From tim_one@email.msn.com Sat Apr 1 00:55:54 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 31 Mar 2000 19:55:54 -0500 Subject: [Python-Dev] A surprising case of cyclic trash Message-ID: <000d01bf9b75$08bf58e0$1aa2143f@tim> This comes (indirectly) from a user of my doctest.py, who noticed that sometimes tempfiles created by his docstring tests got cleaned up (via __del__), but other times not. Here's a hard-won self-contained program illustrating the true cause: class Critical: count = 0 def __init__(self): Critical.count = Critical.count + 1 self.id = Critical.count print "acquiring Critical", self.id def __del__(self): print "releasing Critical", self.id good = "temp = Critical()\n" bad = "def f(): pass\n" + good basedict = {"Critical": Critical} for test in good, bad, good: print "\nStarting test case:" print test exec compile(test, "", "exec") in basedict.copy() And here's output: D:\Python>python misc\doccyc.py Starting test case: temp = Critical() acquiring Critical 1 releasing Critical 1 Starting test case: def f(): pass temp = Critical() acquiring Critical 2 Starting test case: temp = Critical() acquiring Critical 3 releasing Critical 3 D:\Python> That is, in the "bad" case, which differs from the "good" case merely in defining an unreferenced function, temp.__del__ not only doesn't get executed "when expected", it never gets executed at all. This appears to be due to a cycle between the function object and the anonymous dict passed to exec, causing the entire dict to become immortal, thus making "temp" immortal too. I can fiddle the doctest framework to manually nuke the temp dict it creates for execution context; the same kind of leak likely occurs in any exec'ed string that contains a function defn. For future reference, note that the finalizer in question belongs to an object not itself in a cycle, it's an object reachable only from a dead cycle. the-users-don't-stand-a-chance-ly y'rs - tim From tismer@tismer.com Sat Apr 1 14:55:50 2000 From: tismer@tismer.com (Christian Tismer) Date: Sat, 01 Apr 2000 16:55:50 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 References: Message-ID: <38E60DF6.9C4C9443@tismer.com> Moshe Zadka wrote: > > On Fri, 31 Mar 2000, Guido van Rossum wrote: > > > + Christian Tismer > > + Christian Tismer > > Ummmmm....I smell something fishy here. Are there two Christian Tismers? Yes! From time to time I'm re-doing my cloning experiments. This isn't so hard as it seems. The hard thing is to keep them from killing each other. BTW: I'm the second copy from the last experiment (the surviver). > That would explain how Christian has so much time to work on Stackless. > > Well, between the both of them, Guido will have no chance but to put > Stackless in the standard distribution. Guido is stronger, even between three of me :-) ciao - chris-and-the-undead-heresy -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From guido@python.org Sat Apr 1 17:00:00 2000 From: guido@python.org (Guido van Rossum) Date: Sat, 1 Apr 2000 12:00:00 -0500 EST Subject: [Python-Dev] New Features in Python 1.6 Message-ID: <200004011740.MAA04675@eric.cnri.reston.va.us> New Features in Python 1.6 ========================== With the recent release of Python 1.6 alpha 1, a lot of people have been wondering what's new. This short note aims to explain the major changes in Python 1.6. Core Language ------------- 1. Unicode strings Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating a Unicode string, while the double-quote character defaults to ASCII strings. If you need to create a Unicode string with double quotes, just preface it with the letter "u"; likewise, an ASCII string can be created by prefacing single quotes with the letter "a". For example: foo = 'hello' # Unicode foo = "hello" # ASCII foo = a'hello' # ASCII foo = u"hello" # Unicode You can still use the "r" character to quote strings in a manner convenient for the regular expression engine, but with subtle changes in semantics: see "New regular expression engine" below for more information. Also, for compatibility with most editors and operating systems, Python source code is still 7-bit ASCII. Thus, for portability it's best to write Unicode strings using one of two new escapes: \u and \N. \u lets you specify a Unicode character as a 16-bit hexadecimal number, and \N lets you specify it by name: message = 'Bienvenue \N{LATIN SMALL LETTER A WITH GRAVE} ' \ + 'Python fran\N{LATIN SMALL LETTER C WITH CEDILLA}ais!' message = 'Bienvenue \u00E0 Python fran\u00E7ais!' 2. string methods Python strings have grown methods, just like lists and dictionaries. For instance, to split a string on spaces, you can now say this: tokens = "foo bar baz".split(" ") Or, equivalently, this: tokens = " ".split("foo bar baz") (Python figures out which string is the delimiter and which is the string to split by examining both strings to see which one occurs more frequently inside the other.) Be careful not to mix Unicode and ASCII strings when doing this, though. Other examples: foo = "The quick red fox jumped over the lazy brown dog." foo.find("dog") foo.strip() foo.lower() Note that use of any string method on a particular string renders it mutable. This is for consistency with lists, which are mutable and have methods like 'append()' and 'sort()' that modify the list. Thus, "foo.strip()" modifies the string 'foo' in-place. "strip(foo)" retains its old behavior of returning a modified copy of 'foo'. 3. extended call syntax The variable argument list and keyword argument syntax introduced in Python 1.3 has been extended. Previously, it only worked in function/method signatures; calling other functions with the same arguments required the use of 'apply()' def spam(arg1,arg2,*more_args,**keyword_args): # ... apply(foo,(arg1,arg2) + more_args,keyword_args) Now it works for calling functions too. For consistency with C and C++, asterisks in the function signature become ampersands in the function body: foo(arg1,arg2,&more_args,&&keyword_args) 4. assignment to None now works In previous version of Python, values assigned to None were lost. For example, this code: (username,None,None,None,realname,homedir,None) = getpwuid(uid) would only preserve the user name, real name, and home directory fields from a password file entry -- everything else of interest was lost. In Python 1.6, you can meaningfully assign to None. In the above example, None would be replaced by a tuple containing the four values of interest. You can also use the variable argument list syntax here, for example: (username,password,uid,uid,*None) = getpwuid(uid) would set None to a tuple containing the last three elements of the tuple returned by getpwuid. Library ------- 1. Distutils In the past, lots of people have complained about the lack of a standard mechanism for distributing and installing Python modules. This has been fixed by the Distutils, or Distribution Utilities. We took the approach of leveraging past efforts in this area rather than reinventing a number of perfectly good wheels. Thus, the Distutils take advantage of a number of "best-of-breed" tools for distributing, configuring, building, and installing software. The core of the system is a set of m4 macros that augment the standard macros supplied by GNU Autoconf. Where the Autoconf macros generate shell code that becomes a configure script, the Distutils macros generate Python code that creates a Makefile. (This is a similar idea to Perl's MakeMaker system, but of course this Makefile builds Python modules and extensions!) Using the Distutils is easy: you write a script called "setup.in" which contains both Autoconf and Distutils m4 macros; the Autoconf macros are used to create a "configure" script which examines the target system to find out how to build your extensions there, and the Distutils macros create a "setup.py" script, which generates a Makefile that knows how to build your particular collection of modules. You process "setup.in" before distributing your modules, and bundle the resulting "configure" and "setup.py" with your modules. Then, the user just has to run "configure", "setup.py", and "make" to build everything. For example, here's a small, simple "setup.in" for a hypothetical module distribution that uses Autoconf to check for a C library "frob" and builds a Python extension called "_frob" and a pure Python module "frob": AC_INIT(frobmodule.c) AC_CHECK_HEADER(frob.h) AC_HAVE_LIBRARY(frob) AC_OUTPUT() DU_INIT(Frob,1.0) DU_EXTENSION(_frob,frobmodule.c,-lfrob) DU_MODULE(frob,frob.py) DU_OUTPUT(setup.py) First, you run this setup.in using the "prepare_dist" script; this creates "configure" and "setup.py": % prepare_dist Next, you configure the package and create a makefile: % ./configure % ./setup.py Finally, to create a source distribution, use the "sdist" target of the generated Makefile: % make sdist This creates Frob-1.0.tar.gz, which you can then share with the world. A user who wishes to install your extension would download Frob-1.0.tar.gz and create local, custom versions of the "configure" and "setup.py" scripts: % gunzip -c Frob-1.0.tar.gz | tar xf - % cd Frob-1.0 % ./configure % ./setup.py Then, she can build and install your modules: % make % make install Hopefully this will foster even more code sharing in the Python community, and prevent unneeded duplication of effort by module developers. Note that the Python installer for Windows now installs GNU m4, the bash shell, and Autoconf, so that Windows users will be able to use the Distutils just like on Unix. 2. Imputils Complementary to the Distutils are the Imputils, or Import Utilities. Python's import mechanism has been reworked to make it easy for Python programmers to put "hooks" into the code that finds and loads modules. The default import mechanism now includes hooks, written in Python, to load modules via HTTP from a known URL. This has allowed us to drop most of the standard library from the distribution. Now, for example, when you import a less-commonly-needed module from the standard library, Python fetches the code for you. For example, if you say import tokenize then Python -- via the Imputils -- will fetch http://modules.python.org/lib/tokenize.py for you and install it on your system for future use. (This is why the Python interpreter is now installed as a setuid binary under Unix -- if you turn off this bit, you will be unable to load modules from the standard library!) If you try to import a module that's not part of the standard library, then the Imputils will find out -- again from modules.python.org -- where it can find this module. It then downloads the entire relevant module distribution, and uses the Distutils to build and install it on your system. It then loads the module you requested. Simplicity itself! 3. New regular expression engine Python 1.6 includes a new regular expression engine, accessed through the "sre" module, to support Unicode strings. Be sure to use the *old* engine for ASCII strings, though: import re, sre # ... re.match(r"(\d+)", "The number is 42.") # ASCII sre.match(r'(\d+)', 'The number is \N{SUPERSCRIPT TWO}') # Unicode If you're not sure whether a string is ASCII or Unicode, you can always determine this at runtime: from types import * # ... if type(s) is StringType: m = re.match(r"...", s) elif type(s) is UnicodeType: m = sre.match(r'...', s) From gvwilson@nevex.com Sat Apr 1 18:01:13 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Sat, 1 Apr 2000 13:01:13 -0500 (EST) Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: > On Sat, 1 Apr 2000, Guido van Rossum wrote: > New Features in Python 1.6 > ========================== > [lots 'n' lots] > tokens = "foo bar baz".split(" ") > tokens = " ".split("foo bar baz") Has anyone started working up a style guide that'll recommend when to use these new methods, when to use the string module's calls, etc.? Ditto for the other changes --- where there are now two or more ways of doing something, how do I (or my students) tell which one is preferred? Greg p.s. "There's More Than One Way To Do It" == "No Matter How Much Of This Language You Learn, Other People's Code Will Always Look Strange" From gvwilson@nevex.com Sat Apr 1 18:45:16 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Sat, 1 Apr 2000 13:45:16 -0500 (EST) Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <000101bf9c09$71011bc0$182d153f@tim> Message-ID: > >> On Sat, 1 Apr 2000, Guido van Rossum wrote: > >> New Features in Python 1.6 > >> ========================== > >> [lots 'n' lots] > >> tokens = "foo bar baz".split(" ") > >> tokens = " ".split("foo bar baz") > >> [and Python guesses which to split on by studying the contents] > > > Has anyone started working up a style guide that'll recommend when > > to use these new methods, when to use the string module's calls, > > etc.? Ditto for the other changes --- where there are now two or > > more ways of doing something, how do I (or my students) tell which > > one is preferred? > > Greg, you should pay real close attention to the date on Guido's msg. > It's quite a comment on the state of programming languages in general > that this all reads sooooooo plausibly! Well, you have to remember, I'm the guy who asked for "<" to be a legal Python token :-). Greg From est@hyperreal.org Sat Apr 1 22:00:54 2000 From: est@hyperreal.org (est@hyperreal.org) Date: Sat, 1 Apr 2000 14:00:54 -0800 (PST) Subject: [Python-Dev] linuxaudiodev minimal test Message-ID: <20000401220054.13820.qmail@hyperreal.org> The appended script works for me. I think the module should be called something like OSS (since it uses the Open Sound System API) with a -I entry in Setup.in to indicate that this will probably need to be specified to find (e.g., -I/usr/include/linux for Linux, -I/usr/include/machine for FreeBSD...). I'm sure I'll have other suggestions for the module, but they'll have to wait until I finish moving to California. :) Best, Eric #!/usr/bin/python import linuxaudiodev import math, struct, fcntl, FCNTL a = linuxaudiodev.open('w') a.setparameters(44100, 16, 1, linuxaudiodev.AFMT_S16_LE) N = 500 data = apply(struct.pack, ['<%dh' % N] + map(lambda n: 32767 * math.sin((2 * math.pi * n) / N), range(N))) fd = a.fileno() fcntl.fcntl(fd, FCNTL.F_SETFL, ~FCNTL.O_NONBLOCK & fcntl.fcntl(fd, FCNTL.F_GETFL)) for i in xrange(200): a.write(data) From Vladimir.Marangozov@inrialpes.fr Sat Apr 1 23:30:46 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sun, 2 Apr 2000 01:30:46 +0200 (CEST) Subject: [Python-Dev] python -t gets confused? Message-ID: <200004012330.BAA10022@python.inrialpes.fr> The tab/space checking code in the tokenizer seems to get confused by the recently checked in test_pyexpat.py With python -t or -tt, there are inconsistency reports at places where there doesn't seem to be one. (tabnanny seems to be confused too, btw :) ./python -tt Lib/test/test_pyexpat.py File "Lib/test/test_pyexpat.py", line 13 print 'Start element:\n\t', name, attrs ^ SyntaxError: inconsistent use of tabs and spaces in indentation Thus, "make test" reports a failure on test_pyexpat due to a syntax error, instead of a missing optional feature (expat not compiled in). I'm not an expert of the tokenizer code, so someone might want to look at it and tell us what's going on. Without -t or -tt, the code runs fine. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond@skippinet.com.au Sat Apr 1 23:53:50 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sun, 2 Apr 2000 09:53:50 +1000 Subject: [Python-Dev] string.ato? and Unicode Message-ID: Is this an over-sight, or by design? >>> string.atoi(u"1") ... TypeError: argument 1: expected string, unicode found It appears easy to support Unicode - there is already an explicit StringType check in these functions, and it simply delegates to int(), which already _does_ work for Unicode A patch would leave the following behaviour: >>> string.atio(u"1") 1 >>> string.atio(u"1", 16) ... TypeError: can't convert non-string with explicit base IMO, this is better than what we have now. I'll put together a patch if one is wanted... Mark. From tim_one@email.msn.com Sun Apr 2 05:14:23 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 1 Apr 2000 23:14:23 -0500 Subject: [Python-Dev] python -t gets confused? In-Reply-To: <200004012330.BAA10022@python.inrialpes.fr> Message-ID: <000601bf9c59$ef8a3da0$752d153f@tim> [Vladimir Marangozov] > The tab/space checking code in the tokenizer seems to get confused > by the recently checked in test_pyexpat.py > > With python -t or -tt, there are inconsistency reports at places where > there doesn't seem to be one. (tabnanny seems to be confused too, btw :) They're not confused, they're simply reporting that the indentation is screwed up in this file -- which it is. It mixes tabs and spaces in ambiguous ways. > ... > I'm not an expert of the tokenizer code, so someone might want to look > at it and tell us what's going on. Without -t or -tt, the code runs fine. If you set your editor to believe that tab chars are 4 columns (as my Windows editor does), the problem (well, problems -- many lines are flawed) will be obvious. It runs anyway because tab=8 is hardcoded in the Python parser. Quickest fix is for someone at CNRI to just run this thru one of the Unix detabifier programs. From tim_one@email.msn.com Sun Apr 2 07:18:28 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 2 Apr 2000 01:18:28 -0500 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <200003311547.KAA15538@eric.cnri.reston.va.us> Message-ID: <000c01bf9c6b$428af560$752d153f@tim> > The Windows installer is always hard to get just right. ... > ... > I'd love to hear that it also installs cleanly on Windows 95. Please > test IDLE from the start menu! All worked without incident for me under Win95. Nice! Would still prefer that it install to D:\Python-1.6\ by default, though (instead of burying it under "Program Files" -- if you're not on the Help list, you can't believe how hard it is to explain how to deal with embedded spaces in paths). So far I've seen one system crash in TK83.DLL upon closing an IDLE window, but haven't been able to reproduce. OK, I can, it's easy: Open IDLE. Ctrl+O, then navigate to e.g. Tools\idle\config.txt and open it. Click the "close window" button. Boom -- invalid page fault in TK83.DLL. No time to dig further now. From tim_one@email.msn.com Sun Apr 2 07:18:31 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 2 Apr 2000 01:18:31 -0500 Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: Message-ID: <000d01bf9c6b$447957e0$752d153f@tim> [Peter Funk] > -1 for C reformatting. The 4 space intendation seesm reasonable for > Python sources, but I disaggree for C code. C is not Python. Code is code. The project I work on professionally is a half million lines of C++, and 4-space indents are rigidly enforced -- works great. It makes just as much sense for C as for Python, and for all the same reasons. The one formal study I've seen on this showed that comprehension levels peaked at indent levels of 3 and 4, dropping off on both sides. However, tabs in C is one of Guido's endearing inconsistencies, and we don't want to lose the only two of those he has (his other is trying to avoid curly braces whenever possible in C, perhaps out of the same perverse sense of pride I used to take in avoiding redundant semicolons in Pascal <;{} wink>. From pf@artcom-gmbh.de Sun Apr 2 09:03:29 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Sun, 2 Apr 2000 10:03:29 +0200 (MEST) Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: <000d01bf9c6b$447957e0$752d153f@tim> from Tim Peters at "Apr 2, 2000 1:18:31 am" Message-ID: Hi! > [Peter Funk] > > -1 for C reformatting. The 4 space intendation seesm reasonable for > > Python sources, but I disaggree for C code. C is not Python. Tim Peters: > Code is code. The project I work on professionally is a half million lines > of C++, and 4-space indents are rigidly enforced -- works great. It makes > just as much sense for C as for Python, and for all the same reasons. The > one formal study I've seen on this showed that comprehension levels peaked > at indent levels of 3 and 4, dropping off on both sides. Sigh... Well, if the Python-Interpreter C sources were indented with 4 spaces from the very beginning, I would have kept my mouth shut! But as we can't get the whole world to aggree on how to indent C-Sources, we should at least try to avoid the loss off energy and time, the debate on this topic will cause. So what's my point? IMO reformatting the C-sources wouldn't do us any favor. There will always be people, who like another indentation style more. The GNU software and the Linux kernel have set some standards within the open source community. These projects represent a reasonable fraction of programmers that may be potential contributors to other open source projects. So the only effect a reformatting from 8 to 4 space indents would be to disturb the "8-spacers" and causing endless discussions like this one. Period. > However, tabs in C is one of Guido's endearing inconsistencies, and we don't > want to lose the only two of those he has (his other is trying to > avoid curly braces whenever possible in C, perhaps out of the same perverse > sense of pride I used to take in avoiding redundant semicolons in Pascal > <;{} wink>. Aggreed. Best reagrds, Peter From Fredrik Lundh" one of my side projects for SRE is to create a regex-compatible frontend. since both engines have NFA semantics, this mostly involves writing an alternate parser. however, when I started playing with that, I completely forgot about the regex.set_syntax() function. supporting one extra syntax isn't that much work, but a whole bunch of them? so what should we do? 1. completely get rid of regex (bjorn would love that, don't you think?) 2. remove regex.set_syntax(), and tell people who've used it that they're SOL. 3. add all the necessary flags to the new parser... 4. keep regex around as before, and live with the extra code bloat. comments? From pf@artcom-gmbh.de Sun Apr 2 13:49:26 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Sun, 2 Apr 2000 14:49:26 +0200 (MEST) Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 1, 2000 12: 0: 0 pm" Message-ID: Hi! Guido van Rossum on april 1st: [...] > With the recent release of Python 1.6 alpha 1, a lot of people have > been wondering what's new. This short note aims to explain the major > changes in Python 1.6. [...] > Python strings can now be stored as Unicode strings. To make it easier > to type Unicode strings, the single-quote character defaults to creating -------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > a Unicode string, while the double-quote character defaults to ASCII ----^^^^^^^^^^^^^^ > strings. As I read this my first thoughts were: "Huh? Is that really true? To me this sounds like a april fools joke. But to be careful I checked first before I read on: pf@artcom0:ttyp4 ~/archiv/freeware/python/CVS_01_04_00/dist/src 41> ./python Python 1.6a1 (#2, Apr 1 2000, 19:19:18) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 'a' 'a' >>> 'ä' '\344' >>> u'ä' u'\344' Since www.python.org happens to be down at that moment, I was unable to check, whether my CVS tarball I downloaded from Davids starship account was recent enough and whether this single-quote-defaults-to-unicode has been discussed earlier before I got subscribed to python-dev. Better I should have read on first, before starting to wonder... [...] > tokens = "foo bar baz".split(" ") > Or, equivalently, this: > tokens = " ".split("foo bar baz") > > (Python figures out which string is the delimiter and which is the > string to split by examining both strings to see which one occurs more > frequently inside the other.) Now it becomes clearer that this *must* be an april fools joke! ;-) : >>> tokens = "foo bar baz".split(" ") >>> print tokens ['foo', 'bar', 'baz'] >>> tokens = " ".split("foo bar baz") >>> print tokens [' '] [...] > Note that use of any string method on a particular string renders it > mutable. [...] > For consistency with C and C++, > asterisks in the function signature become ampersands in the function > body: [...] > load modules via HTTP from a known URL. [...] > This has allowed us to drop most of the standard library from the > distribution... [...] Pheeew... Oh Well. And pigs can fly. Sigh! ;-) That was a well prepared April fools joke! Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From tismer@tismer.com Sun Apr 2 14:53:12 2000 From: tismer@tismer.com (Christian Tismer) Date: Sun, 02 Apr 2000 15:53:12 +0200 Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) References: Message-ID: <38E750C8.A559DF19@tismer.com> Peter Funk wrote: > > Hi! > > Guido van Rossum on april 1st: [turns into a Perli for a moment - well done! ] ... > Since www.python.org happens to be down at that moment, I was unable to check, > whether my CVS tarball I downloaded from Davids starship account > was recent enough and whether this single-quote-defaults-to-unicode > has been discussed earlier before I got subscribed to python-dev. Better > I should have read on first, before starting to wonder... You should not give up when python.org is down. As a fallback, I used to use www.cwi.nl which appears to be quite up-to-date. You can find the files and the *true* change list at http://www.cwi.nl/www.python.org/1.6/ Note that today is April 2, so you may believe me at-least-not-less-than-usually - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From fdrake@acm.org Sun Apr 2 21:34:39 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Sun, 2 Apr 2000 16:34:39 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <004701bf9c7e$a5045480$34aab5d4@hagrid> References: <004701bf9c7e$a5045480$34aab5d4@hagrid> Message-ID: <14567.44767.357265.167396@seahag.cnri.reston.va.us> Fredrik Lundh writes: > 1. completely get rid of regex (bjorn would love that, > don't you think?) The regex module has been documented as obsolete for a while now. Just leave the module alone and will disappear in time. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal@lemburg.com Sun Apr 2 23:11:02 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 00:11:02 +0200 Subject: [Python-Dev] string.ato? and Unicode References: Message-ID: <38E7C576.5D3530E4@lemburg.com> Mark Hammond wrote: > > Is this an over-sight, or by design? > > >>> string.atoi(u"1") > ... > TypeError: argument 1: expected string, unicode found Probably an oversight... and it may well not be the only one: there are many explicit string checks in the code which might need to be fixed for Unicode support. As for string.ato? I'm not sure: these functions are obsoleted by int(), float() and long(). > It appears easy to support Unicode - there is already an explicit > StringType check in these functions, and it simply delegates to > int(), which already _does_ work for Unicode Right. I fixed the above three APIs to support Unicode. > A patch would leave the following behaviour: > >>> string.atio(u"1") > 1 > >>> string.atio(u"1", 16) > ... > TypeError: can't convert non-string with explicit base > > IMO, this is better than what we have now. I'll put together a > patch if one is wanted... BTW, the code in string.py for atoi() et al. looks really complicated: """ def atoi(*args): """atoi(s [,base]) -> int Return the integer represented by the string s in the given base, which defaults to 10. The string s must consist of one or more digits, possibly preceded by a sign. If base is 0, it is chosen from the leading characters of s, 0 for octal, 0x or 0X for hexadecimal. If base is 16, a preceding 0x or 0X is accepted. """ try: s = args[0] except IndexError: raise TypeError('function requires at least 1 argument: %d given' % len(args)) # Don't catch type error resulting from too many arguments to int(). The # error message isn't compatible but the error type is, and this function # is complicated enough already. if type(s) == _StringType: return _apply(_int, args) else: raise TypeError('argument 1: expected string, %s found' % type(s).__name__) """ Why not simply... def atoi(s, base=10): return int(s, base) dito for atol() and atof()... ?! This would not only give us better performance, but also Unicode support for free. (I'll fix int() and long() to accept Unicode when using an explicit base too.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Mon Apr 3 10:44:52 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 02:44:52 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <000c01bf9c6b$428af560$752d153f@tim> Message-ID: On Sun, 2 Apr 2000, Tim Peters wrote: > > The Windows installer is always hard to get just right. ... > > ... > > I'd love to hear that it also installs cleanly on Windows 95. Please > > test IDLE from the start menu! > > All worked without incident for me under Win95. Nice! Would still prefer > that it install to D:\Python-1.6\ by default, though (instead of burying it > under "Program Files" -- if you're not on the Help list, you can't believe > how hard it is to explain how to deal with embedded spaces in paths). Ack! No way... Keep my top-level clean! :-) This is Windows. Apps go into Program Files. That is Just The Way It Is. When was the last time you saw /python on a Unix box? Never? Always in .../bin/? Thought so. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Fredrik Lundh" Message-ID: <004f01bf9d52$ce40de20$34aab5d4@hagrid> Greg Stein wrote: > > All worked without incident for me under Win95. Nice! Would still = prefer > > that it install to D:\Python-1.6\ by default, though (instead of = burying it > > under "Program Files" -- if you're not on the Help list, you can't = believe > > how hard it is to explain how to deal with embedded spaces in = paths). >=20 > Ack! No way... Keep my top-level clean! :-) >=20 > This is Windows. Apps go into Program Files. That is Just The Way It = Is. =20 if you're on a US windows box, sure. but "Program Files" isn't exactly an international standard... we install our python distribution under the \py, and we get lot of positive responses. as far as I remember, nobody has ever reported problems setting up the path... > When was the last time you saw /python on a Unix box? Never? Always in > .../bin/? Thought so. if the Unix designers had come up with the bright idea of translating "bin" to "whatever might seem to make sense in this language", I think you'd see many more non-std in- stallations under Unix... especially if they'd made the root directory writable to everyone :-) From gstein@lyra.org Mon Apr 3 11:08:54 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:08:54 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <004f01bf9d52$ce40de20$34aab5d4@hagrid> Message-ID: On Mon, 3 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > > All worked without incident for me under Win95. Nice! Would still prefer > > > that it install to D:\Python-1.6\ by default, though (instead of burying it > > > under "Program Files" -- if you're not on the Help list, you can't believe > > > how hard it is to explain how to deal with embedded spaces in paths). > > > > Ack! No way... Keep my top-level clean! :-) > > > > This is Windows. Apps go into Program Files. That is Just The Way It Is. > > if you're on a US windows box, sure. but "Program Files" > isn't exactly an international standard... Yes it is... if you use the appropriate Windows APIs (or registry... forget where). Windows specifies a way to get the localized name for Program Files. > we install our python distribution under the \py, > and we get lot of positive responses. as far as I remember, > nobody has ever reported problems setting up the path... *shrug* This doesn't dispute the standard Windows recommendation to install software into Program Files. > > When was the last time you saw /python on a Unix box? Never? Always in > > .../bin/? Thought so. > > if the Unix designers had come up with the bright idea of > translating "bin" to "whatever might seem to make sense > in this language", I think you'd see many more non-std in- > stallations under Unix... especially if they'd made the root > directory writable to everyone :-) heh :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Mon Apr 3 11:18:30 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:18:30 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: Message-ID: I don't recall the termination of the discussion, but I don't know that consensus was ever reached. Personally, I find this of little value over the similar (not exact) code: def supplement(dict, extra): d = extra.copy() d.update(dict) return d If the dictionary needs to be modified in place, then the loop from your UserDict.supplement would be used. Another view: why keep adding methods to service all possible needs? Cheers, -g On Mon, 3 Apr 2000, Peter Funk wrote: > Dear Python patcher! > > Please consider to apply the patch appended below and commit into the CVS tree. > It applies to: Python 1.6a1 as released on april 1st. > --=-- argument: --=--=--=--=--=--=--=--=--=--=-->8--=- > This patch adds a new method to dictionary and UserDict objects: > '.supplement()' is a "sibling" of '.update()', but it add only > those items that are not already there instead of replacing them. > > This idea has been discussed on python-dev last month. > --=-- obligatory disclaimer: -=--=--=--=--=--=-->8--=- > I confirm that, to the best of my knowledge and belief, this > contribution is free of any claims of third parties under > copyright, patent or other rights or interests ("claims"). To > the extent that I have any such claims, I hereby grant to CNRI a > nonexclusive, irrevocable, royalty-free, worldwide license to > reproduce, distribute, perform and/or display publicly, prepare > derivative versions, and otherwise use this contribution as part > of the Python software and its related documentation, or any > derivative versions thereof, at no cost to CNRI or its licensed > users, and to authorize others to do so. > > I acknowledge that CNRI may, at its sole discretion, decide > whether or not to incorporate this contribution in the Python > software and its related documentation. I further grant CNRI > permission to use my name and other identifying information > provided to CNRI by me for use in connection with the Python > software and its related documentation. > --=-- dry signature: =--=--=--=--=--=--=--=--=-->8--=- > Regards, Peter > -- > Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 > office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) > --=-- patch: --=--=--=--=--=--=--=--=--=--=--=-->8--=- > *** ../../cvs_01_04_00_orig/dist/src/Objects/dictobject.c Fri Mar 31 11:45:02 2000 > --- src/Objects/dictobject.c Mon Apr 3 10:30:11 2000 > *************** > *** 734,739 **** > --- 734,781 ---- > } > > static PyObject * > + dict_supplement(mp, args) > + register dictobject *mp; > + PyObject *args; > + { > + register int i; > + dictobject *other; > + dictentry *entry, *oldentry; > + if (!PyArg_Parse(args, "O!", &PyDict_Type, &other)) > + return NULL; > + if (other == mp) > + goto done; /* a.supplement(a); nothing to do */ > + /* Do one big resize at the start, rather than incrementally > + resizing as we insert new items. Expect that there will be > + no (or few) overlapping keys. */ > + if ((mp->ma_fill + other->ma_used)*3 >= mp->ma_size*2) { > + if (dictresize(mp, (mp->ma_used + other->ma_used)*3/2) != 0) > + return NULL; > + } > + for (i = 0; i < other->ma_size; i++) { > + entry = &other->ma_table[i]; > + if (entry->me_value != NULL) { > + oldentry = lookdict(mp, entry->me_key, entry->me_hash); > + if (oldentry->me_value == NULL) { > + /* TODO: optimize: > + 'insertdict' does another call to 'lookdict'. > + But for sake of readability and symmetry with > + 'dict_update' I didn't tried to avoid this. > + At least not now as we go into 1.6 alpha. */ > + Py_INCREF(entry->me_key); > + Py_INCREF(entry->me_value); > + insertdict(mp, entry->me_key, entry->me_hash, > + entry->me_value); > + } > + } > + } > + done: > + Py_INCREF(Py_None); > + return Py_None; > + } > + > + > + static PyObject * > dict_copy(mp, args) > register dictobject *mp; > PyObject *args; > *************** > *** 1045,1050 **** > --- 1087,1093 ---- > {"clear", (PyCFunction)dict_clear}, > {"copy", (PyCFunction)dict_copy}, > {"get", (PyCFunction)dict_get, METH_VARARGS}, > + {"supplement", (PyCFunction)dict_supplement}, > {NULL, NULL} /* sentinel */ > }; > > *** ../../cvs_01_04_00_orig/dist/src/Lib/test/test_types.py Wed Feb 23 23:23:17 2000 > --- src/Lib/test/test_types.py Mon Apr 3 10:41:53 2000 > *************** > *** 242,247 **** > --- 242,250 ---- > d.update({2:20}) > d.update({1:1, 2:2, 3:3}) > if d != {1:1, 2:2, 3:3}: raise TestFailed, 'dict update' > + d.supplement({1:"not", 2:"neither", 4:4}) > + if d != {1:1, 2:2, 3:3, 4:4}: raise TestFailed, 'dict supplement' > + del d[4] > if d.copy() != {1:1, 2:2, 3:3}: raise TestFailed, 'dict copy' > if {}.copy() != {}: raise TestFailed, 'empty dict copy' > # dict.get() > *** ../../cvs_01_04_00_orig/dist/src/Lib/UserDict.py Wed Feb 2 16:10:14 2000 > --- src/Lib/UserDict.py Mon Apr 3 10:45:17 2000 > *************** > *** 32,36 **** > --- 32,45 ---- > else: > for k, v in dict.items(): > self.data[k] = v > + def supplement(self, dict): > + if isinstance(dict, UserDict): > + self.data.supplement(dict.data) > + elif isinstance(dict, type(self.data)): > + self.data.supplement(dict) > + else: > + for k, v in dict.items(): > + if not self.data.has_key(k): > + self.data[k] = v > def get(self, key, failobj=None): > return self.data.get(key, failobj) > *** ../../cvs_01_04_00_orig/dist/src/Lib/test/test_userdict.py Fri Mar 26 16:32:02 1999 > --- src/Lib/test/test_userdict.py Mon Apr 3 10:50:29 2000 > *************** > *** 93,101 **** > --- 93,109 ---- > t.update(u2) > assert t == u2 > > + # Test supplement > + > + t = UserDict(d1) > + t.supplement(u2) > + assert t == u2 > + > # Test get > > for i in u2.keys(): > assert u2.get(i) == u2[i] > assert u1.get(i) == d1.get(i) > assert u0.get(i) == d0.get(i) > + > + # TODO: Add a test using dir({}) to test for unimplemented methods > > _______________________________________________ > Patches mailing list > Patches@python.org > http://www.python.org/mailman/listinfo/patches > -- Greg Stein, http://www.lyra.org/ From Fredrik Lundh" Message-ID: <008b01bf9d57$0555fc20$34aab5d4@hagrid> Greg Stein wrote: > I don't recall the termination of the discussion, but I don't know = that > consensus was ever reached. iirc, Ping liked it, but I'm not sure anybody else contributed much to that thread... (and to neutralize Ping, just let me say that I don't like it :-) > Personally, I find this of little value over the similar (not exact) = code: >=20 > def supplement(dict, extra): > d =3D extra.copy() > d.update(dict) > return d has anyone benchmarked this? for some reason, I doubt that the difference between copy/update and supplement is that large... > Another view: why keep adding methods to service all possible needs? exactly. From Fredrik Lundh" Message-ID: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Greg Stein wrote: > > we install our python distribution under the \py, > > and we get lot of positive responses. as far as I remember, > > nobody has ever reported problems setting up the path... >=20 > *shrug* This doesn't dispute the standard Windows recommendation to > install software into Program Files. no, but Tim's and my experiences from doing user support show that the standard Windows recommendation doesn't work for command line applications. we don't care about Microsoft, we care about Python's users. to quote a Linus Torvalds, "bad standards _should_ be broken" (after all, Microsoft doesn't put their own command line applications down there -- there's no "\Program Files" [sub]directory in the default PATH, at least not on any of my boxes. maybe they've changed that in Windows 2000?) From gstein@lyra.org Mon Apr 3 11:49:27 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:49:27 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Message-ID: On Mon, 3 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > > we install our python distribution under the \py, > > > and we get lot of positive responses. as far as I remember, > > > nobody has ever reported problems setting up the path... > > > > *shrug* This doesn't dispute the standard Windows recommendation to > > install software into Program Files. > > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. Valid point. But there are other solutions, too. VC distributes a thing named "VCVARS.BAT" to set up paths and other environ vars. Python could certainly do the same thing (to overcome the embedded-space issue). > to quote a Linus Torvalds, "bad standards _should_ be broken" Depends on the audience of that standard. Programmers: yah. Consumers? They just want the damn thing to work like they expect it to. That expectation is usually "I can find my programs in Program Files." > (after all, Microsoft doesn't put their own command line applications > down there -- there's no "\Program Files" [sub]directory in the default > PATH, at least not on any of my boxes. maybe they've changed that > in Windows 2000?) Incorrect. Site Server had command-line tools down there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ajung@sz-sb.de Mon Apr 3 12:17:20 2000 From: ajung@sz-sb.de (Andreas Jung) Date: Mon, 3 Apr 2000 13:17:20 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us>; from guido@python.org on Sat, Apr 01, 2000 at 12:00:00PM -0500 References: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: <20000403131720.A10313@sz-sb.de> On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: > > Python strings can now be stored as Unicode strings. To make it easier > to type Unicode strings, the single-quote character defaults to creating > a Unicode string, while the double-quote character defaults to ASCII > strings. If you need to create a Unicode string with double quotes, > just preface it with the letter "u"; likewise, an ASCII string can be > created by prefacing single quotes with the letter "a". For example: > > foo = 'hello' # Unicode > foo = "hello" # ASCII Is single-quoting for creating unicode clever ? I think there might be a problem with old code when the operations on unicode strings are not 100% compatible to the standard string operations. I don't know if this is a real problem - it's just a point for discussion. Cheers, Andreas From pf@artcom-gmbh.de Mon Apr 3 12:12:25 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Mon, 3 Apr 2000 13:12:25 +0200 (MEST) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: <008b01bf9d57$0555fc20$34aab5d4@hagrid> from Fredrik Lundh at "Apr 3, 2000 12:25: 5 pm" Message-ID: Hi! > Greg Stein wrote: > > I don't recall the termination of the discussion, but I don't know that > > consensus was ever reached. > Fredrik Lundh: > iirc, Ping liked it, but I'm not sure anybody else contributed > much to that thread... That was my impression: It is hard to guess what you guys think from mere silence. ;-) > (and to neutralize Ping, just let me say that I don't like it :-) > > > Personally, I find this of little value over the similar (not exact) code: > > > > def supplement(dict, extra): [...] > > Another view: why keep adding methods to service all possible needs? > > exactly. A agree that we should avoid adding new methods all over the place. But IMO this is an exception: I proposed it for the sake of symmetry with 'update'. From my POV 'supplement' relates to 'update' as '+' relates to '-'. YMMV and I will not be angry, if this idea will be finally rejected. But it would have saved me an hour or two of coding and testing time if you had expressed your opinions a little bit earlier. ;-) But I know: you are all busy. To get an impression of possible uses for supplement, I sketch some code here: class MysticMegaWidget(MyMegaWidget): _config = { horizontal_elasticity = 1000, vertical_elasticity = 10, mentalplex_fg_color = "#FF0000", mentalplex_bg_color = "#0000FF", font = "Times", } def __init__(self, *args, **kw): if kw: self._config = kw self._config.supplement(self.__class__._config) .... Of course this can also be implemented using 'copy' and 'update'. It's only slightly more complicated. But you can also emulate any boolean operation using only NAND. Nevertheless any serious programming language contains at least OR, AND, NOT and possibly XOR. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mal@lemburg.com Mon Apr 3 12:48:05 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 13:48:05 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 References: <200004011740.MAA04675@eric.cnri.reston.va.us> <20000403131720.A10313@sz-sb.de> Message-ID: <38E884F5.8F2FB271@lemburg.com> Andreas Jung wrote: > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: The above line has all the answers ;-) ... > > Python strings can now be stored as Unicode strings. To make it easier > > to type Unicode strings, the single-quote character defaults to creating > > a Unicode string, while the double-quote character defaults to ASCII > > strings. If you need to create a Unicode string with double quotes, > > just preface it with the letter "u"; likewise, an ASCII string can be > > created by prefacing single quotes with the letter "a". For example: > > > > foo = 'hello' # Unicode > > foo = "hello" # ASCII > > Is single-quoting for creating unicode clever ? I think there might be a problem > with old code when the operations on unicode strings are not 100% compatible to > the standard string operations. I don't know if this is a real problem - it's > just a point for discussion. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond@skippinet.com.au Mon Apr 3 13:22:17 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Mon, 3 Apr 2000 22:22:17 +1000 Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <38E884F5.8F2FB271@lemburg.com> Message-ID: > > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van > Rossum wrote: > > The above line has all the answers ;-) ... That was pretty sneaky tho! Had the added twist of being half-true... Mark. From mal@lemburg.com Mon Apr 3 13:59:21 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 14:59:21 +0200 Subject: [Python-Dev] Unicode and numerics Message-ID: <38E895A9.94504851@lemburg.com> I've just posted a new patch set to the patches list which contains better support for Unicode in the int(), long(), float() and complex() builtins. There are some new APIs now which can be used by extension writer to convert from Unicode to integers, floats and longs. These APIs are fully Unicode aware, meaning that you can also pass them any Unicode characters with decimal mappings, not only the standard ASCII '0'-'9' ones. One thing I noticed, which needs some discussion: There are two separate APIs which convert long string literals to long objects: PyNumber_Long() and PyLong_FromString(). The first applies the same error checking as does the PyInt_FromString() API, while the latter does not apply this check... Question is: shouldn't the check for truncated data ("9.5" -> 9L) be moved into PyLong_FromString() ? BTW, should I also post patches to string.py which use the simplified versions for string.ato?() I posted a few days ago ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Apr 3 14:12:58 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 15:12:58 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 References: Message-ID: <38E898DA.B69D7ED6@lemburg.com> Mark Hammond wrote: > > > > > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van > > Rossum wrote: > > > > The above line has all the answers ;-) ... > > That was pretty sneaky tho! Had the added twist of being > half-true... ... and on time like a CRON-job ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov@inrialpes.fr Mon Apr 3 15:11:55 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Mon, 3 Apr 2000 16:11:55 +0200 (CEST) Subject: [Python-Dev] Suggested PyMem & PyObject_NEW includes (fwd) Message-ID: <200004031411.QAA12486@python.inrialpes.fr> Vladimir Marangozov wrote: From Vladimir.Marangozov@inrialpes.fr Mon Apr 3 15:07:43 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Mon, 3 Apr 2000 16:07:43 +0200 (CEST) Subject: Suggested PyMem & PyObject_NEW includes Sorry for the delay -- I simply could't progress on this as I wanted to. Here's the includes I suggest for PyMem and PyObject_New cs. PyMem is okay. Some questions arise with PyObject_NEW. 1) I'm willing to unify the implementation on Windows and Unix (so I'm retaining the Windows variant of _PyObject_New reincarnated by PyObject_FromType -- see the comments in pyobjimpl.h). 2) For the user, there's the principle to use functions if binary compatibility is desired, and macros if she needs to trade compatibility for speed. But there's the issue of allocating the user objects with PyMem, or, allocate the objects with a custom allocator. After scratching my head on how to preserve bin compatibility with old libraries and offer the freedom to the user, I ended with the following (subject to discussion): - Use the functions for bin compat (but have also an exception _PyObject_Del(), with a leading underscore, for the core...) Objects in this case are allocated with PyMem. - Use the macros for allocating the objects with the potentially custom allocator (through malloc, realloc, free -- see below) What do you think? -----------------------------[ mymalloc.h ]--------------------------- ... /* * Core memory allocator * ===================== */ /* To make sure the interpreter is user-malloc friendly, all memory and object APIs are implemented on top of this one. The PyCore_* macros can be changed to make the interpreter use a custom allocator. Note that they are for internal use only. Both the core and extension modules should use the PyMem_* API. */ #define PyCore_MALLOC_FUNC malloc #define PyCore_REALLOC_FUNC realloc #define PyCore_FREE_FUNC free #define PyCore_MALLOC_PROTO Py_PROTO((size_t)) #define PyCore_REALLOC_PROTO Py_PROTO((ANY *, size_t)) #define PyCore_FREE_PROTO Py_PROTO((ANY *)) #define PyCore_MALLOC(n) PyCore_MALLOC_FUNC(n) #define PyCore_REALLOC(p, n) PyCore_REALLOC_FUNC((p), (n)) #define PyCore_FREE(p) PyCore_FREE_FUNC(p) /* The following should never be necessary */ #ifdef NEED_TO_DECLARE_MALLOC_AND_FRIEND extern ANY *PyCore_MALLOC_FUNC PyCore_MALLOC_PROTO; extern ANY *PyCore_REALLOC_FUNC PyCore_REALLOC_PROTO; extern void PyCore_FREE_FUNC PyCore_FREE_PROTO; #endif /* BEWARE: Each interface exports both functions and macros. Extension modules should normally use the functions for ensuring binary compatibility of the user's code across Python versions. Subsequently, if Python switches to its own malloc (different from standard malloc), no recompilation is required for the extensions. The macro versions trade compatibility for speed. They can be used whenever there is a performance problem, but their use implies recompilation of the code for each new Python release. The Python core uses the macros because it *is* compiled on every upgrade. This might not be the case with 3rd party extensions in a custom setup (for example, a customer does not always have access to the source of 3rd party deliverables). You have been warned! */ /* * Raw memory interface * ==================== */ /* Functions */ /* Two sets of function wrappers around malloc and friends; useful if you need to be sure that you are using the same memory allocator as Python. Note that the wrappers make sure that allocating 0 bytes returns a non-NULL pointer, even if the underlying malloc doesn't. */ /* These wrappers around malloc call PyErr_NoMemory() on failure */ extern DL_IMPORT(ANY *) Py_Malloc Py_PROTO((size_t)); extern DL_IMPORT(ANY *) Py_Realloc Py_PROTO((ANY *, size_t)); extern DL_IMPORT(void) Py_Free Py_PROTO((ANY *)); /* These wrappers around malloc *don't* call anything on failure */ extern DL_IMPORT(ANY *) PyMem_Malloc Py_PROTO((size_t)); extern DL_IMPORT(ANY *) PyMem_Realloc Py_PROTO((ANY *, size_t)); extern DL_IMPORT(void) PyMem_Free Py_PROTO((ANY *)); /* Macros */ #define PyMem_MALLOC(n) PyCore_MALLOC(n) #define PyMem_REALLOC(p, n) PyCore_REALLOC((ANY *)(p), (n)) #define PyMem_FREE(p) PyCore_FREE((ANY *)(p)) /* * Type-oriented memory interface * ============================== */ /* Functions */ #define PyMem_New(type, n) \ ( (type *) PyMem_Malloc((n) * sizeof(type)) ) #define PyMem_Resize(p, type, n) \ ( (p) = (type *) PyMem_Realloc((n) * sizeof(type)) ) #define PyMem_Del(p) PyMem_Free(p) /* Macros */ #define PyMem_NEW(type, n) \ ( (type *) PyMem_MALLOC(_PyMem_EXTRA + (n) * sizeof(type)) ) #define PyMem_RESIZE(p, type, n) \ if ((p) == NULL) \ (p) = (type *) PyMem_MALLOC( \ _PyMem_EXTRA + (n) * sizeof(type)); \ else \ (p) = (type *) PyMem_REALLOC((p), \ _PyMem_EXTRA + (n) * sizeof(type)) #define PyMem_DEL(p) PyMem_FREE(p) /* PyMem_XDEL is deprecated. To avoid the call when p is NULL, it's recommended to write the test explicitely in the code. Note that according to ANSI C, free(NULL) has no effect. */ #define PyMem_XDEL(p) if ((p) == NULL) ; else PyMem_DEL(p) ... -----------------------------[ mymalloc.h ]--------------------------- ... /* Functions and macros for modules that implement new object types. You must first include "object.h". PyObject_New(type, typeobj) allocates memory for a new object of the given type; here 'type' must be the C structure type used to represent the object and 'typeobj' the address of the corresponding type object. Reference count and type pointer are filled in; the rest of the bytes of the object are *undefined*! The resulting expression type is 'type *'. The size of the object is actually determined by the tp_basicsize field of the type object. PyObject_NewVar(type, typeobj, n) is similar but allocates a variable-size object with n extra items. The size is computed as tp_basicsize plus n * tp_itemsize. This fills in the ob_size field as well. PyObject_Del(op) releases the memory allocated for an object. Two versions of the object constructors/destructors are provided: 1) PyObject_{New, NewVar, Del} delegate the allocation of the objects to the Python allocator which places them within the bounds of the Python heap. This way, Python keeps control on the user's objects regarding their memory management; for instance, they may be subject to automatic garbage collection, once their reference count drops to zero. Binary compatibility is preserved and there's no need to recompile the extension every time a new Python release comes out. 2) PyObject_{NEW, NEW_VAR, DEL} use the allocator of the extension module which *may* differ from the one used by the Python library. Typically, in a C++ module one may wish to redefine the default allocation strategy by overloading the operators new and del. In this case, however, the extension does not cooperate with the Python memory manager. The latter has no control on the user's objects as they won't be allocated within the Python heap. Therefore, automatic garbage collection may not be performed, binary compatibility is not guaranteed and recompilation is required on every new Python release. Unless a specific memory management is needed, it's recommended to use 1). */ /* In pre-Python-1.6 times, only the PyObject_{NEW, NEW_VAR} macros were defined in terms of internal functions _PyObject_{New, NewVar}, the implementation of which used to differ for Windows and non-Windows platforms (see object.c -- these functions are left for backwards compatibility with old libraries). Starting from 1.6, an unified interface was introduced for both 1) & 2) */ extern DL_IMPORT(PyObject *) PyObject_FromType Py_PROTO((PyTypeObject *, PyObject *)); extern DL_IMPORT(PyVarObject *) PyObject_VarFromType Py_PROTO((PyTypeObject *, int, PyVarObject *)); extern DL_IMPORT(void) PyObject_Del Py_PROTO((PyObject *)); /* Functions */ #define PyObject_New(type, typeobj) \ ((type *) PyObject_FromType(typeobj, NULL)) #define PyObject_NewVar(type, typeobj, n) \ ((type *) PyObject_VarFromType((typeobj), (n), NULL)) #define PyObject_Del(op) PyObject_Del((PyObject *)(op)) /* XXX This trades binary compatibility for speed. */ #include "mymalloc.h" #define _PyObject_Del(op) PyMem_FREE((PyObject *)(op)) /* Macros */ #define PyObject_NEW(type, typeobj) \ ((type *) PyObject_FromType(typeobj, \ (PyObject *) malloc((typeobj)->tp_basicsize))) #define PyObject_NEW_VAR(type, typeobj, n) \ ((type *) PyObject_VarFromType(typeobj, \ (PyVarObject *) malloc((typeobj)->tp_basicsize + \ n * (typeobj)->tp_itemsize))) #define PyObject_DEL(op) free(op) ---------------------------------------------------------------------- So with this, I'm planning to "give the example" by renaming everywhere in the distrib PyObject_NEW with PyObject_New, but use for the core _PyObject_Del instead of PyObject_Del. I'll use PyObject_Del for the objects defined in extension modules. The point is that I don't want to define PyObject_Del in terms of PyMem_FREE (or define PyObject_New in terms of PyMem_MALLOC) as this would break the principle of binary compatibility when the Python allocator is changed to a custom malloc from one build to another. OTOH, I don't like the underscore... Do you have a better suggestion? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Mon Apr 3 15:50:25 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 16:50:25 +0200 Subject: [Python-Dev] Re: Unicode and numerics References: <38E895A9.94504851@lemburg.com> Message-ID: <38E8AFB1.9798186E@lemburg.com> "M.-A. Lemburg" wrote: > > BTW, should I also post patches to string.py which use the > simplified versions for string.ato?() I posted a few days ago ? I've just added these to the patch set... they no longer use the same error string, but the error type still is the same when e.g. string.atoi() is called with a non-string. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Mon Apr 3 17:04:02 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 12:04:02 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: Your message of "Sat, 01 Apr 2000 12:00:00 EST." <200004011740.MAA04675@eric.cnri.reston.va.us> References: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: <200004031604.MAA05283@eric.cnri.reston.va.us> Not only was it an April fool's joke, but it wasn't mine! It was forged by an insider. I know by who, but won't tell, because it was so good. It shows that I can trust to delegate way more to the Python community than I think I can! :-) BTW, the biggest give-away that it wasn't mine was the absence of my standard sign-off line: --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@cnri.reston.va.us Mon Apr 3 17:36:24 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Mon, 3 Apr 2000 12:36:24 -0400 (EDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: References: Message-ID: <14568.51336.811523.937351@bitdiddle.cnri.reston.va.us> I agree with Greg. Jeremy From bwarsaw@cnri.reston.va.us Mon Apr 3 18:20:19 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Mon, 3 Apr 2000 13:20:19 -0400 (EDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' References: <14568.51336.811523.937351@bitdiddle.cnri.reston.va.us> Message-ID: <14568.53971.777162.624760@anthem.cnri.reston.va.us> -0 on dict.supplement(), not the least because I'll always missspell it :) -Barry From pf@artcom-gmbh.de Mon Apr 3 19:01:50 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Mon, 3 Apr 2000 20:01:50 +0200 (MEST) Subject: [Python-Dev] {}.supplement() -- poll results so far Message-ID: Look's like I should better forget my proposal to add a new method '.supplement()' to dictionaries, which should do the opposite of the already available method '.update()'. I summarize in cronological order: Ka-Ping Yee: +1 Fred Drake: +0 Greg Stein: -1 Fredrik Lundh: -1 Jeremy Hylton: -1 Barry Warsaw: -0 Are there other opinions which may change the picture? <0.1 wink> Regards, Peter From gstein@lyra.org Mon Apr 3 19:31:33 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 11:31:33 -0700 (PDT) Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Message-ID: On Mon, 3 Apr 2000, Peter Funk wrote: > Look's like I should better forget my proposal to add a new method > '.supplement()' to dictionaries, which should do the opposite of > the already available method '.update()'. > I summarize in cronological order: > > Ka-Ping Yee: +1 > Fred Drake: +0 > Greg Stein: -1 > Fredrik Lundh: -1 > Jeremy Hylton: -1 > Barry Warsaw: -0 > > Are there other opinions which may change the picture? <0.1 wink> Guido's :-) -- Greg Stein, http://www.lyra.org/ From Fredrik Lundh" >>> "!" in ("a", None) 0 >>> u"!" in ("a", None) Traceback (innermost last): File "", line 1, in ? TypeError: expected a character buffer object From guido@python.org Mon Apr 3 20:48:25 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 15:48:25 -0400 Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Your message of "Mon, 03 Apr 2000 11:31:33 PDT." References: Message-ID: <200004031948.PAA05532@eric.cnri.reston.va.us> > On Mon, 3 Apr 2000, Peter Funk wrote: > > Look's like I should better forget my proposal to add a new method > > '.supplement()' to dictionaries, which should do the opposite of > > the already available method '.update()'. > > I summarize in cronological order: > > > > Ka-Ping Yee: +1 > > Fred Drake: +0 > > Greg Stein: -1 > > Fredrik Lundh: -1 > > Jeremy Hylton: -1 > > Barry Warsaw: -0 > > > > Are there other opinions which may change the picture? <0.1 wink> > > Guido's :-) If I have to, it's a -1. I personally wouldn't be able to remember which one was update() and which one was supplement(). --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Mon Apr 3 20:57:26 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 12:57:26 -0700 (PDT) Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: <200004031948.PAA05532@eric.cnri.reston.va.us> Message-ID: On Mon, 3 Apr 2000, Guido van Rossum wrote: > > On Mon, 3 Apr 2000, Peter Funk wrote: > > > Look's like I should better forget my proposal to add a new method > > > '.supplement()' to dictionaries, which should do the opposite of > > > the already available method '.update()'. > > > I summarize in cronological order: > > > > > > Ka-Ping Yee: +1 > > > Fred Drake: +0 > > > Greg Stein: -1 > > > Fredrik Lundh: -1 > > > Jeremy Hylton: -1 > > > Barry Warsaw: -0 > > > > > > Are there other opinions which may change the picture? <0.1 wink> > > > > Guido's :-) > > If I have to, it's a -1. You don't have to, but yours *is* the only one that counts. Ours are "merely advisory" ;-) hehe... Cheers, -g -- Greg Stein, http://www.lyra.org/ From gward@cnri.reston.va.us Mon Apr 3 21:56:21 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Mon, 3 Apr 2000 16:56:21 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <200004031604.MAA05283@eric.cnri.reston.va.us>; from guido@python.org on Mon, Apr 03, 2000 at 12:04:02PM -0400 References: <200004011740.MAA04675@eric.cnri.reston.va.us> <200004031604.MAA05283@eric.cnri.reston.va.us> Message-ID: <20000403165621.A9955@cnri.reston.va.us> On 03 April 2000, Guido van Rossum said: > Not only was it an April fool's joke, but it wasn't mine! It was > forged by an insider. I know by who, but won't tell, because it was > so good. It shows that I can trust to delegate way more to the Python > community than I think I can! :-) > > BTW, the biggest give-away that it wasn't mine was the absence of my > standard sign-off line: > > --Guido van Rossum (home page: http://www.python.org/~guido/) D'ohhh!!! Hasn't anyone noticed that the largest amount of text in the joke feature list was devoted to the Distutils? I thought *that* would give it away "fer shure". You people are *so* gullible! ;-) And for my next trick... *poof*! Greg From mal@lemburg.com Mon Apr 3 22:45:20 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 23:45:20 +0200 Subject: [Python-Dev] unicode: strange exception References: <020701bf9da4$670d8580$34aab5d4@hagrid> Message-ID: <38E910F0.5EB00566@lemburg.com> Fredrik Lundh wrote: > > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (innermost last): > File "", line 1, in ? > TypeError: expected a character buffer object Good catch. The same happens when you try to compare Unicode and a different non-string type: >>> '1' == None 0 >>> u'1' == None Traceback (most recent call last): File "", line 1, in ? TypeError: expected a character buffer object The reason is the same in both cases: failing auto-coercion. I will send a patch for this tomorrow. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond@skippinet.com.au Tue Apr 4 00:11:13 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 4 Apr 2000 09:11:13 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. Message-ID: The 1.6a1 installer on Windows copies Python16.dll into the Python directory, rather than the system32 directory like 1.5.x. We discussed too long ago on this list not why this was probably not going to work. I guess Guido decided to "suck it and see" - which is fine. But guess what - it doesnt work :-( I couldnt get past the installer! The win32all installer executes some Python code at the end of the install (to generate the .pyc files and install the COM objects). This Python code is executed directly to the installation .EXE, by loading and executing a "shim" DLL I wrote for the purpose. Problem is, try as I might, my shim DLL could not load Python16.dll. The shim DLL _was_ in the same directory as Python16.dll. The only way I could have solved it was to insist the WISE installation .EXE be run from the main Python directory - obviously not an option. And the problem is quite obviously going to exist with COM objects. The problem would appear to go away if the universe switched over the LoadLibraryEx() - but we dont have that control in most cases (eg, COM, WISE etc dictate this to us). So, my solution was to copy Python16.dll to the system directory during win32all installation. This results in duplicate copies of this DLL, so to my mind, it is preferable that Python itself go back to using the System32 directory. The problem this will lead to is that Python 1.6.0 and 1.6.1 will not be able to be installed concurrently. Putting entries on the PATH doesnt solve the underlying problem - you will only be able to have one Python 1.6 directory on your path, else you end up with the same coflicts for the DLL. I dont see any better answer than System32 :-( Thoughts? Mark. From gstein@lyra.org Tue Apr 4 01:32:12 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 17:32:12 -0700 (PDT) Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: On Tue, 4 Apr 2000, Mark Hammond wrote: >... > The problem this will lead to is that Python 1.6.0 and 1.6.1 will > not be able to be installed concurrently. Same thing happened with Python 1.5, so we're no worse off. If we do want this behavior, then we need to add another version digit... > Putting entries on the > PATH doesnt solve the underlying problem - you will only be able to > have one Python 1.6 directory on your path, else you end up with the > same coflicts for the DLL. > > I dont see any better answer than System32 :-( Thoughts? I don't have a better answer, as you and I explained on several occasions. Dunno why Guido decided to skip our recommendations, but hey... it happens :-). IMO, put the DLL back into System32. If somebody can *demonstrate* (not hypothesize) a mechanism that works, then it can be switched. The underlying issue is this: Python16.dll in the app directory works for Python as an executable. However, it completely disables any possibility for *embedding* Python. On Windows, embedding is practically required because of the COM stuff (sure... a person could avoid COM but...). Cheers, -g -- Greg Stein, http://www.lyra.org/ From nascheme@enme.ucalgary.ca Tue Apr 4 02:38:41 2000 From: nascheme@enme.ucalgary.ca (Neil Schemenauer) Date: 4 Apr 2000 01:38:41 -0000 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <20000403165621.A9955@cnri.reston.va.us> References: <200004011740.MAA04675@eric.cnri.reston.va.us> <200004031604.MAA05283@eric.cnri.reston.va.us> <20000403165621.A9955@cnri.reston.va.us> Message-ID: <20000404013841.15629.qmail@cranky.arctrix.com> In comp.lang.python, you wrote: >You people are *so* gullible! ;-) Well done. You had me going for a while. You had just enough truth in there. Guido releasing the alpha at that time helped your cause as well. Neil -- Tact is the ability to tell a man he has an open mind when he has a hole in his head. From guido@python.org Tue Apr 4 03:52:52 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 22:52:52 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." References: Message-ID: <200004040252.WAA06637@eric.cnri.reston.va.us> > > The problem this will lead to is that Python 1.6.0 and 1.6.1 will > > not be able to be installed concurrently. > > Same thing happened with Python 1.5, so we're no worse off. If we do want > this behavior, then we need to add another version digit... Actually, I don't plan on releasing a 1.6.1. The next one will be 1.7. Of course, alpha and beta versions for 1.6 won't be able to live along, but I can live with that. > > Putting entries on the > > PATH doesnt solve the underlying problem - you will only be able to > > have one Python 1.6 directory on your path, else you end up with the > > same coflicts for the DLL. > > > > I dont see any better answer than System32 :-( Thoughts? > > I don't have a better answer, as you and I explained on several occasions. > Dunno why Guido decided to skip our recommendations, but hey... it > happens :-). Actually, I just wanted to get the discussion started. It worked. :-) I'm waiting for Tim Peters' response in this thread -- if I recall he was the one who said that python1x.dll should not go into the system directory. Note that I've made it easy to switch: the WISE script defines a separate variable DLLDEST which is currently set to MAINDIR, but which I could easily change to SYS32 to get the semantics you prefer. Hey, we could even give the user a choice here! <0.4 wink> > IMO, put the DLL back into System32. If somebody can *demonstrate* (not > hypothesize) a mechanism that works, then it can be switched. > > The underlying issue is this: Python16.dll in the app directory works for > Python as an executable. However, it completely disables any possibility > for *embedding* Python. On Windows, embedding is practically required > because of the COM stuff (sure... a person could avoid COM but...). Yes, I know this. I'm just not happy with it, and I've definitely heard people complain that it is evil to install directories in the system directory. Seems there are different schools of thought... Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into the system directory. I will now be distributing with the VC++ 6.0 servicepack 1 versions of these files. Won't this be a problem for installations that already have an older version? (Now that I think of it, this is another reason why I decided that at least the alpha release should install everything in MAINDIR -- to limit the damage. Any informed opinions?) David Ascher: if you're listening, could you forward this to someone at ActiveState who might understand the issues here? They should have the same problems with ActivePerl, right? Or don't they have COM support? (Personally, I think that it wouldn't be so bad if we made it so that if you install just Python, the DLLs go into MAINDIR -- if you install the COM support, it can move/copy them to the system directory. But you may find this inelegant...) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Tue Apr 4 04:11:33 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 20:11:33 -0700 (PDT) Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> Message-ID: On Mon, 3 Apr 2000, Guido van Rossum wrote: >... > Actually, I just wanted to get the discussion started. It worked. :-) hehe. True :-) > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. What's his physical address again? I have this nice little package to send him... >... > > IMO, put the DLL back into System32. If somebody can *demonstrate* (not > > hypothesize) a mechanism that works, then it can be switched. > > > > The underlying issue is this: Python16.dll in the app directory works for > > Python as an executable. However, it completely disables any possibility > > for *embedding* Python. On Windows, embedding is practically required > > because of the COM stuff (sure... a person could avoid COM but...). > > Yes, I know this. I'm just not happy with it, and I've definitely > heard people complain that it is evil to install directories in the > system directory. Seems there are different schools of thought... It is evil, but it is also unavoidable. The alternative is to munge the PATH variable, but that is a Higher Evil than just dropping DLLs into the system directory. > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > the system directory. I will now be distributing with the VC++ 6.0 > servicepack 1 versions of these files. Won't this be a problem for > installations that already have an older version? Not at all. In fact, Microsoft explicitly recommends including those in the distribution and installing them over the top of *previous* versions. They should never be downgraded (i.e. always check their version stamp!), but they should *always* be upgraded. Microsoft takes phenomenal pains to ensure that OLD applications are compatible with NEW runtimes. It is certainly possible that you could have a new app was built against a new runtime, and breaks when used against an old runtime. But that is why you always upgrade :-) And note that I do mean phenomenal pains. It is one of their ship requirements that you can always drop in a new RT without breaking old apps. So: regardless of where you decide to put python16.dll, you really should be upgrading the RT DLLs. > David Ascher: if you're listening, could you forward this to someone > at ActiveState who might understand the issues here? They should have > the same problems with ActivePerl, right? Or don't they have COM > support? ActivePerl does COM, but I dunno much more than that. > (Personally, I think that it wouldn't be so bad if we made it so that > if you install just Python, the DLLs go into MAINDIR -- if you install > the COM support, it can move/copy them to the system directory. But > you may find this inelegant...) Eek. Now you're talking about one guy reaching into another installation and munging it around. Especially for a move (boy, would that throw off the uninstall!). If you copied, then it is possible to have *two* copies of the DLL loaded into a process. The primary key is the pathname. I've had two pythoncom DLLs loaded in a process, and boy does that suck! The bugs are quite interesting, to say the least :-) And a total bear to track down until you have seen the double-load several times and can start to recognize the effects. In other words, moving is bad for elegance/uninstall reasons, and copy is bad for (potential) runtime reasons. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one@email.msn.com Tue Apr 4 05:28:54 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:54 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: <000201bf9dee$49638760$162d153f@tim> [Mark Hammond] > The 1.6a1 installer on Windows copies Python16.dll into the Python > directory, rather than the system32 directory like 1.5.x. We > discussed too long ago on this list not why this was probably not > going to work. I guess Guido decided to "suck it and see" - which > is fine. > > But guess what - it doesnt work :-( > ... > I dont see any better answer than System32 :-( Thoughts? Same as yours! Guido went off and innovated here -- always a bad sign . OTOH, I've got no use for "Program Files" -- make the cmdline version easy to use too. From tim_one@email.msn.com Tue Apr 4 05:28:59 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:59 -0400 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: Message-ID: <000401bf9dee$4bf9c2a0$162d153f@tim> [/F] > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. [Greg Stein] > Valid point. But there are other solutions, too. VC distributes a thing > named "VCVARS.BAT" to set up paths and other environ vars. Python could > certainly do the same thing (to overcome the embedded-space issue). And put the .bat file where, exactly? In the Python root, somewhere under "Program Files"? Begs the question. MS doesn't want you to put stuff in System32 either, but it's the only rational place to put the DLL. Likewise the only rational place to put the cmdline EXE is in an easy-to-get-at directory. If C:\Quickenw\ is good enough for the best-selling non-MS Windows app, C:\Python-1.6\ is good enough for Python . Besides, it's a *default*. If you love MS guidelines and are savvy enough to know what the heck they are, you're savvy enough to install it under "Program Files" yourself. The people we're trying to help here have scant idea what they're doing, and dealing with the embedded space drives them nuts at the very start of their experience. Other languages understand this. For example, here are pieces of the PATH on my machine: C:\PERL5\BIN D:\JDK1.1.5\BIN C:\WINICON\BIN E:\OCAML\BIN From tim_one@email.msn.com Tue Apr 4 05:28:56 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:56 -0400 Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Message-ID: <000301bf9dee$4acba2e0$162d153f@tim> [Peter Funk] > Look's like I should better forget my proposal to add a new method > '.supplement()' to dictionaries, which should do the opposite of > the already available method '.update()'. > I summarize in cronological order: > > Ka-Ping Yee: +1 > Fred Drake: +0 > Greg Stein: -1 > Fredrik Lundh: -1 > Jeremy Hylton: -1 > Barry Warsaw: -0 > > Are there other opinions which may change the picture? <0.1 wink> -1 on dict.supplement(), -0 on an optional arg to dict.update(), dict.update(otherdict, overwrite=1) From guido@python.org Tue Apr 4 06:25:26 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 01:25:26 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Mon, 03 Apr 2000 20:11:33 PDT." References: Message-ID: <200004040525.BAA11585@eric.cnri.reston.va.us> > What's his physical address again? I have this nice little package to send > him... Now, now, you don't want to sound like Ted Kazinsky, do you? :-) > It is evil, but it is also unavoidable. The alternative is to munge the > PATH variable, but that is a Higher Evil than just dropping DLLs into the > system directory. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > the system directory. I will now be distributing with the VC++ 6.0 > > servicepack 1 versions of these files. Won't this be a problem for > > installations that already have an older version? > > Not at all. In fact, Microsoft explicitly recommends including those in > the distribution and installing them over the top of *previous* versions. > They should never be downgraded (i.e. always check their version stamp!), > but they should *always* be upgraded. > > Microsoft takes phenomenal pains to ensure that OLD applications are > compatible with NEW runtimes. It is certainly possible that you could have > a new app was built against a new runtime, and breaks when used against an > old runtime. But that is why you always upgrade :-) > > And note that I do mean phenomenal pains. It is one of their ship > requirements that you can always drop in a new RT without breaking old > apps. > > So: regardless of where you decide to put python16.dll, you really should > be upgrading the RT DLLs. OK. That means I need two separate variables: where to install the MS DLLs and where to install the Py DLLs. > > David Ascher: if you're listening, could you forward this to someone > > at ActiveState who might understand the issues here? They should have > > the same problems with ActivePerl, right? Or don't they have COM > > support? > > ActivePerl does COM, but I dunno much more than that. I just downloaded and installed it. I've never seen an installer like this -- they definitely put a lot of effort in it. Annoying nit: they tell you to install "MS Windows Installer" first, and of course, being a MS tool, it requires a reboot. :-( Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So there. It also didn't change PATH for me, even though the docs mention that it does -- maybe only on NT? (PATH on Win9x is still a mystery to me. Is it really true that in order to change PATH an installer has to edit autoexec.bat? Or is there a better way? Anything that claims to change PATH for me doesn't seem to do so. Could I have screwed something up?) > > (Personally, I think that it wouldn't be so bad if we made it so that > > if you install just Python, the DLLs go into MAINDIR -- if you install > > the COM support, it can move/copy them to the system directory. But > > you may find this inelegant...) > > Eek. Now you're talking about one guy reaching into another installation > and munging it around. Especially for a move (boy, would that throw off > the uninstall!). If you copied, then it is possible to have *two* copies > of the DLL loaded into a process. The primary key is the pathname. I've > had two pythoncom DLLs loaded in a process, and boy does that suck! The > bugs are quite interesting, to say the least :-) And a total bear to track > down until you have seen the double-load several times and can start to > recognize the effects. > > In other words, moving is bad for elegance/uninstall reasons, and copy is > bad for (potential) runtime reasons. OK, got it. But I'm still hoping that there's something we can do differently. Didn't someone tell me that at least on Windows 2000 installing app-specific files (as opposed to MS-provided files) in the system directory is a no-no? What's the alternative there? Is the same mechanism supported on NT or Win98? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one@email.msn.com Tue Apr 4 05:28:48 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:48 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <20000403165621.A9955@cnri.reston.va.us> Message-ID: <000001bf9dee$45f318c0$162d153f@tim> [Greg Ward, fesses up] > Hasn't anyone noticed that the largest amount of text in the joke > feature list was devoted to the Distutils? I thought *that* would > give it away "fer shure". You people are *so* gullible! ;-) Me too! My first suspect was me, but for the life of me, me couldn't remember writing that. You were only second on me list (it had to be one of us, as nobody else could have described legitimate Python features as if they had been implemented in Perl <0.9 wink>). > And for my next trick... *poof*! Nice try. You're not only not invisible, I've posted your credit card info to a hacker list. crushing-guido's-enemies-cuz-he's-too-much-of-a-wuss-ly y'rs - tim From tim_one@email.msn.com Tue Apr 4 06:00:55 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 01:00:55 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> Message-ID: <000901bf9df2$c224c7a0$162d153f@tim> [Guido] > ... > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. Not that I don't say a lot of dumb-ass things , but I strongly doubt I would have said this one. In my brief career as a Windows app provider, I learned four things, the first three loudly gotten across by seriously unhappy users: 1. Contra MS guidelines, dump the core DLLs in the system directory. 2. Contra MS guidelines, install the app by default in C:\name_of_app\. 3. Contra MS guidelines, put all the config options you can in a text file C:\name_of_app\name_of_app.ini instead of the registry. 4. This one was due to my boss: Contra MS guidelines, put a copy of every MS system DLL you rely on under C:\name_of_app\, so you don't get screwed when MS introduces an incompatible DLL upgrade. In the end, the last one is the only one I disagreed with (in recent years I believe MS DLL upgrades have gotten much more likely to fix bugs than to introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 apps on my home machine that use their own copy of msvcrt.dll -- /F, if you're reading, how come the Pythonworks beta does this?). > ... > I've definitely heard people complain that it is evil to install > directories in the system directory. Seems there are different > schools of thought... Well, mucking with the system directories is horrid! Nobody likes doing it. AFAIK, though, there's really no realistic alternative. It's the only place you *know* will be on the PATH, and if an app embedding Python can't rely on PATH, it will have to hardcode the Python DLL path itself. > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > the system directory. I will now be distributing with the VC++ 6.0 > servicepack 1 versions of these files. Won't this be a problem for > installations that already have an older version? (Now that I think > of it, this is another reason why I decided that at least the alpha > release should install everything in MAINDIR -- to limit the damage. > Any informed opinions?) You're using a std installer, and MS has rigid rules for these DLLs that the installer will follow by magic. Small comfort if things break, but this one is (IMO) worth playing along with. From tim_one@email.msn.com Tue Apr 4 06:42:55 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 01:42:55 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <000201bf9df8$a066b8c0$6d2d153f@tim> [Guido, on changing socket.connect() to require a single arg] > ... > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... I think this one already caused too much pain: it appears virtually everyone uses the two-argument form routinely, and the reason for getting rid of that seems pretty weak. As Tres Seaver just wrote on c.l.py, Constructing a spurious "address" object (which has no behavior, and exists only to be torn apart inside the implementation) seems a foolish consistency, beyond doubt. So offer to back off on this one, in return for making 1/2 yield 0.5 . From guido@python.org Tue Apr 4 08:03:58 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 03:03:58 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 04 Apr 2000 01:42:55 EDT." <000201bf9df8$a066b8c0$6d2d153f@tim> References: <000201bf9df8$a066b8c0$6d2d153f@tim> Message-ID: <200004040703.DAA11944@eric.cnri.reston.va.us> > I think this one already caused too much pain: it appears virtually > everyone uses the two-argument form routinely, and the reason for getting > rid of that seems pretty weak. As Tres Seaver just wrote on c.l.py, > > Constructing a spurious "address" object (which has no behavior, and > exists only to be torn apart inside the implementation) seems a > foolish consistency, beyond doubt. No more foolish than passing a point as an (x, y) tuple instead of separate x and y arguments. There are good reasons for passing it as a tuple, such as being able to store and recall it as a single entity. > So offer to back off on this one, in return for making 1/2 yield 0.5 . Unfortunately, I think I will have to. And it will have to be documented. The problem is that I can't document it as connect(host, port) -- there are Unix domain sockets that only take a single string argument (a filename). Also, sendto() takes a (host, port) tuple only. It has other arguments so that's the only form. Maybe I'll have to document it as connect(address) with a backwards compatible syntax connect(a, b) being equivalent to connect((a, b)). At least that sets the record straight without breaking old code. Still torn, --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Tue Apr 4 09:59:02 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 4 Apr 2000 18:59:02 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <000901bf9df2$c224c7a0$162d153f@tim> Message-ID: > 2. Contra MS guidelines, install the app by default in > C:\name_of_app\. Ive got to agree here. While I also see Greg's point, the savvy user can place it where they want, while the "average user" is better of with a more reasonable default. However, I would tend to go for "\name_of_app" rooted from the Windows drive. It is likely that this will be the default drive when a command prompt is open, so a simple "cd \python1.6" will work. This is also generally the same drive the default "Program Files" is on too. > You're using a std installer, and MS has rigid rules for > these DLLs that the > installer will follow by magic. Small comfort if things > break, but this one > is (IMO) worth playing along with. I checked the installer, and these MSVC dlls are indeed set to install only if the existing version is the "same or older". Annoyingly, it doesnt have an option for only "if older"! They are also set to correctly reference count in the registry. I believe that by installing a single custom DLL into the system directory, plus correctly installing some MS system DLLs into the system directory we are being perfect citizens. [Interestingly, Windows 2000 has a system process that continually monitors the system directory. If it detects that a "protected file" has been changed, it promptly copies the original back over the top! I believe the MSVC*.dlls are in the protected list, so can only be changed with a service pack release anyway. Everything _looks_ like it updates - Windows just copies it back!] Mark. From mal@lemburg.com Tue Apr 4 10:26:53 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 11:26:53 +0200 Subject: [Python-Dev] Unicode and comparisons Message-ID: <38E9B55D.F2B6409C@lemburg.com> Fredrik bug report made me dive a little deeper into compares and contains tests. Here is a snapshot of what my current version does: >>> '1' == None 0 >>> u'1' == None 0 >>> '1' == 'aäöü' 0 >>> u'1' == 'aäöü' Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data >>> '1' in ('a', None, 1) 0 >>> u'1' in ('a', None, 1) 0 >>> '1' in (u'aäöü', None, 1) 0 >>> u'1' in ('aäöü', None, 1) Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data The decoding errors occur because 'aäöü' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode). Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From joachim@medien.tecmath.de Tue Apr 4 10:28:37 2000 From: joachim@medien.tecmath.de (Joachim Koenig-Baltes) Date: Tue, 4 Apr 2000 11:28:37 +0200 (MEST) Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <20000403131720.A10313@sz-sb.de> References: <200004011740.MAA04675@eric.cnri.reston.va.us> <20000403131720.A10313@sz-sb.de> Message-ID: <20000404092837.944E889@tmpc200.medien.tecmath.de> In comp.lang.python, you wrote: >On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: >> >> Python strings can now be stored as Unicode strings. To make it easier >> to type Unicode strings, the single-quote character defaults to creating >> a Unicode string, while the double-quote character defaults to ASCII >> strings. If you need to create a Unicode string with double quotes, >> just preface it with the letter "u"; likewise, an ASCII string can be >> created by prefacing single quotes with the letter "a". For example: >> >> foo = 'hello' # Unicode >> foo = "hello" # ASCII > >Is single-quoting for creating unicode clever ? I think there might be a problem >with old code when the operations on unicode strings are not 100% compatible to >the standard string operations. I don't know if this is a real problem - it's >just a point for discussion. > >Cheers, >Andreas > Hallo Andreas, hast Du mal auf das Datum des Beitrages von Guido geschaut? Echt guter April- Scherz, da er die Scherze sehr gut mit der Realität mischt. Liebe Grüße, auch an die anderen, Joachim From guido@python.org Tue Apr 4 12:51:42 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 07:51:42 -0400 Subject: [Python-Dev] Unicode and comparisons In-Reply-To: Your message of "Tue, 04 Apr 2000 11:26:53 +0200." <38E9B55D.F2B6409C@lemburg.com> References: <38E9B55D.F2B6409C@lemburg.com> Message-ID: <200004041151.HAA12035@eric.cnri.reston.va.us> > Fredrik bug report made me dive a little deeper into compares > and contains tests. > > Here is a snapshot of what my current version does: > > >>> '1' == None > 0 > >>> u'1' == None > 0 > >>> '1' == 'aäöü' > 0 > >>> u'1' == 'aäöü' > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > >>> '1' in ('a', None, 1) > 0 > >>> u'1' in ('a', None, 1) > 0 > >>> '1' in (u'aäöü', None, 1) > 0 > >>> u'1' in ('aäöü', None, 1) > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > The decoding errors occur because 'aäöü' is not a valid > UTF-8 string (Unicode comparisons coerce both arguments > to Unicode by interpreting normal strings as UTF-8 > encodings of Unicode). > > Question: is this behaviour acceptable or should I go > even further and mask decoding errors during compares > and contains tests too ? I think this is right -- I expect it will catch more errors than it will cause. This made me go out and see what happens if you compare a numeric class instance (one that defines __int__) to another int -- it doesn't even call the __int__ method! This should be fixed in 1.7 when we do the smart comparisons and rich coercions (or was it the other way around? :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 4 14:24:12 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 09:24:12 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 01:00:55 EDT." <000901bf9df2$c224c7a0$162d153f@tim> References: <000901bf9df2$c224c7a0$162d153f@tim> Message-ID: <200004041324.JAA12173@eric.cnri.reston.va.us> > [Guido] > > ... > > I'm waiting for Tim Peters' response in this thread -- if I recall he > > was the one who said that python1x.dll should not go into the system > > directory. [Tim] > Not that I don't say a lot of dumb-ass things , but I strongly doubt I > would have said this one. OK, it must be my overworked tired brain that is playing games with me. It might have been Jim Ahlstrom then, our resident Windows 3.1 supporter. :-) > In my brief career as a Windows app provider, I > learned four things, the first three loudly gotten across by seriously > unhappy users: > > 1. Contra MS guidelines, dump the core DLLs in the system directory. > 2. Contra MS guidelines, install the app by default in C:\name_of_app\. It's already been said that the drive letter could be chosen more carefully. I wonder if the pathname should also be an 8+3 (max) name, so that it can be relyably typed into a DOS window. > 3. Contra MS guidelines, put all the config options you can in a text file > C:\name_of_app\name_of_app.ini > instead of the registry. > 4. This one was due to my boss: Contra MS guidelines, put a copy of > every MS system DLL you rely on under C:\name_of_app\, so you don't > get screwed when MS introduces an incompatible DLL upgrade. > > In the end, the last one is the only one I disagreed with (in recent years I > believe MS DLL upgrades have gotten much more likely to fix bugs than to > introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 apps on > my home machine that use their own copy of msvcrt.dll -- /F, if you're > reading, how come the Pythonworks beta does this?). Probably because Pythonworks doesn't care about COM or embedding. Anyway, I now agree with you on 1-2 and on not following 4. As for 3, I think that for Mark's COM support to work, the app won't necessarily be able to guess what \name_of_app\ is, so that's where the registry comes in handy. PATH info is really about all that Python puts in the registry, so I think we're okay here. (Also if you read PC\getpathp.c in 1.6, you'll see that it now ignores most of the registry when it finds the installation through a search based on argv[0].) > > ... > > I've definitely heard people complain that it is evil to install > > directories in the system directory. Seems there are different > > schools of thought... > > Well, mucking with the system directories is horrid! Nobody likes doing it. > AFAIK, though, there's really no realistic alternative. It's the only place > you *know* will be on the PATH, and if an app embedding Python can't rely on > PATH, it will have to hardcode the Python DLL path itself. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > the system directory. I will now be distributing with the VC++ 6.0 > > servicepack 1 versions of these files. Won't this be a problem for > > installations that already have an older version? (Now that I think > > of it, this is another reason why I decided that at least the alpha > > release should install everything in MAINDIR -- to limit the damage. > > Any informed opinions?) > > You're using a std installer, and MS has rigid rules for these DLLs that the > installer will follow by magic. Small comfort if things break, but this one > is (IMO) worth playing along with. One more thing that I just realized. There are a few Python extension modules (_tkinter and the new pyexpat) that rely on external DLLs: _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs xmlparse.dll and xmltok.dll. If I understand correctly how the path rules work, these have to be on PATH too (although the pyd files don't have to be). This worries me -- these aren't official MS DLLs and neither are the our own, so we could easily stomp on some other app's version of the same... (The tcl folks don't change their filename when the 3rd version digit changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) Is there a better solution? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Tue Apr 4 15:20:19 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 10:20:19 -0400 (EDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200004040703.DAA11944@eric.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> Message-ID: <14569.64035.285070.760022@seahag.cnri.reston.va.us> Guido van Rossum writes: > Maybe I'll have to document it as connect(address) with a backwards > compatible syntax connect(a, b) being equivalent to connect((a, b)). > At least that sets the record straight without breaking old code. If you *must* support the two-arg flavor (which I've never actually seen outside this discussion), I'd suggest not documenting it as a backward compatibility, only that it will disappear in 1.7. This can be done fairly easily and cleanly in the library reference. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fredrik Lundh" Message-ID: <005101bf9e44$71bade60$34aab5d4@hagrid> Tim Peters wrote: > 4. This one was due to my boss: Contra MS guidelines, put a copy of > every MS system DLL you rely on under C:\name_of_app\, so you don't > get screwed when MS introduces an incompatible DLL upgrade. >=20 > In the end, the last one is the only one I disagreed with (in recent = years I > believe MS DLL upgrades have gotten much more likely to fix bugs than = to > introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 = apps on > my home machine that use their own copy of msvcrt.dll -- /F, if you're > reading, how come the Pythonworks beta does this?). we've been lazy... in the pre-IE days, some machines came without any msvcrt.dll at all. so since we have to ship it, I guess it was easier to ship it along with all the other components, rather than implementing the "install in system directory only if newer" stuff... (I think it's on the 2.0 todo list ;-) From guido@python.org Tue Apr 4 15:52:30 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 10:52:30 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 04 Apr 2000 10:20:19 EDT." <14569.64035.285070.760022@seahag.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> Message-ID: <200004041452.KAA12455@eric.cnri.reston.va.us> > If you *must* support the two-arg flavor (which I've never actually > seen outside this discussion), I'd suggest not documenting it as a > backward compatibility, only that it will disappear in 1.7. This can > be done fairly easily and cleanly in the library reference. Yes, I must. Can you fix up the docs? --Guido van Rossum (home page: http://www.python.org/~guido/) From Fredrik Lundh" <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <006301bf9e45$5b168dc0$34aab5d4@hagrid> Guido van Rossum wrote: > I wonder if the pathname should also be an 8+3 (max) name, so that it > can be relyably typed into a DOS window. "\py" is reserved ;-) From guido@python.org Tue Apr 4 15:56:17 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 10:56:17 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:52:08 +0200." <006301bf9e45$5b168dc0$34aab5d4@hagrid> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <006301bf9e45$5b168dc0$34aab5d4@hagrid> Message-ID: <200004041456.KAA12509@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > I wonder if the pathname should also be an 8+3 (max) name, so that it > > can be relyably typed into a DOS window. > > "\py" is reserved ;-) OK, it'll be \python16 then. --Guido van Rossum (home page: http://www.python.org/~guido/) From Fredrik Lundh" Message-ID: <009701bf9e47$1cf13660$34aab5d4@hagrid> > Socket methods: > + (NB: an argument list of the form (sockaddr...) means that multiple > + arguments are treated the same as a single tuple argument, for = backwards > + compatibility.) how about threatening to remove this in 1.7? IOW: > + (NB: an argument list of the form (sockaddr...) means that multiple > + arguments are treated the same as a single tuple argument, for = backwards > + compatibility. This is deprecated, and will be removed in future = versions.) From skip@mojam.com (Skip Montanaro) Tue Apr 4 15:23:44 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 4 Apr 2000 09:23:44 -0500 (CDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14569.64035.285070.760022@seahag.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> Message-ID: <14569.64240.80221.587062@beluga.mojam.com> Fred> If you *must* support the two-arg flavor (which I've never Fred> actually seen outside this discussion), I'd suggest not Fred> documenting it as a backward compatibility, only that it will Fred> disappear in 1.7. Having surprisingly little opportunity to call socket.connect directly in my work (considering the bulk of my programming is for the web), I'll note for the record that the direct calls I've made to socket.connect all have two arguments: host and port. It never occurred to me that there would even be a one-argument version. After all, why look at the docs for help if what you're doing already works? Skip From gvwilson@nevex.com Tue Apr 4 16:34:38 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Tue, 4 Apr 2000 11:34:38 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <38E9B55D.F2B6409C@lemburg.com> Message-ID: Random thought (hopefully more sensible than my last one): Would it make sense in P3K to keep using '/' for CS-style division (int/int -> rounded-down-int), and to introduce '=F6' for math-style division (int=F6int -> float-when-necessary)? Greg From gmcm@hypernet.com Tue Apr 4 16:39:52 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 11:39:52 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> References: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." Message-ID: <1257259699-4963377@hypernet.com> [Guido] > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. Some time ago Tim and I said that the place for a DLL that is intimately tied to an EXE is in the EXE's directory. The search path: 1) the EXE's directory 2) the current directory (useless) 3) the system directory 4) the Windows directory 5) the PATH For a general purpose DLL, that makes the system directory the only sane choice (if modifying PATH was sane, then PATH would be saner, but a SpecTCL will just screw you up). Things that go in the system directory should maintain backwards compatibility. For a DLL, that means all the old entry points are still there, in the same order with new ones at the end. For Python, there's no crying need to conform for now, but if (when?) embedding Python becomes ubiquitous, this (or some other scheme) may need to be considered. - Gordon From guido@python.org Tue Apr 4 16:45:39 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 11:45:39 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 11:39:52 EDT." <1257259699-4963377@hypernet.com> References: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." <1257259699-4963377@hypernet.com> Message-ID: <200004041545.LAA12635@eric.cnri.reston.va.us> > Some time ago Tim and I said that the place for a DLL that is > intimately tied to an EXE is in the EXE's directory. But the conclusion seems to be that python1x.dll is not closely tied to python.exe -- it may be invoked via COM. > The search path: > 1) the EXE's directory > 2) the current directory (useless) > 3) the system directory > 4) the Windows directory > 5) the PATH > > For a general purpose DLL, that makes the system directory > the only sane choice (if modifying PATH was sane, then > PATH would be saner, but a SpecTCL will just screw you up). > > Things that go in the system directory should maintain > backwards compatibility. For a DLL, that means all the old > entry points are still there, in the same order with new ones at > the end. For Python, there's no crying need to conform for > now, but if (when?) embedding Python becomes ubiquitous, > this (or some other scheme) may need to be considered. Where should I put tk83.dll etc.? In the Python\DLLs directory, where _tkinter.pyd also lives? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 4 16:43:49 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 11:43:49 -0400 Subject: [Python-Dev] re: division In-Reply-To: Your message of "Tue, 04 Apr 2000 11:34:38 EDT." References: Message-ID: <200004041543.LAA12616@eric.cnri.reston.va.us> > Random thought (hopefully more sensible than my last one): > > Would it make sense in P3K to keep using '/' for CS-style division > (int/int -> rounded-down-int), and to introduce 'ö' for math-style > division (intöint -> float-when-necessary)? Careful with your character sets there... The symbol you typed looks like a lowercase o with dieresis to me. :-( Assuming you're proposing something like this: . --- . I'm not so sure that choosing a non-ASCII symbol is going to work. For starters, it's on very few keyboards, and that won't change soon! In the past we've talked about using // for integer division and / for regular (int/int->float) division. This would mean that we have to introduce // now as an alias for /, and encourage people to use it for int division (only); then in 1.7 using / between ints will issue a compatibility warning, and in Py3K int/int will yield a float. It's still going to be painful, though. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Tue Apr 4 16:52:52 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 17:52:52 +0200 Subject: [Python-Dev] Unicode and comparisons References: <38E9B55D.F2B6409C@lemburg.com> <200004041151.HAA12035@eric.cnri.reston.va.us> Message-ID: <38EA0FD4.DB0D96BF@lemburg.com> Guido van Rossum wrote: > > > Fredrik bug report made me dive a little deeper into compares > > and contains tests. > > > > Here is a snapshot of what my current version does: > > > > >>> '1' == None > > 0 > > >>> u'1' == None > > 0 > > >>> '1' == 'aäöü' > > 0 > > >>> u'1' == 'aäöü' > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: UTF-8 decoding error: invalid data > > > > >>> '1' in ('a', None, 1) > > 0 > > >>> u'1' in ('a', None, 1) > > 0 > > >>> '1' in (u'aäöü', None, 1) > > 0 > > >>> u'1' in ('aäöü', None, 1) > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: UTF-8 decoding error: invalid data > > > > The decoding errors occur because 'aäöü' is not a valid > > UTF-8 string (Unicode comparisons coerce both arguments > > to Unicode by interpreting normal strings as UTF-8 > > encodings of Unicode). > > > > Question: is this behaviour acceptable or should I go > > even further and mask decoding errors during compares > > and contains tests too ? > > I think this is right -- I expect it will catch more errors than it > will cause. Ok, I'll only mask the TypeErrors then. (UnicodeErrors are subclasses of ValueErrors and thus do not get masked.) > This made me go out and see what happens if you compare a numeric > class instance (one that defines __int__) to another int -- it doesn't > even call the __int__ method! This should be fixed in 1.7 when we do > the smart comparisons and rich coercions (or was it the other way > around? :-). Not sure ;-) I think both go hand in hand. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fredrik Lundh" Message-ID: <010901bf9e4d$eb097840$34aab5d4@hagrid> gvwilson@nevex.com wrote: > Random thought (hopefully more sensible than my last one): >=20 > Would it make sense in P3K to keep using '/' for CS-style division > (int/int -> rounded-down-int), and to introduce '=F6' for math-style > division (int=F6int -> float-when-necessary)? where's the =F6 key? (oh, look, my PC keyboard has one. but if I press it, I get a /. hmm...) From martin@loewis.home.cs.tu-berlin.de Tue Apr 4 16:44:17 2000 From: martin@loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 4 Apr 2000 17:44:17 +0200 Subject: [Python-Dev] Re: Unicode and comparisons Message-ID: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> > Question: is this behaviour acceptable or should I go even further > and mask decoding errors during compares and contains tests too ? I always thought it is a core property of cmp that it works between all objects. Because of that, >>> x=[u'1','aäöü'] >>> x.sort() Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data fails. As always in cmp, I'd expect to get a consistent outcome here (ie. cmp should give a total order on objects). OTOH, I'm not so sure why cmp between plain and unicode strings needs to perform UTF-8 conversion? IOW, why is it desirable that >>> 'a' == u'a' 1 Anyway, I'm not objecting to that outcome - I only think that, to get cmp consistent, it may be necessary to drop this result. If it is not necessary, the better. Regards, Martin From jim@interet.com Tue Apr 4 17:06:27 2000 From: jim@interet.com (James C. Ahlstrom) Date: Tue, 04 Apr 2000 12:06:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <38EA1303.B393D7F8@interet.com> Guido van Rossum wrote: > OK, it must be my overworked tired brain that is playing games with > me. It might have been Jim Ahlstrom then, our resident Windows 3.1 > supporter. :-) I think I've been insulted. What's wrong with Windows 3.1?? :-) > > 1. Contra MS guidelines, dump the core DLLs in the system directory. The Python DLL must really go in the Windows system directory. I don't see any other choice. This is in accordance with Microsoft guidelines AFAIK, or anyway, that's the only way it Just Works. The Python16.dll is a system file if you are using COM, and it supports an embedded scripting language, so it goes into the system dir. QED. > > 3. Contra MS guidelines, put all the config options you can in a text file > > C:\name_of_app\name_of_app.ini > > instead of the registry. This is an excellent practice, and there should be a standard module to deal with .ini files. But, as you say, the registry is sometimes needed. > > 4. This one was due to my boss: Contra MS guidelines, put a copy of > > every MS system DLL you rely on under C:\name_of_app\, so you don't > > get screwed when MS introduces an incompatible DLL upgrade. Yuk. More trouble than it's worth. > > > I've definitely heard people complain that it is evil to install > > > directories in the system directory. Seems there are different > > > schools of thought... It is very illegal to install directories as opposed to DLL's. Do you really mean directories? If so, don't do that. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > > the system directory. I will now be distributing with the VC++ 6.0 If you distribute these, you must check version numbers and only replace old versions. Wise and other installers do this easily. Doing otherwise is evil and unacceptable. Checking file dates is not good enough either. > > > servicepack 1 versions of these files. Won't this be a problem for > > > installations that already have an older version? Probably not, thanks to Microsoft's valiant testing efforts. > > > (Now that I think > > > of it, this is another reason why I decided that at least the alpha > > > release should install everything in MAINDIR -- to limit the damage. > > > Any informed opinions?) Distribute these files with a valid Wise install script which checks VERSIONS. > One more thing that I just realized. There are a few Python extension > modules (_tkinter and the new pyexpat) that rely on external DLLs: > _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs > xmlparse.dll and xmltok.dll. Welcome to the club. > If I understand correctly how the path rules work, these have to be on > PATH too (although the pyd files don't have to be). This worries me > -- these aren't official MS DLLs and neither are the our own, so we > could easily stomp on some other app's version of the same... > (The tcl folks don't change their filename when the 3rd version digit > changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) > > Is there a better solution? This is a daily annoyance and risk in the Windows world. If you require Tk, then you need to completely understand how to produce a valid Tk distribution. Same with PIL (which requires Tk). Often you won't know that some pyd requires some other obscure DLL. To really do this you need something high level. Like rpm's on linux. On Windows, people either write complex install programs with Wise et al, or run third party installers provided with (for example) Tk from simpler install scripts. It is then up to the Tk people to know how to install it, and how to deal with version upgrades. JimA From gmcm@hypernet.com Tue Apr 4 17:10:38 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 12:10:38 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041545.LAA12635@eric.cnri.reston.va.us> References: Your message of "Tue, 04 Apr 2000 11:39:52 EDT." <1257259699-4963377@hypernet.com> Message-ID: <1257257855-5074057@hypernet.com> [Gordon] > > Some time ago Tim and I said that the place for a DLL that is > > intimately tied to an EXE is in the EXE's directory. [Guido] > But the conclusion seems to be that python1x.dll is not closely tied > to python.exe -- it may be invoked via COM. Right. > Where should I put tk83.dll etc.? In the Python\DLLs directory, where > _tkinter.pyd also lives? Won't work (unless there are some tricks in MSVC 6 I don't know about). Assuming no one is crazy enough to use Tk in a COM server, (or rather, that their insanity need not be catered to), then I'd vote for the directory where python.exe and pythonw.exe live. - Gordon From gvwilson@nevex.com Tue Apr 4 17:20:22 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Tue, 4 Apr 2000 12:20:22 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <200004041543.LAA12616@eric.cnri.reston.va.us> Message-ID: > Assuming you're proposing something like this: > > . > --- > . > > I'm not so sure that choosing a non-ASCII symbol is going to work. For > starters, it's on very few keyboards, and that won't change soon! I realize that, but neither are many of the accented characters used in non-English names (said the Canadian). If we assume 18-24 months until P3K, will it be safe to assume support for non-7-bit characters, or will we continue to be constrained by what was available on PDP-11's in 1975? (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', but harder to track down, since you'll have to scrutinize values very carefully to spot the difference. Haven't done any field tests, though...) Greg From bwarsaw@cnri.reston.va.us Tue Apr 4 18:56:23 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 4 Apr 2000 13:56:23 -0400 (EDT) Subject: [Python-Dev] re: division References: <200004041543.LAA12616@eric.cnri.reston.va.us> Message-ID: <14570.11463.83210.17189@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> If we assume 18-24 months until P3K, will it be safe to gvwilson> assume support for non-7-bit characters, or will we gvwilson> continue to be constrained by what was available on gvwilson> PDP-11's in 1975? Undoubtedly. From gvwilson@nevex.com Tue Apr 4 19:08:36 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Tue, 4 Apr 2000 14:08:36 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case Message-ID: Here's a longer, and hopefully more coherent, argument for using the divided-by sign in P3K: 1. If P3K source is allowed to be Unicode, then all Python programming systems (custom-made or pre-existing) are going to have to be able to handle more than just 1970s-vintage 7-bit ASCII. If that support has to be there, it seems a shame not to make use of it in the language itself where that would be helpful. [1,2] 2. As I understand it, support for (int,int)->float division is being added to help people who think that arithmetic on computers ought to behave like arithmetic did in grade 4. I have no data to support this, but I expect that such people will understand the divided-by sign more readily than a forward slash. [3] 3. I also expect, again without data, that '//' vs. '/' will lead to as high a proportion of errors as '==' vs. '='. These errors may even prove harder to track down, since the result is a slightly wrong answer instead of a state change leading (often) to early loop termination or something equally noticeable. Greg [1] I'm aware that there are encoding issues (the replies to my first post mentioned at least two different ways for "my" divided-by sign to display), but this is an issue that will have to be tackled in general in order to support Unicode anyway. [2] I'd be grateful if everyone posting objections along the lines of, "But what about emacs/vi/some other favored bit of legacy technology?" could also indicate whether they use lynx(1) as their web browser, and/or are sure that 100% of the web pages they have built are accessible to people who don't have bit-mapped graphics. I am *not* trying to be inflammatory, I just think that if a technology is taken for granted as part of one tool, then it is legitimate to ask that it be taken for granted in another. [3] Please note that I am not asking for a multiplication sign, a square root sign, or any of APL's mystic runes. From fdrake@acm.org Tue Apr 4 19:27:08 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 14:27:08 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <14570.13308.147675.434718@seahag.cnri.reston.va.us> gvwilson@nevex.com writes: > 1. If P3K source is allowed to be Unicode, then all Python programming > systems (custom-made or pre-existing) are going to have to be able > to handle more than just 1970s-vintage 7-bit ASCII. If that support > has to be there, it seems a shame not to make use of it in the language > itself where that would be helpful. [1,2] I don't recall any requirement that the host be able to deal with Unicode specially (meaning "other than as binary data"). Perhaps I missed that? > 2. As I understand it, support for (int,int)->float division is being > added to help people who think that arithmetic on computers ought to > behave like arithmetic did in grade 4. I have no data to support this, > but I expect that such people will understand the divided-by sign more > readily than a forward slash. [3] I don't think the division sign itself is a problem. Re-training experianced programmers might be; I don't think there's any intention of alienating that audience. > 3. I also expect, again without data, that '//' vs. '/' will lead to as > high a proportion of errors as '==' vs. '='. These errors may even > prove harder to track down, since the result is a slightly wrong answer > instead of a state change leading (often) to early loop termination or > something equally noticeable. A agree. > [3] Please note that I am not asking for a multiplication sign, a square > root sign, or any of APL's mystic runes. As I indicated above, I don't think the specific runes are the problem (outside of programmer alienation). The *biggest* problem (IMO) is that the runes are not on our keyboards. This has nothing to do with the appropriateness of the runes to the semantic meanings bound to them in the language definition, this has to do convenience for typing without any regard to cultured habits in the current programmer population. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson@nevex.com Tue Apr 4 19:38:29 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Tue, 4 Apr 2000 14:38:29 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14570.13308.147675.434718@seahag.cnri.reston.va.us> Message-ID: Hi, Fred; thanks for your mail. > gvwilson@nevex.com writes: > > 1. If P3K source is allowed to be Unicode > I don't recall any requirement that the host be able to deal with > Unicode specially (meaning "other than as binary data"). Perhaps I > missed that? I'm sorry, I didn't mean to imply that this decision had been taken --- hence the "if". However, allowing Unicode in source doesn't seem to have slowed down adoption of Java... :-) > I don't think the division sign itself is a problem. Re-training > experianced programmers might be; I don't think there's any intention > of alienating that audience. I think this comes down to spin. If this is presented as, "We're adding a symbol that isn't on your keyboard in order to help newbies," it'll be flamed. If it's presented as, "Python is the first scripting language to fully embrace internationalization, so get with the twenty-first century!" (or something like that), I could see it getting a much more positive response. I also think that, despite their grumbling, experienced programmers are pretty adaptable. After all, I switch from Emacs Lisp to Python to C++ half-a-dozen times a day... :-) > The *biggest* problem (IMO) is that the runes are not on our > keyboards. Agreed. Perhaps non-native English speakers could pitch in and describe how easy/difficult it is for them to (for example) put properly-accented Spanish comments in code? Thanks, Greg From klm@digicool.com Tue Apr 4 19:48:52 2000 From: klm@digicool.com (Ken Manheimer) Date: Tue, 4 Apr 2000 14:48:52 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14570.13308.147675.434718@seahag.cnri.reston.va.us> Message-ID: On Tue, 4 Apr 2000, Fred L. Drake, Jr. wrote: > gvwilson@nevex.com writes: > > 1. If P3K source is allowed to be Unicode, then all Python programming > > systems (custom-made or pre-existing) are going to have to be able > > to handle more than just 1970s-vintage 7-bit ASCII. If that support > > has to be there, it seems a shame not to make use of it in the language > > itself where that would be helpful. [1,2] > [...] > As I indicated above, I don't think the specific runes are the > problem (outside of programmer alienation). The *biggest* problem > (IMO) is that the runes are not on our keyboards. This has nothing to > do with the appropriateness of the runes to the semantic meanings > bound to them in the language definition, this has to do convenience > for typing without any regard to cultured habits in the current > programmer population. In general, it seems that there are some places where a programming language implementation should not be on the leading edge, and this is one. I think we'd have to be very confident that this new division sign (or whatever) is going to be in ubiquitous use, on everyone's keyboard, etc, before we could even consider making it a necessary part of the standard language. Do you have that confidence? Ken Manheimer klm@digicool.com From gvwilson@nevex.com Tue Apr 4 19:53:52 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Tue, 4 Apr 2000 14:53:52 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: HI, Ken; thanks for your mail. > In general, it seems that there are some places where a programming > language implementation should not be on the leading edge, and this is > one. I think we'd have to be very confident that this new division > sign (or whatever) is going to be in ubiquitous use, on everyone's > keyboard, etc, before we could even consider making it a necessary > part of the standard language. Do you have that confidence? I wouldn't expect the division sign to be on keyboards. On the other hand, I would expect that having to type a two-stroke sequence every once in a while would help native English speakers appreciate what people in other countries sometimes have to go through in order to spell their names correctly... :-) Greg From pf@artcom-gmbh.de Tue Apr 4 19:48:11 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Tue, 4 Apr 2000 20:48:11 +0200 (MEST) Subject: .ini files - was Re: [Python-Dev] DLL in the system dir In-Reply-To: <38EA1303.B393D7F8@interet.com> from "James C. Ahlstrom" at "Apr 4, 2000 12: 6:27 pm" Message-ID: Hi! [...] > > > 3. Contra MS guidelines, put all the config options you can in a text file > > > C:\name_of_app\name_of_app.ini > > > instead of the registry. James C. Ahlstrom: > This is an excellent practice, and there should be a standard module to > deal > with .ini files. [...] One half of it is already there in the standard library: 'ConfigParser'. From my limited knowledge about windows (shrug) this can at least read .ini files. Writing this info again out to a file shouldn't be too hard. Regards, Peter From Fredrik Lundh" Message-ID: <024401bf9e67$9a1a53e0$34aab5d4@hagrid> gvwilson@nevex.com wrote: > > The *biggest* problem (IMO) is that the runes are not on our > > keyboards. >=20 > Agreed. Perhaps non-native English speakers could pitch in and = describe > how easy/difficult it is for them to (for example) put = properly-accented > Spanish comments in code? you know, people who do use foreign languages a lot tend to use keyboards designed for their language. I have keys for all swedish characters on my keyboard -- att skriva korrekt svenska p=E5 mitt tangentbord =E4r hur enkelt som helst... to type less common latin 1 characters, I =FCs=FB=E0ll=FF o=F1l=FF have = to use tw=F6 keys -- one "d=E8=E1d ke=FD" for the =E5ccent, f=F6llow=E9d by = th=EB c=F6rre- sp=F5nding ch=E4r=E5ct=E8r. (visst, =FC och =E9 anv=E4nds ibland i = svensk text, och fanns f=F6rr ofta som separata tangenter -- i alla fall innan pc'n kom och f=F6rst=F6rde allting). besides, the use of indentation causes enough problems when doing trivial things like mailing, posting, and typesetting Python code. = adding odd characters to the mix won't exactly help... From gmcm@hypernet.com Tue Apr 4 20:46:34 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 15:46:34 -0400 Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <1257244899-5853377@hypernet.com> Greg Wilson wrote: > I wouldn't expect the division sign to be on keyboards. On the other hand, > I would expect that having to type a two-stroke sequence every once in a > while would help native English speakers appreciate what people in other > countries sometimes have to go through in order to spell their names > correctly... Certain stuffy (and now deceased) members of my family, despite emigrating to the Americas during the Industrial Revolution, insisted that the proper spelling of McMillan involved elevating the "c". Wonder if there's a unicode character for that, so I can get righteously indignant whenever people fail to use it. Personally, I'm delighted when people don't add extra letters to my name, and even that's pretty silly, since all the variations on M*M*ll*n come down to how some government clerk chose to spell it. - Gordon From guido@python.org Tue Apr 4 20:49:32 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 15:49:32 -0400 Subject: [Python-Dev] Re: Unicode and comparisons In-Reply-To: Your message of "Tue, 04 Apr 2000 17:44:17 +0200." <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> References: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> Message-ID: <200004041949.PAA13102@eric.cnri.reston.va.us> > I always thought it is a core property of cmp that it works between > all objects. Not any more. Comparisons can raise exceptions -- this has been so since release 1.5. This is rarely used between standard objects, but not unheard of; and class instances can certainly do anything they want in their __cmp__. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Tue Apr 4 20:51:14 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 15:51:14 -0400 (EDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14569.64240.80221.587062@beluga.mojam.com> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> <14569.64240.80221.587062@beluga.mojam.com> Message-ID: <14570.18354.151349.452329@seahag.cnri.reston.va.us> Skip Montanaro writes: > arguments: host and port. It never occurred to me that there would even be > a one-argument version. After all, why look at the docs for help if what > you're doing already works? And it never occurred to me that there would be two args; I vaguely recall the C API having one argument (a structure). Ah, well. I've patched up the documents to warn those who expect intuitive APIs. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Tue Apr 4 20:57:47 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 15:57:47 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Sun, 02 Apr 2000 10:37:11 +0200." <004701bf9c7e$a5045480$34aab5d4@hagrid> References: <004701bf9c7e$a5045480$34aab5d4@hagrid> Message-ID: <200004041957.PAA13168@eric.cnri.reston.va.us> > one of my side projects for SRE is to create a regex-compatible > frontend. since both engines have NFA semantics, this mostly > involves writing an alternate parser. > > however, when I started playing with that, I completely forgot > about the regex.set_syntax() function. supporting one extra > syntax isn't that much work, but a whole bunch of them? > > so what should we do? > > 1. completely get rid of regex (bjorn would love that, > don't you think?) (Who's bjorn?) > 2. remove regex.set_syntax(), and tell people who've > used it that they're SOL. > > 3. add all the necessary flags to the new parser... > > 4. keep regex around as before, and live with the > extra code bloat. > > comments? I'm for 4, then deprecating it, and eventually switching to 1. This saves you effort debugging compatibility with an obsolete module. If it ain't broken, don't "fix" it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 4 21:10:07 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:10:07 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> Message-ID: <200004042010.QAA13180@eric.cnri.reston.va.us> [me] > > One more thing that I just realized. There are a few Python extension > > modules (_tkinter and the new pyexpat) that rely on external DLLs: > > _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs > > xmlparse.dll and xmltok.dll. [Jim A] > Welcome to the club. I'm not sure what you mean by this? > > If I understand correctly how the path rules work, these have to be on > > PATH too (although the pyd files don't have to be). This worries me > > -- these aren't official MS DLLs and neither are the our own, so we > > could easily stomp on some other app's version of the same... > > (The tcl folks don't change their filename when the 3rd version digit > > changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) > > > > Is there a better solution? > > This is a daily annoyance and risk in the Windows world. If you require > Tk, then you need to completely understand how to produce a valid Tk > distribution. Same with PIL (which requires Tk). Often you won't > know that some pyd requires some other obscure DLL. To really do this > you need something high level. Like rpm's on linux. On Windows, people > either write complex install programs with Wise et al, or run third > party installers provided with (for example) Tk from simpler install > scripts. It is then up to the Tk people to know how to install it, and > how to deal with version upgrades. Calculating the set of required DLLs isn't the problem. I have a tool (Dependency Viewer) that shows me exactly the dependencies (it recurses down any DLLs it finds and shows their dependencies too, using the nice MFC tree widget). The problem is where should I install these extra DLLs. In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer from Scriptics). The feedback over the past year showed that this was a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk installations stomped over it, people chose to install Tcl/Tk on a different volume than Python, etc. In 1.6, I am copying the necessary files from the Tcl/Tk installation into the Python directory. This actually installs fewer files than the full Tcl/Tk installation (but you don't get the Tcl/Tk docs). It gives me complete control over which Tcl/Tk version I use without affecting other Tcl/Tk installations that might exist. This is how professional software installations deal with inclusions. However the COM DLL issue might cause problems: if the Python directory is not in the search path because we're invoked via COM, there are only two places where the Tcl/Tk DLLs can be put so they will be found: in the system directory or somewhere along PATH. Assuming it is still evil to modify PATH, we would end up with Tcl/Tk in the system directory, where it could once again interfere with (or be interfered by) other Tcl/Tk installations! Someone suggested that COM should not use Tcl/Tk, and then the Tcl/Tk DLLs can live in the Python tree. I'm not so sure -- I can at least *imagine* that someone would use Tcl/Tk to give their COM object a bit of a GUI. Moreover, this argument doesn't work for pyexpat -- COM apps are definitely going to expect to be able to use pyexpat! It's annoying. I have noticed, however, that you can use os.putenv() (or assignment to os.environ[...]) to change the PATH environment variable. The FixTk.py script in Python 1.5.2 used this -- it looked in a few places for signs of a Tcl/Tk installation, and then adjusted PATH to include the proper directory before trying to import _tkinter. Maybe there's a solution here? The Python DLL could be the only thing in the system directory, and from the registry it could know where the Python directory was. It could then prepend this directory to PATH. This is not so evil as mucking with PATH at install time, I think, since it is only done when Python16.dll is actually loaded. Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run out of environment space? Wouldn't it break other COM apps? Is the PATH truly separate per process? --Guido van Rossum (home page: http://www.python.org/~guido/) From Fredrik Lundh" <200004040703.DAA11944@eric.cnri.reston.va.us><14569.64035.285070.760022@seahag.cnri.reston.va.us><14569.64240.80221.587062@beluga.mojam.com> <14570.18354.151349.452329@seahag.cnri.reston.va.us> Message-ID: <02d501bf9e71$e85f0e60$34aab5d4@hagrid> Fred L. Drake wrote: > Skip Montanaro writes: > > arguments: host and port. It never occurred to me that there would = even be > > a one-argument version. After all, why look at the docs for help = if what > > you're doing already works? >=20 > And it never occurred to me that there would be two args; I vaguely > recall the C API having one argument (a structure). Ah, well. I've > patched up the documents to warn those who expect intuitive APIs. ;) while you're at it, and when you find the time, could you perhaps grep for "pair" and change places which use "pair" to mean a tuple with two elements to actually say "tuple" or "2-tuple"... after all, numerous people have claimed that stuff like "a pair (host, port)" isn't enough to make them understand that "pair" actually means "tuple". unless pair refers to a return value, of course. and only if the function doesn't use the optional argument syntax, of course. etc. (I suspect they're making it up as they go, but that's another story...) =20 From skip@mojam.com (Skip Montanaro) Tue Apr 4 20:15:26 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 4 Apr 2000 14:15:26 -0500 (CDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <14570.16206.210676.756348@beluga.mojam.com> Greg> On the other hand, I would expect that having to type a two-s= troke Greg> sequence every once in a while would help native English spea= kers Greg> appreciate what people in other countries sometimes have to g= o Greg> through in order to spell their names correctly... I'm sure this is a practical problem, but aren't there country-specific= keyboards available to Finnish, Spanish, Russian and non-English-speaki= ng users to avoid precisely these problems? I grumble every time I have t= o enter some accented characters, but that's just because I do it rarely = and use a US ASCII keyboard. I suspect Fran=E7ois Pinard has a keyboard wi= th a "=E7" key. --=20 Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From Fredrik Lundh" <14570.16206.210676.756348@beluga.mojam.com> Message-ID: <02eb01bf9e73$9d4c1560$34aab5d4@hagrid> Skip wrote: > I'm sure this is a practical problem, but aren't there = country-specific > keyboards available to Finnish, Spanish, Russian and = non-English-speaking > users to avoid precisely these problems?=20 fwiw, my windows box supports about 80 different language-related keyboard layouts. that's western european and american keyboard layouts only, of course (mostly latin-1). haven't installed all the others... From gmcm@hypernet.com Tue Apr 4 21:35:23 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 16:35:23 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004042010.QAA13180@eric.cnri.reston.va.us> References: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> Message-ID: <1257241967-6030557@hypernet.com> [Guido] > Someone suggested that COM should not use Tcl/Tk, and then the Tcl/Tk > DLLs can live in the Python tree. I'm not so sure -- I can at least > *imagine* that someone would use Tcl/Tk to give their COM object a bit > of a GUI. Moreover, this argument doesn't work for pyexpat -- COM > apps are definitely going to expect to be able to use pyexpat! Me. Would you have any sympathy for someone who wanted to make a GUI an integral part of a web server? Or would you tell them to get a brain and write a GUI that talks to the web server? Same issue. (Though not, I guess, for pyexpat). > It's annoying. > > I have noticed, however, that you can use os.putenv() (or assignment > to os.environ[...]) to change the PATH environment variable. The > FixTk.py script in Python 1.5.2 used this -- it looked in a few places > for signs of a Tcl/Tk installation, and then adjusted PATH to include > the proper directory before trying to import _tkinter. Maybe there's > a solution here? The Python DLL could be the only thing in the system > directory, and from the registry it could know where the Python > directory was. It could then prepend this directory to PATH. This is > not so evil as mucking with PATH at install time, I think, since it is > only done when Python16.dll is actually loaded. The drawback of relying on PATH is that then some other jerk (eg you, last year ) will stick something of the same name in the system directory and break your installation. > Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run > out of environment space? Wouldn't it break other COM apps? Is the > PATH truly separate per process? Are there any exceptions to this: - dynamically load a .pyd - .pyd implicitly loads the .dll ? If that's always the case, then you can temporarily cd to the right directory before the dynamic load, and the implicit load should work. As for the others: probably not; can't see how; yes. - Gordon From guido@python.org Tue Apr 4 21:45:12 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:45:12 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:35:23 EDT." <1257241967-6030557@hypernet.com> References: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> <1257241967-6030557@hypernet.com> Message-ID: <200004042045.QAA13343@eric.cnri.reston.va.us> > Me. Would you have any sympathy for someone who wanted > to make a GUI an integral part of a web server? Or would you > tell them to get a brain and write a GUI that talks to the web > server? Same issue. (Though not, I guess, for pyexpat). Not all COM objects are used in web servers. Some are used in GUI contexts (aren't Word and Excel and even IE really mostly COM objects these days?). > > It's annoying. > > > > I have noticed, however, that you can use os.putenv() (or assignment > > to os.environ[...]) to change the PATH environment variable. The > > FixTk.py script in Python 1.5.2 used this -- it looked in a few places > > for signs of a Tcl/Tk installation, and then adjusted PATH to include > > the proper directory before trying to import _tkinter. Maybe there's > > a solution here? The Python DLL could be the only thing in the system > > directory, and from the registry it could know where the Python > > directory was. It could then prepend this directory to PATH. This is > > not so evil as mucking with PATH at install time, I think, since it is > > only done when Python16.dll is actually loaded. > > The drawback of relying on PATH is that then some other jerk > (eg you, last year ) will stick something of the same > name in the system directory and break your installation. Yes, that's a problem, especially since it appears that PATH is searched *last*. (I wonder if this could explain the hard-to-reproduce crashes that people report when quitting IDLE?) > > Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run > > out of environment space? Wouldn't it break other COM apps? Is the > > PATH truly separate per process? > > Are there any exceptions to this: > - dynamically load a .pyd > - .pyd implicitly loads the .dll > ? I think this is always the pattern (except that some DLLs will implicitly load other DLLs, and so on). > If that's always the case, then you can temporarily cd to the > right directory before the dynamic load, and the implicit load > should work. Hm, I would think that the danger of temporarily changing the current directory is at least as big as that of changing PATH. (What about other threads? What if you run into an error and don't get a chance to cd back?) > As for the others: probably not; can't see how; yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From jim@interet.com Tue Apr 4 21:53:20 2000 From: jim@interet.com (James C. Ahlstrom) Date: Tue, 04 Apr 2000 16:53:20 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> Message-ID: <38EA5640.D3FC112F@interet.com> Guido van Rossum wrote: > [Jim A] > > Welcome to the club. > > I'm not sure what you mean by this? It sounded like you were joining the Microsoft afflicted... > In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer > from Scriptics). The feedback over the past year showed that this was > a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk > installations stomped over it, people chose to install Tcl/Tk on a > different volume than Python, etc. My first thought was that this was the preferred solution. It is up to Scriptics to provide an installer for Tk and for Tk customers to use it. Any problems with installing Tk are Scriptics' problem. I don't know the reasons it stomped over other installs etc. But either Tk customers are widely using non-standard installs, or the Scriptics installer is broken, or there is no such thing as a standard Tk install. This is fundamentally a Scriptics problem, but I understand it is a Python problem too. There may still be the problem that a standard Tk install might not be accessible to Python. This needs to be worked out with Scriptics. An environment variable could be set, the registry used etc. Assuming there is a standard Tk install and a way for external apps to use Tk, then we can still use the (fixed) Scriptics installer. > Assuming it is still evil to modify PATH, we would end up with Tcl/Tk > in the system directory, where it could once again interfere with (or > be interfered by) other Tcl/Tk installations! I seems to me that the correct Tk install script would put Tk DLL's in the system dir, and use the registry to find the libraries and other needed files. The exe's could go in a program directory somewhere. This is what I have to come to expect from professional software for DLL's which are expected to be used from multiple apps, as opposed to DLL's which are peculiar to one app. If the Tk installer did this, Tk would Just Work, and it would Just Work with third party apps (Tk clients) like Python too. Sorry, I have to run to a class. To be continued tomorrow.... JimA From guido@python.org Tue Apr 4 21:58:08 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:58:08 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:53:20 EDT." <38EA5640.D3FC112F@interet.com> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> <38EA5640.D3FC112F@interet.com> Message-ID: <200004042058.QAA13437@eric.cnri.reston.va.us> > > [Jim A] > > > Welcome to the club. [me] > > I'm not sure what you mean by this? > > It sounded like you were joining the Microsoft afflicted... Indeed :-( > > In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer > > from Scriptics). The feedback over the past year showed that this was > > a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk > > installations stomped over it, people chose to install Tcl/Tk on a > > different volume than Python, etc. > > My first thought was that this was the preferred solution. It is up > to Scriptics to provide an installer for Tk and for Tk customers > to use it. Any problems with installing Tk are Scriptics' problem. > I don't know the reasons it stomped over other installs etc. But > either Tk customers are widely using non-standard installs, or > the Scriptics installer is broken, or there is no such thing > as a standard Tk install. This is fundamentally a Scriptics > problem, but I understand it is a Python problem too. > > There may still be the problem that a standard Tk install might not > be accessible to Python. This needs to be worked out with Scriptics. > An environment variable could be set, the registry used etc. Assuming > there is a standard Tk install and a way for external apps to use Tk, > then we can still use the (fixed) Scriptics installer. The Tk installer has had these problems for a long time. I don't want to have to argue with them, I think it would be a waste of time. > > Assuming it is still evil to modify PATH, we would end up with Tcl/Tk > > in the system directory, where it could once again interfere with (or > > be interfered by) other Tcl/Tk installations! > > I seems to me that the correct Tk install script would put Tk > DLL's in the system dir, and use the registry to find the libraries > and other needed files. The exe's could go in a program directory > somewhere. This is what I have to come to expect from professional > software for DLL's which are expected to be used from multiple > apps, as opposed to DLL's which are peculiar to one app. If > the Tk installer did this, Tk would Just Work, and it would > Just Work with third party apps (Tk clients) like Python too. OK, you go argue with the Tcl folks. They create a vaguely unix-like structure under c:\Program Files\Tcl: subdirectories lib, bin, include, and then they dump their .exe and their .dll files in the bin directory. They also try to munge PATH to include their bin directory, but that often doesn't work (not on Windows 95/98 anyway). --Guido van Rossum (home page: http://www.python.org/~guido/) From pf@artcom-gmbh.de Tue Apr 4 22:14:59 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Tue, 4 Apr 2000 23:14:59 +0200 (MEST) Subject: [Python-Dev] Re: Unicode and comparisons In-Reply-To: <200004041949.PAA13102@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 4, 2000 3:49:32 pm" Message-ID: Hi! Guido van Rossum: > > I always thought it is a core property of cmp that it works between > > all objects. > > Not any more. Comparisons can raise exceptions -- this has been so > since release 1.5. This is rarely used between standard objects, but > not unheard of; and class instances can certainly do anything they > want in their __cmp__. Python 1.6a1 (#6, Apr 2 2000, 02:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> a = '1' >>> b = 2 >>> a < b 0 >>> a > b # Newbies are normally baffled here 1 >>> a = 'ä' >>> b = u'ä' >>> a < b Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: unexpected end of data IMO we will have a *very* hard to time to explain *this* behaviour to newbiews! Unicode objects are similar to normal string objects from the users POV. It is unintuitive that objects that are far less similar (like for example numbers and strings) compare the way they do now, while the attempt to compare an unicode string with a standard string object containing the same character raises an exception. Mit freundlichen Grüßen (Regards), Peter (BTW: using an 12year old US keyboard and a custom xmodmap all the time to write umlauts lots of other interisting chars: ÷× ± ²³ ½¼ ° µ «» ¿? ¡! ;-) From mal@lemburg.com Tue Apr 4 17:47:51 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 18:47:51 +0200 Subject: [Python-Dev] Re: Unicode and comparisons References: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> Message-ID: <38EA1CB7.BBECA305@lemburg.com> "Martin v. Loewis" wrote: > > > Question: is this behaviour acceptable or should I go even further > > and mask decoding errors during compares and contains tests too ? > > I always thought it is a core property of cmp that it works between > all objects. It does, but not necessarily without exceptions. I could easily mask the decoding errors too and then have cmp() work exactly as for strings, but the outcome may be different to what the user had expected due to the failing conversion. Sorting order may then look quite unsorted... > Because of that, > > >>> x=[u'1','aäöü'] > >>> x.sort() > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > fails. As always in cmp, I'd expect to get a consistent outcome here > (ie. cmp should give a total order on objects). > > OTOH, I'm not so sure why cmp between plain and unicode strings needs > to perform UTF-8 conversion? IOW, why is it desirable that > > >>> 'a' == u'a' > 1 This is needed to enhance inter-operability between Unicode and normal strings. Note that they also have the same hash value (provided both use the ASCII code range), making them interchangeable in dictionaries: >>> d={u'a':1} >>> d['a'] = 2 >>> d[u'a'] 2 >>> d['a'] 2 This is per design. > Anyway, I'm not objecting to that outcome - I only think that, to get > cmp consistent, it may be necessary to drop this result. If it is not > necessary, the better. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Apr 4 22:47:16 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 23:47:16 +0200 Subject: [Python-Dev] Re: Unicode and comparisons References: Message-ID: <38EA62E4.7E2B0E43@lemburg.com> Peter Funk wrote: > > Hi! > > Guido van Rossum: > > > I always thought it is a core property of cmp that it works between > > > all objects. > > > > Not any more. Comparisons can raise exceptions -- this has been so > > since release 1.5. This is rarely used between standard objects, but > > not unheard of; and class instances can certainly do anything they > > want in their __cmp__. > > Python 1.6a1 (#6, Apr 2 2000, 02:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> a = '1' > >>> b = 2 > >>> a < b > 0 > >>> a > b # Newbies are normally baffled here > 1 > >>> a = 'ä' > >>> b = u'ä' > >>> a < b > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: unexpected end of data > > IMO we will have a *very* hard to time to explain *this* behaviour > to newbiews! > > Unicode objects are similar to normal string objects from the users POV. > It is unintuitive that objects that are far less similar (like for > example numbers and strings) compare the way they do now, while the > attempt to compare an unicode string with a standard string object > containing the same character raises an exception. I don't think newbies will really want to get into the UTF-8 business right from the start... when they do, they probably know about the above problems already. Changing this behaviour to silently swallow the decoding error would cause more problems than do good, IMHO. Newbies sure would find (u'a' not in 'aäöü') == 1 just as sursprising... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond@skippinet.com.au Tue Apr 4 23:51:01 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 08:51:01 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: > I wonder if the pathname should also be an 8+3 (max) > name, so that it > can be relyably typed into a DOS window. To be honest, I can not see a good reason for this any more. The installation package only works on Win95/98/NT/2000 - all of these support long file names on all their supported file systems. So, any where that this installer will run, the "command prompt" on this system will correctly allow "cd \Python-1.6-and-any-thing-else-I-like-ly" :-) [OTOH, I tend to prefer "Python1.6" purely from an "easier to type" POV] Mark. From mhammond@skippinet.com.au Tue Apr 4 23:59:24 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 08:59:24 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <1257259699-4963377@hypernet.com> Message-ID: [Gordon writes] > Things that go in the system directory should maintain > backwards compatibility. For a DLL, that means all the old > entry points are still there, in the same order with new ones at > the end. Actually, the order is not important unless people link to you by ordinal (in which case you are likely to specify the ordinal in the .def file anyway). The Win32 loader is smart enough to be able to detect that all ordinals are the same as when it was linked, and use a fast-path. If ordinal name-to-number mappings have changed, the runtime loader takes a slower path that fixes up these differences. So what you suggest is ideal, but not really necessary. > For Python, there's no crying need to conform for > now, but if (when?) embedding Python becomes ubiquitous, > this (or some other scheme) may need to be considered. I believe Python will already do this, almost by accident, due to the conservative changes with each minor Python release. Eg, up until Python 1.6 was branded as 1.6, I was still linking my win32all extensions against the CVS version. When I remembered I would switch back to the 1.5.2 release ones, but when I forgot I never had a problem. People running a release version 1.5.2 could happily use my extensions linked with the latest 1.5.2+ binaries. We-could-even-blame-the-time-machine-at-a-strecth-ly, Mark. From mhammond@skippinet.com.au Wed Apr 5 00:08:58 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 09:08:58 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <1257257855-5074057@hypernet.com> Message-ID: > > Where should I put tk83.dll etc.? In the Python\DLLs > directory, where > > _tkinter.pyd also lives? > > Won't work (unless there are some tricks in MSVC 6 I don't > know about). Assuming no one is crazy enough to use Tk in a > COM server, (or rather, that their insanity need not be catered > to), then I'd vote for the directory where python.exe and > pythonw.exe live. What we can do is have Python itself use LoadLibraryEx() to load the .pyd files. This _will_ allow any dependant DLLs to be found in the same directory as the .pyd. [And as I mentioned, if the whole world would use LoadLibraryEx(), our problem would go away] LoadLibraryEx() is documented as working on all Win9x and NT from 3.1. From the LoadLibraryEx() documentation: -- If a path [to the DLL] is specified, and the dwFlags parameter is set to LOAD_WITH_ALTERED_SEARCH_PATH, the LoadLibraryEx function uses an alternate file search strategy to find any executable modules that the specified module causes to be loaded. This alternate strategy searches for a file in the following sequence: * The directory specified by the lpLibFileName path. In other words, the directory that the specified executable module is in. * [The search path as described for LoadLibrary()] -- Id be happy to knock up a patch - would be quite trivial... Mark. From guido@python.org Wed Apr 5 00:14:22 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 19:14:22 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Wed, 05 Apr 2000 09:08:58 +1000." References: Message-ID: <200004042314.TAA15407@eric.cnri.reston.va.us> > What we can do is have Python itself use LoadLibraryEx() to load the > .pyd files. This _will_ allow any dependant DLLs to be found in the > same directory as the .pyd. [And as I mentioned, if the whole world > would use LoadLibraryEx(), our problem would go away] Doh! [Sound of forehead being slapped violently] We already use LoadLibraryEx()! So we can drop all the dependent dlls in the DLLs directory which has the PYD files as well. Case closed. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA@ActiveState.com Wed Apr 5 02:20:44 2000 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 4 Apr 2000 18:20:44 -0700 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Message-ID: > Greg Stein wrote: > > > we install our python distribution under the \py, > > > and we get lot of positive responses. as far as I remember, > > > nobody has ever reported problems setting up the path... > > > > *shrug* This doesn't dispute the standard Windows recommendation to > > install software into Program Files. > > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. > > to quote a Linus Torvalds, "bad standards _should_ be broken" > > (after all, Microsoft doesn't put their own command line applications > down there -- there's no "\Program Files" [sub]directory in the default > PATH, at least not on any of my boxes. maybe they've changed that > in Windows 2000?) Sorry I'm late -- I've been out of town. Just two FYIs: 1) ActivePerl goes into /Perl5.6, and my guess is that it's based on user feedback. 2) I've switched to changing the default installation to C:/Python in all my installs, and am much happier since I made that switchover. --david From DavidA@ActiveState.com Wed Apr 5 02:24:57 2000 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 4 Apr 2000 18:24:57 -0700 Subject: FW: [Python-Dev] Windows installer pre-prelease Message-ID: Forgot to cc: python-dev on my reply to Greg -----Original Message----- From: David Ascher [mailto:DavidA@ActiveState.com] Sent: Tuesday, April 04, 2000 6:23 PM To: Greg Stein Subject: RE: [Python-Dev] Windows installer pre-prelease > Valid point. But there are other solutions, too. VC distributes a thing > named "VCVARS.BAT" to set up paths and other environ vars. Python could > certainly do the same thing (to overcome the embedded-space issue). I hate VCVARS -- it doesn't work from my Cygnus shell, it has to be invoked by the user as opposed to automatically started by the installer, etc. > Depends on the audience of that standard. Programmers: yah. Consumers? > They just want the damn thing to work like they expect it to. That > expectation is usually "I can find my programs in Program Files." In my experience, the /Program Files location works fine for tools which have strictly GUI interfaces and which are launched by the Start menu or other GUI mechanisms. Anything which you might need to invoke at the command line lives best in a non-space-containing path, IMO of course. --david From guido@python.org Wed Apr 5 02:26:12 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 21:26:12 -0400 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: Your message of "Tue, 04 Apr 2000 18:20:44 PDT." References: Message-ID: <200004050126.VAA15836@eric.cnri.reston.va.us> I've pretty much made my mind up about this one. Mark's mention of LoadLibraryEx() solved the last puzzle. I'm making the changes to the installer and hope to release alpha 2 with these changes later this week. - Default install root is \Python1.6 on the same drive as the default Program Files - MSVC*RT.DLL and PYTHON16.DLL go into the system directory; the MSV*RT.DLL files are only replaced if we bring a newer or same version - I'm using Tcl/Tk 8.2.3 instead of 8.3.0; the latter often crashes when closing a window - The Tcl/Tk and expat DLLs go in the DLLs subdirectory of the install root Thanks a lot for your collective memory!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@lfw.org Wed Apr 5 03:19:18 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 19:19:18 -0700 Subject: [Python-Dev] re: division In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson@nevex.com wrote: > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > but harder to track down, since you'll have to scrutinize values very > carefully to spot the difference. Haven't done any field tests, > though...) My favourite symbol for integer division is _/ (read it as "floor-divide"). It makes visually apparent what is going on. -- ?!ng "There's no point in being grown up if you can't be childish sometimes." -- Dr. Who --KAC01325.954869821/skuld.lfw.org-- --KAD01325.954869821/skuld.lfw.org-- From ping@lfw.org Wed Apr 5 03:19:09 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 19:19:09 -0700 Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: Message-ID: On Sun, 2 Apr 2000, Peter Funk wrote: > > As I read this my first thoughts were: > "Huh? Is that really true? To me this sounds like a april fools joke. > But to be careful I checked first before I read on: My favourite part was the distutils section. The great thing about this announcement is that it would have been almost believable if we were talking about any language other than Python! -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell --JAA00536.954694740/skuld.lfw.org-- --JAB00536.954694740/skuld.lfw.org-- From ping@lfw.org Sun Apr 2 17:58:57 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 2 Apr 2000 09:58:57 -0700 (PDT) Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: Message-ID: On Sun, 2 Apr 2000, Peter Funk wrote: > > As I read this my first thoughts were: > "Huh? Is that really true? To me this sounds like a april fools joke. > But to be careful I checked first before I read on: My favourite part was the distutils section. The great thing about this announcement is that it would have been almost believable if we were talking about any language other than Python! -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell --JAA00536.954694740/skuld.lfw.org-- --JAB00536.954694740/skuld.lfw.org-- From ping@lfw.org Tue Apr 4 18:25:07 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 10:25:07 -0700 (PDT) Subject: [Python-Dev] re: division In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson@nevex.com wrote: > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > but harder to track down, since you'll have to scrutinize values very > carefully to spot the difference. Haven't done any field tests, > though...) My favourite symbol for integer division is _/ (read it as "floor-divide"). It makes visually apparent what is going on. -- ?!ng "There's no point in being grown up if you can't be childish sometimes." -- Dr. Who --KAC01325.954869821/skuld.lfw.org-- --KAD01325.954869821/skuld.lfw.org-- From tim_one@email.msn.com Wed Apr 5 05:57:27 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040525.BAA11585@eric.cnri.reston.va.us> Message-ID: <000601bf9ebb$723f7ea0$3e2d153f@tim> [Guido] > ... > (PATH on Win9x is still a mystery to me. You're not alone. > Is it really true that in order to change PATH an installer has > to edit autoexec.bat? AFAIK, yes. A specific PATH setting can be associated with a specific exe via the registry, though. > ... > Anything that claims to change PATH for me doesn't seem to do so. Almost always the same here; suspect documentation rot. > Could I have screwed something up? Yes, but I doubt it. > ... > Didn't someone tell me that at least on Windows 2000 installing > app-specific files (as opposed to MS-provided files) in the system > directory is a no-no? MS was threatening to do this in (the then-named) NT5, but I believe they backed down. Don't have (the now-named) W2000 here to check on for sure, though. From tim_one@email.msn.com Wed Apr 5 05:57:33 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:33 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: <000701bf9ebb$740c4f60$3e2d153f@tim> [Mark Hammond] > ... > However, I would tend to go for "\name_of_app" rooted from the > Windows drive. It is likely that this will be the default drive > when a command prompt is open, so a simple "cd \python1.6" will > work. This is also generally the same drive the default "Program > Files" is on too. Yes, "C:\" doesn't literally mean "C:\" any more than "Program Files" literally means "Program Files" <0.1 wink>. By "C:\" I meant "drive where the thing we conveniently but naively call 'Program Files' lives"; naming the registry key whose value is this thing is more accurate but less helpful; the installer will have some magic predefined name which would be most helpful to give, but without the installer docs here I can't guess what that is. > ... > [Interestingly, Windows 2000 has a system process that continually > monitors the system directory. If it detects that a "protected > file" has been changed, it promptly copies the original back over > the top! I believe the MSVC*.dlls are in the protected list, so can > only be changed with a service pack release anyway. Everything > _looks_ like it updates - Windows just copies it back!] Thanks for making my day . From tim_one@email.msn.com Wed Apr 5 05:57:36 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:36 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <000801bf9ebb$75c85740$3e2d153f@tim> [Guido] > ... > I wonder if the pathname should also be an 8+3 (max) name, so that it > can be relyably typed into a DOS window. Yes, but for a different reason: Many sites still use older Novell file servers that screw up on non-8.3 names in a variety of unpleasant ways. Just went thru this at Dragon again, where the language modeling group created a new file format with a 4-letter extension; they had to back off to 3 letters because half the company couldn't get at the new files. BTW, two years ago it was much worse, and one group started using Python instead of Java partly because .java files didn't work over the network at all! From nascheme@enme.ucalgary.ca Wed Apr 5 06:19:45 2000 From: nascheme@enme.ucalgary.ca (Neil Schemenauer) Date: Tue, 4 Apr 2000 23:19:45 -0600 Subject: [Python-Dev] Re: A surprising case of cyclic trash Message-ID: <20000404231945.A16978@acs.ucalgary.ca> An even simpler example: >>> import sys >>> d = {} >>> print sys.getrefcount(d) 2 >>> exec("def f(): pass\n") in d >>> print sys.getrefcount(d) 3 >>> d.clear() >>> print sys.getrefcount(d) 2 exec adds the function to the dictionary. The function references the dictionary through globals. Neil -- "If elected mayor, my first act will be to kill the whole lot of you, and burn your town to cinders!" -- Groundskeeper Willie From Moshe Zadka Wed Apr 5 07:44:10 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 5 Apr 2000 08:44:10 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson@nevex.com wrote: > Agreed. Perhaps non-native English speakers could pitch in and describe > how easy/difficult it is for them to (for example) put properly-accented > Spanish comments in code? As the only (I think?) person here who is a native right-to-left language native speaker let me put my 2cents in. I program in two places: at work, and at home. At work, we have WinNT machines, with *no Hebrew support*. We right everything in English, including internal Word documents. We figured that if 1000 programmers working on NT couldn't produce a stable system, the hacks of half-a-dozen programmers thrown in could only make it worse. At home, I have a Linux machine with no Hebrew support either -- it just didn't seem to be worth the hassle, considering that most of what I write is sent out to the world, so it needs to be in English anyway. My previous machine had some Esperanto support, and I intend to put some on my new machine. True, not many people know Esperanto, but at least its easy enough to learn. It was easy enough to write comments in Esperanto in "vim", but since I was thinking in English anyway while programming (if, while, StringIO etc.), it was more natural to write the comments in English too. The only non-English comments I've seen in sources I had to read were in French, and I won't repeat what I've said about French people then . -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From artcom0!pf@artcom-gmbh.de Wed Apr 5 07:39:56 2000 From: artcom0!pf@artcom-gmbh.de (artcom0!pf@artcom-gmbh.de) Date: Wed, 5 Apr 2000 08:39:56 +0200 (MEST) Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: <200004042332.TAA15480@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 4, 2000 7:32:23 pm" Message-ID: Hi! Guido van Rossum: > Modified Files: > FixTk.py > Log Message: > Work the Tcl version number in the path we search for. [...] > ! import sys, os, _tkinter > ! ver = str(_tkinter.TCL_VERSION) > ! v = os.path.join(sys.prefix, "tcl", "tcl"+ver) > if os.path.exists(os.path.join(v, "init.tcl")): > os.environ["TCL_LIBRARY"] = v [...] Just a wild idea: Does it make sense to have several incarnations of the shared object file _tkinter.so (or _tkinter.pyd on WinXX)? Something like _tkint83.so, _tkint82.so and so on, so that Tkinter.py can do something like the following to find a available Tcl/Tk version: for tkversion in range(83,79,-1): try: _tkinter = __import__("_tkint"+str(tkversion)) break except ImportError: pass else: raise Of course this does only make sense on platforms with shared object loading and if preparing Python binary distributions without including a particular Tcl/Tk package into the Python package. This idea might be interesting for Red Hat, SuSE Linux distribution users to allow partial system upgrades with a binary python-1.6.rpm Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From Moshe Zadka Wed Apr 5 07:46:23 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 5 Apr 2000 08:46:23 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson@nevex.com wrote: > I wouldn't expect the division sign to be on keyboards. On the other hand, > I would expect that having to type a two-stroke sequence every once in a > while would help native English speakers appreciate what people in other > countries sometimes have to go through in order to spell their names > correctly... Not to mention what we have to do to get Americans to pronounce our name correctly. (I've learned to settle for not calling me Moshi) i18n-sucks-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Wed Apr 5 07:55:16 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 5 Apr 2000 08:55:16 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <1257244899-5853377@hypernet.com> Message-ID: On Tue, 4 Apr 2000, Gordon McMillan wrote: > despite emigrating to the Americas during the Industrial > Revolution, insisted that the proper spelling of McMillan > involved elevating the "c". Wonder if there's a unicode > character for that, so I can get righteously indignant whenever > people fail to use it. Hmmmm...I think the Python ACKS file should be moved to UTF-8, and write *my* name in Hebrew letters: mem, shin, hey, space, tsadi, aleph, dalet, kuf, hey. now-i-can-get-righteously-indignant-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From billtut@microsoft.com Wed Apr 5 07:18:49 2000 From: billtut@microsoft.com (Bill Tutt) Date: Tue, 4 Apr 2000 23:18:49 -0700 Subject: [Python-Dev] _PyUnicode_New/PyUnicode_Resize Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> should be exported as part of the unicode object API. Otherwise, external C codec developers have to jump through some useless and silly hoops in order to construct a PyUnicode object. Additionally, you mentioned to Andrew that the decoders don't have to return a tuple anymore. Thats currently incorrect with whats currently in CVS: Python\codecs.c:PyCodec_Decode() current requires, but ignores the integer returned in the tuple. Should this be fixed, or must codecs return the integer as Misc\unicode.txt says? Thanks, Bill From mal@lemburg.com Wed Apr 5 10:40:56 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 11:40:56 +0200 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> Message-ID: <38EB0A28.8E8F6397@lemburg.com> Bill Tutt wrote: > > should be exported as part of the unicode object API. > > Otherwise, external C codec developers have to jump through some useless and > silly hoops in order to construct a PyUnicode object. Hmm, resize would be useful, agreed. The reason I haven't made these public is that the internal allocation logic could be changed in some future version to more elaborate and faster techniques. Having the _PyUnicode_* API private makes these changes possible without breaking external C code. E.g. say Unicode gets interned someday, then resize will need to watch out not resizing a Unicode object which is already stored in the interning dict. Perhaps a wrapper with additional checks around _PyUnicode_Resize() would be useful. Note that you don't really need _PyUnicode_New(): call PyUnicode_FromUnicode() with NULL argument and then fill in the buffer using PyUnicode_AS_UNICODE()... works just like PyString_FromStringAndSize() with NULL argument. > Additionally, you mentioned to Andrew that the decoders don't have to return > a tuple anymore. > Thats currently incorrect with whats currently in CVS: > Python\codecs.c:PyCodec_Decode() current requires, but ignores the integer > returned in the tuple. > Should this be fixed, or must codecs return the integer as Misc\unicode.txt > says? That was a misunderstanding on my part: I was thinking of the .read()/.write() methods which are now in synch with the other file objects. .read() previously returned a tuple and .write() an integer. .encode() and .decode() must return a tuple. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf@artcom-gmbh.de Wed Apr 5 11:42:37 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 12:42:37 +0200 (MEST) Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: <38EB0BD5.66048804@lemburg.com> from "M.-A. Lemburg" at "Apr 5, 2000 11:48: 5 am" Message-ID: Hi! [me]: > > From my POV (using ISO Latin-1 all the time) it would be > > "intuitive"(TM) to assume ISO Latin-1 when interpreting u'äöü' in a > > Python source file so that (u'äöü' == 'äöü') == 1. This is what I see > > on *my* screen, whether there is a 'u' in Front of the string or not. M.-A. Lemburg: > u"äöü" is being interpreted as Latin-1. The problem is the > string 'äöü' to the right: during coercion this string is > being interpreted as UTF-8 and this causes the failure. > > You could say: ok, all my strings use Latin-1, but that would > introduce other problems... esp. when you take different > modules with different encoding assumptions and try to > integrate them into an application. Okay. This wouldn't occur here but we have deal with this possibility. > > In dist/src/Misc/unicode.txt you wrote: > > > > > Note that you should provide some hint to the encoding you used to > > > write your programs as pragma line in one the first few comment lines > > > of the source file (e.g. '# source file encoding: latin-1'). [me]: > > The upcoming 1.6 documentation should probably clarify whether > > the interpreter pays attention to "pragma"s or not. > > This is otherwise misleading. > > This "pragma" is nothing more than a hint for the source code > reader to switch his viewing encoding. The interpreter doesn't > treat the file differently. In fact, Python source code is > supposed to tbe 7-bit ASCII ! Sigh. In our company we use 'german' as our master language so we have string literals containing iso-8859-1 umlauts all over the place. Okay as long as we don't mix them with Unicode objects, this doesn't hurt anybody. What I would love to see, would be a well defined way to tell the interpreter to use 'latin-1' as default encoding instead of 'UTF-8' when dealing with string literals from our modules. The tokenizer in Python 1.6 already contains smart logic to get the size of TABs right (pasting from tokenizer.c): /* Skip comment, while looking for tab-setting magic */ if (c == '#') { static char *tabforms[] = { "tab-width:", /* Emacs */ ":tabstop=", /* vim, full form */ ":ts=", /* vim, abbreviated form */ "set tabsize=", /* will vi never die? */ /* more templates can be added here to support other editors */ }; .. It wouldn't be to hard to add something there to recognize other "pragma" comments like for example: #content-transfer-encoding: iso-8859-1 But what to do with it? May be adding a default encoding to every string object? Is this bloat? Just an idea. Regards, Peter From mal@lemburg.com Wed Apr 5 12:28:58 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 13:28:58 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: Message-ID: <38EB237A.5B16575B@lemburg.com> Peter Funk wrote: > > Hi! > > [me]: > > > From my POV (using ISO Latin-1 all the time) it would be > > > "intuitive"(TM) to assume ISO Latin-1 when interpreting u'äöü' in a > > > Python source file so that (u'äöü' == 'äöü') == 1. This is what I see > > > on *my* screen, whether there is a 'u' in Front of the string or not. > > M.-A. Lemburg: > > u"äöü" is being interpreted as Latin-1. The problem is the > > string 'äöü' to the right: during coercion this string is > > being interpreted as UTF-8 and this causes the failure. > > > > You could say: ok, all my strings use Latin-1, but that would > > introduce other problems... esp. when you take different > > modules with different encoding assumptions and try to > > integrate them into an application. > > Okay. This wouldn't occur here but we have deal with this possibility. > > > > In dist/src/Misc/unicode.txt you wrote: > > > > > > > Note that you should provide some hint to the encoding you used to > > > > write your programs as pragma line in one the first few comment lines > > > > of the source file (e.g. '# source file encoding: latin-1'). > > [me]: > > > The upcoming 1.6 documentation should probably clarify whether > > > the interpreter pays attention to "pragma"s or not. > > > This is otherwise misleading. > > > > This "pragma" is nothing more than a hint for the source code > > reader to switch his viewing encoding. The interpreter doesn't > > treat the file differently. In fact, Python source code is > > supposed to tbe 7-bit ASCII ! > > Sigh. In our company we use 'german' as our master language so > we have string literals containing iso-8859-1 umlauts all over the place. > Okay as long as we don't mix them with Unicode objects, this doesn't > hurt anybody. > > What I would love to see, would be a well defined way to tell the > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > when dealing with string literals from our modules. > > The tokenizer in Python 1.6 already contains smart logic to get the > size of TABs right (pasting from tokenizer.c): > > /* Skip comment, while looking for tab-setting magic */ > if (c == '#') { > static char *tabforms[] = { > "tab-width:", /* Emacs */ > ":tabstop=", /* vim, full form */ > ":ts=", /* vim, abbreviated form */ > "set tabsize=", /* will vi never die? */ > /* more templates can be added here to support other editors */ > }; > .. > > It wouldn't be to hard to add something there to recognize > other "pragma" comments like for example: > #content-transfer-encoding: iso-8859-1 > But what to do with it? May be adding a default encoding to every string > object? Is this bloat? Just an idea. As I have already indicated above this would only solve the problem of string literals in Python source code. It would not however solve the problem with strings in general, since these can be built dynamically or from user input. The only way I can see for #pragma to work here is by auto- converting all static strings in the source code to Unicode and that would probably break more code than do good. Even worse, writing 'abc' in such a program would essentially mean the same thing as u'abc'. I'd suggest turning your Latin-1 strings into Unicode... this will hurt at first, but in the long rung, you win. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim@interet.com Wed Apr 5 14:33:29 2000 From: jim@interet.com (James C. Ahlstrom) Date: Wed, 05 Apr 2000 09:33:29 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> <38EA5640.D3FC112F@interet.com> <200004042058.QAA13437@eric.cnri.reston.va.us> Message-ID: <38EB40A9.32A60EA2@interet.com> Guido van Rossum wrote: > OK, you go argue with the Tcl folks. They create a vaguely unix-like > structure under c:\Program Files\Tcl: subdirectories lib, bin, > include, and then they dump their .exe and their .dll files in the bin > directory. They also try to munge PATH to include their bin > directory, but that often doesn't work (not on Windows 95/98 anyway). That is even worse than I thought. Obviously they are incompetent in Windows. Mark's suggestion is a great one! JimA From bwarsaw@cnri.reston.va.us Wed Apr 5 14:34:39 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 5 Apr 2000 09:34:39 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case References: <1257244899-5853377@hypernet.com> Message-ID: <14571.16623.493822.231793@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Hmmmm...I think the Python ACKS file should be moved to UTF-8, MZ> and write *my* name in Hebrew letters: mem, shin, hey, space, MZ> tsadi, aleph, dalet, kuf, hey. Shouldn't that be hey kuf dalet aleph tsadi space hey shin mem? :) lamed-alef-vav-mem-shin-ly y'rs, -Barry From Moshe Zadka Wed Apr 5 14:44:15 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 5 Apr 2000 15:44:15 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14571.16623.493822.231793@anthem.cnri.reston.va.us> Message-ID: On Wed, 5 Apr 2000, Barry A. Warsaw wrote: > MZ> Hmmmm...I think the Python ACKS file should be moved to UTF-8, > MZ> and write *my* name in Hebrew letters: mem, shin, hey, space, > MZ> tsadi, aleph, dalet, kuf, hey. > > Shouldn't that be > > hey kuf dalet aleph tsadi space hey shin mem? No, just stick the unicode directional shifting characters around it. now-you-see-why-i18n-is-a-pain-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gward@cnri.reston.va.us Wed Apr 5 14:48:24 2000 From: gward@cnri.reston.va.us (Greg Ward) Date: Wed, 5 Apr 2000 09:48:24 -0400 Subject: [Python-Dev] re: division In-Reply-To: ; from ping@lfw.org on Tue, Apr 04, 2000 at 10:25:07AM -0700 References: Message-ID: <20000405094823.A11890@cnri.reston.va.us> On 04 April 2000, Ka-Ping Yee said: > On Tue, 4 Apr 2000 gvwilson@nevex.com wrote: > > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > > but harder to track down, since you'll have to scrutinize values very > > carefully to spot the difference. Haven't done any field tests, > > though...) > > My favourite symbol for integer division is _/ > (read it as "floor-divide"). It makes visually > apparent what is going on. Gaackk! Why is this even an issue? As I recall, Pascal got it right 30 years ago: / is what you learned in grade school (1/2 = 0.5), div is what you learn in first-year undergrad CS (1/2 = 0). Either add a "div" operator or a "div()" builtin to Python and you take care of the spelling issue. (The fixing-old-code issue is another problem entirely.) I think that means I favour keeping operator.div and the __div__() method as-is, and adding operator.fdiv (?) and __fdiv__ for "floating-point" division. In other words: 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 (where I have used artistic license in applying __div__ to actual numbers -- you know what I mean). -1 on adding any non-7-bit-ASCII characters to the character set required to express Python; +0 on allowing any (alphanumeric) Unicode character in identifiers (all for Py3k). Not sure what "alphanumeric" means in Unicode, but I'm sure someone has worried about this. Greg From guido@python.org Wed Apr 5 15:04:53 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:04:53 -0400 Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: Your message of "Wed, 05 Apr 2000 08:39:56 +0200." References: Message-ID: <200004051404.KAA16039@eric.cnri.reston.va.us> > Guido van Rossum: > > Modified Files: > > FixTk.py > > Log Message: > > Work the Tcl version number in the path we search for. > [...] > > ! import sys, os, _tkinter > > ! ver = str(_tkinter.TCL_VERSION) > > ! v = os.path.join(sys.prefix, "tcl", "tcl"+ver) > > if os.path.exists(os.path.join(v, "init.tcl")): > > os.environ["TCL_LIBRARY"] = v > [...] Note that this is only used on Windows, where Python is distributed with a particular version of Tk. I decided I needed to back down from 8.3 to 8.2 (8.3 sometimes crashes on close) so I decided to make the FixTk module independent of the version. > Just a wild idea: > > Does it make sense to have several incarnations of the shared object file > _tkinter.so (or _tkinter.pyd on WinXX)? > > Something like _tkint83.so, _tkint82.so and so on, so that > Tkinter.py can do something like the following to find a > available Tcl/Tk version: > > for tkversion in range(83,79,-1): > try: > _tkinter = __import__("_tkint"+str(tkversion)) > break > except ImportError: > pass > else: > raise > > Of course this does only make sense on platforms with shared object loading > and if preparing Python binary distributions without including a > particular Tcl/Tk package into the Python package. This idea might be > interesting for Red Hat, SuSE Linux distribution users to allow partial > system upgrades with a binary python-1.6.rpm Can you tell me what problem you are trying to solve here? It makes no sense to me, but maybe I'm missing something. Typically Python is built to match the Tcl/Tk version you have installed, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 5 15:11:02 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:11:02 -0400 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize In-Reply-To: Your message of "Wed, 05 Apr 2000 11:40:56 +0200." <38EB0A28.8E8F6397@lemburg.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> <38EB0A28.8E8F6397@lemburg.com> Message-ID: <200004051411.KAA16095@eric.cnri.reston.va.us> > E.g. say Unicode gets interned someday, then resize will > need to watch out not resizing a Unicode object which is > already stored in the interning dict. Note that string objects deal with this by requiring that the reference count is 1 when a string is resized. This effectively enforces that resizes are only used when the original creator is still working on the string. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 5 15:16:15 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:16:15 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 12:42:37 +0200." References: Message-ID: <200004051416.KAA16112@eric.cnri.reston.va.us> > Sigh. In our company we use 'german' as our master language so > we have string literals containing iso-8859-1 umlauts all over the place. > Okay as long as we don't mix them with Unicode objects, this doesn't > hurt anybody. > > What I would love to see, would be a well defined way to tell the > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > when dealing with string literals from our modules. It would be better if this was supported for u"..." literals, so that it was taken care of at the source code level completely. The running program shouldn't have to worry about what encoding its source code was! For 8-bit literals, this would mean that if you had source code using Latin-1, the literals would be translated from Latin-1 to UTF-8 by the code generator. This would mean that len('ç') would return 2. I'm not sure this is a great idea -- but then I'm not sure that using Latin-1 in source code is a great idea either. > The tokenizer in Python 1.6 already contains smart logic to get the > size of TABs right (pasting from tokenizer.c): > > /* Skip comment, while looking for tab-setting magic */ > if (c == '#') { > static char *tabforms[] = { > "tab-width:", /* Emacs */ > ":tabstop=", /* vim, full form */ > ":ts=", /* vim, abbreviated form */ > "set tabsize=", /* will vi never die? */ > /* more templates can be added here to support other editors */ > }; > .. > > It wouldn't be to hard to add something there to recognize > other "pragma" comments like for example: > #content-transfer-encoding: iso-8859-1 > But what to do with it? May be adding a default encoding to every string > object? Is this bloat? Just an idea. Before we go any further we should design pragmas. The current approach is inefficient and only designed to accommodate editor-specific magical commands. I say it's a Python 1.7 issue. --Guido van Rossum (home page: http://www.python.org/~guido/) From Moshe Zadka Wed Apr 5 15:08:53 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 5 Apr 2000 16:08:53 +0200 (IST) Subject: [Python-Dev] re: division In-Reply-To: <20000405094823.A11890@cnri.reston.va.us> Message-ID: On Wed, 5 Apr 2000, Greg Ward wrote: > Gaackk! Why is this even an issue? As I recall, Pascal got it right 30 > years ago: / is what you learned in grade school (1/2 = 0.5) Greg, here's an easy way for you to make money: sue your grade school . I learned that 1/2 is 1/2. Rationals are a much more natural entities then decimals (just think 1/3). FWIW, I think Python should support Rationals, and have integer division return a rational. I'm still working on the details of my great Python numeric tower change. > Not sure what "alphanumeric" > means in Unicode, but I'm sure someone has worried about this. I think Unicode has a clear definition of a letter and a number. How do you feel about letting arbitrary Unicode whitespace into Python? (Other then the indentation of non-empty lines ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer@tismer.com Wed Apr 5 15:29:03 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 05 Apr 2000 16:29:03 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? Message-ID: <38EB4DAE.2F538F9F@tismer.com> Hi, while fixing my design flaws after Just's Stackless Mac port, I was dealing with some overflow conditions and tracebacks. When there is a recursion depth overflow condition, we create a lot of new structure for the tracebacks. This usually happens in a situation where memory is quite exhausted. Even worse if we crash because of a memory error: The system will not have enough memory to build the traceback structure, to report the error. Puh :-) When I look into tracebacks, it turns out to be just a chain like the frame chain, but upward down. It holds references to the frames in a 1-to-1 manner, and it keeps copies of f->f_lasti and f->f_lineno. I don't see why this is needed. I'm thinking to replace the tracebacks by a single pointer in the frames for this purpose. It appears further to be possible to do that without any extra memory, since all the frames have extra temporary fields for exception info, and that isn't used in this context. Traceback objects exist each for one and only one frame, and they could be embedded into their frame. Does this make sense? Do I miss something? I'm considering this for Stackless and would like to know if I should prepare it for orthodox Python as well? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido@python.org Wed Apr 5 15:32:05 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:32:05 -0400 Subject: [Python-Dev] re: division In-Reply-To: Your message of "Wed, 05 Apr 2000 16:08:53 +0200." References: Message-ID: <200004051432.KAA16210@eric.cnri.reston.va.us> > FWIW, I think Python should support Rationals, and have integer division > return a rational. I'm still working on the details of my great Python > numeric tower change. Forget it. ABC did this, and the problem is that where you *think* you are doing something simple like calculating interest rates, you are actually manipulating rational numbers with 1000s of digits in their numerator and denumerator. If you want to change it, consider emulating what kids currently use in school: a decimal floating point calculator with N digits of precision. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 5 15:33:18 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:33:18 -0400 Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: Your message of "Wed, 05 Apr 2000 16:29:03 +0200." <38EB4DAE.2F538F9F@tismer.com> References: <38EB4DAE.2F538F9F@tismer.com> Message-ID: <200004051433.KAA16229@eric.cnri.reston.va.us> > When I look into tracebacks, it turns out to be just a chain > like the frame chain, but upward down. It holds references > to the frames in a 1-to-1 manner, and it keeps copies of > f->f_lasti and f->f_lineno. I don't see why this is needed. > > I'm thinking to replace the tracebacks by a single pointer > in the frames for this purpose. It appears further to be > possible to do that without any extra memory, since all the > frames have extra temporary fields for exception info, and > that isn't used in this context. Traceback objects exist > each for one and only one frame, and they could be embedded > into their frame. > > Does this make sense? Do I miss something? Yes. It is quite possible to have multiple stack traces lingering around that all point to the same stack frames. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed Apr 5 16:04:31 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 17:04:31 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> Message-ID: <38EB55FF.C900CF8A@lemburg.com> Guido van Rossum wrote: > > > Sigh. In our company we use 'german' as our master language so > > we have string literals containing iso-8859-1 umlauts all over the place. > > Okay as long as we don't mix them with Unicode objects, this doesn't > > hurt anybody. > > > > What I would love to see, would be a well defined way to tell the > > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > > when dealing with string literals from our modules. > > It would be better if this was supported for u"..." literals, so that > it was taken care of at the source code level completely. The running > program shouldn't have to worry about what encoding its source code > was! u"..." currently interprets the characters it finds as Latin-1 (this is by design, since the first 256 Unicode ordinals map to the Latin-1 characters). > For 8-bit literals, this would mean that if you had source code using > Latin-1, the literals would be translated from Latin-1 to UTF-8 by the > code generator. This would mean that len('ç') would return 2. I'm > not sure this is a great idea -- but then I'm not sure that using > Latin-1 in source code is a great idea either. > > > The tokenizer in Python 1.6 already contains smart logic to get the > > size of TABs right (pasting from tokenizer.c): ... > > Before we go any further we should design pragmas. The current > approach is inefficient and only designed to accommodate > editor-specific magical commands. > > I say it's a Python 1.7 issue. Good idea :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer@tismer.com Wed Apr 5 16:01:24 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 05 Apr 2000 17:01:24 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? References: <38EB4DAE.2F538F9F@tismer.com> <200004051433.KAA16229@eric.cnri.reston.va.us> Message-ID: <38EB5544.5D428C01@tismer.com> Guido van Rossum wrote: [me, about embedding tracebacks into frames] > > Does this make sense? Do I miss something? > > Yes. It is quite possible to have multiple stack traces lingering > around that all point to the same stack frames. Oh, I see. This is a Standard Python specific thing, which I was about to forget. In my version, this can happen, too, unless you are in a continuation-protected context already. There (and that was what I looked at while debugging), this situation can never happen, since an exception creates continuation-copies of all the frames while it crawls up. Since the traceback causes refcount increase, all the frames protect themselves. Thank you. I see it is a stackless feature. I can implement it if I put protection into the core, not just the co-extension. Frames can carry the tracebacks under the condition that they are protected (copied) if the traceback fields are occupied. Great, since this is a rare condition. Thanks again for the enlightment - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From pf@artcom-gmbh.de Wed Apr 5 16:08:35 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 17:08:35 +0200 (MEST) Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: <200004051404.KAA16039@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 5, 2000 10: 4:53 am" Message-ID: Hi! [me]: [...] > > particular Tcl/Tk package into the Python package. This idea might be > > interesting for Red Hat, SuSE Linux distribution users to allow partial > > system upgrades with a binary python-1.6.rpm > > Can you tell me what problem you are trying to solve here? It makes > no sense to me, but maybe I'm missing something. Typically Python is > built to match the Tcl/Tk version you have installed, right? If you build from source this is true. But the Linux world is now different: The two major Linux distributions (RedHat, SuSE) both use the RPM format to distribute precompiled binary packages. Tcl/Tk usually lives in a separate package. (BTW.: SuSE in their perverse mood has splitted Python 1.5.2 itself into more than half a dozen separate packages, but that's another story). If someone wants to prebuild a Python 1.6 binary RPM for installation on any RPM based Linux system it is unknown, which version of Tcl/Tk is installed on the destination system. So either you can build a monster RPM, which includes the Tcl/Tk shared libs or use the RPM Spec file to force the user to install a specific version of Tcl/Tk (for example 8.2.3) or implement something like I suggested above. Of course this places a lot of burden on the RPM builder: he has to install at least all the four major versions of Tcl/Tk (8.0 - 8.3) on his machine and has to build _tkinter four times against each particular shared library and header files... but this would be possible. Currently the situation with SuSE Python 1.5.2 RPMs is even more dangerous, since the SPEC files used by SuSE simply contains the following 'Requires'-definitions: %package -n pyth_tk Requires: python tk tix blt This makes RPM believe that *any* version of Tcl/Tk would fit. Luckily SuSE 6.4 (released last week) still ships with the old Tcl/Tk 8.0.5, so this will not break until SuSE decides to upgrade their Tcl/Tk. But I guess that Red Hat comes with a newer version of Tcl/Tk. Hopefully they have got their SPEC file right (they invented RPM in the first place) RPM can be a really powerful tool protecting people from breaking their system with binary updates --- if used the right way... :-( May be I should go ahead and write a RPM Python.SPEC file? Would that have a chance to get included into src/Misc? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido@python.org Wed Apr 5 16:25:38 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 11:25:38 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 17:04:31 +0200." <38EB55FF.C900CF8A@lemburg.com> References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> Message-ID: <200004051525.LAA16345@eric.cnri.reston.va.us> > u"..." currently interprets the characters it finds as Latin-1 > (this is by design, since the first 256 Unicode ordinals map to > the Latin-1 characters). Nice, except that now we seem to be ambiguous about the source character encoding: it's Latin-1 for Unicode strings and UTF-8 for 8-bit strings...! --Guido van Rossum (home page: http://www.python.org/~guido/) From pf@artcom-gmbh.de Wed Apr 5 16:54:12 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 17:54:12 +0200 (MEST) Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: <200004051525.LAA16345@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 5, 2000 11:25:38 am" Message-ID: Guido van Rossum: > > u"..." currently interprets the characters it finds as Latin-1 > > (this is by design, since the first 256 Unicode ordinals map to > > the Latin-1 characters). > > Nice, except that now we seem to be ambiguous about the source > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > 8-bit strings...! This is a little bit difficult to understand and will make the task to write the upcoming 1.6 documentation even more challenging. ;-) But I agree: Changing this should go into 1.7 BTW: Our umlaut strings are sooner or later passed through one central function. All modules usually contain something like this: try: import fintl _ = fintl.gettext execpt ImportError: def _(msg): return msg ... MenuEntry(_("Öffnen"), self.open), MenuEntry(_("Schließen"), self.close) .... you get the picture. It would be easy to change the implementation of 'fintl.gettext' to coerce the resulting strings into Unicode or do whatever is required. But we currently use GNU gettext to produce the messages files that are translated into english, french and italian. AFAIK GNU gettext handles only 8 bit strings anyway. Our customers in far east currently live with the english version but this has merely financial than technical reasons. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido@python.org Wed Apr 5 19:01:29 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 14:01:29 -0400 Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: Your message of "Wed, 05 Apr 2000 17:08:35 +0200." References: Message-ID: <200004051801.OAA16736@eric.cnri.reston.va.us> > RPM can be a really powerful tool protecting people from breaking their > system with binary updates --- if used the right way... :-( > > May be I should go ahead and write a RPM Python.SPEC file? > Would that have a chance to get included into src/Misc? I'd say yes! But check with Oliver Andrich first, who's maintaining Python RPMs already. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed Apr 5 19:32:26 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 20:32:26 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> Message-ID: <38EB86BA.5225C381@lemburg.com> Guido van Rossum wrote: > > > u"..." currently interprets the characters it finds as Latin-1 > > (this is by design, since the first 256 Unicode ordinals map to > > the Latin-1 characters). > > Nice, except that now we seem to be ambiguous about the source > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > 8-bit strings...! Noo... there is no definition for non-ASCII 8-bit strings in Python source code using the ordinal range 127-255. If you were to define Latin-1 as source code encoding, then we would have to change auto-coercion to make a Latin-1 assumption instead, but... I see the picture: people are getting pretty confused about what is going on. If you write u"xyz" then the ordinals of those characters are taken and stored directly as Unicode characters. If you live in a Latin-1 world, then you happen to be lucky: the Unicode characters match your input. If not, some totally different characters are likely to show if the string were written to a file and displayed using a Unicode aware editor. The same will happen to your normal 8-bit string literals. Nothing unusual so far... if you use Latin-1 strings and write them to a file, you get Latin-1. If you happen to program on DOS, you'll get the DOS ANSI encoding for the German umlauts. Now the key point where all this started was that u'ä' in 'äöü' will raise an error due to 'äöü' being *interpreted* as UTF-8 -- this doesn't mean that 'äöü' will be interpreted as UTF-8 elsewhere in your application. The UTF-8 assumption had to be made in order to get the two worlds to interoperate. We could have just as well chosen Latin-1, but then people currently using say a Russian encoding would get upset for the same reason. One way or another somebody is not going to like whatever we choose, I'm afraid... the simplest solution is to use Unicode for all strings which contain non-ASCII characters and then call .encode() as necessary. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fredrik Lundh" >>> None in "abc" Traceback (most recent call last): File "", line 1, in ? TypeError: coercing to Unicode: need string or charbuffer now that's an interesting error message. I think the old one was better ;-) From Fredrik Lundh" I wrote: > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (innermost last): > File "", line 1, in ? > TypeError: expected a character buffer object with the latest version, I get: >>> "!" in ("a", None) 0 >>> u"!" in ("a", None) Traceback (most recent call last): File "", line 1, in ? TypeError: coercing to Unicode: need string or charbuffer is this really an improvement? looks like writing code that works with any kind of strings will be harder than I thought... From guido@python.org Wed Apr 5 22:46:47 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 17:46:47 -0400 Subject: [Python-Dev] Re: unicode: strange exception (part 2) In-Reply-To: Your message of "Wed, 05 Apr 2000 23:38:10 +0200." <000e01bf9f47$7e47eac0$34aab5d4@hagrid> References: <000e01bf9f47$7e47eac0$34aab5d4@hagrid> Message-ID: <200004052146.RAA22187@eric.cnri.reston.va.us> > with the latest version, I get: > > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: coercing to Unicode: need string or charbuffer > > is this really an improvement? > > looks like writing code that works with any kind of strings > will be harder than I thought... Are you totally up-to-date? I get >>> u"!" in ("a", None) 0 >>> --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 5 23:37:24 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 18:37:24 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 20:32:26 +0200." <38EB86BA.5225C381@lemburg.com> References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> Message-ID: <200004052237.SAA22215@eric.cnri.reston.va.us> [MAL] > > > u"..." currently interprets the characters it finds as Latin-1 > > > (this is by design, since the first 256 Unicode ordinals map to > > > the Latin-1 characters). [GvR] > > Nice, except that now we seem to be ambiguous about the source > > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > > 8-bit strings...! [MAL] > Noo... there is no definition for non-ASCII 8-bit strings in > Python source code using the ordinal range 127-255. If you were > to define Latin-1 as source code encoding, then we would have > to change auto-coercion to make a Latin-1 assumption instead, but... > I see the picture: people are getting pretty confused about what > is going on. > > If you write u"xyz" then the ordinals of those characters are > taken and stored directly as Unicode characters. If you live > in a Latin-1 world, then you happen to be lucky: the Unicode > characters match your input. If not, some totally different > characters are likely to show if the string were written > to a file and displayed using a Unicode aware editor. > > The same will happen to your normal 8-bit string literals. > Nothing unusual so far... if you use Latin-1 strings and > write them to a file, you get Latin-1. If you happen to > program on DOS, you'll get the DOS ANSI encoding for the > German umlauts. > > Now the key point where all this started was that > u'ä' in 'äöü' will raise an error due to 'äöü' being > *interpreted* as UTF-8 -- this doesn't mean that 'äöü' > will be interpreted as UTF-8 elsewhere in your application. > > The UTF-8 assumption had to be made in order to get the two > worlds to interoperate. We could have just as well chosen > Latin-1, but then people currently using say a Russian > encoding would get upset for the same reason. > > One way or another somebody is not going to like whatever > we choose, I'm afraid... the simplest solution is to use > Unicode for all strings which contain non-ASCII characters > and then call .encode() as necessary. I have a different view on this (except that I agree that it's pretty confusing :-). In my definition of a "source character encoding", string literals, whether Unicode or 8-bit strings, are translated from the source encoding to the corresponding run-time values. If I had a C compiler that read its source in EBCDIC but cross-compiled to a machine that used ASCII, I would expect that 'a' in the source would have the integer value 97 (ASCII 'a'), regardless of the EBCDIC value for 'a'. If I type a non-ASCII Latin-1 character in a Unicode literal, it generates the corresponding Unicode character. This means to me that the source character encoding is Latin-1. But when I type the same character in an 8-bit character literal, that literal is interpreted as UTF-8 (e.g. when converting to Unicode using the default conversions). Thus, even though you can do whatever you want with 8-bit literals in your program, the most defensible view is that they are UTF-8 encoded. I would be much happier if all source code was encoded in the same encoding, because otherwise there's no good way to view such code in a general Unicode-aware text viewer! My preference would be to always use UTF-8. This would mean no change for 8-bit literals, but a big change for Unicode literals... And a break with everyone who's currently typing Latin-1 source code and using strings as Latin-1. (Or Latin-7, or whatever.) My next preference would be a pragma to define the source encoding, but that's a 1.7 issue. Maybe the whole thing is... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Wed Apr 5 23:51:51 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 06 Apr 2000 00:51:51 +0200 Subject: [Python-Dev] Re: unicode: strange exception References: <000f01bf9f47$7ea37840$34aab5d4@hagrid> Message-ID: <38EBC387.FAB08D61@lemburg.com> Fredrik Lundh wrote: > > >>> None in "abc" > Traceback (most recent call last): > File "", line 1, in ? > TypeError: coercing to Unicode: need string or charbuffer > > now that's an interesting error message. I think the old one > was better ;-) How come you're always faster on this than I am with my patches ;-) The above is already fixed in my local version (together with some other minor stuff I found in the codec error handling) with the next patch set. It will then again produce this output: >>> None in "abc" Traceback (most recent call last): File "", line 1, in ? TypeError: string member test needs char left operand BTW, my little "don't use tabs use spaces" in C code extravaganza was a complete nightmare... diff just doesn't like it and the Python source code is full of places where tabs and spaces are mixed in many different ways... I'm back to tabs-indent-mode again :-/ -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond@skippinet.com.au Thu Apr 6 01:19:30 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 6 Apr 2000 10:19:30 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040525.BAA11585@eric.cnri.reston.va.us> Message-ID: > I just downloaded and installed it. I've never seen an > installer like > this -- they definitely put a lot of effort in it. hehe - guess who "encouraged" that :-) > Annoying nit: they > tell you to install "MS Windows Installer" first that should be a good clue :-) > and of > course, being > a MS tool, it requires a reboot. :-( Actually, MSI is very cool. Now MSI is installed, most future MSI installs should proceed without reboot. In Win2k it is finally close to perfect. I dont think an installer has ever wanted to reboot my PC since Win2k. > Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So > there. It also didn't change PATH for me, even though the docs > mention that it does -- maybe only on NT? In another mail you asked David to look into how Active State handle their DLLs. Well, Trent Mick started the ball rolling... The answer is that Perl extensions never import data from the core DLL. They always import functions. In many cases, they can hide this fact with the pre-processor. In the Python world, this qould be equivilent to never accessing Py_None directly - always via a "PyGetNone()" type function. As mentioned, this could possibly be hidden so that code still uses "Py_None". One advantage they mentioned a number of times is avoiding dependencies on differing Perl versions. By avoiding the import of data, they have far more possibilities, including the use of LoadLibrary(), and a new VC6 linker feature called "delay loading". To my mind, it would be quite difficult to make this work for Python. There are a a large number of data items we import, and adding a function call indirection to each one sounds a pain. [As a semi-related issue: This "delay loading" feature is very cool - basically, the EXE loader will not resolve external DLL references until actually used. This is the same trick mentioned on comp.lang.python, where they saw _huge_ startup increases (although the tool used there was a third-party tool). The thread in question on c.l.py resolved that, for some reason, the initialization of the Windows winsock library was taking many seconds on that particular PC. Guido - are you on VC6 yet? If so, I could look into this linker option, and see how it improves startup performance on Windows. Note - this feature only works if no data is imported - hence, we could use it in Python16.dll, as most of its imports are indeed functions. Python extension modules can not use it against Python16 itself as they import data.] Mark. From tim_one@email.msn.com Thu Apr 6 04:10:39 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 23:10:39 -0400 Subject: [Python-Dev] re: division In-Reply-To: <20000405094823.A11890@cnri.reston.va.us> Message-ID: <000401bf9f75$afb18520$ab2d153f@tim> [Greg Ward] > ... > In other words: > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 > > (where I have used artistic license in applying __div__ to actual > numbers -- you know what I mean). +1 from me provided you can sneak the new keyword past Guido <1/3 wink>. From tim_one@email.msn.com Thu Apr 6 04:10:35 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 23:10:35 -0400 Subject: [Python-Dev] re: division In-Reply-To: <200004051432.KAA16210@eric.cnri.reston.va.us> Message-ID: <000301bf9f75$ada190e0$ab2d153f@tim> [Moshe] > FWIW, I think Python should support Rationals, and have integer division > return a rational. I'm still working on the details of my great Python > numeric tower change. [Guido] > Forget it. ABC did this, and the problem is that where you *think* > you are doing something simple like calculating interest rates, you > are actually manipulating rational numbers with 1000s of digits in > their numerator and denumerator. Let's not be too hasty about this, cuz I doubt we'll get to change it twice . You (Guido) & I agreed that ABC's rationals didn't work out well way back when, but a) That has not been my experience in other languages -- ABC was unique. b) Presumably ABC did usability studies that concluded rationals were least surprising. c) TeachScheme! seems delighted with their use of rationals (indeed, one of TeachScheme!'s primary authors beat up on me in email for Python not doing this). d) I'd much rather saddle newbies with time & space surprises than correctness surprises. Last week I took some time to stare at the ABC manual again, & suspect I hit on the cause: ABC was *aggressively* rational. That is, ABC had no notation for floating point (ABC "approximate") literals; even 6.02e23 was taken to mean "exact rational". In my experience ABC was unique this way, and uniquely surprising for it: it's hard to be surprised by 2/3 returning a rational, but hard not to be surprised by 6.02e23/1.0001e-18 doing so. Give it some thought. > If you want to change it, consider emulating what kids currently use > in school: a decimal floating point calculator with N digits of > precision. This is what REXX does, and is very powerful even for experts (assuming the user can, as in REXX, specify N; but that means writing a whole slew of arbitrary-precision math libraries too -- btw, that is doable! e.g., I worked w/ Dave Gillespie on some of the algorithms for his amazing Emacs calc). It will run at best 10x slower than native fp of comparable precision, though, so experts will hate it in the cases they don't love it <0.5 wink>. one-case-where-one-size-doesn't-fit-anyone-ly y'rs - tim From petrilli@amber.org Thu Apr 6 04:16:28 2000 From: petrilli@amber.org (Christopher Petrilli) Date: Wed, 5 Apr 2000 23:16:28 -0400 Subject: [Python-Dev] re: division In-Reply-To: <000401bf9f75$afb18520$ab2d153f@tim>; from tim_one@email.msn.com on Wed, Apr 05, 2000 at 11:10:39PM -0400 References: <20000405094823.A11890@cnri.reston.va.us> <000401bf9f75$afb18520$ab2d153f@tim> Message-ID: <20000405231628.A24968@trump.amber.org> Tim Peters [tim_one@email.msn.com] wrote: > [Greg Ward] > > ... > > In other words: > > > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 > > > > (where I have used artistic license in applying __div__ to actual > > numbers -- you know what I mean). > > +1 from me provided you can sneak the new keyword past Guido <1/3 wink>. +1 from me as well. I spent a little time going through all my code, and looking through Zope as well, and I couldn't find any place I used 'div' as a variable, much less any place I depended on this behaviour, so I don't think my code would break in any odd ways. The only thing I can imagine is some printed text formatting issues. Chris -- | Christopher Petrilli | petrilli@amber.org From Moshe Zadka Thu Apr 6 07:30:44 2000 From: Moshe Zadka (Moshe Zadka) Date: Thu, 6 Apr 2000 08:30:44 +0200 (IST) Subject: [Python-Dev] re: division In-Reply-To: <000301bf9f75$ada190e0$ab2d153f@tim> Message-ID: On Wed, 5 Apr 2000, Tim Peters wrote: > Last week I took some time to stare at the ABC manual again, & suspect I hit > on the cause: ABC was *aggressively* rational. That is, ABC had no > notation for floating point (ABC "approximate") literals; even 6.02e23 was > taken to mean "exact rational". In my experience ABC was unique this way, > and uniquely surprising for it: it's hard to be surprised by 2/3 returning > a rational, but hard not to be surprised by 6.02e23/1.0001e-18 doing so. Ouch. There is definitely place for floats in the numeric tower. It's just that those shouldn't be reached accidentally <0.3 wink> > one-case-where-one-size-doesn't-fit-anyone-ly y'rs - tim but-in-this-case-two-sizes-do-seem-enough-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal@lemburg.com Thu Apr 6 09:50:47 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 06 Apr 2000 10:50:47 +0200 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> <38EB0A28.8E8F6397@lemburg.com> <200004051411.KAA16095@eric.cnri.reston.va.us> Message-ID: <38EC4FE7.94F862D7@lemburg.com> Guido van Rossum wrote: > > > E.g. say Unicode gets interned someday, then resize will > > need to watch out not resizing a Unicode object which is > > already stored in the interning dict. > > Note that string objects deal with this by requiring that the > reference count is 1 when a string is resized. This effectively > enforces that resizes are only used when the original creator is still > working on the string. Nice trick ;-) The new PyUnicode_Resize() will have the same interface as _PyString_Resize() since this seems to be the most flexible way to implement it without giving away possibilities for future optimizations... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gvwilson@nevex.com Thu Apr 6 12:31:26 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Thu, 6 Apr 2000 07:31:26 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <20000405231628.A24968@trump.amber.org> Message-ID: > > [Greg Ward] > > > In other words: > > > > > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > > > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 +1. Should 'mod' be made a synonym for '%' for symmetry's sake? Greg From guido@python.org Thu Apr 6 14:33:51 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:33:51 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Thu, 06 Apr 2000 10:19:30 +1000." References: Message-ID: <200004061333.JAA23880@eric.cnri.reston.va.us> > > Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So > > there. It also didn't change PATH for me, even though the docs > > mention that it does -- maybe only on NT? > > In another mail you asked David to look into how Active State handle > their DLLs. Well, Trent Mick started the ball rolling... > > The answer is that Perl extensions never import data from the core > DLL. They always import functions. In many cases, they can hide > this fact with the pre-processor. This doesn't answer my question. My question is how they support COM without having a DLL in the system directory. Or at least I don't understand how not importing data makes a difference. > By avoiding the import of data, they have far more possibilities, > including the use of LoadLibrary(), For what do they use LoadLibrary()? What is it? We use LoadLibraryEx() -- isn't that just as good? > and a new VC6 linker feature called "delay loading". > To my mind, it would be quite difficult to make this work for > Python. There are a a large number of data items we import, and > adding a function call indirection to each one sounds a pain. Agreed. > [As a semi-related issue: This "delay loading" feature is very > cool - basically, the EXE loader will not resolve external DLL > references until actually used. This is the same trick mentioned on > comp.lang.python, where they saw _huge_ startup increases (although > the tool used there was a third-party tool). The thread in question > on c.l.py resolved that, for some reason, the initialization of the > Windows winsock library was taking many seconds on that particular > PC. > > Guido - are you on VC6 yet? Yes -- I promised myself I'd start using VC6 for the 1.6 release cycle, and I did. > If so, I could look into this linker > option, and see how it improves startup performance on Windows. > Note - this feature only works if no data is imported - hence, we > could use it in Python16.dll, as most of its imports are indeed > functions. Python extension modules can not use it against Python16 > itself as they import data.] But what DLLs does python16 use that could conceivably be delay-loaded? Note that I have a feeling that there are a few standard extensions that should become separate PYDs -- e.g. socket (for the above reason) and unicodedata. This would greatly reduce the size of python16.dll. Since this way we manage our own DLL loading anyway, what's the point of delay-loading? --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Thu Apr 6 14:43:00 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Thu, 6 Apr 2000 15:43:00 +0200 (CEST) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <200004041957.PAA13168@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 04, 2000 03:57:47 PM Message-ID: <200004061343.PAA20218@python.inrialpes.fr> [Guido] > > If it ain't broken, don't "fix" it. > This also explains why socket.connect() generated so much resistance... -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond@skippinet.com.au Thu Apr 6 14:53:10 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 6 Apr 2000 23:53:10 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004061333.JAA23880@eric.cnri.reston.va.us> Message-ID: > > The answer is that Perl extensions never import data > from the core > > DLL. They always import functions. In many cases, > they can hide > > this fact with the pre-processor. > > This doesn't answer my question. My question is how they > support COM > without having a DLL in the system directory. Or at least I don't > understand how not importing data makes a difference. By not using data, they can use either "delay load", or fully dynamic loading. Fully dynamic loading really just involves getting every API function via GetProcAddress() rather than having implicit linking via external references. GetProcAddress() can retrieve data items, but only their address, leaving us still in a position where "Py_None" doesnt work without magic. Delay Loading involves not loading the DLL until the first reference is used. This also lets you define code that locates the DLL to be used. This code is special in a "DllMain" kinda way, but does allow runtime binding to a statically linked DLL. However, it still has the "no data" limitation. > But what DLLs does python16 use that could conceivably be > delay-loaded? > > Note that I have a feeling that there are a few standard > extensions > that should become separate PYDs -- e.g. socket (for the > above reason) > and unicodedata. This would greatly reduce the size of > python16.dll. Agreed - these were my motivation. If these are moving to external modules then I am happy. I may have a quick look for other preloaded DLLs we can avoid - worth a look for the sake of a linker option :-) Mark. From guido@python.org Thu Apr 6 14:52:47 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:52:47 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Thu, 06 Apr 2000 15:43:00 +0200." <200004061343.PAA20218@python.inrialpes.fr> References: <200004061343.PAA20218@python.inrialpes.fr> Message-ID: <200004061352.JAA24034@eric.cnri.reston.va.us> [GvR] > > If it ain't broken, don't "fix" it. [VM] > This also explains why socket.connect() generated so much resistance... Yes -- people are naturally conservative. I am too, myself, so I should have known... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Thu Apr 6 14:51:41 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Thu, 6 Apr 2000 15:51:41 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004051433.KAA16229@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 05, 2000 10:33:18 AM Message-ID: <200004061351.PAA20261@python.inrialpes.fr> [Christian] > > When I look into tracebacks, it turns out to be just a chain > > like the frame chain, but upward down. It holds references > > to the frames in a 1-to-1 manner, and it keeps copies of > > f->f_lasti and f->f_lineno. I don't see why this is needed. > > ... > > Does this make sense? Do I miss something? > [Guido] > Yes. It is quite possible to have multiple stack traces lingering > around that all point to the same stack frames. This reminds me that some time ago I made an experimental patch for removing SET_LINENO. There was the problem of generating callbacks for pdb (which I think I solved somehow but I don't remember the details). I do remember that I had to look at pdb again for some reason. Is there any interest in reviving this idea? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From guido@python.org Thu Apr 6 14:57:27 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:57:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Thu, 06 Apr 2000 23:53:10 +1000." References: Message-ID: <200004061357.JAA24071@eric.cnri.reston.va.us> > > > The answer is that Perl extensions never import data from the core > > > DLL. They always import functions. In many cases, they can hide > > > this fact with the pre-processor. > > > > This doesn't answer my question. My question is how they support COM > > without having a DLL in the system directory. Or at least I don't > > understand how not importing data makes a difference. > > By not using data, they can use either "delay load", or fully > dynamic loading. > > Fully dynamic loading really just involves getting every API > function via GetProcAddress() rather than having implicit linking > via external references. GetProcAddress() can retrieve data items, > but only their address, leaving us still in a position where > "Py_None" doesnt work without magic. Actually, Py_None is just a macro that expands to the address of some data -- isn't that exactly what we need? > Delay Loading involves not loading the DLL until the first reference > is used. This also lets you define code that locates the DLL to be > used. This code is special in a "DllMain" kinda way, but does allow > runtime binding to a statically linked DLL. However, it still has > the "no data" limitation. > > > But what DLLs does python16 use that could conceivably be > > delay-loaded? > > > > Note that I have a feeling that there are a few standard > > extensions > > that should become separate PYDs -- e.g. socket (for the > > above reason) > > and unicodedata. This would greatly reduce the size of > > python16.dll. > > Agreed - these were my motivation. If these are moving to external > modules then I am happy. I may have a quick look for other > preloaded DLLs we can avoid - worth a look for the sake of a linker > option :-) OK, I'll look into moving socket and unicodedata out of python16.dll. But, I still don't understand why Perl/COM doesn't need a DLL in the system directory. Or is it just because they change PATH? (I don't know zit about COM, so that may be it. I understand that a COM object is registered (in the registry) as an entry point of a DLL. Couldn't that DLL be specified by absolute pathname??? Then no search path would be necessary.) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Thu Apr 6 15:07:38 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 00:07:38 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004061357.JAA24071@eric.cnri.reston.va.us> Message-ID: > But, I still don't understand why Perl/COM doesn't need a > DLL in the > system directory. Or is it just because they change PATH? > > (I don't know zit about COM, so that may be it. I > understand that a > COM object is registered (in the registry) as an entry point of a > DLL. Couldn't that DLL be specified by absolute > pathname??? Then no > search path would be necessary.) Yes - but it all gets back to the exact same problem that got us here in the first place: * COM object points to \Python1.6\PythonCOM16.dll * PythonCOM16.dll has link-time reference to Python16.dll * As COM just uses LoadLibrary(), the path of PythonCOM16.dll is not used to resolve its references - only the path of the host .EXE, the system path, etc. End result is Python16.dll is not found, even though it is in the same directory. So, if you have the opportunity to intercept the link-time reference to a DLL (or, obviously, use LoadLibrary()/GetProcAddress() to reference the DLL), you can avoid override the search path. Thus, if PythonCOM16.dll could intercept its references to Python16.dll, it could locate the correct Python16.dll with runtime code. However, as we import data from Python16.dll rather then purely addresses, we can't use any of these interception solutions. If we could hide all data references behind macros, then we could possibly arrange it. Perl _does_ use such techniques, so can arrange for the runtime type resolution. (Its not clear if Perl uses "dynamic loading" via GetProcAddress(), or delayed loading via the new VC6 feature - I believe the former, but the relevant point is that they definately hide data references behind magic...) Mark. From skip@mojam.com (Skip Montanaro) Thu Apr 6 14:08:14 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 6 Apr 2000 08:08:14 -0500 (CDT) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004061351.PAA20261@python.inrialpes.fr> References: <200004051433.KAA16229@eric.cnri.reston.va.us> <200004061351.PAA20261@python.inrialpes.fr> Message-ID: <14572.35902.781258.448592@beluga.mojam.com> Vladimir> This reminds me that some time ago I made an experimental Vladimir> patch for removing SET_LINENO. There was the problem of Vladimir> generating callbacks for pdb (which I think I solved somehow Vladimir> but I don't remember the details). I do remember that I had to Vladimir> look at pdb again for some reason. Is there any interest in Vladimir> reviving this idea? I believe you can get line number information from a code object's co_lnotab attribute, though I don't know the format. I think this should be sufficient to allow SET_LINENO to be eliminated altogether. It's just that there are places in various modules that predate the appearance of co_lnotab. Whoops, wait a minute. I just tried >>> def foo(): pass ... >>> foo.func_code.co_lnotab with both "python" and "python -O". co_lnotab is empty for python -O. I thought it was supposed to always be generated? -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From tismer@tismer.com Thu Apr 6 16:09:51 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 17:09:51 +0200 Subject: [Python-Dev] Re: unicode: strange exception References: <000f01bf9f47$7ea37840$34aab5d4@hagrid> <38EBC387.FAB08D61@lemburg.com> Message-ID: <38ECA8BF.5C47F700@tismer.com> "M.-A. Lemburg" wrote: > BTW, my little "don't use tabs use spaces" in C code extravaganza > was a complete nightmare... diff just doesn't like it and the > Python source code is full of places where tabs and spaces > are mixed in many different ways... I'm back to tabs-indent-mode > again :-/ Isn't this ignorable with the diff -b switch? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake@acm.org Thu Apr 6 16:12:11 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 6 Apr 2000 11:12:11 -0400 (EDT) Subject: [Python-Dev] Unicode documentation Message-ID: <14572.43339.472062.364098@seahag.cnri.reston.va.us> I've added Marc-Andre's documentation updates for Unicode to the Python CVS repository; I don't think I've done any damage. Marc-Andre, please review and let me know if I've missed anything! Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tismer@tismer.com Thu Apr 6 16:16:16 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 17:16:16 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? References: <200004061351.PAA20261@python.inrialpes.fr> Message-ID: <38ECAA40.456F9919@tismer.com> Vladimir Marangozov wrote: > > [Christian] > > > When I look into tracebacks, it turns out to be just a chain > > > like the frame chain, but upward down. It holds references > > > to the frames in a 1-to-1 manner, and it keeps copies of > > > f->f_lasti and f->f_lineno. I don't see why this is needed. > > > ... > > > Does this make sense? Do I miss something? > > > > [Guido] > > Yes. It is quite possible to have multiple stack traces lingering > > around that all point to the same stack frames. > > This reminds me that some time ago I made an experimental patch for > removing SET_LINENO. There was the problem of generating callbacks > for pdb (which I think I solved somehow but I don't remember the > details). I do remember that I had to look at pdb again for some > reason. Is there any interest in reviving this idea? This is a very cheap opcode (at least in my version). What does it buy? Can you drop the f_lineno field from frames, and calculate it for the frame's f_lineno attribute? -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas.heller@ion-tof.com Thu Apr 6 16:40:38 2000 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Thu, 6 Apr 2000 17:40:38 +0200 Subject: [Python-Dev] DLL in the system directory on Windows Message-ID: <01ce01bf9fde$7601f8a0$4500a8c0@thomasnotebook> > However, as we import data from Python16.dll rather then purely > addresses, we can't use any of these interception solutions. What's wrong with: #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) I have only looked at PythonCOM15.dll, and it seems that there are only references to a handfull of exported data items: some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, _PyZero_Struct. Thomas Heller From jim@interet.com Thu Apr 6 16:48:50 2000 From: jim@interet.com (James C. Ahlstrom) Date: Thu, 06 Apr 2000 11:48:50 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <200004061357.JAA24071@eric.cnri.reston.va.us> Message-ID: <38ECB1E2.AD1BAF5C@interet.com> Guido van Rossum wrote: > But, I still don't understand why Perl/COM doesn't need a DLL in the > system directory. Or is it just because they change PATH? Here is some generic info which may help, or perhaps you already know it. If you have a DLL head.dll or EXE head.exe which needs another DLL needed.dll, you can link needed.dll with head, and the system will find all data and module names automatically (well, almost). When head is loaded, needed.dll must be available, or head will fail to load. This can be confusing. For example, I once tried to port PIL to my new Python mini-GUI model, and my DLL failed. Only after some confusion did I realize that PIL is linked with Tk libs, and would fail to load if they were not present, even though I was not using them. I think what Mark is saying is that Microsoft now has an option to do delayed DLL loading. The load of needed.dll is delayed until a function in needed.dll is called. This would have meant that PIL would have worked provided I never called a Tk function. I think he is also saying that this feature can only trap function calls, not pointer access to data, so it won't work in the context of data access (maybe it would if a function call came first). Of course, if you access all data through a function call GetMyData(), it all works. As an alternative, head.[exe|dll] would not be linked with needed.dll, and so needed.dll need not be present. To access functions by name in needed.dll, you call LoadLibrary or LoadLibraryEx to open needed.dll, and then call GetProcAddress() to get a pointer to named functions. In the case of data items, the pointer is dereferenced twice, that is, data = **pt. Python uses this strategy to load PYD's, and accesses the sole function initmodule(). Then the rest of the data is available through Python mechanisms which effectively substitute for normal DLL access. The alternative search path available in LoadLibraryEx only affects head.dll, and causes the system to look in the directory of needed.dll instead of the directory of the ultimate executable for finding other needed DLL's. So on Windows, Python needs PYTHONPATH to find PYD's, and if the PYD's need further DLL's those DLL's can be in the directory of the PYD, or on the usual DLL search path provided the "alternate search path" is used. Probably you alread know this, but maybe it will help the Windozly-challenged follow along. JimA From tismer@tismer.com Thu Apr 6 22:22:30 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 23:22:30 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? Message-ID: <38ED0016.E1C4A26C@tismer.com> Hi, asa side effect, I happened to observe the following rounding bug. It happens in Stackless Python, which is built against the pre-unicode CVS branch. Is this changed for 1.6, or might it be my bug? D:\python\spc>python Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1415999999999999 >>> ^Z D:\python>python Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1416 >>> ^Z ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer@tismer.com Thu Apr 6 22:31:03 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 23:31:03 +0200 Subject: [Python-Dev] Long Multiplication is not commutative. Message-ID: <38ED0217.7C44A24F@tismer.com> Yikes! No, it is computatively commutative, just not in terms of computation time. :-)) The following factorial loops differ by a remarkable factor of 1.8, and we can gain this speed by changing long_mult to always put the lower multiplicand into the left. This was reported to me by Lenny Kneler, who thought he had found a Stackless bug, but he was actually testing long math. :-) This buddy... >>> def ifact3(n) : ... p = 1L ... for i in range(1,n+1) : ... p = i*p ... return p performs better by a factor of 1.8 than this one: >>> def ifact1(n) : ... p = 1L ... for i in range(1,n+1) : ... p = p*i ... return p The analysis of this behavior is quite simple if you look at the implementation of long_mult. If the left operand is big and the right is small, there are much more carry operations performed, together with more loop overhead. Swapping the multiplicands would be a 5 line patch. Should I submit it? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From jeremy@cnri.reston.va.us Thu Apr 6 22:29:13 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 6 Apr 2000 17:29:13 -0400 (EDT) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <38ECAA40.456F9919@tismer.com> References: <200004061351.PAA20261@python.inrialpes.fr> <38ECAA40.456F9919@tismer.com> Message-ID: <14573.425.369099.605774@bitdiddle.cnri.reston.va.us> >> Vladimir Marangozov wrote: >> This reminds me that some time ago I made an experimental patch >> for removing SET_LINENO. There was the problem of generating >> callbacks for pdb (which I think I solved somehow but I don't >> remember the details). I do remember that I had to look at pdb >> again for some reason. Is there any interest in reviving this >> idea? I think the details are important. The only thing the SET_LINENO opcode does is to call a trace function if one is installed. It's necessary to have some way to invoke the trace function when the line number changes (or it will be relatively difficult to execute code line-by-line in the debugger ). Off the top of my head, the only other way I see to invoke the trace function would be to add code at the head of the mainloop that computed the line number for each instruction (from lnotab) and called the trace function if the current line number is different than the previous time through the loop. That doesn't sound faster or simpler. Jeremy From guido@python.org Thu Apr 6 22:30:21 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 17:30:21 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Thu, 06 Apr 2000 23:22:30 +0200." <38ED0016.E1C4A26C@tismer.com> References: <38ED0016.E1C4A26C@tismer.com> Message-ID: <200004062130.RAA26273@eric.cnri.reston.va.us> > asa side effect, I happened to observe the following rounding bug. > It happens in Stackless Python, which is built against the > pre-unicode CVS branch. > > Is this changed for 1.6, or might it be my bug? > > D:\python\spc>python > Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1415999999999999 > >>> ^Z > > D:\python>python > Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1416 > >>> ^Z This is because repr() now uses full precision for floating point numbers. round() does what it can, but 3.1416 just can't be represented exactly, and "%.17g" gives 3.1415999999999999. This is definitely the right thing to do for repr() -- ask Tim. However, it may be time to switch so that "immediate expression" values are printed as str() instead of as repr()... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Thu Apr 6 21:31:02 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 6 Apr 2000 15:31:02 -0500 (CDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38ED0016.E1C4A26C@tismer.com> References: <38ED0016.E1C4A26C@tismer.com> Message-ID: <14572.62470.804145.677372@beluga.mojam.com> Chris> I happened to observe the following rounding bug. It happens in Chris> Stackless Python, which is built against the pre-unicode CVS Chris> branch. Chris> Is this changed for 1.6, or might it be my bug? I doubt it's your problem. I see it too with 1.6a2 (no stackless): % ./python Python 1.6a2 (#2, Apr 6 2000, 15:27:22) [GCC pgcc-2.91.66 19990314 (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1415999999999999 Same behavior whether compiled with -O2 or -g. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From guido@python.org Thu Apr 6 22:32:36 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 17:32:36 -0400 Subject: [Python-Dev] Long Multiplication is not commutative. In-Reply-To: Your message of "Thu, 06 Apr 2000 23:31:03 +0200." <38ED0217.7C44A24F@tismer.com> References: <38ED0217.7C44A24F@tismer.com> Message-ID: <200004062132.RAA26296@eric.cnri.reston.va.us> > This buddy... > > >>> def ifact3(n) : > ... p = 1L > ... for i in range(1,n+1) : > ... p = i*p > ... return p > > performs better by a factor of 1.8 than this one: > > >>> def ifact1(n) : > ... p = 1L > ... for i in range(1,n+1) : > ... p = p*i > ... return p > > The analysis of this behavior is quite simple if you look at the > implementation of long_mult. If the left operand is big and the > right is small, there are much more carry operations performed, > together with more loop overhead. > Swapping the multiplicands would be a 5 line patch. > Should I submit it? Yes, go for it. I would appreciate a bunch of new test cases that exercise the new path through the code, too... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Thu Apr 6 23:43:16 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 00:43:16 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <14573.425.369099.605774@bitdiddle.cnri.reston.va.us> from "Jeremy Hylton" at Apr 06, 2000 05:29:13 PM Message-ID: <200004062243.AAA21491@python.inrialpes.fr> Jeremy Hylton wrote: > > >> Vladimir Marangozov wrote: > >> This reminds me that some time ago I made an experimental patch > >> for removing SET_LINENO. There was the problem of generating > >> callbacks for pdb (which I think I solved somehow but I don't > >> remember the details). I do remember that I had to look at pdb > >> again for some reason. Is there any interest in reviving this > >> idea? > > I think the details are important. The only thing the SET_LINENO > opcode does is to call a trace function if one is installed. It's > necessary to have some way to invoke the trace function when the line > number changes (or it will be relatively difficult to execute code > line-by-line in the debugger ). Looking back at the discussion and the patch I ended up with at that time, I think the callback issue was solved rather elegantly. I'm not positive that it does not have side effects, though... For an overview of the approach and the corresponding patch, go back to: http://www.python.org/pipermail/python-dev/1999-August/002252.html http://sirac.inrialpes.fr/~marangoz/python/lineno/ What happens is that in tracing mode, a copy of the original code stream is created, a new CALL_TRACE opcode is stored in it at the addresses corresponding to each source line number, then the instruction pointer is redirected to execute the modified code string. Whenever a CALL_TRACE opcode is reached, the callback is triggered. On a successful return, the original opcode at the current address is fetched from the original code string, then directly goto the dispatch code. This code string duplication & conditional break-point setting occurs only when a trace function is set; in the "normal case", the interpreter executes a code string without SET_LINENO. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond@skippinet.com.au Fri Apr 7 01:47:06 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 10:47:06 +1000 Subject: [Python-Dev] RE: DLL in the system directory on Windows In-Reply-To: <01ce01bf9fde$7601f8a0$4500a8c0@thomasnotebook> Message-ID: > > However, as we import data from Python16.dll rather then purely > > addresses, we can't use any of these interception solutions. > > What's wrong with: > > #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) My only objection is that this is a PITA. It becomes a maintenance nightmare for Guido as the code gets significantly larger and uglier. > I have only looked at PythonCOM15.dll, and it seems that > there are only references to a handfull of exported data items: > > some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, > _PyZero_Struct. Yep - these structs, all the error objects and all the type objects. However, to do this properly, we must do _every_ exported data item, not just ones that satisfy COM (otherwise the next poor soul will have the exact same issue, and require patches to the core before they can work...) Im really not convinced it is worth it to save one, well-named DLL in the system directory. Mark. From guido@python.org Fri Apr 7 02:25:35 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 21:25:35 -0400 Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: Your message of "Fri, 07 Apr 2000 00:43:16 +0200." <200004062243.AAA21491@python.inrialpes.fr> References: <200004062243.AAA21491@python.inrialpes.fr> Message-ID: <200004070125.VAA26776@eric.cnri.reston.va.us> > What happens is that in tracing mode, a copy of the original code stream > is created, a new CALL_TRACE opcode is stored in it at the addresses > corresponding to each source line number, then the instruction pointer > is redirected to execute the modified code string. Whenever a CALL_TRACE > opcode is reached, the callback is triggered. On a successful return, > the original opcode at the current address is fetched from the original > code string, then directly goto the dispatch code. > > This code string duplication & conditional break-point setting occurs > only when a trace function is set; in the "normal case", the interpreter > executes a code string without SET_LINENO. Ai! This really sounds like a hack. It may be a standard trick in the repertoire of virtual machine implementers, but it is still a hack, and makes my heart cry. I really wonder if it makes enough of a difference to warrant all that code, and the risk that that code isn't quite correct. (Is it thread-safe?) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Fri Apr 7 02:36:30 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 11:36:30 +1000 Subject: [Python-Dev] RE: DLL in the system directory on Windows In-Reply-To: Message-ID: [I wrote] > My only objection is that this is a PITA. It becomes a ... > However, to do this properly, we must do _every_ exported ... > Im really not convinced it is worth it to save one, well-named DLL > in the system directory. ie, lots of good reasons _not_ to do this. However, it is worth pointing out that there is one good - possibly compelling - reason to consider this. Not only would we drop the dependency from the system directory, we could also drop the dependency to the Python version. That is, any C extension compiled for 1.6 would be able to automatically and without recompilation work with Python 1.7, so long as we kept all the same public names. It is too late for Python 1.5, but it would be a nice feature if an upgrade to Python 1.7 did not require waiting for every extension author to catch up. OTOH, if Python 1.7 is really the final in the 1.x family, is it worth it for a single version? Just-musing-ly, Mark. From ping@lfw.org Fri Apr 7 02:47:36 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 6 Apr 2000 20:47:36 -0500 (CDT) Subject: [Python-Dev] Pythons (Like Buses) Considered Harmful Message-ID: So, has anyone not seen Doctor Fun today yet? http://metalab.unc.edu/Dave/Dr-Fun/latest.jpg :) :) -- ?!ng From Vladimir.Marangozov@inrialpes.fr Fri Apr 7 03:02:22 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 04:02:22 +0200 (CEST) Subject: [Python-Dev] python -O weirdness Message-ID: <200004070202.EAA22307@python.inrialpes.fr> Strange. Can somebody confirm/refute, explain this behavior? -------------[ bug.py ]------------ def f(): pass def g(): a = 1 b = 2 def h(): pass def show(func): c = func.func_code print "(%d) %s: %d -> %s" % \ (c.co_firstlineno, c.co_name, len(c.co_lnotab), repr(c.co_lnotab)) show(f) show(g) show(h) ----------------------------------- ~> python bug.py (1) f: 2 -> '\003\001' (4) g: 4 -> '\003\001\011\001' (8) h: 2 -> '\003\000' ~> python -O bug.py (1) f: 2 -> '\000\001' (4) g: 4 -> '\000\001\006\001' (1) f: 2 -> '\000\001' <=== ??? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one@email.msn.com Fri Apr 7 03:19:02 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 6 Apr 2000 22:19:02 -0400 Subject: [Python-Dev] Long Multiplication is not commutative. In-Reply-To: <200004062132.RAA26296@eric.cnri.reston.va.us> Message-ID: <000701bfa037$a4545960$6c2d153f@tim> > Yes, go for it. I would appreciate a bunch of new test cases that > exercise the new path through the code, too... FYI, a suitable test would be to add a line to function test_division_2 in test_long.py, to verify that x*y == y*x. A variety of bitlengths for x and y are already generated by the framework. From tim_one@email.msn.com Fri Apr 7 03:19:00 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 6 Apr 2000 22:19:00 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38ED0016.E1C4A26C@tismer.com> Message-ID: <000601bfa037$a2c18460$6c2d153f@tim> [posted & mailed] [Christian Tismer] > as a side effect, I happened to observe the following rounding bug. > It happens in Stackless Python, which is built against the > pre-unicode CVS branch. > > Is this changed for 1.6, or might it be my bug? It's a 1.6 thing, and is not a bug. > D:\python\spc>python > Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1415999999999999 > >>> ^Z The best possible IEEE-754 double approximation to 3.1416 is (exactly) 3.141599999999999948130380289512686431407928466796875 so the output you got is correctly rounded to 17 significant digits. IOW, it's a feature. 1.6 boosted the number of decimal digits repr(float) produces so that eval(repr(x)) == x for every finite float on every platform with an IEEE-754-conforming libc. It was actually rare for that equality to hold pre-1.6. repr() cannot produce fewer digits than this without allowing the equality to fail in some cases. The 1.6 str() still produces the *illusion* that the result is 3.1416 (as repr() also did pre-1.6). IMO it would be better if Python stopped using repr() (at least by default) for formatting expressions at the interactive prompt (for much more on this, see DejaNews). the-two-things-you-can-do-about-it-are-nothing-and-love-it-ly y'rs - tim From guido@python.org Fri Apr 7 03:23:11 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 22:23:11 -0400 Subject: [Python-Dev] python -O weirdness In-Reply-To: Your message of "Fri, 07 Apr 2000 04:02:22 +0200." <200004070202.EAA22307@python.inrialpes.fr> References: <200004070202.EAA22307@python.inrialpes.fr> Message-ID: <200004070223.WAA26916@eric.cnri.reston.va.us> > Strange. Can somebody confirm/refute, explain this behavior? > > -------------[ bug.py ]------------ > def f(): > pass > > def g(): > a = 1 > b = 2 > > def h(): pass > > def show(func): > c = func.func_code > print "(%d) %s: %d -> %s" % \ > (c.co_firstlineno, c.co_name, len(c.co_lnotab), repr(c.co_lnotab)) > > show(f) > show(g) > show(h) > ----------------------------------- > > ~> python bug.py > (1) f: 2 -> '\003\001' > (4) g: 4 -> '\003\001\011\001' > (8) h: 2 -> '\003\000' > > ~> python -O bug.py > (1) f: 2 -> '\000\001' > (4) g: 4 -> '\000\001\006\001' > (1) f: 2 -> '\000\001' <=== ??? > > -- Yes. I can confirm and explain it. The functions f and h are sufficiently similar that their code objects actually compare equal. A little-known optimization is that two constants in a const array that compare equal (and have the same type!) are replaced by a single copy. This happens in the module's code object: f's and h's code are the same, so only one copy is kept. The function name is not taken into account for the comparison. Maybe it should? On the other hand, the name is a pretty inessential part of the function, and it's not going to change the semantics of the program... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov@inrialpes.fr Fri Apr 7 03:47:15 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 04:47:15 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004070125.VAA26776@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 06, 2000 09:25:35 PM Message-ID: <200004070247.EAA22442@python.inrialpes.fr> Guido van Rossum wrote: > > > What happens is that in tracing mode, a copy of the original code stream > > is created, a new CALL_TRACE opcode is stored in it at the addresses > > corresponding to each source line number, then the instruction pointer > > is redirected to execute the modified code string. Whenever a CALL_TRACE > > opcode is reached, the callback is triggered. On a successful return, > > the original opcode at the current address is fetched from the original > > code string, then directly goto the dispatch code. > > > > This code string duplication & conditional break-point setting occurs > > only when a trace function is set; in the "normal case", the interpreter > > executes a code string without SET_LINENO. > > Ai! This really sounds like a hack. It may be a standard trick in > the repertoire of virtual machine implementers, but it is still a > hack, and makes my heart cry. The implementation sounds tricky, yes. But there's nothing hackish in the principle of setting breakpoints. The modified code string is in fact the stripped code stream (without LINENO), reverted back to a standard code stream with LINENO. However, to simplify things, the LINENO (aka CALL_TRACE) are not inserted between the instructions for every source line. They overwrite the original opcodes in the copy whenever a trace function is set (i.e. we set all conditional breakpoints (LINENO) at once). And since we overwrite for simplicity, at runtime, we read the ovewritten opcodes from the original stream, after the callback returns. All this magic occurs before the main loop, with finalization on exit of eval_code2. A tricky implementation of the principle of having a set of conditional breakpoints for every source line (these cond. bp are currently the SET_LINENO opcodes, in a more redundant version). > I really wonder if it makes enough of a difference to warrant all > that code, and the risk that that code isn't quite correct. Well, all this business is internal to ceval.c and doesn't seem to affect the rest of the world. I can see only two benefits (if this idea doesn't hide other mysteries -- so anyone interested may want check it out): 1) Some tiny speedup -- we'll reach -O in a standard setup 2) The .pyc files become smaller. (Lib/*.pyc is reduced by ~80K for 1.5.2) No other benefits (hmmm, maybe the pdb code will be simplified wrt linenos) I originally developped this idea because of the redundant, consecutive SET_LINENO in a code object. > (Is it thread-safe?) I think so. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From thomas.heller@ion-tof.com Fri Apr 7 08:10:41 2000 From: thomas.heller@ion-tof.com (Thomas Heller) Date: Fri, 7 Apr 2000 09:10:41 +0200 Subject: [Python-Dev] Re: DLL in the system directory on Windows References: Message-ID: <03fe01bfa060$626a2010$4500a8c0@thomasnotebook> > > > However, as we import data from Python16.dll rather then purely > > > addresses, we can't use any of these interception solutions. > > > > What's wrong with: > > > > #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) > > My only objection is that this is a PITA. It becomes a maintenance > nightmare for Guido as the code gets significantly larger and > uglier. Why is it a nightmare for Guido? It can be done by the extension writer: You in the case for PythonCOM.dll. > > > I have only looked at PythonCOM15.dll, and it seems that > > there are only references to a handfull of exported data items: > > > > some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, > > _PyZero_Struct. > > Yep - these structs, all the error objects and all the type objects. > > However, to do this properly, we must do _every_ exported data item, > not just ones that satisfy COM (otherwise the next poor soul will > have the exact same issue, and require patches to the core before > they can work...) IMHO it is not a problem of exporting, but a question how *you* import these. > > Im really not convinced it is worth it to save one, well-named DLL > in the system directory. As long as no one else installs a modified version there (which *should* have a different name, but...) > > Mark. > Thomas Heller From fredrik@pythonware.com Fri Apr 7 09:47:37 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 10:47:37 +0200 Subject: [Python-Dev] SRE: regex.set_syntax References: <200004061343.PAA20218@python.inrialpes.fr> Message-ID: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Vladimir Marangozov wrote: > [Guido] > > If it ain't broken, don't "fix" it. >=20 > This also explains why socket.connect() generated so much = resistance... I'm not sure I see the connection -- the 'regex' module is already declared obsolete... so Guido probably meant "if it's not even in there, don't waste time on it" imo, the main reasons for supporting 'regex' are 1) that lots of people are still using it, often for performance reasons 2) while the import error should be easy to spot, actually changing from 'regex' to 're' requires some quite extensive core restructuring, especially com- pared to what it takes to fix a broken 'append' or 'connect' call, and 3) it's fairly easy to do, since the engines use the same semantics, and 'sre' supports pluggable front-ends. but alright, I think the consensus here is "(1) get rid of it completely". in 1.6a2, perhaps? From fredrik@pythonware.com Fri Apr 7 10:13:16 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 11:13:16 +0200 Subject: [Python-Dev] Pythons (Like Buses) Considered Harmful References: Message-ID: <00cd01bfa071$838c6cb0$0500a8c0@secret.pythonware.com> > So, has anyone not seen Doctor Fun today yet? >=20 > http://metalab.unc.edu/Dave/Dr-Fun/latest.jpg >=20 > :) :) the daily python-url features this link ages ago (in internet time, at least): http://hem.passagen.se/eff/url.htm (everyone should read the daily python URL ;-) From fredrik@pythonware.com Fri Apr 7 10:13:23 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 11:13:23 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> Message-ID: <00ce01bfa071$87fd5b60$0500a8c0@secret.pythonware.com> M.-A. Lemburg wrote: > The UTF-8 assumption had to be made in order to get the two > worlds to interoperate. We could have just as well chosen > Latin-1, but then people currently using say a Russian > encoding would get upset for the same reason. >=20 > One way or another somebody is not going to like whatever > we choose, I'm afraid... the simplest solution is to use > Unicode for all strings which contain non-ASCII characters > and then call .encode() as necessary. just a brief head's up: I've been playing with this a bit, and my current view is that the current unicode design is horridly broken when it comes to mixing 8-bit and 16-bit strings. basically, if you pass a uni- code string to a function slicing and dicing 8-bit strings, it will probably not work. and you will probably not under- stand why. I'm working on a proposal that I think will make things simpler and less magic, and far easier to understand. to appear on sunday. From Vladimir.Marangozov@inrialpes.fr Fri Apr 7 10:53:19 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 11:53:19 +0200 (CEST) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> from "Fredrik Lundh" at Apr 07, 2000 10:47:37 AM Message-ID: <200004070953.LAA25788@python.inrialpes.fr> Fredrik Lundh wrote: > > Vladimir Marangozov wrote: > > [Guido] > > > If it ain't broken, don't "fix" it. > > > > This also explains why socket.connect() generated so much resistance... > > I'm not sure I see the connection -- the 'regex' module is > already declared obsolete... Don't look further -- there's no connection with the re/sre code. It was just a thought about the above citation vs. the connect change. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Fri Apr 7 11:55:30 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 12:55:30 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> <00ce01bfa071$87fd5b60$0500a8c0@secret.pythonware.com> Message-ID: <38EDBEA2.8C843E49@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > The UTF-8 assumption had to be made in order to get the two > > worlds to interoperate. We could have just as well chosen > > Latin-1, but then people currently using say a Russian > > encoding would get upset for the same reason. > > > > One way or another somebody is not going to like whatever > > we choose, I'm afraid... the simplest solution is to use > > Unicode for all strings which contain non-ASCII characters > > and then call .encode() as necessary. > > just a brief head's up: > > I've been playing with this a bit, and my current view is that > the current unicode design is horridly broken when it comes > to mixing 8-bit and 16-bit strings. Why "horribly" ? String and Unicode mix pretty well, IMHO. The magic auto-conversion of Unicode to UTF-8 in C APIs using "s" or "s#" does not always do what the user expects, but it's still better than not having Unicode objects work with these APIs at all. > basically, if you pass a uni- > code string to a function slicing and dicing 8-bit strings, it > will probably not work. and you will probably not under- > stand why. > > I'm working on a proposal that I think will make things simpler > and less magic, and far easier to understand. to appear on > sunday. Looking forward to it, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov@inrialpes.fr Fri Apr 7 12:47:07 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 13:47:07 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <14572.35902.781258.448592@beluga.mojam.com> from "Skip Montanaro" at Apr 06, 2000 08:08:14 AM Message-ID: <200004071147.NAA26437@python.inrialpes.fr> Skip Montanaro wrote: > > Whoops, wait a minute. I just tried > > >>> def foo(): pass > ... > >>> foo.func_code.co_lnotab > > with both "python" and "python -O". co_lnotab is empty for python -O. I > thought it was supposed to always be generated? It is always generated, but since co_lnotab contains only lineno increments starting from co_firstlineno (i.e. only deltas) and your function is a 1-liner (no lineno increments starting from the first line of the function), the table is empty. Move 'pass' to the next line and the table will contain 1-entry (of 2 bytes: delta_addr, delta_line). Generally speaking, the problem really boils down to the callbacks from C to Python when a tracefunc is set. My approach is not that bad in this regard. A decent processor nowadays has (an IRQ pin) a flag for generating interrupts on every processor instruction (trace flag). In Python, we have the same problem - we need to interrupt the (virtual) processor, implemented in eval_code2() on regular intervals. Actually, what we need (for pdb) is to interrupt the processor on every source line, but one could easily imagine a per instruction interrupt (with a callback installed with sys.settracei(). This is exactly what the patch does under the grounds. It interrupts the processor on every new source line (but interrupting it on every instruction would be a trivial extension -- all opcodes in the code stream would be set to CALL_TRACE!) And this is exactly what LINENO does (+ some processor state saving in the frame: f_lasti, f_lineno). Clearly, there are 2 differences with the existing code: a) The interrupting opcodes are installed dynamically, on demand, only when a trace function is set, for the current traced frame. Presently, these opcodes are SET_LINENO; I introduced a new one byte CALL_TRACE opcode which does the same thing (thus preserving backwards compatibility with old .pyc that contain SET_LINENO). b) f_lasti and f_lineno aren't updated when the frame is not traced :-( I wonder whether we really care about them, though. The other implementation details aren't so important. Yet, they look scary, but no more than the co_lnotab business. The problem with my patch is point b). I believe the approach is good, though -- if it weren't, I woudn't have taken the care to talk about it detail. :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Fri Apr 7 12:57:41 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 13:57:41 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings Message-ID: <38EDCD35.DDD5EB4B@lemburg.com> There has been a bug report about the treatment of Unicode objects together with 8-bit format strings. The current implementation converts the Unicode object to UTF-8 and then inserts this value in place of the %s.... I'm inclined to change this to have '...%s...' % u'abc' return u'...abc...' since this is just another case of coercing data to the "bigger" type to avoid information loss. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer@tismer.com Fri Apr 7 13:41:19 2000 From: tismer@tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 14:41:19 +0200 Subject: [Python-Dev] python -O weirdness References: <200004070202.EAA22307@python.inrialpes.fr> <200004070223.WAA26916@eric.cnri.reston.va.us> Message-ID: <38EDD76F.986D3C39@tismer.com> Guido van Rossum wrote: ... > The function name is not taken into account for the comparison. Maybe > it should? Absolutely, please! > On the other hand, the name is a pretty inessential part > of the function, and it's not going to change the semantics of the > program... If the name of the code object has any meaning, then it must be the name of the function that I meant, not just another function which happens to have the same body, IMHO. or the name should vanish completely. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gward@mems-exchange.org Fri Apr 7 13:49:15 2000 From: gward@mems-exchange.org (Greg Ward) Date: Fri, 7 Apr 2000 08:49:15 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004062130.RAA26273@eric.cnri.reston.va.us>; from guido@python.org on Thu, Apr 06, 2000 at 05:30:21PM -0400 References: <38ED0016.E1C4A26C@tismer.com> <200004062130.RAA26273@eric.cnri.reston.va.us> Message-ID: <20000407084914.A13606@mems-exchange.org> On 06 April 2000, Guido van Rossum said: > This is because repr() now uses full precision for floating point > numbers. round() does what it can, but 3.1416 just can't be > represented exactly, and "%.17g" gives 3.1415999999999999. > > This is definitely the right thing to do for repr() -- ask Tim. > > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... +1 on this: it's easier to change "foo" to "`foo`" than to "str(foo)" or "print foo". It just makes more sense to use str(). Oh, joy! oh happiness! someday soon, I may be able to type "blah.__doc__" at the interactive prompt and get a readable result! Greg From mikael@isy.liu.se Fri Apr 7 13:57:38 2000 From: mikael@isy.liu.se (Mikael Olofsson) Date: Fri, 07 Apr 2000 14:57:38 +0200 (MET DST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <20000407084914.A13606@mems-exchange.org> Message-ID: On 07-Apr-00 Greg Ward wrote: > Oh, joy! oh happiness! someday soon, I may be able to type > "blah.__doc__" at the interactive prompt and get a readable result! Just i case... I hope you haven't missed "print blah.__doc__". /Mikael ----------------------------------------------------------------------- E-Mail: Mikael Olofsson WWW: http://www.dtr.isy.liu.se/dtr/staff/mikael Phone: +46 - (0)13 - 28 1343 Telefax: +46 - (0)13 - 28 1339 Date: 07-Apr-00 Time: 14:56:52 This message was sent by XF-Mail. ----------------------------------------------------------------------- From guido@python.org Fri Apr 7 14:01:45 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:01:45 -0400 Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: Your message of "Fri, 07 Apr 2000 13:57:41 +0200." <38EDCD35.DDD5EB4B@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> Message-ID: <200004071301.JAA27100@eric.cnri.reston.va.us> > There has been a bug report about the treatment of Unicode > objects together with 8-bit format strings. The current > implementation converts the Unicode object to UTF-8 and then > inserts this value in place of the %s.... > > I'm inclined to change this to have '...%s...' % u'abc' > return u'...abc...' since this is just another case of > coercing data to the "bigger" type to avoid information loss. > > Thoughts ? Makes sense. But note that it's going to be difficult to catch all cases: you could have '...%d...%s...%s...' % (3, "abc", u"abc") and '...%(foo)s...' % {'foo': u'abc'} and even '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'} (the latter should *not* convert to Unicode). --Guido van Rossum (home page: http://www.python.org/~guido/) From jack@oratrix.nl Fri Apr 7 14:06:51 2000 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 07 Apr 2000 15:06:51 +0200 Subject: [Python-Dev] PYTHON_API_VERSION and threading Message-ID: <20000407130652.4D002370CF2@snelboot.oratrix.nl> Something that just struck me: couldn't we use a couple of bits in the PYTHON_API_VERSION to check various other things that make dynamic modules break? WITH_THREAD is the one I just ran in to, but there's a few others such as the object refcounting statistics and platform-dependent things like the debug/nodebug compilation on Windows. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido@python.org Fri Apr 7 14:13:21 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:13:21 -0400 Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: Your message of "Fri, 07 Apr 2000 15:06:51 +0200." <20000407130652.4D002370CF2@snelboot.oratrix.nl> References: <20000407130652.4D002370CF2@snelboot.oratrix.nl> Message-ID: <200004071313.JAA27132@eric.cnri.reston.va.us> > Something that just struck me: couldn't we use a couple of bits in the > PYTHON_API_VERSION to check various other things that make dynamic modules > break? WITH_THREAD is the one I just ran in to, but there's a few others such > as the object refcounting statistics and platform-dependent things like the > debug/nodebug compilation on Windows. I'm curious what combination didn't work? The thread APIs are supposed to be designed so that all combinations work -- the APIs are always present, they just don't do anything in the unthreaded version. If an extension is compiled without threads, well, then it won't release the interpreter lock, of course, but otherwise there should be no bad effects. The debug issue on Windows is taken care of by a DLL naming convention: the debug versions are named spam_d.dll (or .pyd). --Guido van Rossum (home page: http://www.python.org/~guido/) From gward@mems-exchange.org Fri Apr 7 14:15:46 2000 From: gward@mems-exchange.org (Greg Ward) Date: Fri, 7 Apr 2000 09:15:46 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: ; from mikael@isy.liu.se on Fri, Apr 07, 2000 at 02:57:38PM +0200 References: <20000407084914.A13606@mems-exchange.org> Message-ID: <20000407091545.B13606@mems-exchange.org> On 07 April 2000, Mikael Olofsson said: > > On 07-Apr-00 Greg Ward wrote: > > Oh, joy! oh happiness! someday soon, I may be able to type > > "blah.__doc__" at the interactive prompt and get a readable result! > > Just i case... I hope you haven't missed "print blah.__doc__". Yeah, I know: my usual mode of operation is this: >>> blah.__doc__ ...repr of docstring... ...sound of me cursing... >>> print blah.__doc__ The real reason for using str() at the interactive prompt is not to save me keystrokes, but because it just seems like the sensible thing to do. People who understand the str/repr difference, and really want the repr version, can slap backquotes around whatever they're printing. Greg From guido@python.org Fri Apr 7 14:18:39 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:18:39 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Fri, 07 Apr 2000 10:47:37 +0200." <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <200004071318.JAA27173@eric.cnri.reston.va.us> > but alright, I think the consensus here is "(1) get rid > of it completely". in 1.6a2, perhaps? I don't think so... If people still use regex, why not keep it? It doesn't cost much to maintain... --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik@pythonware.com Fri Apr 7 14:43:03 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 15:43:03 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: <20000407084914.A13606@mems-exchange.org> <20000407091545.B13606@mems-exchange.org> Message-ID: <002801bfa097$33228770$0500a8c0@secret.pythonware.com> Greg wrote: > Yeah, I know: my usual mode of operation is this: >=20 > >>> blah.__doc__ > ...repr of docstring... > ...sound of me cursing... > >>> print blah.__doc__ on the other hand, I tend to do this now and then: >>> blah =3D foo() # returns chunk of binary data >>> blah which, if you use str instead of repr, can reprogram your terminal window in many interesting ways... but I think I'm +1 on this anyway. or at least +0.90000000000000002 From skip@mojam.com (Skip Montanaro) Fri Apr 7 14:04:39 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 7 Apr 2000 08:04:39 -0500 (CDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <14573.56551.939560.375409@beluga.mojam.com> Fredrik> 1) that lots of people are still using it, often for Fredrik> performance reasons Speaking of which, how do sre, re and regex compare to one another these days? -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From jack@oratrix.nl Fri Apr 7 15:19:36 2000 From: jack@oratrix.nl (Jack Jansen) Date: Fri, 07 Apr 2000 16:19:36 +0200 Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: Message by Guido van Rossum , Fri, 07 Apr 2000 09:13:21 -0400 , <200004071313.JAA27132@eric.cnri.reston.va.us> Message-ID: <20000407141937.3FBDE370CF2@snelboot.oratrix.nl> > > Something that just struck me: couldn't we use a couple of bits in the > > PYTHON_API_VERSION to check various other things that make dynamic modules > > break? WITH_THREAD is the one I just ran in to, but there's a few others such > > as the object refcounting statistics and platform-dependent things like the > > debug/nodebug compilation on Windows. > > I'm curious what combination didn't work? The thread APIs are > supposed to be designed so that all combinations work -- the APIs are > always present, they just don't do anything in the unthreaded > version. Oops, the problem was mine: not only was the extension module compiled without threading, but also with the previous version of the I/O library used on the mac. Silly me. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen@oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fdrake@acm.org Fri Apr 7 15:21:59 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 7 Apr 2000 10:21:59 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <14573.61191.486890.43591@seahag.cnri.reston.va.us> Fredrik Lundh writes: > 1) that lots of people are still using it, often for > performance reasons That's why I never converted Grail; the "re" layer around "pcre" was substantially more expensive to use, and the HTML parser was way too slow already. (Displaying the result was still the slowest part, but we were desparate for every little scrap!) > but alright, I think the consensus here is "(1) get rid > of it completely". in 1.6a2, perhaps? I seem to recall a determination to toss it for Py3K (or Python 2, as it was called at the time). Note that Grail breaks completely as soon as the module can't be imported. I'll propose a compromise: keep it in the set of modules that get built by default, but remove the documentation sections from the manual. This will more strongly encourage migration for actively maintained code. I would be surprised if Grail is the only large application which uses "regex" for performance reasons, and we don't really *want* to break everything. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal@lemburg.com Fri Apr 7 15:48:31 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 16:48:31 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> Message-ID: <38EDF53F.94071785@lemburg.com> Guido van Rossum wrote: > > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Makes sense. But note that it's going to be difficult to catch all > cases: you could have > > '...%d...%s...%s...' % (3, "abc", u"abc") > > and > > '...%(foo)s...' % {'foo': u'abc'} > > and even > > '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'} > > (the latter should *not* convert to Unicode). No problem... :-) Its a simple fix: once %s in an 8-bit string sees a Unicode object it will stop processing the string and restart using the unicode formatting algorithm. This will cost performance, of course. Optimization is easy though: add a small "u" in front of the string ;-) A sample session: >>> '...%(foo)s...' % {'foo':u"abc"} u'...abc...' >>> '...%(foo)s...' % {'foo':"abc"} '...abc...' >>> '...%(foo)s...' % {u'foo':"abc"} '...abc...' >>> '...%(foo)s...' % {u'foo':u"abc"} u'...abc...' >>> '...%(foo)s...' % {u'foo':u"abc",'def':123} u'...abc...' >>> '...%(foo)s...' % {u'foo':u"abc",u'def':123} u'...abc...' -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake@acm.org Fri Apr 7 15:53:43 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 7 Apr 2000 10:53:43 -0400 (EDT) Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: <38EDF53F.94071785@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> Message-ID: <14573.63095.48171.721921@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > No problem... :-) Its a simple fix: once %s in an 8-bit string > sees a Unicode object it will stop processing the string and > restart using the unicode formatting algorithm. > > This will cost performance, of course. Optimization is easy though: > add a small "u" in front of the string ;-) Seems reasonable to me! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov@inrialpes.fr Fri Apr 7 18:14:03 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 19:14:03 +0200 (CEST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000601bfa037$a2c18460$6c2d153f@tim> from "Tim Peters" at Apr 06, 2000 10:19:00 PM Message-ID: <200004071714.TAA27347@python.inrialpes.fr> Tim Peters wrote: > > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > 3.141599999999999948130380289512686431407928466796875 > > so the output you got is correctly rounded to 17 significant digits. IOW, > it's a feature. I'm very respectful when I see a number with so many digits in a row. :-) I'm not sure that this will be of any interest to you, number crunchers, but a research team in computer arithmetics here reported some major results lately: they claim that they "solved" the Table Maker's Dilemma for most common functions in IEEE-754 double precision arithmetic. (and no, don't ask me what this means ;-) For more information, see: http://www.ens-lyon.fr/~jmmuller/Intro-to-TMD.htm -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov@inrialpes.fr Fri Apr 7 19:03:15 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 20:03:15 +0200 (CEST) Subject: [Python-Dev] python -O weirdness In-Reply-To: <38EDD76F.986D3C39@tismer.com> from "Christian Tismer" at Apr 07, 2000 02:41:19 PM Message-ID: <200004071803.UAA27485@python.inrialpes.fr> Christian Tismer wrote: > > Guido van Rossum wrote: > ... > > The function name is not taken into account for the comparison. Maybe > > it should? > > Absolutely, please! Honestly, no. -O is used for speed, so showing the wrong symbols is okay. It's the same in C. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer@tismer.com Fri Apr 7 19:37:54 2000 From: tismer@tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 20:37:54 +0200 Subject: [Python-Dev] python -O weirdness References: <200004071803.UAA27485@python.inrialpes.fr> Message-ID: <38EE2B02.1E6F3CB8@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Guido van Rossum wrote: > > ... > > > The function name is not taken into account for the comparison. Maybe > > > it should? > > > > Absolutely, please! > > Honestly, no. -O is used for speed, so showing the wrong symbols is > okay. It's the same in C. Not ok, IMHO. If the name is not guaranteed to be valid, why should it be there at all? If I write code that relies on inspecting those things, then I'm hosed. I'm the last one who argues against optimization. But I'd use either no name at all, or a tuple with all folded names. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Vladimir.Marangozov@inrialpes.fr Fri Apr 7 19:40:03 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 20:40:03 +0200 (CEST) Subject: [Python-Dev] the regression test suite Message-ID: <200004071840.UAA27606@python.inrialpes.fr> My kitchen programs show that regrtest.py keeps requesting more and more memory until it finishes all tests. IOW, it doesn't finalize properly each test. It keeps importing modules, without deleting them after each test. I think that before a particular test is run, we need to save the value of sys.modules, then restore it after the test (before running the next one). In a module enabled interpreter, this reduces the memory consumption almost by half... Patch? Think about the number of new tests that will be added in the future. I don't want to tolerate a silently approaching useless disk swapping :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From ping@lfw.org Fri Apr 7 19:47:45 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 7 Apr 2000 13:47:45 -0500 (CDT) Subject: [Python-Dev] Round Bug in Python 1.6? Message-ID: Tim Peters wrote: > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > 3.141599999999999948130380289512686431407928466796875 Let's call this number 'A' for the sake of discussion. > so the output you got is correctly rounded to 17 significant digits. IOW, > it's a feature. Clearly there is something very wrong here: Python 1.5.2+ (#2, Mar 28 2000, 18:27:50) Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 3.1416 3.1415999999999999 >>> Now you say that 17 significant digits are required to ensure that eval(repr(x)) == x, but we surely know that 17 digits are *not* required when x is A because i *just typed in* 3.1416 and the best choice of double value was A. I haven't gone and figured it out, but i'll take your word for it that 17 digits may be required in *certain* cases to ensure that eval(repr(x)) == x. They're just not required in all cases. It's very jarring to type something in, and have the interpreter give you back something that looks very different. It breaks a fundamental rule of consistency, and that damages the user's trust in the system or their understanding of the system. (What do you do then, start explaining the IEEE double representation to your CP4E beginner?) What should really happen is that floats intelligently print in the shortest and simplest manner possible, i.e. the fewest number of digits such that the decimal representation will convert back to the actual value. Now you may say this is a pain to implement, but i'm talking about sanity for the user here. I haven't investigated how to do this best yet. I'll go off now and see if i can come up with an algorithm that's not quite so stupid as def smartrepr(x): p = 17 while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 return '%%.%df' % p % x -- ?!ng From tismer@tismer.com Fri Apr 7 19:51:09 2000 From: tismer@tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 20:51:09 +0200 Subject: [Python-Dev] Long Multiplication is not commutative. References: <000701bfa037$a4545960$6c2d153f@tim> Message-ID: <38EE2E1D.6708B43D@tismer.com> Tim Peters wrote: > > > Yes, go for it. I would appreciate a bunch of new test cases that > > exercise the new path through the code, too... > > FYI, a suitable test would be to add a line to function test_division_2 in > test_long.py, to verify that x*y == y*x. A variety of bitlengths for x and > y are already generated by the framework. Thanks - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Moshe Zadka Fri Apr 7 19:45:41 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 7 Apr 2000 20:45:41 +0200 (IST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004062130.RAA26273@eric.cnri.reston.va.us> Message-ID: On Thu, 6 Apr 2000, Guido van Rossum wrote: > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... Just checking my newly bought "Guido Channeling" kit -- you mean str() but special case the snot out of strings(TM), don't you Trademark probably belong to Tim Peters. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From guido@python.org Fri Apr 7 20:18:40 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:18:40 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Fri, 07 Apr 2000 20:45:41 +0200." References: Message-ID: <200004071918.PAA27474@eric.cnri.reston.va.us> > Just checking my newly bought "Guido Channeling" kit -- you mean str() > but special case the snot out of strings(TM), don't you Except I'm not sure what kind of special-casing should be happening. Put quotes around it without worrying if that makes it a valid string literal is one thought that comes to mind. Another approach might be what Tk's text widget does -- pass through certain control characters (LF, TAB) and all (even non-ASCII) printing characters, but display other control characters as \x.. escapes rather than risk putting the terminal in a weird mode. No quotes though. Hm, I kind of like this: when used as intended, it will just display the text, with newlines and umlauts etc.; but when printing binary gibberish, it will do something friendly. There's also the issue of what to do with lists (or tuples, or dicts) containing strings. If we agree on this: >>> "hello\nworld\n\347" # octal 347 is a cedilla hello world ç >>> Then what should ("hello\nworld", "\347") show? I've got enough serious complaints that I don't want to propose that it use repr(): >>> ("hello\nworld", "\347") ('hello\nworld', '\347') >>> Other possibilities: >>> ("hello\nworld", "\347") ('hello world', 'ç') >>> or maybe >>> ("hello\nworld", "\347") ('''hello world''', 'ç') >>> Of course there's also the Unicode issue -- the above all assumes Latin-1 for stdout. Still no closure, I think... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 7 20:35:32 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:35:32 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Fri, 07 Apr 2000 13:47:45 CDT." References: Message-ID: <200004071935.PAA27541@eric.cnri.reston.va.us> > Tim Peters wrote: > > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > > > 3.141599999999999948130380289512686431407928466796875 > > Let's call this number 'A' for the sake of discussion. > > > so the output you got is correctly rounded to 17 significant digits. IOW, > > it's a feature. > > Clearly there is something very wrong here: > > Python 1.5.2+ (#2, Mar 28 2000, 18:27:50) > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> 3.1416 > 3.1415999999999999 > >>> > > Now you say that 17 significant digits are required to ensure > that eval(repr(x)) == x, but we surely know that 17 digits are > *not* required when x is A because i *just typed in* 3.1416 and > the best choice of double value was A. Ping has a point! > I haven't gone and figured it out, but i'll take your word for > it that 17 digits may be required in *certain* cases to ensure > that eval(repr(x)) == x. They're just not required in all cases. > > It's very jarring to type something in, and have the interpreter > give you back something that looks very different. It breaks a > fundamental rule of consistency, and that damages the user's > trust in the system or their understanding of the system. (What > do you do then, start explaining the IEEE double representation > to your CP4E beginner?) > > What should really happen is that floats intelligently print in > the shortest and simplest manner possible, i.e. the fewest > number of digits such that the decimal representation will > convert back to the actual value. Now you may say this is a > pain to implement, but i'm talking about sanity for the user here. > > I haven't investigated how to do this best yet. I'll go off > now and see if i can come up with an algorithm that's not > quite so stupid as > > def smartrepr(x): > p = 17 > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > return '%%.%df' % p % x Have a look at what Java does; it seems to be doing this right: & jpython JPython 1.1 on java1.2 (JIT: sunwjit) Copyright (C) 1997-1999 Corporation for National Research Initiatives >>> import java.lang >>> x = java.lang.Float(3.1416) >>> x.toString() '3.1416' >>> ^D & Could it be as simple as converting x +/- one bit and seeing how many differing digits there were? (Not that +/- one bit is easy to calculate...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 7 20:37:26 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:37:26 -0400 Subject: [Python-Dev] the regression test suite In-Reply-To: Your message of "Fri, 07 Apr 2000 20:40:03 +0200." <200004071840.UAA27606@python.inrialpes.fr> References: <200004071840.UAA27606@python.inrialpes.fr> Message-ID: <200004071937.PAA27552@eric.cnri.reston.va.us> > My kitchen programs show that regrtest.py keeps requesting more and > more memory until it finishes all tests. IOW, it doesn't finalize > properly each test. It keeps importing modules, without deleting them > after each test. I think that before a particular test is run, we need to > save the value of sys.modules, then restore it after the test (before > running the next one). In a module enabled interpreter, this reduces > the memory consumption almost by half... > > Patch? > > Think about the number of new tests that will be added in the future. > I don't want to tolerate a silently approaching useless disk swapping :-) I'm not particularly concerned, but it does make some sense. (And is faster than starting a fresh interpreter for each test.) So why don't you give it a try! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 7 20:49:52 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:49:52 -0400 Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: Your message of "Fri, 07 Apr 2000 16:48:31 +0200." <38EDF53F.94071785@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> Message-ID: <200004071949.PAA27635@eric.cnri.reston.va.us> > No problem... :-) Its a simple fix: once %s in an 8-bit string > sees a Unicode object it will stop processing the string and > restart using the unicode formatting algorithm. But the earlier items might already have incurred side effects (e.g. when rendering user code)... Unless you save all the strings you got for reuse, which seems a pain as well. --Guido van Rossum (home page: http://www.python.org/~guido/) From ping@lfw.org Fri Apr 7 21:00:09 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Fri, 7 Apr 2000 15:00:09 -0500 (CDT) Subject: [Python-Dev] str() for interpreter output Message-ID: Guido van Rossum wrote: > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... You do NOT want this. I'm against this change -- quite strongly, in fact. Greg Ward wrote: > Oh, joy! oh happiness! someday soon, I may be able to type > "blah.__doc__" at the interactive prompt and get a readable result! Have repr() use triple-quotes when strings contain newlines if you like, but do *not* hide the fact that the thing being displayed is a string. Imagine the confusion this would cause! (in a hypothetical Python-with-str()...) >>> a = 1 + 1 >>> b = '2' >>> c = [1, 2, 3] >>> d = '[1, 2, 3]' ...much later... >>> a 2 >>> b 2 >>> a + 5 7 >>> b + 5 Traceback (innermost last): File "", line 1, in ? TypeError: illegal argument type for built-in operation Huh?!? >>> c [1, 2, 3] >>> d [1, 2, 3] >>> c.append(4) >>> c [1, 2, 3, 4] >>> d.append(4) Traceback (innermost last): File "", line 1, in ? AttributeError: attribute-less object Huh?!?! >>> c[1] 2 >>> d[1] 1 What?! This is guaranteed to confuse! Things that look the same should be the same. Things that are different should look different. Getting the representation of objects from the interpreter provides a very important visual cue: you can usually tell just by looking at the first character what kind of animal you've got. A digit means it's a number; a quote means a string; "[" means a list; "(" means a tuple; "{" means a dictionary; "<" means an instance or a special kind of object. Switching to str() instead of repr() completely breaks this property so you have no idea what you are getting. Intuitions go out the window. Granted, repr() cannot always produce an exact reconstruction of an object. repr() is not a serialization mechanism! We have 'pickle' for that. But the nice thing about repr() is that, in general, you can *tell* whether the representation is accurate enough to re-type: once you see a "<...>" sort of thing, you know that there is extra magic that you can't type in. "<...>" was an excellent choice because it is very clearly syntactically illegal. As a corollary, here is an important property of repr() that i think ought to be documented and preserved: eval(repr(x)) should produce an object with the same value and state as x, or it should cause a SyntaxError. We should avoid ever having it *succeed* and produce the *wrong* x. * * * As Tim suggested, i did go back and read the comp.lang.python thread on "__str__ vs. __repr__". Honestly i'm really surprised that such a convoluted hack as the suggestion to "special-case the snot out of strings" would come from Tim, and more surprised that it actually got so much airtime. Doing this special-case mumbo-jumbo would be even worse! Look: (in a hypothetical Python-with-snotless-str()...) >>> a = '\\' >>> b = '\'' ...much later... >>> a '\' >>> '\' File "", line 1 '\' ^ SyntaxError: invalid token (at this point i am envisioning the user screaming, "But that's what YOU said!") >>> b ''' >>> ''' ... Wha...?!! Or, alternatively, if even more effort had been expended removing snot: >>> b "'" >>> "'" "'" >>> print b ' Okay... then: >>> c = '"\'" >>> c '"'' >>> '"'' File "", line 1 '"'' ^ SyntaxError: invalid token Oh, it should print as '"\'', you say? Well then what of: >>> c '"\'' >>> d = '"\\\'' '"\\'' >>> '"\\'' File "", line 1 '"\\'' ^ SyntaxError: invalid token Damned if you do, damned if you don't. Tim's snot-removal algorithm forces the user to *infer* the rules of snot removal, remember them, and tentatively apply them to everything they see (since they still can't be sure whether snot has been removed from what they are seeing). How are the user and the interpreter ever to get along if they can't talk to each other in the same language? * * * As for the suggestion to add an interpreter hook to __builtins__ such that you can supply your own display routine, i'm all for it. Great idea there. * * * I think Donn Cave put it best: there are THREE different kinds of convert-to-string, and we'll only confuse the issue if we try to ignore the distinctions. (a) accurate serialization (b) coerce to string (c) friendly display (a) is taken care of by 'pickle'. (b) is str(). Clearly, coercing a string to a string should not change anything -- thus str(x) is just x if x is already a string. (c) is repr(). repr() is for the human, not for the machine. (a) is for the machine. repr() is: "Please show me as much information as you reasonably can about this object in an accurate and unambiguous way, but if you can't readably show me everything, make it obvious that you're not." repr() must be unambiguous, because the interpreter must help people learn by example. -- ?!ng From gmcm@hypernet.com Fri Apr 7 21:12:29 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 7 Apr 2000 16:12:29 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <1256984142-21537727@hypernet.com> Ka-Ping Yee wrote: > repr() must be unambiguous, because the interpreter must help people > learn by example. Speaking of which: >>> class A: ... def m(self): ... pass ... >>> a = A() >>> a.m >>> m = a.m >>> m >>> m is a.m 0 >>> ambiguated-ly y'rs - Gordon From guido@python.org Fri Apr 7 21:14:53 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 16:14:53 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Your message of "Fri, 07 Apr 2000 15:00:09 CDT." References: Message-ID: <200004072014.QAA27700@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > However, it may be time to switch so that "immediate expression" > > values are printed as str() instead of as repr()... [Ping] > You do NOT want this. > > I'm against this change -- quite strongly, in fact. Thanks for reminding me of what my original motivation was for using repr(). I am also still annoyed at some extension writers who violate the rule, and design a repr() that is nice to look at but lies about the type. Note that xrange() commits this sin! (I didn't write xrange() and never liked it. ;-) We still have a dilemma though... People using the interactive interpreter to perform some specific task (e.g. NumPy users), rather than to learn about Python, want str(), and actually I agree with them there. How can we give everybody wht they want? > As for the suggestion to add an interpreter hook to __builtins__ > such that you can supply your own display routine, i'm all for it. > Great idea there. Maybe this is the solution... --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Apr 7 22:03:31 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 23:03:31 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> <200004071949.PAA27635@eric.cnri.reston.va.us> Message-ID: <38EE4D22.CC43C664@lemburg.com> Guido van Rossum wrote: > > > No problem... :-) Its a simple fix: once %s in an 8-bit string > > sees a Unicode object it will stop processing the string and > > restart using the unicode formatting algorithm. > > But the earlier items might already have incurred side effects > (e.g. when rendering user code)... Unless you save all the strings > you got for reuse, which seems a pain as well. Oh well... I don't think it's worth getting this 100% right. We'd need quite a lot of code to store the intermediate results and then have them reused during the Unicode %-formatting -- just to catch the few cases where str(obj) does have side-effects: the code would have to pass the partially rendered string pasted together with the remaining format string to the Unicode coercion mechanism and then fiddle the arguments right. Which side-effects are you thinking about here ? Perhaps it would be better to simply raise an exception in case '%s' meets Unicode. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Fri Apr 7 23:42:01 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 08 Apr 2000 00:42:01 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> <200004071949.PAA27635@eric.cnri.reston.va.us> <38EE4D22.CC43C664@lemburg.com> Message-ID: <38EE6439.80847A06@lemburg.com> "M.-A. Lemburg" wrote: > > Guido van Rossum wrote: > > > > > No problem... :-) Its a simple fix: once %s in an 8-bit string > > > sees a Unicode object it will stop processing the string and > > > restart using the unicode formatting algorithm. > > > > But the earlier items might already have incurred side effects > > (e.g. when rendering user code)... Unless you save all the strings > > you got for reuse, which seems a pain as well. > > Oh well... I don't think it's worth getting this 100% right. Never mind -- I have a patch ready now, that doesn't restart, but instead uses what has already been formatted and then continues in Unicode mode. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim_one@email.msn.com Sat Apr 8 02:41:48 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 7 Apr 2000 21:41:48 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000201bfa0fb$9af44b40$bc2d153f@tim> [Ka-Ping Yee] > ,,, > Now you say that 17 significant digits are required to ensure > that eval(repr(x)) == x, Yes. This was first proved in Jerome Coonen's doctoral dissertation, and is one of the few things IEEE-754 guarantees about fp I/O: that input(output(x)) == x for all finite double x provided that output() produces at least 17 significant decimal digits (and 17 is minimal). In particular, IEEE-754 does *not* guarantee that either I or O are properly rounded, which latter is needed for what *you* want to see here. The std doesn't require proper rounding in this case (despite that it requires it in all other cases) because no efficient method for doing properly rounded I/O was known at the time (and, alas, that's still true). > but we surely know that 17 digits are *not* required when x is A > because i *just typed in* 3.1416 and the best choice of double value > was A. Well, x = 1.0 provides a simpler case . > I haven't gone and figured it out, but i'll take your word for > it that 17 digits may be required in *certain* cases to ensure > that eval(repr(x)) == x. They're just not required in all cases. > > It's very jarring to type something in, and have the interpreter > give you back something that looks very different. It's in the very nature of binary floating-point that the numbers they type in are often not the numbers the system uses. > It breaks a fundamental rule of consistency, and that damages the user's > trust in the system or their understanding of the system. If they're surprised by this, they indeed don't understand the arithmetic at all! This is an argument for using a different form of arithmetic, not for lying about reality. > (What do you do then, start explaining the IEEE double representation > to your CP4E beginner?) As above. repr() shouldn't be used at the interactive prompt anyway (but note that I did not say str() should be). > What should really happen is that floats intelligently print in > the shortest and simplest manner possible, i.e. the fewest > number of digits such that the decimal representation will > convert back to the actual value. Now you may say this is a > pain to implement, but i'm talking about sanity for the user here. This can be done, but only if Python does all fp I/O conversions entirely on its own -- 754-conforming libc routines are inadequate for this purpose (and, indeed, I don't believe any libc other than Sun's does do proper rounding here). For background and code, track down "How To Print Floating-Point Numbers Accurately" by Steele & White, and its companion paper (s/Print/Read/) by Clinger. Steele & White were specifically concerned with printing the "shortest" fp representation possible such that proper input could later reconstruct the value exactly. Steele, White & Clinger give relatively simple code for this that relies on unbounded int arithmetic. Excruciatingly difficult and platform-#ifdef'ed "optimized" code for this was written & refined over several years by the numerical analyst David Gay, and is available from Netlib. > I haven't investigated how to do this best yet. I'll go off > now and see if i can come up with an algorithm that's not > quite so stupid as > > def smartrepr(x): > p = 17 > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > return '%%.%df' % p % x This merely exposes accidents in the libc on the specific platform you run it. That is, after print smartrepr(x) on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not yield the same number platform A started with. Both platforms have to do proper rounding to make this work; there's no way to do proper rounding by using libc; so Python has to do it itself; there's no efficient way to do it regardless; nevertheless, it's a noble goal, and at least a few languages in the Lisp family require it (most notably Scheme, from whence Steele, White & Clinger's interest in the subject). you're-in-over-your-head-before-the-water-touches-your-toes-ly y'rs - tim From billtut@microsoft.com Sat Apr 8 02:45:03 2000 From: billtut@microsoft.com (Bill Tutt) Date: Fri, 7 Apr 2000 18:45:03 -0700 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> > There has been a bug report about the treatment of Unicode > objects together with 8-bit format strings. The current > implementation converts the Unicode object to UTF-8 and then > inserts this value in place of the %s.... > > I'm inclined to change this to have '...%s...' % u'abc' > return u'...abc...' since this is just another case of > coercing data to the "bigger" type to avoid information loss. > > Thoughts ? Suddenly returning a Unicode string from an operation that was an 8-bit string is likely to give some code exterme fits of despondency. Converting to UTF-8 didn't give you any data loss, however it certainly might be unexpected to now find UTF-8 characters in what the user originally thought was a binary string containing whatever they had wanted it to contain. Throwing an exception would at the very least force the user to make a decision one way or the other about what they want to do with the data. They might want to do a codepage translation, or something else. (aka Hey, here's a bug I just found for you!) In what other cases are you suddenly returning a Unicode string object from which previouslly returned a string object? Bill From tim_one@email.msn.com Sat Apr 8 02:49:03 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 7 Apr 2000 21:49:03 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: <200004072014.QAA27700@eric.cnri.reston.va.us> Message-ID: <000301bfa0fc$9e452e80$bc2d153f@tim> [Guido] > Thanks for reminding me of what my original motivation was for using > repr(). I am also still annoyed at some extension writers who violate > the rule, and design a repr() that is nice to look at but lies about > the type. ... Back when this was a hot topic on c.l.py (there are no new topics <0.1 wink>), it was very clear that many did this to class __repr__ on purpose, precisely because they wanted to get back a readable string at the interactive prompt (where a *correct* repr may yield a megabyte of info -- see my extended examples from that thread with Rationals, and lists of Rationals, and dicts w/ Rationals etc). In fact, at least one Python old-timer argued strongly that the right thing to do was to swap the descriptions of str() and repr() in the docs! str()-should-also-"pass-str()-down"-ly y'rs - tim From fredrik@pythonware.com Sat Apr 8 06:47:13 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Sat, 8 Apr 2000 07:47:13 +0200 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> Message-ID: <002c01bfa11d$e4608ec0$0500a8c0@secret.pythonware.com> Bill Tutt wrote: > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s....=20 > >=20 > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > >=20 > > Thoughts ? >=20 > Suddenly returning a Unicode string from an operation that was an = 8-bit > string is likely to give some code exterme fits of despondency. why is this different from returning floating point values from operations involving integers and floats? > Converting to UTF-8 didn't give you any data loss, however it = certainly > might be unexpected to now find UTF-8 characters in what the user = originally > thought was a binary string containing whatever they had wanted it to = contain. the more I've played with this, the stronger my opinion that the "now it's an ordinary string, now it's a UTF-8 string, now it's an ordinary string again" approach doesn't work. more on this in a later post. (am I the only one here that has actually tried to write code that handles both unicode strings and ordinary strings? if not, can anyone tell me what I'm doing wrong?) > Throwing an exception would at the very least force the user to make a > decision one way or the other about what they want to do with the = data. > They might want to do a codepage translation, or something else. (aka = Hey, > here's a bug I just found for you!) > In what other cases are you suddenly returning a Unicode string object = from > which previouslly returned a string object? if unicode is ever to be a real string type in python, and not just a nifty extension type, it must be okay to return a unicode string from any operation that involves a unicode argument... From billtut@microsoft.com Sat Apr 8 07:24:06 2000 From: billtut@microsoft.com (Bill Tutt) Date: Fri, 7 Apr 2000 23:24:06 -0700 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF04@RED-MSG-50> > From: Fredrik Lundh [mailto:fredrik@pythonware.com] > > Bill Tutt wrote: > > > There has been a bug report about the treatment of Unicode > > > objects together with 8-bit format strings. The current > > > implementation converts the Unicode object to UTF-8 and then > > > inserts this value in place of the %s.... > > > > > > I'm inclined to change this to have '...%s...' % u'abc' > > > return u'...abc...' since this is just another case of > > > coercing data to the "bigger" type to avoid information loss. > > > > > > Thoughts ? > > > > Suddenly returning a Unicode string from an operation that > was an 8-bit > > string is likely to give some code exterme fits of despondency. > > why is this different from returning floating point values from > operations involving integers and floats? > > > Converting to UTF-8 didn't give you any data loss, however > it certainly > > might be unexpected to now find UTF-8 characters in what > the user originally > > thought was a binary string containing whatever they had > wanted it to contain. > > the more I've played with this, the stronger my opinion that > the "now it's an ordinary string, now it's a UTF-8 string, now > it's an ordinary string again" approach doesn't work. more on > this in a later post. > Well, unicode string/UTF-8 string, but I definately agree with you. Pick one or the other and make the user convert betwixt the two. > (am I the only one here that has actually tried to write code > that handles both unicode strings and ordinary strings? if not, > can anyone tell me what I'm doing wrong?) > In C++, yes. :) Autoconverting into or out of unicode is bound to lead to trouble for someone. Look at the various messes that misused C++ operator overloading can get you into. Whether its the code that wasn't expecting UTF-8 in a normal string type, or a formatting operation that used to return a normal string type now returning a Unicode string. > > Throwing an exception would at the very least force the > user to make a > > decision one way or the other about what they want to do > with the data. > > They might want to do a codepage translation, or something > else. (aka Hey, > > here's a bug I just found for you!) > > > In what other cases are you suddenly returning a Unicode > string object from > > which previouslly returned a string object? > > if unicode is ever to be a real string type in python, and not just a > nifty extension type, it must be okay to return a unicode string from > any operation that involves a unicode argument... Err. I'm not sure what you're getting at here. If your saying that it'd be nice if we could ditch the current string type and just use the Unicode string type, then I agree with you. However, that doesn't mean you should change the semantics of an operation that existed before unicode came into the picture, since it would break backward compatability. +1 for '%s' % u'\u1234' throwing a TypeError exception. Bill From tim_one@email.msn.com Sat Apr 8 08:23:16 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 8 Apr 2000 03:23:16 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071935.PAA27541@eric.cnri.reston.va.us> Message-ID: <000001bfa12b$4f5501e0$6b2d153f@tim> [Guido] > Have a look at what Java does; it seems to be doing this right: > > & jpython > JPython 1.1 on java1.2 (JIT: sunwjit) > Copyright (C) 1997-1999 Corporation for National Research Initiatives > >>> import java.lang > >>> x = java.lang.Float(3.1416) > >>> x.toString() > '3.1416' > >>> That Java does this is not an accident: Guy Steele pushed for the same rules he got into Scheme, although a) The Java rules are much tighter than Scheme's. and b) He didn't prevail on this point in Java until version 1.1 (before then Java's double/float->string never produced more precision than ANSI C's default %g format, so was inadequate to preserve equality under I/O). I suspect there was more than a bit of internal politics behind the delay, as the 754 camp has never liked the "minimal width" gimmick(*), and Sun's C and Fortran numerics (incl. their properly-rounding libc I/O routines) were strongly influenced by 754 committee members. > Could it be as simple as converting x +/- one bit and seeing how many > differing digits there were? (Not that +/- one bit is easy to > calculate...) Sorry, it's much harder than that. See the papers (and/or David Gay's code) I referenced before. (*) Why the minimal-width gimmick is disliked: If you print a (32-bit) IEEE float with minimal width, then read it back in as a (64-bit) IEEE double, you may not get the same result as if you had converted the original float to a double directly. This is because "minimal width" here is *relative to* the universe of 32-bit floats, and you don't always get the same minimal width if you compute it relative to the universe of 64-bit doubles instead. In other words, "minimal width" can lose accuracy needlessly -- but this can't happen if you print the float to full precision instead. From mal@lemburg.com Sat Apr 8 10:51:32 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 08 Apr 2000 11:51:32 +0200 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> Message-ID: <38EF0124.F5032CB2@lemburg.com> Bill Tutt wrote: > > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Suddenly returning a Unicode string from an operation that was an 8-bit > string is likely to give some code exterme fits of despondency. > > Converting to UTF-8 didn't give you any data loss, however it certainly > might be unexpected to now find UTF-8 characters in what the user originally > thought was > a binary string containing whatever they had wanted it to contain. Well, the design is to always coerce to Unicode when 8-bit string objects and Unicode objects meet. This is done for all string methods and that's the reason I'm also implementing this for %-formatting (internally this is just another string method). > Throwing an exception would at the very least force the user to make a > decision one way or the other about what they want to do with the data. > They might want to do a codepage translation, or something else. (aka Hey, > here's a bug I just found for you!) True; but Guido's intention was to have strings and Unicode interoperate without too much user intervention. > In what other cases are you suddenly returning a Unicode string object from > which previouslly returned a string object? All string methods automatically coerce to Unicode when they see a Unicode argument, e.g. " ".join(("abc", u"def")) will return u"abc def". -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov@inrialpes.fr Sat Apr 8 12:01:00 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sat, 8 Apr 2000 13:01:00 +0200 (CEST) Subject: [Python-Dev] python -O weirdness In-Reply-To: <38EE2B02.1E6F3CB8@tismer.com> from "Christian Tismer" at Apr 07, 2000 08:37:54 PM Message-ID: <200004081101.NAA28756@python.inrialpes.fr> > > > [GvR] > > > ... > > > > The function name is not taken into account for the comparison. Maybe > > > > it should? > > > > > > [CT] > > > Absolutely, please! > > > > [VM] > > Honestly, no. -O is used for speed, so showing the wrong symbols is > > okay. It's the same in C. > > [CT] > Not ok, IMHO. If the name is not guaranteed to be valid, why > should it be there at all? If I write code that relies on > inspecting those things, then I'm hosed. I think that you don't want to rely on inspecting the symbol<->code bindings of an optimized program. In general. Python is different in this regard, though, because of the standard introspection facilities. One expects that f.func_code.co_name == 'f' is always true, although it's not for -O. A perfect example of a name `conflict' due to object sharing. The const array optimization is well known. It folds object constants which have the same value. In this particular case, however, they don't have the same value, because of the hardcoded function name. So in the end, it turns out that Chris is right (although not for the same reason ;-) and it would be nice to fix code_compare. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one@email.msn.com Sun Apr 9 02:26:23 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 8 Apr 2000 21:26:23 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071714.TAA27347@python.inrialpes.fr> Message-ID: <000001bfa1c2$9e403a80$18a2143f@tim> [Vladimir Marangozov] > I'm not sure that this will be of any interest to you, number crunchers, > but a research team in computer arithmetics here reported some major > results lately: they claim that they "solved" the Table Maker's Dilemma > for most common functions in IEEE-754 double precision arithmetic. > (and no, don't ask me what this means ;-) Back in the old days, some people spent decades making tables of various function values. A common way was to laboriously compute high-precision values over a sparse grid, using e.g. series expansions, then extend that to a fine grid via relatively simple interpolation formulas between the high-precision results. You have to compute the sparse grid to *some* "extra" precision in order to absorb roundoff errors in the interpolated values. The "dilemma" is figuring out how *much* extra precision: too much and it greatly slows the calculations, too little and the interpolated values are inaccurate. The "problem cases" for a function f(x) are those x such that the exact value of f(x) is very close to being exactly halfway between representable numbers. In order to round correctly, you have to figure out which representable number f(x) is closest to. How much extra precision do you need to use to resolve this correctly in all cases? Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3. That's not enough to know whether it *should* round to 41 or 42. So you need to try again with more precision. But how much? You might try 5 digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next? Might be a waste of effort. Try 20 next? Might *still* not be enough -- or could just as well be that 7 would have been enough and you did 10x the work you needed to do. Etc. It turns out that for most functions there's no general way known to answer the "how much?" question in advance: brute force is the best method known. For various IEEE double precision functions, so far it's turned out that you need in the ballpark of 40-60 extra accurate bits (beyond the native 53) in order to round back correctly to 53 in all cases, but there's no *theory* supporting that. It *could* require millions of extra bits. For those wondering "why bother?", the practical answer is this: if a std could require correct rounding, functions would be wholly portable across machines ("correctly rounded" is precisely defined by purely mathematical means). That's where IEEE-754 made its huge break with tradition, by requiring correct rounding for + - * / and sqrt. The places it left fuzzy (like string<->float, and all transcendental functions) are the places your program produces different results when you port it. Irritating one: MS VC++ on Intel platforms generates different code for exp() depending on the optimization level. They often differ in the last bit they compute. This wholly accounts for why Dragon's speech recognition software sometimes produces subtly (but very visibly!) different results depending on how it was compiled. Before I got tossed into this pit, it was assumed for a year to be either a -O bug or somebody fetching uninitialized storage. that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim From tim_one@email.msn.com Sun Apr 9 05:39:09 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 00:39:09 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <000101bfa1dd$8c382f80$172d153f@tim> [Guido van Rossum] > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... [Ka-Ping Yee] > You do NOT want this. > > I'm against this change -- quite strongly, in fact. Relax, nobody wants that. The fact is that neither str() nor repr() is reasonable today for use at the interactive prompt. repr() *appears* adequate only so long as you stick to the builtin types, where the difference between repr() and str() is most often non-existent(!). But repr() has driven me (& not only me) mad for years at the interactive prompt in my own (and extension) types, since a *faithful* representation of a "large" object is exactly what you *don't* want to see scrolling by. You later say (echoing Donn Cave) > repr() is for the human, not for the machine but that contradicts the docs and the design. What you mean to say is "the thing that the interactive prompt uses by default *should* be for the human, not for the machine" -- which repr() is not. That's why repr() sucks here, despite that it's certainly "more for the human" than a pickle is. str() isn't suitable either, alas, despite that (by design and by the docs) it was *intended* to be, at least because str() on a container invokes repr() on the containees. Neither str() nor repr() can be used to get a human-friendly string form of nested objects today (unless, as is increasingly the *practice*, people misuse __repr__() to do what __str__() was *intended* to do -- c.f. Guido's complaint about that). > ... > Have repr() use triple-quotes when strings contain newlines > if you like, but do *not* hide the fact that the thing being > displayed is a string. Nobody wants to hide this (or, if someone does, set yourself up for merciless poking before it's too late). > ... > Getting the representation of objects from the interpreter provides > a very important visual cue: you can usually tell just by looking > at the first character what kind of animal you've got. A digit means > it's a number; a quote means a string; "[" means a list; "(" means a > tuple; "{" means a dictionary; "<" means an instance or a special > kind of object. Switching to str() instead of repr() completely > breaks this property so you have no idea what you are getting. > Intuitions go out the window. This is way oversold: str() also supplies "[" for lists, "(" for tuples, "{" for dicts, and "<" for instances of classes that don't override __str__. The only difference between repr() and str() in this listing of faux terror is when they're applied to strings. > Granted, repr() cannot always produce an exact reconstruction of an > object. repr() is not a serialization mechanism! To the contrary, many classes and types implement repr() for that very purpose. It's not universal but doesn't need to be. > We have 'pickle' for that. pickles are unreadable by humans; that's why repr() is often preferred. > ... > As a corollary, here is an important property of repr() that > i think ought to be documented and preserved: > > eval(repr(x)) should produce an object with the same value > and state as x, or it should cause a SyntaxError. > > We should avoid ever having it *succeed* and produce the *wrong* x. Fine by me. > ... > Honestly i'm really surprised that such a convoluted hack as the > suggestion to "special-case the snot out of strings" would come > from Tim, and more surprised that it actually got so much airtime. That thread tapped into real and widespread unhappiness with what's displayed at an interactive prompt today. That's why it got so much airtime -- no mystery there. As above, your objections to str() reduce to its behavior for strings specifically (I have more objections than just that -- str() should "get passed down" too), hence "str() special-casing the snot out of strings" was a direct hack to address that specific complaint. > Doing this special-case mumbo-jumbo would be even worse! Look: > > (in a hypothetical Python-with-snotless-str()...) > > >>> a = '\\' > >>> b = '\'' I'd actually like to use euroquotes for str(string) -- don't throw the Latin-1 away with your outrage . Whatever, examples with backslashes are non-starters, since newbies can't make any sense out of their doubling under repr() today either (if it's not a FAQ, it should be -- I've certainly had to explain it often enough!). > ...much later... > > >>> a > '\' > >>> '\' > File "", line 1 > '\' > ^ > SyntaxError: invalid token > > (at this point i am envisioning the user screaming, "But that's > what YOU said!") Nobody ever promised that eval(str(x)) == x -- if they want that, they should use repr() or backticks. Today they get >>> a '\\' and scream "Huh?! I thought that was only supposed to be ONE backslash!". Or someone in Europe tries to look at a list of strings, or a simple dict keyed by names, and gets back a god-awful mish-mash of octal backslash escapes (and str() can't be used today to stop that either, since str() "isn't passed down"). Compared to that, confusion over explicit backslashes strikes me as trivial. > [various examples of ambiguous output] That's why it's called a hack . Last time I corresponded with Guido about it, he was leaning toward using angle brackets (<>) instead. That would take away the temptation to believe you should be able to type the same thing back in and have it do something reasonable. > Tim's snot-removal algorithm forces the user to *infer* the rules > of snot removal, remember them, and tentatively apply them to > everything they see (since they still can't be sure whether snot > has been removed from what they are seeing). Not at all. "Tim's snot-removal algorithm" didn't remove anything ("removal" is an adjective I don't believe I've seen applied to it before). At the time it simply did str() and stuck a pair of quotes around the result. The (passed down) str() was the important part; how it's decorated to say "and, btw, it's a string" is the teensy tail of a flea that's killing the whole dog <0.9 wink>. If we had Latin-1, we could use euroquotes for this. If we had control over the display, we could use a different color or font. If we stick to 7-bit ASCII, we have to do *something* irritating. So here's a different idea for SSCTSOOS: escape quote chars and backslashes (like repr()) as needed, but leave everything else alone (like str()). Then you can have fun stringing N adjacent backslashes together , and other people can use non-ASCII characters without going mad. What I want *most*, though, is for ssctsoos() to get passed down (from container to containee), and for it to be the default action. > ... > As for the suggestion to add an interpreter hook to __builtins__ > such that you can supply your own display routine, i'm all for it. > Great idea there. Same here! But I reject going on from there to say "and since Python lets you do it yourself, Python isn't obligated to try harder itself". anything-to-keep-octal-escapes-out-of-a-unicode-world-ly y'rs - tim From tim_one@email.msn.com Sun Apr 9 05:39:17 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 00:39:17 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: <200004072014.QAA27700@eric.cnri.reston.va.us> Message-ID: <000201bfa1dd$90581800$172d153f@tim> [Guido] > ... > We still have a dilemma though... People using the interactive > interpreter to perform some specific task (e.g. NumPy users), rather > than to learn about Python, want str(), and actually I agree with them > there. And if they're using something fancier than NumPy arrays, they want str() to get passed down from containers to containees too. BTW, boosting the number of digits repr displays is likely to make NumPy users even unhappier so long as repr() is used at the prompt (they'll be very happy to be able to transport doubles exactly across machines via repr(), but won't want to see all the noise digits all the time). > How can we give everybody what they want? More than one display function, user-definable and user-settable, + a change in the default setting. From gstein@lyra.org Sun Apr 9 10:28:18 2000 From: gstein@lyra.org (Greg Stein) Date: Sun, 9 Apr 2000 02:28:18 -0700 (PDT) Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: <200004071313.JAA27132@eric.cnri.reston.va.us> Message-ID: On Fri, 7 Apr 2000, Guido van Rossum wrote: > > Something that just struck me: couldn't we use a couple of bits in the > > PYTHON_API_VERSION to check various other things that make dynamic modules > > break? WITH_THREAD is the one I just ran in to, but there's a few others such > > as the object refcounting statistics and platform-dependent things like the > > debug/nodebug compilation on Windows. > > I'm curious what combination didn't work? The thread APIs are > supposed to be designed so that all combinations work -- the APIs are > always present, they just don't do anything in the unthreaded > version. If an extension is compiled without threads, well, then it > won't release the interpreter lock, of course, but otherwise there > should be no bad effects. But if you enable "free threading" or "trace refcounts", then the combinations will not work. This is because these two options modify very basic things like Py_INCREF/DECREF. To help prevent mismatches, they do some monkey work with redefining a Python symbol (the InitModule thingy). Jack's idea of using PYTHON_API_VERSION is a cleaner approach to preventing imcompatibilities. > The debug issue on Windows is taken care of by a DLL naming > convention: the debug versions are named spam_d.dll (or .pyd). It would be nice to have it at the code level, too. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping@lfw.org Sun Apr 9 11:46:41 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 03:46:41 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000201bfa0fb$9af44b40$bc2d153f@tim> Message-ID: In a previous message, i wrote: > > It's very jarring to type something in, and have the interpreter > > give you back something that looks very different. [...] > > It breaks a fundamental rule of consistency, and that damages the user's > > trust in the system or their understanding of the system. Then on Fri, 7 Apr 2000, Tim Peters replied: > If they're surprised by this, they indeed don't understand the arithmetic at > all! This is an argument for using a different form of arithmetic, not for > lying about reality. This is not lying! If you type in "3.1416" and Python says "3.1416", then indeed it is the case that "3.1416" is a correct way to type in the floating-point number being expressed. So "3.1415999999999999" is not any more truthful than "3.1416" -- it's just more annoying. I just tried this in Python 1.5.2+: >>> .1 0.10000000000000001 >>> .2 0.20000000000000001 >>> .3 0.29999999999999999 >>> .4 0.40000000000000002 >>> .5 0.5 >>> .6 0.59999999999999998 >>> .7 0.69999999999999996 >>> .8 0.80000000000000004 >>> .9 0.90000000000000002 Ouch. I wrote: > > (What do you do then, start explaining the IEEE double representation > > to your CP4E beginner?) Tim replied: > As above. repr() shouldn't be used at the interactive prompt anyway (but > note that I did not say str() should be). What, then? Introduce a third conversion routine and further complicate the issue? I don't see why it's necessary. I wrote: > > What should really happen is that floats intelligently print in > > the shortest and simplest manner possible Tim replied: > This can be done, but only if Python does all fp I/O conversions entirely on > its own -- 754-conforming libc routines are inadequate for this purpose Not "all fp I/O conversions", right? Only repr(float) needs to be implemented for this particular purpose. Other conversions like "%f" and "%g" can be left to libc, as they are now. I suppose for convenience's sake it may be nice to add another format spec so that one can ask for this behaviour from the "%" operator as well, but that's a separate issue (perhaps "%r" to insert the repr() of an argument of any type?). > For background and code, track down "How To Print Floating-Point Numbers > Accurately" by Steele & White, and its companion paper (s/Print/Read/) Thanks! I found 'em. Will read... I suggested: > > def smartrepr(x): > > p = 17 > > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > > return '%%.%df' % p % x Tim replied: > This merely exposes accidents in the libc on the specific platform you run > it. That is, after > > print smartrepr(x) > > on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not > yield the same number platform A started with. That is not repr()'s job. Once again: repr() is not for the machine. It is not part of repr()'s contract to ensure the kind of platform-independent conversion you're talking about. It prints out the number in a way that upholds the eval(repr(x)) == x contract for the system you are currently interacting with, and that's good enough. If you wanted platform-independent serialization, you would use something else. As long as the language reference says "These represent machine-level double precision floating point numbers. You are at the mercy of the underlying machine architecture and C implementation for the accepted range and handling of overflow." and until Python specifies the exact sizes and behaviours of its floating-point numbers, you can't expect these kinds of cross-platform guarantees anyway. Here are the expectations i've come to have: str()'s contract: - if x is a string, str(x) == x - otherwise, str(x) is a reasonable string coercion from x repr()'s contract: - if repr(x) is syntactically valid, eval(repr(x)) == x - repr(x) displays x in a safe and readable way - for objects composed of basic types, repr(x) reflects what the user would have to say to produce x pickle's contract: - pickle.dumps(x) is a platform-independent serialization of the value and state of object x -- ?!ng From ping@lfw.org Sun Apr 9 11:33:00 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 03:33:00 -0700 (PDT) Subject: [Python-Dev] str() for interpreter output In-Reply-To: <000101bfa1dd$8c382f80$172d153f@tim> Message-ID: On Sun, 9 Apr 2000, Tim Peters wrote: > You later say (echoing Donn Cave) > > > repr() is for the human, not for the machine > > but that contradicts the docs and the design. What you mean to say > is "the thing that the interactive prompt uses by default *should* be for > the human, not for the machine" -- which repr() is not. No, what i said is what i said. Let's try this again: repr() is not for the machine. The documentation for __repr__ says: __repr__(self) Called by the repr() built-in function and by string conversions (reverse quotes) to compute the "official" string representation of an object. This should normally look like a valid Python expression that can be used to recreate an object with the same value. It only suggests that the output "normally look like a valid Python expression". It doesn't require it, and certainly doesn't imply that __repr__ should be the standard way to turn an object into a platform-independent serialization. > This is way oversold: str() also supplies "[" for lists, "(" for tuples, > "{" for dicts, and "<" for instances of classes that don't override __str__. > The only difference between repr() and str() in this listing of faux terror > is when they're applied to strings. Right, and that is exactly the one thing that breaks everything: because strings are the most dangerous things to display raw, they can appear like anything, and break all the rules in one fell swoop. > > Granted, repr() cannot always produce an exact reconstruction of an > > object. repr() is not a serialization mechanism! > > To the contrary, many classes and types implement repr() for that very > purpose. It's not universal but doesn't need to be. If they want to, that's fine. In general, however, repr() is not for the machine. If you are using repr(), it's because you are expecting a human to look at the thing at some point. > > We have 'pickle' for that. > > pickles are unreadable by humans; that's why repr() is often preferred. Precisely. You just said it yourself: repr() is for humans. That is why repr() cannot be mandated as a serialization mechanism. There are two goals at odds here: readability and serialization. You can't have both, so you must prioritize. Pickles are more about serialization than about readability; repr is more about readability than about serialization. repr() is the interpreter's way of communicating with the human. It makes sense that e.g. the repr() of a string that you see printed by the interpreter looks just like what you would type in to produce the same string, because the interpreter and the human should speak and understand the same language as much as possible. > > >>> a = '\\' > > >>> b = '\'' > > I'd actually like to use euroquotes for str(string) -- don't throw the > Latin-1 away with your outrage . And no, even if you argue that we need to have something else, whatever you want to call it, it's not called 'str'. 'str' is "coerce to string". If you coerce an object into the type it's already in, it must not change. So, if x is a string, then str(x) must == x. > Whatever, examples with backslashes > are non-starters, since newbies can't make any sense out of their doubling > under repr() today either (if it's not a FAQ, it should be -- I've certainly > had to explain it often enough!). It may not be easy, but at least it's *consistent*. Eventually, you can't avoid the problem of escaping characters, and you just have to learn how that works, and that's that. Introducing yet a different way of escaping things won't help. Or, to put it another way: to write Python, it is required that you understand how to read and write escaped strings. Either you learn just that, or you learn that plus another, different way to read escaped-strings-as-printed-by-the-interpreter. The second case clearly requires you to learn and remember more. > Nobody ever promised that eval(str(x)) == x -- if they want that, they > should use repr() or backticks. Today they get > > >>> a > '\\' > > and scream "Huh?! I thought that was only supposed to be ONE backslash!". You have to understand this at some point. You can't get around it. Changing the way the interpreter prints things won't save anyone the trouble of learning it. > Or someone in Europe tries to look at a list of strings, or a simple dict > keyed by names, and gets back a god-awful mish-mash of octal backslash > escapes (and str() can't be used today to stop that either, since str() > "isn't passed down"). This is a pretty sensible complaint to me. I don't use characters beyond 0x7f often, but i can empathize with the hassle. As you suggested, this could be solved by having the built-in container types do something nicer with str(), such as repr without escaping characters beyond 0x7f. (However, characters below 0x20 are definitely dangerous to the terminal, and would have to be escaped regardless.) > Not at all. "Tim's snot-removal algorithm" didn't remove anything > ("removal" is an adjective I don't believe I've seen applied to it before). Well, if you "special-case the snot OUT of strings", then you're removing snot, aren't you? :) > What I want *most*, though, is for ssctsoos() to get passed down (from > container to containee), and for it to be the default action. Getting it passed down as str() seems okay to me. Making it the default action, in my (naturally) subjective opinion, is Right Out if it means that eval(what_the_interpreter_prints_for(x)) == x no longer holds for objects composed of the basic built-in types. -- ?!ng From tismer@tismer.com Sun Apr 9 14:07:53 2000 From: tismer@tismer.com (Christian Tismer) Date: Sun, 09 Apr 2000 15:07:53 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: Message-ID: <38F080A9.16DE05B8@tismer.com> Ok, just a word (carefully:) Ka-Ping Yee wrote: ... > I just tried this in Python 1.5.2+: > > >>> .1 > 0.10000000000000001 > >>> .2 > 0.20000000000000001 > >>> .3 > 0.29999999999999999 Agreed that this is not good. ... > repr()'s contract: > - if repr(x) is syntactically valid, eval(repr(x)) == x > - repr(x) displays x in a safe and readable way > - for objects composed of basic types, repr(x) reflects > what the user would have to say to produce x This sounds reasonable. BTW my problem did not come up by typing something in, but I just rounded a number down to 3 digits past the dot. Then, as usual, I just let the result drop from the prompt, without prefixing it with "print". repr() was used, and the result was astonishing. Here is the problem, as I see it: You say if you type 3.1416, you want to get exactly this back. But how should Python know that you typed it in? Same in my case: I just rounded to 3 digits, but how should Python know about this? And what do you expect when you type in 3.14160, do you want the trailing zero preserved or not? Maybe we would need to carry exactness around for numbers. Or even have a different float type for cases where we want exact numbers? Keyboard entry and rounding produce exact numbers. Simple operations between exact numbers would keep exactness, higher level functions would probably not. I think we dlved into a very difficult domain here. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From ping@lfw.org Sun Apr 9 18:24:07 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 10:24:07 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F080A9.16DE05B8@tismer.com> Message-ID: On Sun, 9 Apr 2000, Christian Tismer wrote: > Here is the problem, as I see it: > You say if you type 3.1416, you want to get exactly this back. > But how should Python know that you typed it in? > Same in my case: I just rounded to 3 digits, but how > should Python know about this? > > And what do you expect when you type in 3.14160, do you want > the trailing zero preserved or not? It's okay for the zero to go away, because it doesn't affect the value of the number. (Carrying around a significant-digit count or error range with numbers is another issue entirely, and a very thorny one at that.) I think "fewest digits needed to distinguish the correct value" will give good and least-surprising results here. This method guarantees: - If you just type a number in and the interpreter prints it back, it will never respond with more junk digits than you typed. - If you type in what the interpreter displays for a float, you can be assured of getting the same value. > Maybe we would need to carry exactness around for numbers. > Or even have a different float type for cases where we want > exact numbers? Keyboard entry and rounding produce exact numbers. If you mean a decimal representation, yes, perhaps we need to explore that possibility a little more. -- ?!ng "All models are wrong; some models are useful." -- George Box From tismer@tismer.com Sun Apr 9 19:53:51 2000 From: tismer@tismer.com (Christian Tismer) Date: Sun, 09 Apr 2000 20:53:51 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: Message-ID: <38F0D1BF.E5ECA4E5@tismer.com> Ka-Ping Yee wrote: > > On Sun, 9 Apr 2000, Christian Tismer wrote: > > Here is the problem, as I see it: > > You say if you type 3.1416, you want to get exactly this back. > > But how should Python know that you typed it in? > > Same in my case: I just rounded to 3 digits, but how > > should Python know about this? > > > > And what do you expect when you type in 3.14160, do you want > > the trailing zero preserved or not? > > It's okay for the zero to go away, because it doesn't affect > the value of the number. (Carrying around a significant-digit > count or error range with numbers is another issue entirely, > and a very thorny one at that.) > > I think "fewest digits needed to distinguish the correct value" > will give good and least-surprising results here. This method > guarantees: Hmm, I hope I understood. Oh, wait a minute! What is the method? What is the correct value? If I type >>> 0.1 0.10000000000000001 >>> 0.10000000000000001 0.10000000000000001 >>> There is only one value: The one which is in the machine. Would you think it is ok to get 0.1 back, when you actually *typed* 0.10000000000000001 ? -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim_one@email.msn.com Sun Apr 9 20:42:11 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 15:42:11 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F080A9.16DE05B8@tismer.com> Message-ID: <000101bfa25b$b39567e0$812d153f@tim> [Christian Tismer] > ... > Here is the problem, as I see it: > You say if you type 3.1416, you want to get exactly this back. > > But how should Python know that you typed it in? > Same in my case: I just rounded to 3 digits, but how > should Python know about this? > > And what do you expect when you type in 3.14160, do you want > the trailing zero preserved or not? > > Maybe we would need to carry exactness around for numbers. > Or even have a different float type for cases where we want > exact numbers? Keyboard entry and rounding produce exact numbers. > Simple operations between exact numbers would keep exactness, > higher level functions would probably not. > > I think we dlved into a very difficult domain here. "This kind of thing" is hopeless so long as Python uses binary floating point. Ping latched on to "shortest" conversion because it appeared to solve "the problem" in a specific case. But it doesn't really solve anything -- it just shuffles the surprises around. For example, >>> 3.1416 - 3.141 0.00059999999999993392 >>> Do "shorest conversion" (relative to the universe of IEEE doubles) instead, and it would print 0.0005999999999999339 Neither bears much syntactic resemblance to the 0.0006 the numerically naive "expect". Do anything less than the 16 significant digits shortest conversion happens to produce in this case, and eval'ing the string won't return the number you started with. So "0.0005999999999999339" is the "best possible" string repr can produce (assuming you think "best" == "shortest faithful, relative to the platform's universe of possibilities", which is itself highly debatable). If you don't want to see that at the interactive prompt, one of two things has to change: A) Give up on eval(repr(x)) == x for float x, even on a single machine. or B) Stop using repr by default. There is *no* advantage to #A over the long haul: lying always extracts a price, and unlike most of you , I appeared to be the lucky email recipient of the passionate gripes about repr(float)'s inadequacy in 1.5.2 and before. Giving a newbie an illusion of comfort at the cost of making it useless for experts is simply nuts. The desire for #B pops up from multiple sources: people trying to use native non-ASCII chars in strings; people just trying to display docstrings without embedded "\012" (newline) and "\011" (tab) escapes; and people using "big" types (like NumPy arrays or rationals) where repr() can produce unboundedly more info than the interactive user typically wants to see. It *so happens* that str() already "does the right thing" in all 3 of the last three points, and also happens to produce "0.0006" for the example above. This is why people leap to: C) Use str by default instead of repr. But str doesn't pass down to containees, and *partly* does a wrong thing when applied to strings, so it's not suitable either. It's *more* suitable than repr, though! trade-off-ing-ly y'rs - tim From tim_one@email.msn.com Sun Apr 9 20:42:19 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 15:42:19 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <000201bfa25b$b7e7ab00$812d153f@tim> [Ping] > No, what i said is what i said. > > Let's try this again: > > repr() is not for the machine. Ping, believe me, I heard that the first 42 times . If it wasn't clear before, I'll spell it out: we don't agree on this, and I didn't agree with Donn Cave when he first went down this path. repr() is a noble attempt to be usable by both human and machine. > The documentation for __repr__ says: > > __repr__(self) Called by the repr() built-in function and by > string conversions (reverse quotes) to compute the "official" > string representation of an object. This should normally look > like a valid Python expression that can be used to recreate an > object with the same value. Additional docs are in the Built-in Functions section of the Library Ref (for repr() and str()). > It only suggests that the output "normally look like a valid > Python expression". It doesn't require it, and certainly doesn't > imply that __repr__ should be the standard way to turn an object > into a platform-independent serialization. Alas, the docs for repr and str are vague to the point of painfulness. Guido's *intent* is more evident in later c.l.py posts, and especially in what the implementation *does*: for at least all of ints, longs, floats, complex numbers and strings, and dicts, lists and tuples composed of those recursively, the 1.6 repr produces a faithful and platform-independent eval'able string composed of 7-bit ASCII printable characters. For floats and complex numbers, bit-for-bit reproducibility relies on the assumption that the platforms are IEEE-754, but all current Windows, Mac and Unix platforms (even Psion's EPOC32) *are*. So when you later say > There are two goals at odds here: readability and serialization. > You can't have both, sorry, but the 1.6 repr() implementation already meets both goals for a great many builtin types (as well as for dozens of classes & types I've implemented, and likely hundreds of classes & types others have implemented -- and there would be twice as many if people weren't abusing repr() to do what str() was intended to do so that the interactive prompt hehaves reasonably). > If you are using repr(), it's because you are expecting a human to > look at the thing at some point. Often, yes. More often it's because I expect a human to *edit* it (dump repr to a text file, fiddle it, then read it back in and eval it -- poor man's database), which they can't reasonably be expected to do with a pickle. Often also it's just a way to send a data structure in email, without needing to attach tedious instructions for how to use pickle to decipher it. >> pickles are unreadable by humans; that's why repr() is often preferred. > Precisely. You just said it yourself: repr() is for humans. *Partly*, yes. You assume an either/or here that I reject: repr() works best when it's designed for both == as Python itself does whenever possible. > That is why repr() cannot be mandated as a serialization mechanism. I haven't suggested to mandate it. It's a goal, and one which is often achievable, and appreciated when it is achieved. Nobody expects repr() to capture the state of an open file object -- but then they don't expect pickle to do that either . > There are two goals at odds here: readability and serialization. > You can't have both, so you must prioritize. Pickles are more > about serialization than about readability; repr is more about > readability than about serialization. Pickles are more about *efficient* machine serialization, sacrificing all readability to run as fast as possible. Sometimes that's the best choice; other times not. > repr() is the interpreter's way of communicating with the human. It is *a* way, sure, but for things like NumPy arrays and Rationals (and probably also for IEEE doubles) it's rarely the *best* way. > It makes sense that e.g. the repr() of a string that you see > printed by the interpreter looks just like what you would type > in to produce the same string, Yes, that's repr's job. But it's often *not* what the interactive user *wants*. You don't want it either! You later say > Right Out if it means that > > eval(what_the_interpreter_prints_for(x)) == x > > no longer holds for objects composed of the basic built-in types. and that implies the shortest string the prompt can display for 3.1416 - 3.141 is 0.0005999999999999339 (see reply to Christian for details on that example). Do you really want to get that string at the prompt? If you have a NumPy array with a million elements, do you really want the interpreter to display all of them -- and in ~17 different widths? If you're using one of my Rational classes, do you really want to see a ratio of multi-thousand digit longs instead of a nice 12-digit floating approximation? I use the interactive prompt a *lot* -- the current behavior plain sucks, starting about 10 minutes after you finish the Python Tutorial <0.7 wink>. > And no, even if you argue that we need to have something else, > whatever you want to call it, it's not called 'str'. Yes, I've said repeatedly that both str() and repr() are unsuitable. That's where SSCTSOOS started, as str() is *more* suitable for more people more of the time than is repr() -- but still isn't enough. > ... > Or, to put it another way: to write Python, it is required that > you understand how to read and write escaped strings. Either > you learn just that, or you learn that plus another, different > way to read escaped-strings-as-printed-by-the-interpreter. The > second case clearly requires you to learn and remember more. You need to learn whatever it takes to get the job done. Since the current alternatives do not get the job done, yes, if anything is ever introduced that *does* get the job done, there's more to learn. Complexity isn't necessarily evil; gratuitous complexity is evil. > ... > (However, characters below 0x20 are definitely dangerous to the terminal, > and would have to be escaped regardless.) They're no danger on any platform I use, and at least in MS-DOS they're mapped to useful graphics characters. Python has no way to know what's dangerous, and gets in the way by trying to guess. Even if x does have control characters that are dangerous, the user will get screwed as soon as they do print x unless you want (the implied) str() to start escaping "dangerous" characters too. Safety and usefulness are definitely at odds here, and I favor usefulness. If they want saftey, let 'em use Java . > Getting it passed down as str() seems okay to me. Making it > the default action, in my (naturally) subjective opinion, is > Right Out if it means that > > eval(what_the_interpreter_prints_for(x)) == x > > no longer holds for objects composed of the basic built-in types. Whereas in my daily use, this property is usually a *wrong* thing to shoot for at an interactive prompt (but is a great thing for repr() to shoot for). When I want eval'ability, it's just a pair of backticks away; by default, I'd rather see something *friendly*. If I type "ping" at the prompt, I don't want to see a second-by-second account of your entire life history . the-best-thing-to-do-with-most-info-is-to-suppress-it-ly y'rs - tim From tim_one@email.msn.com Sun Apr 9 21:14:17 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:14:17 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000301bfa260$2f161640$812d153f@tim> [Tim] >> If they're surprised by this, they indeed don't understand the >> arithmetic at all! This is an argument for using a different form of >> arithmetic, not for lying about reality. > This is not lying! Yes, I overstated that. It's not lying, but I defy anyone to explain the full truth of it in a way even Guido could understand <0.9 wink>. "Shortest conversion" is a subtle concept, requiring knowledge not only of the mathematical value, but of details of the HW representation. Plain old "correct rounding" is HW-independent, so is much easier to *fully* understand. And in things floating-point, what you don't fully understand will eventually burn you. Note that in a machine with 2-bit floating point, the "shortest conversion" for 0.75 is the string "0.8": this should suggest the sense in which "shortest conversion" can be actively misleading too. > If you type in "3.1416" and Python says "3.1416", then indeed it is the > case that "3.1416" is a correct way to type in the floating-point number > being expressed. So "3.1415999999999999" is not any more truthful than > "3.1416" -- it's just more annoying. Yes, shortest conversion is *defensible*. But Python has no code to implement that now, so it's not an option today. > I just tried this in Python 1.5.2+: > > >>> .1 > 0.10000000000000001 > >>> .2 > 0.20000000000000001 > >>> .3 > 0.29999999999999999 > >>> .4 > 0.40000000000000002 > >>> .5 > 0.5 > >>> .6 > 0.59999999999999998 > >>> .7 > 0.69999999999999996 > >>> .8 > 0.80000000000000004 > >>> .9 > 0.90000000000000002 > > Ouch. As shown in my reply to Christian, shortest conversion is not a cure for this "gosh, it printed so much more than I expected it to"; it only appears to "fix it" in the simplest examples. So long as you want eval(what's_diplayed) == what's_typed, this is unavoidable. The only ways to avoid that are to use a different arithmetic, or stop using repr() at the prompt. >> As above. repr() shouldn't be used at the interactive prompt >> anyway (but note that I did not say str() should be). > What, then? Introduce a third conversion routine and further > complicate the issue? I don't see why it's necessary. Because I almost never want current repr() or str() at the prompt, and even you don't want 3.1416-3.141 to display 0.0005999999999999339 (which is the least you can print and have eval return the true answer). >>> What should really happen is that floats intelligently print in >>> the shortest and simplest manner possible >> This can be done, but only if Python does all fp I/O conversions >> entirely on its own -- 754-conforming libc routines are inadequate >> for this purpose > Not "all fp I/O conversions", right? Only repr(float) needs to > be implemented for this particular purpose. Other conversions > like "%f" and "%g" can be left to libc, as they are now. No, all, else you risk %f and %g producing results that are inconsistent with repr(), which creates yet another set of incomprehensible surprises. This is not an area that rewards half-assed hacks! I'm intimately familiar with just about every half-assed hack that's been tried here over the last 20 years -- they never work in the end. The only approach that ever bore fruit was 754's "there is *a* mathematically correct answer, and *that's* the one you return". Unfortunately, they dropped the ball here on float<->string conversions (and very publicly regret that today). > I suppose for convenience's sake it may be nice to add another > format spec so that one can ask for this behaviour from the "%" > operator as well, but that's a separate issue (perhaps "%r" to > insert the repr() of an argument of any type?). %r is cool! I like that. >>> def smartrepr(x): >>> p = 17 >>> while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 >>> return '%%.%df' % p % x >> This merely exposes accidents in the libc on the specific >> platform you run it. That is, after >> >> print smartrepr(x) >> >> on IEEE-754 platform A, reading that back in on IEEE-754 ?> platform B may not yield the same number platform A started with. > That is not repr()'s job. Once again: > > repr() is not for the machine. And once again, I didn't and don't agree with that, and, to save the next seven msgs, never will . > It is not part of repr()'s contract to ensure the kind of > platform-independent conversion you're talking about. It > prints out the number in a way that upholds the eval(repr(x)) == x > contract for the system you are currently interacting with, and > that's good enough. It's not good enough for Java and Scheme, and *shouldn't* be good enough for Python. The 1.6 repr(float) is already platform-independent across IEEE-754 machines (it's not correctly rounded on most platforms, but *does* print enough that 754 guarantees bit-for-bit reproducibility) -- and virtually all Python platforms are IEEE-754 (I don't know of an exception -- perhaps Python is running on some ancient VAX?). The std has been around for 15+ years, virtually all platforms support it fully now, and it's about time languages caught up. BTW, the 1.5.2 text-mode pickle was *not* sufficient for reproducing floats either, even on a single machine. It is now -- but thanks to the change in repr. > If you wanted platform-independent serialization, you would > use something else. There is nothing else. In 1.5.2 and before, people mucked around with binary dumps hoping they didn't screw up endianness. > As long as the language reference says > > "These represent machine-level double precision floating > point numbers. You are at the mercy of the underlying > machine architecture and C implementation for the accepted > range and handling of overflow." > > and until Python specifies the exact sizes and behaviours of > its floating-point numbers, you can't expect these kinds of > cross-platform guarantees anyway. There's nothing wrong with exceeding expectations . Despite what the reference manual says, virtually all machines use identical fp representations today (this wasn't true when the text above was written). > str()'s contract: > - if x is a string, str(x) == x > - otherwise, str(x) is a reasonable string coercion from x The last is so vague as to say nothing. My counterpart-- at least equally vague --is - otherwise, str(x) is a string that's easy to read and contains a compact summary indicating x's nature and value in general terms > repr()'s contract: > - if repr(x) is syntactically valid, eval(repr(x)) == x > - repr(x) displays x in a safe and readable way I would say instead: - every character c in repr(x) has ord(c) in range(32, 128) - repr(x) should strive to be easily readable by humans > - for objects composed of basic types, repr(x) reflects > what the user would have to say to produce x Given your first point, does this say something other than "for basic types, repr(x) is syntactically valid"? Also unclear what "basic types" means. > pickle's contract: > - pickle.dumps(x) is a platform-independent serialization > of the value and state of object x Since pickle can't handle all objects, this exaggerates the difference between it and repr. Give a fuller description, like - If pickle.dumps(x) is defined, pickle.loads(pickle.dumps(x)) == x and it's the same as the first line of your repr() contract, modulo s/syntactically valid/is defined/ s/eval/pickle.loads/ s/repr/pickle.dumps/ The differences among all these guys remain fuzzy to me. but-not-surprising-when-talking-about-what-people-like-to-look-at-ly y'rs - tim From tim_one@email.msn.com Sun Apr 9 21:14:25 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:14:25 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000401bfa260$33e6ff40$812d153f@tim> [Ping] > ... > I think "fewest digits needed to distinguish the correct value" > will give good and least-surprising results here. This method > guarantees: > > - If you just type a number in and the interpreter > prints it back, it will never respond with more > junk digits than you typed. Note the example from another reply of a machine with 2-bit floats. There the user would see: >>> 0.75 # happens to be exactly representable on this machine 0.8 # because that's the shortest string needed on this machine # to get back 0.75 internally >> This kind of surprise is inherent in the approach, not specific to 2-bit machines . BTW, I don't know that it will never print more digits than you type: did you prove that? It's plausible, but many plausible claims about fp turn out to be false. > - If you type in what the interpreter displays for a > float, you can be assured of getting the same value. This isn't of value for most interactive use -- in general you want to see the range of a number, not enough to get 53 bits exactly (that's beyond the limits of human "number sense"). It also has one clearly bad aspect: when printing containers full of floats, the number of digits printed for each will vary wildly from float to float. Makes for an unfriendly display. If the prompt's display function were settable, I'd probably plug in pprint! From tim_one@email.msn.com Sun Apr 9 21:25:19 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:25:19 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F0D1BF.E5ECA4E5@tismer.com> Message-ID: <000501bfa261$b9b5f3a0$812d153f@tim> [Christian] > Hmm, I hope I understood. > Oh, wait a minute! What is the method? What is the correct value? > > If I type > >>> 0.1 > 0.10000000000000001 > >>> 0.10000000000000001 > 0.10000000000000001 > >>> > > There is only one value: The one which is in the machine. > Would you think it is ok to get 0.1 back, when you > actually *typed* 0.10000000000000001 ? Yes, this is the kind of surprise I sketched with the "2-bit machine" example. It can get more surprising than the above (where, as you suspect, "shortest conversion" yields "0.1" for both -- which, btw, is why reading it back in to a float type with more precision loses accuracy needlessly, which in turn is why 754 True Believers dislike it). repetitively y'rs - tim From akuchlin@mems-exchange.org Sun Apr 9 23:00:24 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Sun, 9 Apr 2000 18:00:24 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <14573.61191.486890.43591@seahag.cnri.reston.va.us> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> <14573.61191.486890.43591@seahag.cnri.reston.va.us> Message-ID: <14576.64888.59263.386826@newcnri.cnri.reston.va.us> Fred L. Drake, Jr. writes: >maintained code. I would be surprised if Grail is the only large >application which uses "regex" for performance reasons, and we don't Zope is another, and there's even a ts_regex module hiding in Zope which tries to provide thread-safety on top of regex. --amk From tim_one@email.msn.com Mon Apr 10 03:40:03 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 22:40:03 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071918.PAA27474@eric.cnri.reston.va.us> Message-ID: <000401bfa296$13876e20$7da0143f@tim> [Moshe Zadka] > Just checking my newly bought "Guido Channeling" kit -- you mean str() > but special case the snot out of strings(TM), don't you [Guido] > Except I'm not sure what kind of special-casing should be happening. Welcome to the club. > Put quotes around it without worrying if that makes it a valid string > literal is one thought that comes to mind. If nothing else , Ping convinced me the temptation to type that back in will prove overwhelming. > Another approach might be what Tk's text widget does -- pass through > certain control characters (LF, TAB) and all (even non-ASCII) printing > characters, but display other control characters as \x.. escapes > rather than risk putting the terminal in a weird mode. This must be platform-dependent? Just tried this loop in Win95 IDLE, using Courier: >>> for i in range(256): print i, chr(i), Across the whole range, it just showed what Windows always shows in the Courier font (which is usually a (empty or filled) rectangle for most "control characters"). No \x escapes at all. BTW, note that Tk unhelpfully translates a request for "Courier New" into a request for "Courier", which aren't the same fonts under Windows! So if anyone tries this with the IDLE Windows defaults, and doesn't see all the special characters Windows assigns to the range 128-159 in Courier New, that's why -- most of them aren't assigned under Courier. > No quotes though. Hm, I kind of like this: when used as intended, it will > just display the text, with newlines and umlauts etc.; but when printing > binary gibberish, it will do something friendly. Can't be worse than what happens now . > There's also the issue of what to do with lists (or tuples, or dicts) > containing strings. If we agree on this: > > >>> "hello\nworld\n\347" # octal 347 is a cedilla > hello > world > ç > >>> I don't think there is agreement on this, because nothing in the output says "btw, this thing was a string". Is that worth preserving? "It depends" is the only answer I've got to that. > Then what should ("hello\nworld", "\347") show? I've got enough serious > complaints that I don't want to propose that it use repr(): > > >>> ("hello\nworld", "\347") > ('hello\nworld', '\347') > >>> > > Other possibilities: > > >>> ("hello\nworld", "\347") > ('hello > world', 'ç') > >>> > > or maybe > > >>> ("hello\nworld", "\347") > ('''hello > world''', 'ç') > >>> I like the last best. > Of course there's also the Unicode issue -- the above all assumes > Latin-1 for stdout. > > Still no closure, I think... It's curious how you invoke "closure" when and only when you don't know what *you* want to do . a-guido-divided-against-himself-cannot-stand-ly y'rs - tim From mhammond@skippinet.com.au Mon Apr 10 05:32:53 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Mon, 10 Apr 2000 14:32:53 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. Message-ID: [Im re-sending as the attachment caused this to be held up for administrative approval. Ive forwarded the attachement to Chris - anyone else just mail me for it] Ive struck a crash in the new trashcan mechanism (so I guess Chris is gunna pay the most attention here). Although I can only provoke this reliably in debug builds, I believe it also exists in release builds, but is just far more insidious. Unfortunately, I also can not create a simple crash case. But I _can_ provide info on how you can reliably cause the crash. Obviously only tested on Windows... * Go to http://lima.mudlib.org/~rassilon/p2c/, and grab the download, and unzip. * Replace "transformer.py" with the attached version (multi-arg append bites :-) * Ensure you have a Windows "debug" build available, built from CVS. * From the p2c directory, Run "python_d.exe gencode.py gencode.py" You will get a crash, and the debugger will show you are destructing a list, with an invalid object. The crash occurs about 1000 times after this code is first hit, and I can't narrow the crash condition down :-( If you open object.h, and disable the trashcan mechanism (by changing the "xx", as the comments suggest) then it runs fine. Hope this helps someone - Im afraid I havent a clue :-( Mark. From gstein@lyra.org Mon Apr 10 09:14:59 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 01:14:59 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils sysconfig.py In-Reply-To: <200004100117.VAA16514@kaluha.cnri.reston.va.us> Message-ID: Why aren't we getting diffs on these things? Is it because of the "distutils" root instead of the Python root? Just curious... thx, -g On Sun, 9 Apr 2000, Greg Ward wrote: > Update of /projects/cvsroot/distutils/distutils > In directory kaluha:/tmp/cvs-serv16499 > > Modified Files: > sysconfig.py > Log Message: > Added optional 'prefix' arguments to 'get_python_inc()' and > 'get_python_lib()'. > > > > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Mon Apr 10 09:18:20 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 01:18:20 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils cmd.py In-Reply-To: <200004100130.VAA16590@kaluha.cnri.reston.va.us> Message-ID: [ damn... can't see the code... went and checked it out... ] On Sun, 9 Apr 2000, Greg Ward wrote: > Update of /projects/cvsroot/distutils/distutils > In directory kaluha:/tmp/cvs-serv16575 > > Modified Files: > cmd.py > Log Message: > Added a check for the 'force' attribute in '__getattr__()' -- better than > crashing when self.force not defined. This seems a bit silly. Why don't you simply define .force in the __init__ method? Better yet: make the other guys crash -- the logic is bad if they are using something that isn't supposed to be defined on that particular Command object. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov@inrialpes.fr Mon Apr 10 10:25:03 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Mon, 10 Apr 2000 11:25:03 +0200 (CEST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000001bfa1c2$9e403a80$18a2143f@tim> from "Tim Peters" at Apr 08, 2000 09:26:23 PM Message-ID: <200004100925.LAA03689@python.inrialpes.fr> Tim Peters wrote: > > Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit > arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3. > That's not enough to know whether it *should* round to 41 or 42. So you > need to try again with more precision. But how much? You might try 5 > digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next? > Might be a waste of effort. Try 20 next? Might *still* not be enough -- or > could just as well be that 7 would have been enough and you did 10x the work > you needed to do. Right. From what I understand, the dilemma is this: In order to round correctly, how much extra precision do we need, so that the range of uncertainity (+-3 in your example) does not contain the middle of two consecutive representable numbers (say 41.49 and 41.501). "Solving" the dilemma is predicting this extra precision so that the ranges of uncertainity does not contain the middle of two consecutive floats. Which in turn equals to calculating the min distance between the image of a number and the middle of two consecutive machine numbers. And that's what these guys have calculated for common functions in IEEE-754 double precision, with brute force, using an apparently original algorithm they have proposed. > > that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim > I haven't asked for anything. It was just passive echoing with a good level of uncertainity :-). -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein@lyra.org Mon Apr 10 10:53:48 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 02:53:48 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] Unicode Patch Set 2000-04-10 In-Reply-To: <38F1A430.D70DF89@lemburg.com> Message-ID: On Mon, 10 Apr 2000, M.-A. Lemburg wrote: > The attached patch includes the following fixes and additions: >... > * '...%s...' % u"abc" now coerces to Unicode just like > string methods. Care is taken not to reevaluate already formatted > arguments -- only the first Unicode object appearing in the > argument mapping is looked up twice. Added test cases for > this to test_unicode.py. >... I missed a chance to bring this up on the first round of discussion, but is this really the right thing to do? We never coerce the string on the left based on operands. For example: if the operands are class instances, we call __str__ -- we don't call __coerce__. It seems a bit weird to magically revise the left operand. In many cases, a Unicode used as a string is used as a UTF-8 value. Why is that different in this case? Seems like a wierd special case. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Mon Apr 10 11:55:50 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 12:55:50 +0200 Subject: [Python-Dev] Re: [Patches] Unicode Patch Set 2000-04-10 References: Message-ID: <38F1B336.12B6707@lemburg.com> Greg Stein wrote: > > On Mon, 10 Apr 2000, M.-A. Lemburg wrote: > > The attached patch includes the following fixes and additions: > >... > > * '...%s...' % u"abc" now coerces to Unicode just like > > string methods. Care is taken not to reevaluate already formatted > > arguments -- only the first Unicode object appearing in the > > argument mapping is looked up twice. Added test cases for > > this to test_unicode.py. > >... > > I missed a chance to bring this up on the first round of discussion, but > is this really the right thing to do? We never coerce the string on the > left based on operands. For example: if the operands are class instances, > we call __str__ -- we don't call __coerce__. > > It seems a bit weird to magically revise the left operand. > > In many cases, a Unicode used as a string is used as a UTF-8 value. Why is > that different in this case? Seems like a wierd special case. It's not a special case: % works just like a method call and all string methods auto-coerce to Unicode in case a Unicode object is found among the arguments. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik@pythonware.com Mon Apr 10 12:19:51 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Mon, 10 Apr 2000 13:19:51 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: Message-ID: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> Greg Stein wrote: > In many cases, a Unicode used as a string is used as a UTF-8 value. = Why is > that different in this case? Seems like a wierd special case. the whole "sometimes it's UTF-8, sometimes it's not" concept is one big mess (try using some existing string crunching code with unicode strings if you don't believe me -- using non-US input strings, of course). among other things, it's very hard to get things to work properly when string slicing and indexing no longer works as expected... I see two possible ways to solve this; rough proposals follow: ----------------------------------------------------------------------- 1. a java-like approach ----------------------------------------------------------------------- a) define *character* in python to be a unicode character b) provide two character containers: 8-bit strings and unicode strings. the former can only hold unicode characters in the range 0-255, the latter can hold characters from the full unicode character set (not entirely true for the current implementation, but that's not relevant here) given a string "s" of any string type, s[i] is *always* the i'th = character. len(s) is always the number of characters in the string. len(s[i]) is = 1. etc. c) string operations involving mixed types use the larger type for the return value. d) they raise TypeError if (c) doesn't make any sense. e) as before, 8-bit strings can also be used to store binary data, hold- ing *bytes* instead of characters. given an 8-bit string "b" used as a buffer, b[i] is always the i'th byte. len(b) is always the number of = bytes in the buffer. binary buffers can be used to hold any external unicode encodings (utf-8, utf-16, etc), as well as non-unicode 8-bit encodings = (iso-8859-x, cyrillic, far east, etc). there are no implicit conversions from = buffers to strings; it's up to the programmer to spell that out when necessary. f) it's up to the programmer to keep track of what a given 8-bit string actually contains (strings, encoded characters, or some other kind of binary data). g) (optionally) change the language definition to say that source code is written in unicode, and provide an "encoding pragma" to tell the com- piler how to interpret any given source file. (maybe in 1.7?) (there are more issues here, but let's start with these) ----------------------------------------------------------------------- 2. a tcl-like approach ----------------------------------------------------------------------- a) change slicing, 8-bit regular expressions (etc) to handle UTF-8 byte sequences as characters. this opens one big can of worms... b) kill the worms. ----------------------------------------------------------------------- comments? (for obvious reasons, I'm especially interested in comments from people using non-ASCII characters on a daily basis...) Return-Path: Delivered-To: python-dev@python.org Received: from mr14.vic-remote.bigpond.net.au (mr14.vic-remote.bigpond.net.au [24.192.1.29]) by dinsdale.python.org (Postfix) with ESMTP id B8DCF1CD40 for ; Sun, 9 Apr 2000 20:53:33 -0400 (EDT) Received: from bobcat (CPE-144-132-23-166.vic.bigpond.net.au [144.132.23.166]) by mr14.vic-remote.bigpond.net.au (Pro-8.9.3/8.9.3) with SMTP id KAA21301 for ; Mon, 10 Apr 2000 10:55:59 +1000 (EST) From: "Mark Hammond" To: Date: Mon, 10 Apr 2000 10:55:39 +1000 Message-ID: X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Subject: [Python-Dev] Crash in new "trashcan" mechanism. Sender: python-dev-admin@python.org Errors-To: python-dev-admin@python.org X-BeenThere: python-dev@python.org X-Mailman-Version: 2.0beta2 Precedence: bulk List-Id: Python core developers Ive struck a crash in the new trashcan mechanism (so I guess Chris is gunna pay the most attention here). Although I can only provoke this reliably in debug builds, I believe it also exists in release builds, but is just far more insidious. Unfortunately, I also can not create a simple crash case. But I _can_ provide info on how you can reliably cause the crash. Obviously only tested on Windows... * Go to http://lima.mudlib.org/~rassilon/p2c/, and grab the download, and unzip. * Replace "transformer.py" with the attached version (multi-arg append bites :-) * Ensure you have a Windows "debug" build available, built from CVS. * From the p2c directory, Run "python_d.exe gencode.py gencode.py" You will get a crash, and the debugger will show you are destructing a list, with an invalid object. The crash occurs about 1000 times after this code is first hit, and I can't narrow the crash condition down :-( If you open object.h, and disable the trashcan mechanism (by changing the "xx", as the comments suggest) then it runs fine. Hope this helps someone - Im afraid I havent a clue :-( Mark. begin 666 transformer.py M(R!#;W!Y2 M+2!T2!' M6-L960N;F5T*0T*(R!&96)R=6%R M>2 Q.3DW+@T*(PT*(PT*(R!4:&4@;W5T<'5T('1R964@:&%S('1H92!F;VQL M;W=I;F<@;F]D97,Z#0HC#0HC(%-O=7)C92!0>71H;VX@;&EN92 C)W,@87!P M96%R(&%T('1H92!E;F0@;V8@96%C:"!O9B!A;&P@;V8@=&AE'!R,4YO M9&4L(&5X<'(R3F]D92P@97AP'!R M,TYO9&4-"B,@=')Y9FEN86QL>3H)=')Y4W5I=&5.;V1E+"!F:6Y3=6ET94YO M9&4-"B,@=')Y97AC97!T.@ET'!R3F]D92P@871T3$L('9A;#$I+" N+BXL("AK M97E.+"!V86Q.*2!=#0HC(&YO=#H)"65X<').;V1E#0HC(&-O;7!A'!R3F]D90T*(R!S=6)S8W)I<'0Z"65X<'). M;V1E+"!F;&%G'!R,@T*(PT*(R!#;VUP:6QE9"!A&]R.@E;(&YO9&4Q+" N+BXL(&YO9&5.(%T-"B,@8FET86YD.@E;(&YO9&4Q M+" N+BXL(&YO9&5.(%T-"B,-"B,@3W!E'!R3F]D92P@2LZ"6YO9&4-"B,@=6YA'!R*'1E>'0I#0H)=')E92 ]('!AR!]#0H@(" @9F]R('9A;'5E+"!N M86UE(&EN('-Y;6)O;"YS>6U?;F%M92YI=&5M7!E*'1R964I("$]('1Y<&4H6UTI.@T*(" @(" @ M=')E92 ]('!A'0I.@T*(" @("(B(E)E='5R;B!A(&UO9&EF:65D('!A7!E*&9I;&4I(#T]('1Y<&4H)R6UB;VPN9FEL95]I;G!U=#H-"@D@(')E='5R;B!S96QF+F9I M;&5?:6YP=70H;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN979A;%]I;G!U M=#H-"@D@(')E='5R;B!S96QF+F5V86Q?:6YP=70H;F]D95LQ.ETI#0H):68@ M;B ]/2!S>6UB;VPN;&%M8F1E9CH-"@D@(')E='5R;B!S96QF+FQA;6)D968H M;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN9G5N8V1E9CH-"@D@(')E='5R M;B!S96QF+F9U;F-D968H;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN8VQA M7!E)RP@;BD-"@T* M("!D968@71H:6YG(&%B;W5T(&)E:6YG(")I;G1E'!R3F]D92D-"B @("!N;V1E'!R,BP@97AP'!R M75T-"B @("!E>'!R,2 ]('-E;&8N8V]M7VYO9&4H;F]D96QI'!R,R ]('-E;&8N8V]M7VYO9&4H;F]D96QI M'!R,R ]($YO;F4-"B @ M("!E;'-E.@T*(" @(" @97AP'!R,R ]($YO;F4-"B @(" @(" @ M#0H@(" @;B ]($YO9&4H)V5X96,G+"!E>'!R,2P@97AP'!R,2P@97AP4YO9&4L(&5L5]S=&UT*'-E;&8L(&YO9&5L:7-T*3H-"B @(" C("=T2<@)SHG('-U:71E("=F:6YA;&QY)R G M.B<@6UB;VPN97AC M97!T7V-L875S93H-"B @(" @(')E='5R;B!S96QF+F-O;5]T2AN;V1E;&ES="D-"@T*(" @(')E='5R;B!S96QF+F-O;5]T6UB;VPN'!R("@G+"<@97AP'!R;&ES=#H@97AP6UB;VPN;&%M8F1E9CH-"B @(" @(')E='5R;B!S96QF+FQA M;6)D968H;F]D96QI2@G;W(G+"!N;V1E;&ES="D-"@T*("!D968@86YD7W1E7!E(#T@)VYO=&EN)PT*"0D@(&5L7!E(#T@)VES;F]T)PT*"2 @96QS93H-"@D)='EP92 ](%]C;7!?='EP97-; M;ELP75T-"@T*"2 @;&EN96YO(#T@;FQ;,5U;,ET-"@D@(')E7!E+"!S96QF+F-O;5]N;V1E*&YO9&5L:7-T6VE=*2D@*0T*#0H) M(R!W92!N965D(&$@'!R*2H-"B @("!R971U'!R*2H-"B @("!R971U&]R7V5X<'(@*"2@G8FET86YD)RP@;F]D96QI'!R*'-E;&8L(&YO9&5L:7-T*3H-"B @("!N;V1E(#T@2TG+"!N;V1E*0T*(" @(" @;F]D92YL:6YE;F\@/2!N;V1E;&ES=%LP75LR M70T*(" @(&EF('0@/3T@=&]K96XN5$E,1$4Z#0H@(" @("!N;V1E(#T@3F]D M92@G:6YV97)T)RP@;F]D92D-"B @(" @(&YO9&4N;&EN96YO(#T@;F]D96QI M5]T'AX>"DI($YO9&5S*0T*(" @(",@#0H@(" @:68@;F]D95LP72 ]/2!T M;VME;BY.15=,24Y%.@T*(" @(" @2AS96QF+"!N;V1E;&ES="DZ#0H@(" @(R!T2(@(CHB('-U:71E#0H@ M(" @;B ]($YO9&4H)W1R>69I;F%L;'DG+"!S96QF+F-O;5]N;V1E*&YO9&5L M:7-T6S)=*2P@5]E M>&-E<'0Z("=T&-E<'0Z"2!;5')Y3F]D M92P@6V5X8V5P=%]C;&%U&-E<'1?8VQA=7-E.@T*(" @ M(" @(" C(&5X8V5P=%]C;&%U&-E<'0G(%ME>'!R(%LG+"<@97AP M'!R,B ]($YO;F4-"B @(" @(" @8VQA=7-E'!R,BP@65X8V5P="'!R M;&ES="!O$5R6UB;VPN871O;3H-"B @(" @(" @("!R M86ES92!3>6YT87A%2P@;F]D95LM M,5TL(&%S6YT87A%2P@;F]D92P@ M87-S:6=N:6YG*3H-"B @("!T(#T@;F]D95LQ75LP70T*(" @(&EF('0@/3T@ M=&]K96XN3%!!4CH-"B @(" @(')A:7-E(%-Y;G1A>$5R2P@;F]D95LR72P@87-S:6=N:6YG*0T*(" @(&EF('0@/3T@=&]K96XN3%-1 M0CH-"B @(" @(')E='5R;B!S96QF+F-O;5]S=6)S8W)I<'1L:7-T*'!R:6UA M6YT87A%7!E.B E2AS96QF+"!T>7!E M+"!N;V1E;&ES="DZ#0H@(" @(D-O;7!I;&4@)TY/1$4@*$]0($Y/1$4I*B<@ M:6YT;R H='EP92P@6R!N;V1E,2P@+BXN+"!N;V1E3B!=*2XB#0H@(" @:68@ M;&5N*&YO9&5L:7-T*2 ]/2 Q.@T*(" @(" @7!E+"!I=&5M&-E<'0Z#0H@(" @("!P4YO9&4L(&YO9&5L:7-T*3H-"B @("!T(#T@;F]D96QI4YO9&4L(&YO9&5L:7-T6S)=+"!/ M4%]!4%!,62D-"@T*(" @(')A:7-E(%-Y;G1A>$5R4YO9&4L(&YO9&5L:7-T*3H-"B @("!I9B!N;V1E;&ES M=%LP72 A/2!T;VME;BY.04U%.@T*(" @(" @$5R'!R97-S:6]N("@E'1E;F1E9%]S;&EC:6YG#0H@ M(" @(R!S:6UP;&5?2 B6R(@'1E;F1E9%]S;&EC:6YG.B!P6UB;VPN'!R97-S:6]N('P@<')O M<&5R7W-L:6-E('P@96QL:7!S:7,-"B @("!C:" ](&YO9&5;,5T-"B @("!I M9B!C:%LP72 ]/2!T;VME;BY$3U0@86YD(&YO9&5;,EU;,%T@/3T@=&]K96XN M1$]4.@T*(" @(" @'!R97-S:6]N M#0H@(" @(R!U<'!E2!B92!F=7)T:&5R('-L:6-I;F2!L;V]K:6YG#0H@(" @(R!F;W(@6UB;VPN6UB;VPN97AP6UB;VPN=&5S=&QI M6UB;VPN86YD7W1E'!R+ T*("!S>6UB;VPN>&]R7V5X<'(L#0H@('-Y;6)O;"YA;F1?97AP M'!R M+ T*("!S>6UB;VPN=&5R;2P-"B @7!E M6UB;VPN9G5N M8V1E9BP-"B @6UB;VPN'!R7W-T;70L#0H@('-Y;6)O;"YP6UB;VPN9&5L7W-T;70L#0H@('-Y;6)O;"YP87-S7W-T;70L#0H@('-Y;6)O M;"YB6UB;VPN8V]N=&EN=65?&5C7W-T;70L#0H@('-Y;6)O;"YA6UB;VPN M9F]R7W-T;70L#0H@('-Y;6)O;"YT6UB;VPN=&5S=&QI6UB M;VPN86YD7W1E'!R;&ES="P-"B @'!R+ T*("!S M>6UB;VPN6UB;VPN9F%C=&]R+ T*("!S>6UB;VPN<&]W97(L M#0H@('-Y;6)O;"YA=&]M+ T*("!=#0H-"E]A6UB;VPN86YD7W1E'!R M+ T*("!S>6UB;VPN>&]R7V5X<'(L#0H@('-Y;6)O;"YA;F1?97AP'!R+ T*("!S D>6UB;VPN=&5R;2P-"B @; from gstein@lyra.org on Mon, Apr 10, 2000 at 01:18:20AM -0700 References: <200004100130.VAA16590@kaluha.cnri.reston.va.us> Message-ID: <20000410091101.B406@mems-exchange.org> On 10 April 2000, Greg Stein said: > On Sun, 9 Apr 2000, Greg Ward wrote: > > Modified Files: > > cmd.py > > Log Message: > > Added a check for the 'force' attribute in '__getattr__()' -- better than > > crashing when self.force not defined. > > This seems a bit silly. Why don't you simply define .force in the __init__ > method? Duhh, 'cause I'm stupid? No, that's not it. 'Cause I was doing this on a lazy Sunday evening and not really thinking about it? Yeah, I think that's it. There, I now define self.force in the Command class constructor. A wee bit cheesy (not all Distutils command classes need or use self.force, and it wouldn't always mean the same thing), but it means minimal code upheaval for now. > [ damn... can't see the code... went and checked it out... ] Oops, that was a CVS config thing. Fixed now -- I'll go checkin that change and we'll all see if it worked. Just as well it was off though -- I checked in a couple of big documentation updates this weekend, and who wants to see 30k of LaTeX patches in their inbox on Monday morning? ;-) Greg -- Greg Ward - software developer gward@mems-exchange.org MEMS Exchange / CNRI voice: +1-703-262-5376 Reston, Virginia, USA fax: +1-703-262-5367 From guido@python.org Mon Apr 10 15:01:58 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 10:01:58 -0400 Subject: [Python-Dev] "takeuchi": a unicode string on IDLE shell Message-ID: <200004101401.KAA00238@eric.cnri.reston.va.us> Can anyone answer this? I can reproduce the output side of this, and I believe he's right about the input side. Where should Python migrate with respect to Unicode input? I think that what Takeuchi is getting is actually better than in Pythonwin or command line (where he gets Shift-JIS)... --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Mon, 10 Apr 2000 22:49:45 +0900 From: "takeuchi" To: Subject: a unicode string on IDLE shell Dear Guido, I plaied your latest CPython(Python1.6a1) on Win98 Japanese version, and found a strange IDLE shell behavior. I'm not sure this is a bug or feacher, so I report my story anyway. When typing a Japanese string on IDLE shell with IME , Tk8.3 seems to convert it to a UTF-8 representation. Unfortunatly Python does not know this, it is dealt with an ordinary string. >>> s = raw_input(">>>") Type Japanese characters with IME for example $B$"(B (This is the first character of Japanese alphabet, Hiragana) >>> s '\343\201\202' # UTF-8 encoded >>> print s $B$"(B # A proper griph is appear on the screen Print statement on IDLE shell works fine with a UTF-8 encoded string,however,slice operation or len() does not work. # I know this is a right result So I have to convert this string with unicode(). >>> u = unicode(s) >>> u u'\u3042' >>> print u $B$"(B # A proper griph is appear on the screen Do you think this convertion is unconfortable ? I think this behavior is inconsistant with command line Python and PythonWin. If I want the same result on command line Python shell or PythonWin shell, I have to code as follows; >>> s = raw_input(">>>") Type Japanese characters with IME for example $B$"(B >>>s '\202\240' # Shift-JIS encoded >>> print s $B$"(B # A proper griph is appear on the screen >>> u = unicode(s,"mbcs") # if I use unicode(s) then UnicodeError is raised ! >>>print u.encode("mbcs") # if I use print u then wrong griph is appear $B$"(B # A proper griph is appear on the screen This difference is confusing !! I do not have the best solution for this annoyance, I hope at least IDLE shell and PythonWin shell would have the same behavior . Thank you for reading. Best Regards, takeuchi ------- End of Forwarded Message From tismer@tismer.com Mon Apr 10 15:24:24 2000 From: tismer@tismer.com (Christian Tismer) Date: Mon, 10 Apr 2000 16:24:24 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F1E418.FF191AEE@tismer.com> About extensions and Trashcan. Mark Hammond wrote: ... > Ive struck a crash in the new trashcan mechanism (so I guess Chris > is gunna pay the most attention here). Although I can only provoke > this reliably in debug builds, I believe it also exists in release > builds, but is just far more insidious. > > Unfortunately, I also can not create a simple crash case. But I > _can_ provide info on how you can reliably cause the crash. > Obviously only tested on Windows... ... > You will get a crash, and the debugger will show you are destructing > a list, with an invalid object. The crash occurs about 1000 times > after this code is first hit, and I can't narrow the crash condition > down :-( The trashcan is built in a quite simple manner. It uses a List to delay deletions if the nesting level is deep. The list operations are not thread safe. A special case is handled: It *could* happen on destruction of the session, that trashcan cannot handle errors, since the thread state is already undefined. But the general case of no interpreter lock is undefined and forbidden. In a discussion with Guido, we first thought that we would need some thread safe object for the delay. Later on it turned out that it must be generally *forbidden* to destroy an object when the interpreter lock is not held. Reason: An instance destruction might call __del__, and that would run an interpreter without lock. Forbidden. For that reason, I kept the list in place. I think it is fine that it crashed. There are obviously extension modules left where the interpreter lock rule is violated. The builtin Python code has been checked, there are most probably no holes, including tkinter. Or, I made a mistake in this little code: void _PyTrash_deposit_object(op) PyObject *op; { PyObject *error_type, *error_value, *error_traceback; if (PyThreadState_GET() != NULL) PyErr_Fetch(&error_type, &error_value, &error_traceback); if (!_PyTrash_delete_later) _PyTrash_delete_later = PyList_New(0); if (_PyTrash_delete_later) PyList_Append(_PyTrash_delete_later, (PyObject *)op); if (PyThreadState_GET() != NULL) PyErr_Restore(error_type, error_value, error_traceback); } void _PyTrash_destroy_list() { while (_PyTrash_delete_later) { PyObject *shredder = _PyTrash_delete_later; _PyTrash_delete_later = NULL; ++_PyTrash_delete_nesting; Py_DECREF(shredder); --_PyTrash_delete_nesting; } } ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido@python.org Mon Apr 10 15:40:19 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 10:40:19 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 10:20:34 EDT." <200004101420.KAA00291@eric.cnri.reston.va.us> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> Message-ID: <200004101440.KAA00324@eric.cnri.reston.va.us> Thinking about entering Japanese into raw_input() in IDLE more, I thought I figured a way to give Takeuchi a Unicode string when he enters Japanese characters. I added an experimental patch to the readline method of the PyShell class: if the line just read, when converted to Unicode, has fewer characters but still compares equal (and no exceptions happen during this test) then return the Unicode version. This doesn't currently work because the built-in raw_input() function requires that the readline() call it makes internally returns an 8-bit string. Should I relax that requirement in general? (I could also just replace __builtin__.[raw_]input with more liberal versions supplied by IDLE.) I also discovered that the built-in unicode() function is not idempotent: unicode(unicode('a')) returns u'\000a'. I think it should special-case this and return u'a' ! Finally, I believe we need a way to discover the encoding used by stdin or stdout. I have to admit I know very little about the file wrappers that Marc wrote -- is it easy to get the encoding out of them? IDLE should probably emulate this, as it's encoding is clearly UTF-8 (at least when using Tcl 8.1 or newer). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 10 16:16:58 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:16:58 -0400 Subject: [Python-Dev] int division proposal in idle-dev Message-ID: <200004101516.LAA00442@eric.cnri.reston.va.us> David Scherer posted an interesting proposal to the idle-dev list for dealing with the incompatibility issues around int division. Bruce Sherwood also posted an interesting discussion there on how to deal with incompatibilities in general (culminating in a recommendation of David's solution). In brief, David abuses the "global" statement at the module level to implement a pragma. Not ideal, but kind of cute and backwards compatible -- this can be added to Python 1.5 or even 1.4 code without breaking! He proposes that you put "global olddivision" at the top of any file that relies on int/int yielding an int; a newer Python can then default to new division semantics. (He does this by generating a different opcode, which is also smart.) It's time to start thinking about a transition path -- Bruce's discussion and David's proposal are a fine starting point, I think. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Mon Apr 10 16:32:17 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 17:32:17 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> Message-ID: <38F1F401.45535C23@lemburg.com> Guido van Rossum wrote: > > Thinking about entering Japanese into raw_input() in IDLE more, I > thought I figured a way to give Takeuchi a Unicode string when he > enters Japanese characters. > > I added an experimental patch to the readline method of the PyShell > class: if the line just read, when converted to Unicode, has fewer > characters but still compares equal (and no exceptions happen during > this test) then return the Unicode version. > > This doesn't currently work because the built-in raw_input() function > requires that the readline() call it makes internally returns an 8-bit > string. Should I relax that requirement in general? (I could also > just replace __builtin__.[raw_]input with more liberal versions > supplied by IDLE.) > > I also discovered that the built-in unicode() function is not > idempotent: unicode(unicode('a')) returns u'\000a'. I think it should > special-case this and return u'a' ! Good idea. I'll fix this in the next round. > Finally, I believe we need a way to discover the encoding used by > stdin or stdout. I have to admit I know very little about the file > wrappers that Marc wrote -- is it easy to get the encoding out of > them? I'm not sure what you mean: the name of the input encoding ? Currently, only the names of the encoding and decoding functions are available to be queried. > IDLE should probably emulate this, as it's encoding is clearly > UTF-8 (at least when using Tcl 8.1 or newer). It should be possible to redirect sys.stdin/stdout using the codecs.EncodedFile wrapper. Some tests show that raw_input() doesn't seem to use the redirected sys.stdin though... >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') >>> s = raw_input() äöü >>> s '\344\366\374' >>> s = sys.stdin.read() äöü >>> s '\303\244\303\266\303\274\012' -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Mon Apr 10 16:38:58 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:38:58 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 17:32:17 +0200." <38F1F401.45535C23@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> Message-ID: <200004101538.LAA00486@eric.cnri.reston.va.us> > > Finally, I believe we need a way to discover the encoding used by > > stdin or stdout. I have to admit I know very little about the file > > wrappers that Marc wrote -- is it easy to get the encoding out of > > them? > > I'm not sure what you mean: the name of the input encoding ? > Currently, only the names of the encoding and decoding functions > are available to be queried. Whatever is helpful for a module or program that wants to know what kind of encoding is used. > > IDLE should probably emulate this, as it's encoding is clearly > > UTF-8 (at least when using Tcl 8.1 or newer). > > It should be possible to redirect sys.stdin/stdout using > the codecs.EncodedFile wrapper. Some tests show that raw_input() > doesn't seem to use the redirected sys.stdin though... > > >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') > >>> s = raw_input() > äöü > >>> s > '\344\366\374' > >>> s = sys.stdin.read() > äöü > >>> s > '\303\244\303\266\303\274\012' This deserves more looking into. The code for raw_input() in bltinmodule.c certainly *tries* to use sys.stdin. (I think that because your EncodedFile object is not a real stdio file object, it will take the second branch, near the end of the function; this calls PyFile_GetLine() which attempts to call readline().) Aha! It actually seems that your read() and readline() are inconsistent! I don't know your API well enough to know which string is "correct" (\344\366\374 or \303\244\303\266\303\274) but when I call sys.stdin.readline() I get the same as raw_input() returns: >>> from codecs import * >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') >>> s = raw_input() äöü >>> s '\344\366\374' >>> s = sys.stdin.read() äöü >>> >>> s '\303\244\303\266\303\274\012' >>> unicode(s) u'\344\366\374\012' >>> s = sys.stdin.readline() äöü >>> s '\344\366\374\012' >>> Didn't you say that your wrapper only wraps read()? Maybe you need to revise that decision! (Note that PyShell doesn't even define read() -- it only defines readline().) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Mon Apr 10 16:45:29 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Mon, 10 Apr 2000 11:45:29 -0400 (EDT) Subject: [Python-Dev] test_fork1 on Linux Message-ID: <14577.63257.956728.228174@seahag.cnri.reston.va.us> I've just checked in changes to test_fork1.py make the test a little more sensible on Linux (where the assumption that the thread pids are the same as the controlling process doesn't hold). However, I'm still observing some serious weirdness with this test. As far as I've been able to tell, the os.fork() call always succeeds, but sometimes the parent process segfaults, and sometimes it locks up. It does seem to get to the os.waitpid() call, which isi appearantly where the failure actually occurs. (And sometimes everything works as expected!) If anyone here is particularly familiar with threading on Linux, I'd appreciate a little help, or even a pointer to someone who understands enough of the low-level aspects of threading on Linux that I can communicate with them to figure this out. Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw@python.org Mon Apr 10 16:52:43 2000 From: bwarsaw@python.org (Barry Warsaw) Date: Mon, 10 Apr 2000 11:52:43 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods Message-ID: <14577.63691.561040.281577@anthem.cnri.reston.va.us> A number of people have played FAST and loose with function and method docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are handy because they are the one attribute on funcs and methods that are easily writable. But as more people overload the semantics for docstrings, we'll get collisions. I've had a number of discussions with folks about adding attribute dictionaries to functions and methods so that you can essentially add any attribute. Namespaces are one honking great idea -- let's do more of those! Below is a very raw set of patches to add an attribute dictionary to funcs and methods. It's only been minimally tested, but if y'all like the idea, I'll clean it up, sanity check the memory management, and post the changes to patches@python.org. Here's some things you can do: -------------------- snip snip -------------------- Python 1.6a2 (#10, Apr 10 2000, 11:27:59) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def a(): pass ... >>> a.publish = 1 >>> a.publish 1 >>> a.__doc__ >>> a.__doc__ = 'a doc string' >>> a.__doc__ 'a doc string' >>> a.magic_string = a.__doc__ >>> a.magic_string 'a doc string' >>> dir(a) ['__doc__', '__name__', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name', 'magic_string', 'publish'] >>> class F: ... def a(self): pass ... >>> f = F() >>> f.a.publish Traceback (most recent call last): File "", line 1, in ? AttributeError: publish >>> f.a.publish = 1 >>> f.a.publish 1 >>> f.a.__doc__ >>> f.a.__doc__ = 'another doc string' >>> f.a.__doc__ 'another doc string' >>> f.a.magic_string = f.a.__doc__ >>> f.a.magic_string 'another doc string' >>> dir(f.a) ['__dict__', '__doc__', '__name__', 'im_class', 'im_func', 'im_self', 'magic_string', 'publish'] >>> -------------------- snip snip -------------------- -Barry [1] Aycock, "Compiling Little Languages in Python", http://www.foretec.com/python/workshops/1998-11/proceedings/papers/aycock-little/aycock-little.html [2] http://classic.zope.org:8080/Documentation/Reference/ORB P.S. I promised to add a little note about setattr and getattr vs. setattro and getattro. There's very little documentation about the differences, and searching on python.org doesn't seem to turn up anything. The differences are simple. setattr/getattr take a char* argument naming the attribute to change, while setattro/getattro take a PyObject* (hence the trailing `o' -- for Object). This stuff should get documented in the C API, but at least now, it'll turn up in a SIG search. :) -------------------- snip snip -------------------- Index: funcobject.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/funcobject.h,v retrieving revision 2.16 diff -c -r2.16 funcobject.h *** funcobject.h 1998/12/04 18:48:02 2.16 --- funcobject.h 2000/04/07 21:30:40 *************** *** 44,49 **** --- 44,50 ---- PyObject *func_defaults; PyObject *func_doc; PyObject *func_name; + PyObject *func_dict; } PyFunctionObject; extern DL_IMPORT(PyTypeObject) PyFunction_Type; Index: classobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/classobject.c,v retrieving revision 2.84 diff -c -r2.84 classobject.c *** classobject.c 2000/04/10 13:03:19 2.84 --- classobject.c 2000/04/10 15:27:15 *************** *** 1550,1577 **** /* Dummies that are not handled by getattr() except for __members__ */ {"__doc__", T_INT, 0}, {"__name__", T_INT, 0}, {NULL} /* Sentinel */ }; static PyObject * instancemethod_getattr(im, name) register PyMethodObject *im; ! PyObject *name; { ! char *sname = PyString_AsString(name); ! if (sname[0] == '_') { /* Inherit __name__ and __doc__ from the callable object implementing the method */ ! if (strcmp(sname, "__name__") == 0 || ! strcmp(sname, "__doc__") == 0) ! return PyObject_GetAttr(im->im_func, name); } if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "instance-method attributes not accessible in restricted mode"); return NULL; } ! return PyMember_Get((char *)im, instancemethod_memberlist, sname); } static void --- 1550,1608 ---- /* Dummies that are not handled by getattr() except for __members__ */ {"__doc__", T_INT, 0}, {"__name__", T_INT, 0}, + {"__dict__", T_INT, 0}, {NULL} /* Sentinel */ }; + static int + instancemethod_setattr(im, name, v) + register PyMethodObject *im; + char *name; + PyObject *v; + { + int rtn; + + if (PyEval_GetRestricted() || + strcmp(name, "im_func") == 0 || + strcmp(name, "im_self") == 0 || + strcmp(name, "im_class") == 0) + { + PyErr_Format(PyExc_TypeError, "read-only attribute: %s", name); + return -1; + } + return PyObject_SetAttrString(im->im_func, name, v); + } + + static PyObject * instancemethod_getattr(im, name) register PyMethodObject *im; ! char *name; { ! PyObject *rtn; ! ! if (strcmp(name, "__name__") == 0 || ! strcmp(name, "__doc__") == 0) { /* Inherit __name__ and __doc__ from the callable object implementing the method */ ! return PyObject_GetAttrString(im->im_func, name); } if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "instance-method attributes not accessible in restricted mode"); return NULL; + } + if (strcmp(name, "__dict__") == 0) + return PyObject_GetAttrString(im->im_func, name); + + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); + if (rtn == NULL) { + PyErr_Clear(); + rtn = PyObject_GetAttrString(im->im_func, name); + if (rtn == NULL) + PyErr_SetString(PyExc_AttributeError, name); } ! return rtn; } static void *************** *** 1662,1669 **** 0, (destructor)instancemethod_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ ! 0, /*tp_getattr*/ ! 0, /*tp_setattr*/ (cmpfunc)instancemethod_compare, /*tp_compare*/ (reprfunc)instancemethod_repr, /*tp_repr*/ 0, /*tp_as_number*/ --- 1693,1700 ---- 0, (destructor)instancemethod_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ ! (getattrfunc)instancemethod_getattr, /*tp_getattr*/ ! (setattrfunc)instancemethod_setattr, /*tp_setattr*/ (cmpfunc)instancemethod_compare, /*tp_compare*/ (reprfunc)instancemethod_repr, /*tp_repr*/ 0, /*tp_as_number*/ *************** *** 1672,1678 **** (hashfunc)instancemethod_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ ! (getattrofunc)instancemethod_getattr, /*tp_getattro*/ 0, /*tp_setattro*/ }; --- 1703,1709 ---- (hashfunc)instancemethod_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ ! 0, /*tp_getattro*/ 0, /*tp_setattro*/ }; Index: funcobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/funcobject.c,v retrieving revision 2.18 diff -c -r2.18 funcobject.c *** funcobject.c 1998/05/22 00:55:34 2.18 --- funcobject.c 2000/04/07 22:15:33 *************** *** 62,67 **** --- 62,68 ---- doc = Py_None; Py_INCREF(doc); op->func_doc = doc; + op->func_dict = PyDict_New(); } return (PyObject *)op; } *************** *** 133,138 **** --- 134,140 ---- {"__name__", T_OBJECT, OFF(func_name), READONLY}, {"func_defaults",T_OBJECT, OFF(func_defaults)}, {"func_doc", T_OBJECT, OFF(func_doc)}, + {"func_dict", T_OBJECT, OFF(func_dict)}, {"__doc__", T_OBJECT, OFF(func_doc)}, {NULL} /* Sentinel */ }; *************** *** 142,153 **** PyFunctionObject *op; char *name; { if (name[0] != '_' && PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not accessible in restricted mode"); return NULL; } ! return PyMember_Get((char *)op, func_memberlist, name); } static int --- 144,167 ---- PyFunctionObject *op; char *name; { + PyObject* rtn; + if (name[0] != '_' && PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not accessible in restricted mode"); return NULL; + } + if (strcmp(name, "__dict__") == 0) + return op->func_dict; + + rtn = PyMember_Get((char *)op, func_memberlist, name); + if (rtn == NULL) { + PyErr_Clear(); + rtn = PyDict_GetItemString(op->func_dict, name); + if (rtn == NULL) + PyErr_SetString(PyExc_AttributeError, name); } ! return rtn; } static int *************** *** 156,161 **** --- 170,177 ---- char *name; PyObject *value; { + int rtn; + if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not settable in restricted mode"); *************** *** 178,185 **** } if (value == Py_None) value = NULL; } ! return PyMember_Set((char *)op, func_memberlist, name, value); } static void --- 194,214 ---- } if (value == Py_None) value = NULL; + } + else if (strcmp(name, "func_dict") == 0) { + if (value == NULL || !PyDict_Check(value)) { + PyErr_SetString( + PyExc_TypeError, + "func_dict must be set to a dict object"); + return -1; + } + } + rtn = PyMember_Set((char *)op, func_memberlist, name, value); + if (rtn < 0) { + PyErr_Clear(); + rtn = PyDict_SetItemString(op->func_dict, name, value); } ! return rtn; } static void *************** *** 191,196 **** --- 220,226 ---- Py_DECREF(op->func_name); Py_XDECREF(op->func_defaults); Py_XDECREF(op->func_doc); + Py_XDECREF(op->func_dict); PyMem_DEL(op); } From mal@lemburg.com Mon Apr 10 17:01:52 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 18:01:52 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> Message-ID: <38F1FAF0.4821AE6C@lemburg.com> Guido van Rossum wrote: > > > > Finally, I believe we need a way to discover the encoding used by > > > stdin or stdout. I have to admit I know very little about the file > > > wrappers that Marc wrote -- is it easy to get the encoding out of > > > them? > > > > I'm not sure what you mean: the name of the input encoding ? > > Currently, only the names of the encoding and decoding functions > > are available to be queried. > > Whatever is helpful for a module or program that wants to know what > kind of encoding is used. > > > > IDLE should probably emulate this, as it's encoding is clearly > > > UTF-8 (at least when using Tcl 8.1 or newer). > > > > It should be possible to redirect sys.stdin/stdout using > > the codecs.EncodedFile wrapper. Some tests show that raw_input() > > doesn't seem to use the redirected sys.stdin though... > > > > >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') > > >>> s = raw_input() > > äöü > > >>> s > > '\344\366\374' > > >>> s = sys.stdin.read() > > äöü > > >>> s > > '\303\244\303\266\303\274\012' The latter is the "correct" output, BTW. > This deserves more looking into. The code for raw_input() in > bltinmodule.c certainly *tries* to use sys.stdin. (I think that > because your EncodedFile object is not a real stdio file object, it > will take the second branch, near the end of the function; this calls > PyFile_GetLine() which attempts to call readline().) > > Aha! It actually seems that your read() and readline() are > inconsistent! They are because I haven't yet found a way to implement readline() without buffering read-ahead data. The only way I can think of to implement it without buffering would be to read one char at a time which is much too slow. Buffering is hard to implement right when assuming that streams are stacked... every level would have its own buffering scheme and mixing .read() and .readline() wouldn't work too well. Anyway, I'll give it try... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Mon Apr 10 16:56:26 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:56:26 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 18:01:52 +0200." <38F1FAF0.4821AE6C@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> Message-ID: <200004101556.LAA00578@eric.cnri.reston.va.us> > > Aha! It actually seems that your read() and readline() are > > inconsistent! > > They are because I haven't yet found a way to implement > readline() without buffering read-ahead data. The only way > I can think of to implement it without buffering would be > to read one char at a time which is much too slow. > > Buffering is hard to implement right when assuming that > streams are stacked... every level would have its own > buffering scheme and mixing .read() and .readline() > wouldn't work too well. Anyway, I'll give it try... Since you're calling methods on the underlying file object anyway, can't you avoid buffering by calling the *corresponding* underlying method and doing the conversion on that? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 10 17:02:36 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 12:02:36 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: Your message of "Mon, 10 Apr 2000 11:52:43 EDT." <14577.63691.561040.281577@anthem.cnri.reston.va.us> References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: <200004101602.MAA00590@eric.cnri.reston.va.us> > A number of people have played FAST and loose with function and method > docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are > handy because they are the one attribute on funcs and methods that are > easily writable. But as more people overload the semantics for > docstrings, we'll get collisions. I've had a number of discussions > with folks about adding attribute dictionaries to functions and > methods so that you can essentially add any attribute. Namespaces are > one honking great idea -- let's do more of those! > > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, I'll clean it up, sanity check the memory management, and > post the changes to patches@python.org. Here's some things you can > do: > > -------------------- snip snip -------------------- > Python 1.6a2 (#10, Apr 10 2000, 11:27:59) [GCC 2.8.1] on sunos5 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> def a(): pass > ... > >>> a.publish = 1 > >>> a.publish > 1 > >>> a.__doc__ > >>> a.__doc__ = 'a doc string' > >>> a.__doc__ > 'a doc string' > >>> a.magic_string = a.__doc__ > >>> a.magic_string > 'a doc string' > >>> dir(a) > ['__doc__', '__name__', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name', 'magic_string', 'publish'] > >>> class F: > ... def a(self): pass > ... > >>> f = F() > >>> f.a.publish > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: publish > >>> f.a.publish = 1 > >>> f.a.publish > 1 Here I have a question. Should this really change F.a, or should it change the method bound to f only? You implement the former, but I'm not sure if those semantics are right -- if I have two instances, f1 and f2, and you change f2.a.spam, I'd be surprised if f1.a.spam got changed as well (since f1.a and f2.a are *not* the same thing -- they are not shared. f1.a.im_func and f2.a.im_func are the same thing, but f1.a and f2.a are distinct! I would suggest that you only allow setting attributes via the class or via a function. (This means that you must still implement the pass-through on method objects, but reject it if the method is bound to an instance.) --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy@cnri.reston.va.us Mon Apr 10 17:05:14 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Mon, 10 Apr 2000 12:05:14 -0400 (EDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F1E418.FF191AEE@tismer.com> References: <38F1E418.FF191AEE@tismer.com> Message-ID: <14577.64442.47034.907133@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> I think it is fine that it crashed. There are obviously CT> extension modules left where the interpreter lock rule is CT> violated. The builtin Python code has been checked, there are CT> most probably no holes, including tkinter. Or, I made a mistake CT> in this little code: I think have misunderstood at least one of Mark's bug report and your response. Does the problem Mark reported rely on extension code? I thought the bug was triggered by running pure Python code. If that is the case, then it can never be fine that it crashed. If the problem relies on extension code, then there ought to be a way to write the extension so that it doesn't cause a crash. Jeremy PS Mark: Is the transformer.py you attached different from the one in the nondist/src/Compiler tree? It looks like the only differences are with the whitespace. From dscherer@cmu.edu (David Scherer), python-dev@python.org Mon Apr 10 17:54:09 2000 From: dscherer@cmu.edu (David Scherer), python-dev@python.org (Peter Funk) Date: Mon, 10 Apr 2000 18:54:09 +0200 (MEST) Subject: [Python-Dev] Re: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: from David Scherer at "Apr 10, 2000 9:54:35 am" Message-ID: Hi! David Scherer on idle-dev@python.org: [...] > in the interpreter* is fast. In principle, one could put THREE operators in > the language: one with the new "float division" semantics, one that divided > only integers, and a "backward compatibility" operator with EXACTLY the old > semantics: [...] > An outline of what I did: [...] Yes, this really clever. I like the ideas. [me]: > > 2. What should the new Interpreter do, if he sees a source file without a > > pragma defining the language level? There are two possibilities: [...] > > 2. Assume, it is a new source file and apply language level 2 to it. > > This has the disadvantage, that it will break any existing code. > I think the answer is 2. A high-quality script for adding the pragma to > existing files, with CLI and GUI interfaces, should be packaged with Python. > Running it on your existing modules would be part of the installation > process. Okay. But what is with the Python packages available on the Internet? May be the upcoming dist-utils should handle this? Or should the Python core distribution contain a clever installer program, which handles this? > Long-lived modules should always have a language level, since it makes them > more robust against changes and also serves as documentation. A version > statement could be encouraged at the top of any nontrivial script, e.g: > > python 1.6 [...] global python_1_5 #implies global old_division or global python_1_6 #implies global old_division or global python_1_7 #may be implies global new_division may be we can solve another issue just discussed on python_dev with global source_iso8859_1 or global source_utf_8 Cute idea... but we should keep the list of such pragmas short. > Personally, I think that it makes more sense to talk about ways to > gracefully migrate individual changes into the language than to put off > every backward-incompatible change to a giant future "flag day" that will > break all existing scripts. Versioning of some sort should be encouraged > starting *now*, and incorporated into 1.6 before it goes final. Yes. > Indeed, but Guido has spoken: > > > Great ideas there, Bruce! I hope you will post these to an > > appropriate mailing list (perhaps idle-dev, as there's no official SIG > > to discuss the Python 3000 transition yet, and python-dev is closed). May be someone can invite you into 'python-dev'? However the archives are open to anyone and writing to the list is also open to anybody. Only subscription is closed. I don't know why. Regards, Peter P.S.: Redirected Reply-To: to David and python-dev@python.org ! -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mal@lemburg.com Mon Apr 10 17:39:45 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 18:39:45 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> Message-ID: <38F203D1.4A0038F@lemburg.com> Guido van Rossum wrote: > > > > Aha! It actually seems that your read() and readline() are > > > inconsistent! > > > > They are because I haven't yet found a way to implement > > readline() without buffering read-ahead data. The only way > > I can think of to implement it without buffering would be > > to read one char at a time which is much too slow. > > > > Buffering is hard to implement right when assuming that > > streams are stacked... every level would have its own > > buffering scheme and mixing .read() and .readline() > > wouldn't work too well. Anyway, I'll give it try... > > Since you're calling methods on the underlying file object anyway, > can't you avoid buffering by calling the *corresponding* underlying > method and doing the conversion on that? The problem here is that Unicode has far more line break characters than plain ASCII. The underlying API would break on ASCII lines (or even worse on those CRLF sequences defined by the C lib), not the ones I need for Unicode. BTW, I think that we may need a new Codec class layer here: .readline() et al. are all text based methods, while the Codec base classes clearly work on all kinds of binary and text data. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Mon Apr 10 19:04:31 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:04:31 -0700 (PDT) Subject: [Python-Dev] CVS: distutils/distutils cmd.py In-Reply-To: <20000410091101.B406@mems-exchange.org> Message-ID: On Mon, 10 Apr 2000, Greg Ward wrote: > On 10 April 2000, Greg Stein said: >... > > [ damn... can't see the code... went and checked it out... ] > > Oops, that was a CVS config thing. Fixed now -- I'll go checkin that > change and we'll all see if it worked. Just as well it was off though > -- I checked in a couple of big documentation updates this weekend, and > who wants to see 30k of LaTeX patches in their inbox on Monday morning? > ;-) Cool. The CVS diffs appear to work quite fine now! Note: you might not get a 30k patch since the system elides giant diffs. Of course, if you patch 10 files, each with 3k diffs... :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Mon Apr 10 19:13:08 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:13:08 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Barry Warsaw wrote: >... > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, +1 on concept, -1 on the patch :-) >... > P.S. I promised to add a little note about setattr and getattr > vs. setattro and getattro. There's very little documentation about > the differences, and searching on python.org doesn't seem to turn up > anything. The differences are simple. setattr/getattr take a char* > argument naming the attribute to change, while setattro/getattro take > a PyObject* (hence the trailing `o' -- for Object). This stuff should > get documented in the C API, but at least now, it'll turn up in a SIG > search. :) And note that the getattro/setattro is preferred. It is easy to extract the char* from them; the other direction requires construction of an object. >... > + static int > + instancemethod_setattr(im, name, v) > + register PyMethodObject *im; > + char *name; > + PyObject *v; IMO, this should be instancemethod_setattro() and take a PyObject *name. In the function, you can extract the string for comparison. >... > + { > + int rtn; This variable isn't used. >... > static PyObject * > instancemethod_getattr(im, name) > register PyMethodObject *im; > ! char *name; IMO, this should remain a getattro function. (and fix the name) In your update, note how many GetAttrString calls there are. The plain GetAttr is typically faster. >... > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyObject_GetAttrString(im->im_func, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); Why do you mask this second error with the AttributeError? Seems that you should just leave whatever is there (typically an AttributeError, but maybe not!). >... > --- 144,167 ---- > PyFunctionObject *op; > char *name; > { > + PyObject* rtn; > + > if (name[0] != '_' && PyEval_GetRestricted()) { > PyErr_SetString(PyExc_RuntimeError, > "function attributes not accessible in restricted mode"); > return NULL; > + } > + if (strcmp(name, "__dict__") == 0) > + return op->func_dict; This is superfluous. The PyMember_Get will do this. > + rtn = PyMember_Get((char *)op, func_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyDict_GetItemString(op->func_dict, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); Again, with the masking... >... > + else if (strcmp(name, "func_dict") == 0) { > + if (value == NULL || !PyDict_Check(value)) { > + PyErr_SetString( > + PyExc_TypeError, > + "func_dict must be set to a dict object"); This raises an interesting thought. Why not just require the mapping protocol? Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Mon Apr 10 19:11:29 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:11:29 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 18:39:45 +0200." <38F203D1.4A0038F@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> <38F203D1.4A0038F@lemburg.com> Message-ID: <200004101811.OAA02323@eric.cnri.reston.va.us> > > Since you're calling methods on the underlying file object anyway, > > can't you avoid buffering by calling the *corresponding* underlying > > method and doing the conversion on that? > > The problem here is that Unicode has far more line > break characters than plain ASCII. The underlying API would > break on ASCII lines (or even worse on those CRLF sequences > defined by the C lib), not the ones I need for Unicode. Hm, can't we just use \n for now? > BTW, I think that we may need a new Codec class layer > here: .readline() et al. are all text based methods, > while the Codec base classes clearly work on all kinds of > binary and text data. Not sure what you mean here. Can you explain through an example? --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Mon Apr 10 19:27:03 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:27:03 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib urlparse.py,1.22,1.23 In-Reply-To: <200004101702.NAA01141@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > Update of /projects/cvsroot/python/dist/src/Lib > In directory eric:/projects/python/develop/guido/src/Lib > > Modified Files: > urlparse.py > Log Message: > Some cleanup -- don't use splitfields/joinfields, standardize > indentation (tabs only), rationalize some code in urljoin... Why not use string methods? (the patch still imports from string) Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Mon Apr 10 19:22:26 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:22:26 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib urlparse.py,1.22,1.23 In-Reply-To: Your message of "Mon, 10 Apr 2000 11:27:03 PDT." References: Message-ID: <200004101822.OAA02423@eric.cnri.reston.va.us> > Why not use string methods? (the patch still imports from string) I had the patch sitting in my directory for who knows how long -- I just wanted to flush it to the CVS repository. I didn't really want to thing about all the great changes I *could* make to the code... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 10 19:44:01 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:44:01 -0400 Subject: [Python-Dev] Getting ready for 1.6 alpha 2 Message-ID: <200004101844.OAA02610@eric.cnri.reston.va.us> I'm getting ready for the release of alpha 2. Tomorrow afternoon (around 5:30pm east coast time) I'm going on vacation for the rest of the week, followed by a business trip most of the week after. Obviously, I'm anxious to release a solid alpha tomorrow. Please, send only simple or essential patches between now and the release date! --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Mon Apr 10 19:57:01 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:57:01 -0700 (PDT) Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: <200004101844.OAA02610@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > I'm getting ready for the release of alpha 2. Tomorrow afternoon > (around 5:30pm east coast time) I'm going on vacation for the rest of > the week, followed by a business trip most of the week after. > > Obviously, I'm anxious to release a solid alpha tomorrow. > > Please, send only simple or essential patches between now and the > release date! Jeremy reminded me that my new httplib.py is still pending integration. There are two possibilities: 1) My httplib.py uses a new name, or goes into a "net" package. We check it in today, and I follow up with patches to fold in post-1.5.2 compatibility items (such as the SSL stuff). 2) httplib.py will remain in the same place, so the compat changes must happen first. In both cases, I will also need to follow up with test and doc. IMO, we go with "net.httplib" and check it in today. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Mon Apr 10 20:00:08 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 15:00:08 -0400 Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: Your message of "Mon, 10 Apr 2000 11:57:01 PDT." References: Message-ID: <200004101900.PAA02692@eric.cnri.reston.va.us> > > Please, send only simple or essential patches between now and the > > release date! > > Jeremy reminded me that my new httplib.py is still pending integration. There will be another alpha release after I'm back -- I think this isn't that urgent. (Plus, just because you're you, you'd have to mail me a wet signature. :-) I am opposed to a net.* package until the reorganization discussion has resulted in a solid design. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Mon Apr 10 20:19:57 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 12:19:57 -0700 (PDT) Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: <200004101900.PAA02692@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > > > Please, send only simple or essential patches between now and the > > > release date! > > > > Jeremy reminded me that my new httplib.py is still pending integration. > > There will be another alpha release after I'm back -- I think this > isn't that urgent. True, but depending on location, it also has zero impact on the release. In other words: added functionality for testing, with no potential for breakage. > (Plus, just because you're you, you'd have to mail > me a wet signature. :-) You've got one on file already :-) [ I sent it back in December; was it misplaced, and I need to resend? ] > I am opposed to a net.* package until the reorganization discussion > has resulted in a solid design. Not a problem. Mine easily replaces httplib.py in its current location. It is entirely backwards compat. A new class is used to get the new functionality, and a compat "HTTP" class is provided (leveraging the new HTTPConnection class). Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Mon Apr 10 20:20:31 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 15:20:31 -0400 Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: Your message of "Mon, 10 Apr 2000 12:19:57 PDT." References: Message-ID: <200004101920.PAA02957@eric.cnri.reston.va.us> > > > Jeremy reminded me that my new httplib.py is still pending integration. > > > > There will be another alpha release after I'm back -- I think this > > isn't that urgent. > > True, but depending on location, it also has zero impact on the release. > In other words: added functionality for testing, with no potential for > breakage. You're just asking for exposure. But unless it's installed as httplib.py, it won't get much more exposure than if you put it on your website and post an announcement to c.l.py, I bet. > > (Plus, just because you're you, you'd have to mail > > me a wet signature. :-) > > You've got one on file already :-) > > [ I sent it back in December; was it misplaced, and I need to resend? ] I was just teasing. Our lawyer believes that you cannot send in a signature for code that you will contribute in the future; but I really don't care enough to force you to send another one... > > I am opposed to a net.* package until the reorganization discussion > > has resulted in a solid design. > > Not a problem. Mine easily replaces httplib.py in its current location. It > is entirely backwards compat. A new class is used to get the new > functionality, and a compat "HTTP" class is provided (leveraging the new > HTTPConnection class). I thought you said there was some additional work on compat changes? I quote: | 2) httplib.py will remain in the same place, so the compat changes must | happen first. Oh well, send it to Jeremy and he'll check it in if it's ready. But not without a test suite and documentation. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@tismer.com Mon Apr 10 20:47:12 2000 From: tismer@tismer.com (Christian Tismer) Date: Mon, 10 Apr 2000 21:47:12 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <38F1E418.FF191AEE@tismer.com> <14577.64442.47034.907133@goon.cnri.reston.va.us> Message-ID: <38F22FC0.C975290C@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> I think it is fine that it crashed. There are obviously > CT> extension modules left where the interpreter lock rule is > CT> violated. The builtin Python code has been checked, there are > CT> most probably no holes, including tkinter. Or, I made a mistake > CT> in this little code: > > I think have misunderstood at least one of Mark's bug report and your > response. Does the problem Mark reported rely on extension code? I > thought the bug was triggered by running pure Python code. If that is > the case, then it can never be fine that it crashed. If the problem > relies on extension code, then there ought to be a way to write the > extension so that it doesn't cause a crash. Oh! If it is so, then there is in fact a problem left in the Kernel. Mark, did you use an extension? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From andy@reportlab.com Mon Apr 10 20:46:25 2000 From: andy@reportlab.com (Andy Robinson) Date: Mon, 10 Apr 2000 20:46:25 +0100 Subject: [Python-Dev] Re: [I18n-sig] "takeuchi": a unicode string on IDLE shell References: <200004101401.KAA00238@eric.cnri.reston.va.us> Message-ID: <008a01bfa325$79b92f00$01ac2ac0@boulder> ----- Original Message ----- From: Guido van Rossum To: Cc: Sent: 10 April 2000 15:01 Subject: [I18n-sig] "takeuchi": a unicode string on IDLE shell > Can anyone answer this? I can reproduce the output side of this, and > I believe he's right about the input side. Where should Python > migrate with respect to Unicode input? I think that what Takeuchi is > getting is actually better than in Pythonwin or command line (where he > gets Shift-JIS)... > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think what he wants, as you hinted, is to be able to specify a 'system wide' default encoding of Shift-JIS rather than UTF8. UTF-8 has a certain purity in that it equally annoys every nation, and is nobody's default encoding. What a non-ASCII user needs is a site-wide way of setting the default encoding used for standard input and output. I think this could be done with something (config file? registry key) which site.py looks at, and wraps stream encoders around stdin, stdout and stderr. To illustrate why it matters, I often used to parse data files and do queries on a Japanese name and address database; I could print my lists and tuples in interactive mode and check they worked, or initialise functions with correct data, since the OS uses Shift-JIS as its native encoding and I was manipulating Shift-JIS strings. I've lost that ability now due to the Unicode stuff and would need to do >>> for thing in mylist: >>> ....print mylist.encode('shift_jis') to see the contents of a database row, rather than just >>> mylist BTW, Pythonwin stopped working in this regard when Scintilla came along; it prints a byte at a time now, although kanji input is fine, as is kanji pasted into a source file, as long as you specify a Japanese font. However, this is fixable - I just need to find a spare box to run Japanese windows on and find out where the printing goes wrong. Andy Robinson ReportLab From gstein@lyra.org Mon Apr 10 20:53:22 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 12:53:22 -0700 (PDT) Subject: [Python-Dev] httplib again In-Reply-To: <200004101920.PAA02957@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: >... > You're just asking for exposure. But unless it's installed as > httplib.py, it won't get much more exposure than if you put it on your > website and post an announcement to c.l.py, I bet. Hmm. Good point :-) >... > > > (Plus, just because you're you, you'd have to mail > > > me a wet signature. :-) > > > > You've got one on file already :-) > > > > [ I sent it back in December; was it misplaced, and I need to resend? ] > > I was just teasing. :-) >... > > > I am opposed to a net.* package until the reorganization discussion > > > has resulted in a solid design. > > > > Not a problem. Mine easily replaces httplib.py in its current location. It > > is entirely backwards compat. A new class is used to get the new > > functionality, and a compat "HTTP" class is provided (leveraging the new > > HTTPConnection class). > > I thought you said there was some additional work on compat changes? Oops. Yah. It would become option (2) (add compat stuff first) by dropping it over the current one. Mostly, I'm concerned about the SSL stuff that was added, but there may be other things (need to check the CVS logs). For example, there was all that stuff dealing with the errors (which never went in, I believe?). >... > Oh well, send it to Jeremy and he'll check it in if it's ready. But > not without a test suite and documentation. Ah. Well, then it definitely won't go in now :-). It'll take a bit to set up the tests and docco. Well... thanx for the replies. When I get the stuff ready, I'll speak up again. And yes, I do intend to ensure this stuff is ready in time for 1.6. Cheers, -g p.s. and I retract my request for inclusion of davlib. I think there is still some design work to do on that guy. -- Greg Stein, http://www.lyra.org/ From guido@python.org Mon Apr 10 21:01:16 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 16:01:16 -0400 Subject: [Python-Dev] httplib again In-Reply-To: Your message of "Mon, 10 Apr 2000 12:53:22 PDT." References: Message-ID: <200004102001.QAA03201@eric.cnri.reston.va.us> > p.s. and I retract my request for inclusion of davlib. I think there is > still some design work to do on that guy. But it should at least be available outside the distro! The Vaults of Parnassus don't list it -- so it don't exist! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Mon Apr 10 21:50:26 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 13:50:26 -0700 (PDT) Subject: [Python-Dev] httplib again In-Reply-To: <200004102001.QAA03201@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > > p.s. and I retract my request for inclusion of davlib. I think there is > > still some design work to do on that guy. > > But it should at least be available outside the distro! The Vaults of > Parnassus don't list it -- so it don't exist! :-) D'oh! I forgot to bring it over from my alternate plane of reality. ... Okay. I've synchronized the universes. Parnassus now contains a number of records for my Python stuff (well, submitted at least). Thanx for the nag :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Mon Apr 10 21:34:12 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 22:34:12 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> Message-ID: <38F23AC4.12CBE187@lemburg.com> Guido van Rossum wrote: > > > > Finally, I believe we need a way to discover the encoding used by > > > stdin or stdout. I have to admit I know very little about the file > > > wrappers that Marc wrote -- is it easy to get the encoding out of > > > them? > > > > I'm not sure what you mean: the name of the input encoding ? > > Currently, only the names of the encoding and decoding functions > > are available to be queried. > > Whatever is helpful for a module or program that wants to know what > kind of encoding is used. Hmm, you mean something like file.encoding ? I'll add some additional attributes holding the encoding names to the wrapper classes (they will then be set by the wrapper constructor functions). BTW, I've just added .readline() et al. to the codecs... all except .readline() are easy to do. For .readline() I simply delegated line breaking to the underlying stream's .readline() method. This is far from optimal, but better than not having the method at all. I also adjusted the interfaces of the .splitlines() methods: they now take a different optional argument: """ S.splitlines([keepends]]) -> list of strings Return a list of the lines in S, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. """ This made implementing the above methods very simple and also allows writing codecs working with other basic storage types (UserString.py anyone ;-). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Apr 10 22:00:53 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 23:00:53 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> <38F203D1.4A0038F@lemburg.com> <200004101811.OAA02323@eric.cnri.reston.va.us> Message-ID: <38F24105.28ADB5EA@lemburg.com> Guido van Rossum wrote: > > > > Since you're calling methods on the underlying file object anyway, > > > can't you avoid buffering by calling the *corresponding* underlying > > > method and doing the conversion on that? > > > > The problem here is that Unicode has far more line > > break characters than plain ASCII. The underlying API would > > break on ASCII lines (or even worse on those CRLF sequences > > defined by the C lib), not the ones I need for Unicode. > > Hm, can't we just use \n for now? > > > BTW, I think that we may need a new Codec class layer > > here: .readline() et al. are all text based methods, > > while the Codec base classes clearly work on all kinds of > > binary and text data. > > Not sure what you mean here. Can you explain through an example? Well, the line concept is really only applicable to text data. Binary data doesn't have lines and e.g. a ZIP codec (probably) couldn't implement this kind of method. As it turns out, only the .writelines() method needs to know what kinds of input/output data objects are used (and then only to be able to specify a joining seperator). I'll just leave things as they are for now: quite shallow w/r to the class hierarchy. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Mon Apr 10 22:34:07 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 14:34:07 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.5,2.6 In-Reply-To: <200004102114.RAA07027@eric.cnri.reston.va.us> Message-ID: Euh... this is the incorrect fix. The 0 is wrong to begin with. Mark Favas submitted a proper patch for this. See his "Revised Patches for bug report 258" posted to patches@python.org on April 4th. Cheers, -g On Mon, 10 Apr 2000, Guido van Rossum wrote: > Update of /projects/cvsroot/python/dist/src/Modules > In directory eric:/projects/python/develop/guido/src/Modules > > Modified Files: > mmapmodule.c > Log Message: > I've had complaints about the comparison "where >= 0" before -- on > IRIX, it doesn't even compile. Added a cast: "where >= (char *)0". > > > Index: mmapmodule.c > =================================================================== > RCS file: /projects/cvsroot/python/dist/src/Modules/mmapmodule.c,v > retrieving revision 2.5 > retrieving revision 2.6 > diff -C2 -r2.5 -r2.6 > *** mmapmodule.c 2000/04/05 14:15:31 2.5 > --- mmapmodule.c 2000/04/10 21:14:05 2.6 > *************** > *** 2,6 **** > / Author: Sam Rushing > / Hacked for Unix by A.M. Kuchling > ! / $Id: mmapmodule.c,v 2.5 2000/04/05 14:15:31 fdrake Exp $ > > / mmapmodule.cpp -- map a view of a file into memory > --- 2,6 ---- > / Author: Sam Rushing > / Hacked for Unix by A.M. Kuchling > ! / $Id: mmapmodule.c,v 2.6 2000/04/10 21:14:05 guido Exp $ > > / mmapmodule.cpp -- map a view of a file into memory > *************** > *** 119,123 **** > char * where = (self->data+self->pos); > CHECK_VALID(NULL); > ! if ((where >= 0) && (where < (self->data+self->size))) { > value = (char) *(where); > self->pos += 1; > --- 119,123 ---- > char * where = (self->data+self->pos); > CHECK_VALID(NULL); > ! if ((where >= (char *)0) && (where < (self->data+self->size))) { > value = (char) *(where); > self->pos += 1; > > > _______________________________________________ > Python-checkins mailing list > Python-checkins@python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From guido@python.org Mon Apr 10 22:43:03 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 17:43:03 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.5,2.6 In-Reply-To: Your message of "Mon, 10 Apr 2000 14:34:07 PDT." References: Message-ID: <200004102143.RAA07181@eric.cnri.reston.va.us> > Euh... this is the incorrect fix. The 0 is wrong to begin with. > > Mark Favas submitted a proper patch for this. See his "Revised Patches for > bug report 258" posted to patches@python.org on April 4th. Sigh. You're right. I've seen two patches to mmapmodule.c since he posted that patch, and no comments on his patch, so I thought his patch was already incorporated. I was wrong. Note that this module still gives 6 warnings on VC6.0, all C4018: '>' or '>=' signed/unsigned mismatch. I wish someone gave me a patch for that too. Unrelated: _sre.c also has a bunch of VC6 warnings -- all C4761, integral size mismatch in argument: conversion supplied. This is all about the calls to SRE_IS_DIGIT and SRE_IS_SPACE. The error occurs 8 times on 4 different lines, and is reported in a cyclic fashion: 106, 108, 110, 112, 106, 108, ..., etc., probably due to sre's recursive self-include tricks? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Mon Apr 10 23:11:26 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 18:11:26 -0400 Subject: [Python-Dev] 1.6a2 prerelease for Windows Message-ID: <200004102211.SAA07363@eric.cnri.reston.va.us> I've made a prerelease of the Windows installer available through the python.org/1.6 webpage (the link is in the paragraph *below* the a1 downloads). This is mostly to give Mark Hammond an opportunity to prepare win32all build 131, to deal with the changed location of the python16.dll file. Hey, it's still alpha software! --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond@skippinet.com.au Tue Apr 11 00:00:48 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 11 Apr 2000 09:00:48 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F22FC0.C975290C@tismer.com> Message-ID: > If it is so, then there is in fact a problem left > in the Kernel. > Mark, did you use an extension? I tried to explain this in private email: This is pure Python code. The parser module is the only extension being used. The crash _always_ occurs as a frame object is being de-allocated, and _always_ happens as a builtin list object (a local variable) is de-alloced by the frame. Always the same line of Python code, always the same line of C code, always the exact same failure. Mark. From mhammond@skippinet.com.au Tue Apr 11 00:41:16 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Tue, 11 Apr 2000 09:41:16 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <14577.64442.47034.907133@goon.cnri.reston.va.us> Message-ID: [Sorry - missed this bit] > PS Mark: Is the transformer.py you attached different > from the one in > the nondist/src/Compiler tree? It looks like the only > differences are > with the whitespace. The attached version is simply the "release" P2C transformer.py with .append args fixed. I imagine it is very close to the CVS version (and indeed I know for a fact that the CVS version also crashes). My initial testing showed the CVS compiler did _not_ trigger this bug (even though code that uses an identical transformer.py does), so I just dropped back to P2C and stopped when I saw it :-) Mark. From bwarsaw@python.org Tue Apr 11 00:48:51 2000 From: bwarsaw@python.org (bwarsaw@python.org) Date: Mon, 10 Apr 2000 19:48:51 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: <14578.26723.857270.63150@anthem.cnri.reston.va.us> > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, >>>>> "GS" == Greg Stein writes: GS> +1 on concept, -1 on the patch :-) Well, that's good, because I /knew/ the patch was a quick hack (which is why I posted it to python-dev and not patches :). Since there's been generally positive feedback on the idea, I think I'll flesh it out a bit. GS> And note that the getattro/setattro is preferred. It is easy GS> to extract the char* from them; the other direction requires GS> construction of an object. Good point. >... > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyObject_GetAttrString(im->im_func, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); GS> Why do you mask this second error with the AttributeError? GS> Seems that you should just leave whatever is there (typically GS> an AttributeError, but maybe not!). Good point here, but... > + rtn = PyMember_Get((char *)op, func_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyDict_GetItemString(op->func_dict, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); GS> Again, with the masking... ...here I don't want the KeyError to leak through the getattr() call. If you do "print func.non_existent_attr" wouldn't you want an AttributeError instead of a KeyError? Maybe it should explicitly test for KeyError rather than masking any error coming back from PyDict_GetItemString()? Or better yet (based on your suggestion below), it should do a PyMapping_HasKey() test, raise an AttributeError if not, then just return PyMapping_GetItemString(). >... > + else if (strcmp(name, "func_dict") == 0) { > + if (value == NULL || !PyDict_Check(value)) { > + PyErr_SetString( > + PyExc_TypeError, > + "func_dict must be set to a dict object"); GS> This raises an interesting thought. Why not just require the GS> mapping protocol? Good point again. -Barry From gstein@lyra.org Tue Apr 11 02:37:45 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 18:37:45 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14578.26723.857270.63150@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000 bwarsaw@python.org wrote: >... > GS> And note that the getattro/setattro is preferred. It is easy > GS> to extract the char* from them; the other direction requires > GS> construction of an object. > > Good point. Oh. Also, I noticed that you removed a handy optimization from the getattr function. Testing a character for '_' *before* calling strcmp() will save a good chunk of time, especially considering how often this function is used. Basically, review whether a quick test can save a strmp() call (and can be easily integrated). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Tue Apr 11 02:12:10 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 18:12:10 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14578.26723.857270.63150@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000 bwarsaw@python.org wrote: >... > >... > > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > > + if (rtn == NULL) { > > + PyErr_Clear(); > > + rtn = PyObject_GetAttrString(im->im_func, name); > > + if (rtn == NULL) > > + PyErr_SetString(PyExc_AttributeError, name); > > GS> Why do you mask this second error with the AttributeError? > GS> Seems that you should just leave whatever is there (typically > GS> an AttributeError, but maybe not!). > > Good point here, but... > > > + rtn = PyMember_Get((char *)op, func_memberlist, name); > > + if (rtn == NULL) { > > + PyErr_Clear(); > > + rtn = PyDict_GetItemString(op->func_dict, name); > > + if (rtn == NULL) > > + PyErr_SetString(PyExc_AttributeError, name); > > GS> Again, with the masking... > > ...here I don't want the KeyError to leak through the getattr() call. Ah! Subtle difference in the code there :-) I agree with you, on remapping the second one. I don't think the first needs to be remapped, however. > If you do "print func.non_existent_attr" wouldn't you want an > AttributeError instead of a KeyError? Maybe it should explicitly test > for KeyError rather than masking any error coming back from > PyDict_GetItemString()? Or better yet (based on your suggestion > below), it should do a PyMapping_HasKey() test, raise an > AttributeError if not, then just return PyMapping_GetItemString(). Seems that you could just do the PyMapping_GetItemString() and remap the error *if* it occurs. Presumably, the exception is the infrequent case and can stand to be a bit slower. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Tue Apr 11 01:58:39 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 17:58:39 -0700 (PDT) Subject: [Python-Dev] transformer.py changes? (was: Crash in new "trashcan" mechanism) In-Reply-To: Message-ID: On Tue, 11 Apr 2000, Mark Hammond wrote: > [Sorry - missed this bit] > > > PS Mark: Is the transformer.py you attached different > > from the one in > > the nondist/src/Compiler tree? It looks like the only > > differences are > > with the whitespace. > > The attached version is simply the "release" P2C transformer.py with > .append args fixed. I imagine it is very close to the CVS version > (and indeed I know for a fact that the CVS version also crashes). > > My initial testing showed the CVS compiler did _not_ trigger this > bug (even though code that uses an identical transformer.py does), > so I just dropped back to P2C and stopped when I saw it :-) Hrm. I fixed those things in the P2C CVS version. Guess I'll have to do a diff to see if there are any other changes... Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw@python.org Tue Apr 11 06:08:49 2000 From: bwarsaw@python.org (bwarsaw@python.org) Date: Tue, 11 Apr 2000 01:08:49 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> <200004101602.MAA00590@eric.cnri.reston.va.us> Message-ID: <14578.45921.289078.190085@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Here I have a question. Should this really change F.a, or GvR> should it change the method bound to f only? You implement GvR> the former, but I'm not sure if those semantics are right -- GvR> if I have two instances, f1 and f2, and you change f2.a.spam, GvR> I'd be surprised if f1.a.spam got changed as well (since f1.a GvR> and f2.a are *not* the same thing -- they are not shared. GvR> f1.a.im_func and f2.a.im_func are the same thing, but f1.a GvR> and f2.a are distinct! As are f1.a and f1.a! :) GvR> I would suggest that you only allow setting attributes via GvR> the class or via a function. (This means that you must still GvR> implement the pass-through on method objects, but reject it GvR> if the method is bound to an instance.) Given that, Python should probably raise a TypeError if an attempt is made to set an attribute on a bound method object. However, it should definitely succeed to /get/ an attribute on a bound method object. I'm not 100% sure that setting bound-method-attributes should be illegal, but we can be strict about it now and see if it makes sense to loosen the restriction later. Here's a candidate for Lib/test/test_methattr.py which should print a bunch of `1's. I'll post the revised diffs (taking into account GvR's and GS's suggestions) tomorrow after I've had a night to sleep on it. -Barry -------------------- snip snip -------------------- from test_support import verbose class F: def a(self): pass def b(): pass # setting attributes on functions try: b.blah except AttributeError: pass else: print 'did not get expected AttributeError' b.blah = 1 print b.blah == 1 print 'blah' in dir(b) # setting attributes on unbound methods try: F.a.blah except AttributeError: pass else: print 'did not get expected AttributeError' F.a.blah = 1 print F.a.blah == 1 print 'blah' in dir(F.a) # setting attributes on bound methods is illegal f1 = F() try: f1.a.snerp = 1 except TypeError: pass else: print 'did not get expected TypeError' # but accessing attributes on bound methods is fine print f1.a.blah print 'blah' in dir(f1.a) f2 = F() print f1.a.blah == f2.a.blah F.a.wazoo = F f1.a.wazoo is f2.a.wazoo # try setting __dict__ illegally try: F.a.__dict__ = (1, 2, 3) except TypeError: pass else: print 'did not get expected TypeError' F.a.__dict__ = {'one': 111, 'two': 222, 'three': 333} print f1.a.two == 222 from UserDict import UserDict d = UserDict({'four': 444, 'five': 555}) F.a.__dict__ = d try: f2.a.two except AttributeError: pass else: print 'did not get expected AttributeError' print f2.a.four is f1.a.four is F.a.four From tim_one@email.msn.com Tue Apr 11 07:01:15 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 11 Apr 2000 02:01:15 -0400 Subject: [Python-Dev] Re: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: Message-ID: <001f01bfa37b$58df5740$27a2143f@tim> [Peter Funk] > ... > May be someone can invite you into 'python-dev'? However the archives > are open to anyone and writing to the list is also open to anybody. > Only subscription is closed. I don't know why. The explanation is to be found at the very start of the list -- before it became public . The idea was to have a much smaller group than c.l.py, and composed of people who had contributed non-trivial stuff to Python's implementation. Also a group that felt comfortable arguing with each other (any heat you may perceive on this list is purely illusory ). So the idea was definitely to discourage participation(!), but never to do things in secret. Keeping subscription closed has served its purposes pretty well, despite that the only mechanism enforcing civility here is the lack of an invitation. Elitist social manipulation at its finest . From tim_one@email.msn.com Tue Apr 11 07:01:19 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 11 Apr 2000 02:01:19 -0400 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: Message-ID: <002101bfa37b$5b2acde0$27a2143f@tim> [Peter Funk] > ... > 2. What should the new Interpreter do, if he sees a source file without a > pragma defining the language level? Python has tens of thousands of users now -- if it doesn't default to "Python 1.5.2" (however that's spelled), approximately 79.681% of them will scream. Had the language done this earlier, it would have been much more sellable to default to the current version. However, a default is *just* "a default", and platform-appropriate config mechanism (from Windows registry to cmdline flag) could be introduced to change the default. That is, 1.7 comes out and all my code runs fine without changing a thing. Then I set a global config option to "pretend every module that doesn't say otherwise has a 1.7 pragma in it", and run things again to see what breaks. As part of the process of editing the files that need to be fixed, I can take that natural opportunity to dump in a 1.7 pragma in the modules I've changed, or a 1.6 pragma in the broken modules I can't (for whatever reason) alter just yet. Two pleasant minutes later, I'll have 6,834 .py files all saying "1.7" at the top. Hmm! So when 1.8 comes out, not a one of them will use any incompatible 1.8 features. So I'll also need a global config option that says "pretend every module has a 1.8 pragma in it, *regardless* of whether it has some other pragma in it already". But that will also screw up the one .py file I forgot that had a 1.5.2 pragma in it. Iterate this process a half dozen times, and I'm afraid the end result is intractable. Seems it would be much more tractable over the long haul to default to the current version. Then every incompatible change will require changing every file that relied on the old behavior (to dump in a "no, I can't use the current version's semantics" pragma) -- but that's the situation today too. The difference is that the minimal change required to get unstuck would be trivial. A nice user (like me ) would devote their life to keeping up with incompatible changes, so would never ever have a version pragma in any file. So I vote "default to current version" -- but, *wow*, that's going to be hard to sell. Tech note: Python's front end is not structured today in such a way that it's feasible to have the parser deal with a change in the set of keywords keying off a pragma -- any given identifier today is either always or never a keyword, and that choice is hardwired into the generated parse tables. Not a reason to avoid starting this process with 1.6, just a reason to avoid adding new keywords in 1.6 (it will take some real work to overcome the front end's limitations here). go-for-it!-ly y'rs - tim From pf@artcom-gmbh.de Tue Apr 11 11:15:20 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Tue, 11 Apr 2000 12:15:20 +0200 (MEST) Subject: [Python-Dev] The purpose of the 'repr' builtin function Message-ID: Hi! Currently the wrapper classes UserList und UserString contain the following method: def __repr__(self): return repr(self.data) I wonder about the following alternatives: def __repr__(self): return self.__class__.__name__ + "(" + repr(self.data) + ")" or even more radical (here only for lists as an example): def __repr__(self): result = [self.__class__.__name__, "("] for item in self.data: result.append(repr(item)) result.append(", ") result.append(")") return "".join(result) Just a thought which jumped into my mind during the recent discussion about the purpose of the 'repr' function (float representation). Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mhammond@skippinet.com.au Tue Apr 11 16:15:16 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 01:15:16 +1000 Subject: [Python-Dev] 1.6a2 prerelease for Windows In-Reply-To: <200004102211.SAA07363@eric.cnri.reston.va.us> Message-ID: [Guido wrote] > downloads). This is mostly to give Mark Hammond an opportunity to > prepare win32all build 131, to deal with the changed > location of the > python16.dll file. Thanks! After consideration like that, how could I do anything other than get it out straight away (and if starship wasnt down it would have been a few hours ago :-) 131 is up on starship now. Actually, it looks like starship is down again (or at least under serious stress!) so the pages may not reflect this. It should be at http://starship.python.net/crew/mhammond/downloads/win32all-131.exe Mark. From guido@python.org Tue Apr 11 16:33:15 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 11:33:15 -0400 Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: Your message of "Tue, 11 Apr 2000 12:15:20 +0200." References: Message-ID: <200004111533.LAA08163@eric.cnri.reston.va.us> > Currently the wrapper classes UserList und UserString contain the > following method: > > def __repr__(self): return repr(self.data) > > I wonder about the following alternatives: > > def __repr__(self): > return self.__class__.__name__ + "(" + repr(self.data) + ")" Yes and no. It would make them behave less like their "theoretical" base class, but you're right that it's better to be honest in repr(). Their str() could still look like self.data. > or even more radical (here only for lists as an example): > > def __repr__(self): > result = [self.__class__.__name__, "("] > for item in self.data: > result.append(repr(item)) > result.append(", ") > result.append(")") > return "".join(result) What's the advantage of this? It seems designed to be faster, but I doubt that it really is -- have you timed it? I'd go for simple -- how time-critical can repr() be...? --Guido van Rossum (home page: http://www.python.org/~guido/) From Fredrik Lundh" Message-ID: <01ed01bfa3cd$6d324f20$34aab5d4@hagrid> > Changed PyUnicode_Splitlines() maxsplit argument to keepends. shouldn't that be "PyUnicode_SplitLines" ? (and TailMatch, IsLineBreak, etc.) From Fredrik Lundh" <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> Message-ID: <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> > comments? (for obvious reasons, I'm especially interested in comments > from people using non-ASCII characters on a daily basis...) nobody? maybe all problems are gone after the last round of checkins? oh well, I'll rebuild again, and see what happens if I remove all kludges in my test code... From tismer@tismer.com Tue Apr 11 17:12:32 2000 From: tismer@tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 18:12:32 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F34EF0.73099769@tismer.com> Mark Hammond wrote: > > > Can you perhaps tell me what the call stack says? > > Is it somewhere, or are we in finalization code of the > > interpreter? > > The crash is in _Py_Dealloc - op is a pointer, but all fields > (ob_type, ob_refcnt, etc) are all 0 - hence the crash. > > Next up is list_dealloc - op is also trashed - ob_item != NULL > (hence we are in the if condition, and calling Py_XDECREF() (which > triggers the _Py_Dealloc) - ob_size ==9, but all other fields are 0. > > Next up is Py_Dealloc() > > Next up is _PyTrash_Destroy() > > Next up is frame_dealloc() > > _Py_Dealloc() > > Next up is eval_code2() - the second last line - Py_DECREF(f) to > cleanup the frame it just finished executing. > > Up the stack are lots more eval_code2() - we are just running the > code - not shutting down or anything. And you do obviously not have any threads, right? And you are in the middle of a simple, heavy computing application. Nothing with GUI events happening? That can only mean there is a bug in the Python core or in the parser module. That happens to be exposed by trashcan, but isn't trashcan's fault. Well. Trashcan might change the order of destruction a little. This *should* not be a problem. But here is a constructed situation where I can think of a problem, if we have buggy code, somewhere: Assume you have something like a tuple that holds other elements. If there is a bug, like someone is dereferencing an argument in an arg tuple, what is always an error. This error can hide for a million of years: a contains (b, c, d) The C function decref's a first, and erroneously then also one of the contained elements. If b is already deallotted by decreffing a, it has refcount zero, but that doesn't hurt, since the dead object is still there, and no mallcos have taken place (unless there is a __del__ trigered of course). This eror would never be detected. With trashcan, it could happen that destruction of a is deferred, but by chance now the delayed erroneous decref of b might happen before a's decref, and there may be mallocs in between, since I have a growing list. If my code is valid (and it appears so), then I guess we have such a situation somewhere in the core code. I-smell-some-long-nightshifts-again - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From akuchlin@mems-exchange.org Tue Apr 11 17:19:21 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Tue, 11 Apr 2000 12:19:21 -0400 (EDT) Subject: [Python-Dev] Extensible library packages Message-ID: <200004111619.MAA05881@amarok.cnri.reston.va.us> For 1.6, the XML-SIG wants to submit a few more things, mostly a small SAX implementation. This currently lives in xml.sax.*. There are other subpackages around such as xml.dom, xml.utils, and so forth, but those aren't being proposed for inclusion (too large, too specialized, or whatever reason). The problem is that, if the Python standard library includes a package named 'xml', that package name can't be extended by add-on modules (unless they install themselves into Python's library directory, which is evil). Let's say Sean McGrath or whoever creates a new subpackage; how can he install it so that the code is accessible as xml.pyxie? One option that comes to mind is to have the xml package in the standard library automatically import all the names and modules from some other package ('xml_ext'? 'xml2') in site-packages. This means that all the third-party products install on top of the same location, $(prefix)/site-packages/xml/, which is only slightly less evil. I can't think of a good way to loop through everything in site-packages/* and detect some set of the available packages as XML-related, short of importing every single package, which isn't going to fly. Can anyone suggest a good solution? Fixing this may not require changing the core in any way, but the cleanest solution isn't obvious. -- A.M. Kuchling http://starship.python.net/crew/amk/ The mind of man, though perhaps the most splendid achievement of evolution, is not, surely, that answer to every problem of the universe. Hamlet suffers, but the Gravediggers go right on with their silly quibbles. -- Robertson Davies, "Opera and Humour" From mal@lemburg.com Tue Apr 11 17:35:23 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 18:35:23 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> Message-ID: <38F3544B.57AF8C42@lemburg.com> Fredrik Lundh wrote: > > > comments? (for obvious reasons, I'm especially interested in comments > > from people using non-ASCII characters on a daily basis...) > > nobody? FYI, there currently is a discussion emerging about this on the i18n-sig list. > maybe all problems are gone after the last round of checkins? Probably not :-/ ... the last round only fixed some minor things. > oh well, I'll rebuild again, and see what happens if I remove all > kludges in my test code... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Tue Apr 11 17:41:26 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 18:41:26 +0200 Subject: [Python-Dev] Extensible library packages References: <200004111619.MAA05881@amarok.cnri.reston.va.us> Message-ID: <38F355B6.DD1FD387@lemburg.com> "Andrew M. Kuchling" wrote: > > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > SAX implementation. This currently lives in xml.sax.*. There are > other subpackages around such as xml.dom, xml.utils, and so forth, but > those aren't being proposed for inclusion (too large, too specialized, > or whatever reason). > > The problem is that, if the Python standard library includes a package > named 'xml', that package name can't be extended by add-on modules > (unless they install themselves into Python's library directory, which > is evil). Let's say Sean McGrath or whoever creates a new subpackage; > how can he install it so that the code is accessible as xml.pyxie? You could make use of the __path__ trick in packages and then redirect the imports of subpackages to look in some predefined other areas as well (e.g. a non-package dir .../site-packages/xml-addons/). Here is how I do this in the compatibility packages for my mx series: DateTime/__init__.py: # Redirect all imports to the corresponding mx package def _redirect(mx_subpackage): global __path__ import os,mx __path__ = [os.path.join(mx.__path__[0],mx_subpackage)] _redirect('DateTime') ... Greg won't like this, but __path__ does have its merrits ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fredrik Lundh" Message-ID: <025e01bfa3d3$aa182800$34aab5d4@hagrid> Andrew M. Kuchling wrote: > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > SAX implementation. > Can anyone suggest a good solution? Fixing this may not require > changing the core in any way, but the cleanest solution isn't obvious. saxlib.py ? (yes, I'm serious) From Vladimir.Marangozov@inrialpes.fr Tue Apr 11 17:37:42 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Tue, 11 Apr 2000 18:37:42 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F1E418.FF191AEE@tismer.com> from "Christian Tismer" at Apr 10, 2000 04:24:24 PM Message-ID: <200004111637.SAA01941@python.inrialpes.fr> Christian Tismer wrote: > > About extensions and Trashcan. > ... > Or, I made a mistake in this little code: > > void > _PyTrash_deposit_object(op) > PyObject *op; > { > PyObject *error_type, *error_value, *error_traceback; > > if (PyThreadState_GET() != NULL) > PyErr_Fetch(&error_type, &error_value, &error_traceback); > > if (!_PyTrash_delete_later) > _PyTrash_delete_later = PyList_New(0); > if (_PyTrash_delete_later) > PyList_Append(_PyTrash_delete_later, (PyObject *)op); > > if (PyThreadState_GET() != NULL) > PyErr_Restore(error_type, error_value, error_traceback); > } Maybe unrelated, but this code does not handle the case when PyList_Append fails. If it fails, the object needs to be deallocated as usual. Looking at the macros, I don't see how you can do that because Py_TRASHCAN_SAFE_END, which calls the above function, occurs after the finalization code... -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From pf@artcom-gmbh.de Tue Apr 11 17:39:45 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Tue, 11 Apr 2000 18:39:45 +0200 (MEST) Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: <200004111533.LAA08163@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 11, 2000 11:33:15 am" Message-ID: Hi! [me:] > > or even more radical (here only for lists as an example): > > > > def __repr__(self): > > result = [self.__class__.__name__, "("] > > for item in self.data: > > result.append(repr(item)) > > result.append(", ") > > result.append(")") > > return "".join(result) Guido van Rossum: > What's the advantage of this? It seems designed to be faster, but I > doubt that it really is -- have you timed it? I'd go for simple -- > how time-critical can repr() be...? I feel sorry: The example above was nonsense. I confused 'str' with 'repr' as I quickly hacked the function above in. I erroneously thought 'repr(some_list)' calls 'str()' on the items. If I only had checked more carefully before, I would have remembered that indeed the opposite is true: Currently lists don't have '__str__' and so fall back to 'repr' on the items when 'str([....])' is used. All this is related to the recent discussion about the new annoying behaviour of Python 1.6 when (mis?)used as a Desktop calculator: Python 1.6a1 (#6, Apr 3 2000, 10:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> print [0.1, 0.2] [0.10000000000000001, 0.20000000000000001] >>> print 0.1 0.1 >>> print (0.1, 0.2) (0.10000000000000001, 0.20000000000000001) >>> print (0.1, 0.2)[0] 0.1 >>> print (0.1, 0.2)[1] 0.2 So if default behaviour of the interactive interpreter would be changed not to use 'repr()' for objects typed at the prompt (I believe Tim Peters suggested that), this wouldn't help to make lists, tuples and dictionaries containing floats more readable. I don't know how to fix this, though. :-( Regards, Peter From tismer@tismer.com Tue Apr 11 17:57:09 2000 From: tismer@tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 18:57:09 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004111637.SAA01941@python.inrialpes.fr> Message-ID: <38F35965.CA28C845@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > About extensions and Trashcan. > > ... > > Or, I made a mistake in this little code: > Maybe unrelated, but this code does not handle the case when > PyList_Append fails. If it fails, the object needs to be deallocated > as usual. Looking at the macros, I don't see how you can do that > because Py_TRASHCAN_SAFE_END, which calls the above function, > occurs after the finalization code... Yes, it does not handle this case for the following reasons: Reason 1) If the append does not work, then the system is apparently in a incredibly bad state, most probably broken! Note that these actions only take place when we have a recursion depth of 50 or so. That means, we already freed some memory, and now we have trouble with this probably little list. I won't touch a broken memory management. Reason 2) If the append does not work, then we are not allowed to deallocate the element at all. Trashcan was written in order to avoid crashes for too deeply nested objects. The current nesting level of 20 or 50 is of course very low, but generally I would assume that the limit is choosen for good reasons, and any deeper recursion might cause a machine crash. Under this assumption, the only thing you can do is to forget about the object. Remark ad 1): I had once changed the strategy to use a tuple construct instead. Thinking of memory problems when the shredder list must be grown, this could give an advantage. The optimum would be if the destructor data structure is never bigger than the smallest nested object. This would even allow me to recycle these for the destruction, without any malloc at all. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Vladimir.Marangozov@inrialpes.fr Tue Apr 11 17:59:07 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Tue, 11 Apr 2000 18:59:07 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35965.CA28C845@tismer.com> from "Christian Tismer" at Apr 11, 2000 06:57:09 PM Message-ID: <200004111659.SAA02051@python.inrialpes.fr> Christian Tismer wrote: > > Vladimir Marangozov wrote: > > > > Maybe unrelated, but this code does not handle the case when > > PyList_Append fails. If it fails, the object needs to be deallocated > > as usual. Looking at the macros, I don't see how you can do that > > because Py_TRASHCAN_SAFE_END, which calls the above function, > > occurs after the finalization code... > > Yes, it does not handle this case for the following reasons: > ... Not enough good reasons to segfault. I suggest you move the call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert the condition there. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer@tismer.com Tue Apr 11 18:20:36 2000 From: tismer@tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 19:20:36 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004111659.SAA02051@python.inrialpes.fr> Message-ID: <38F35EE4.7E741801@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Vladimir Marangozov wrote: > > > > > > Maybe unrelated, but this code does not handle the case when > > > PyList_Append fails. If it fails, the object needs to be deallocated > > > as usual. Looking at the macros, I don't see how you can do that > > > because Py_TRASHCAN_SAFE_END, which calls the above function, > > > occurs after the finalization code... > > > > Yes, it does not handle this case for the following reasons: > > ... > > Not enough good reasons to segfault. I suggest you move the > call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert > the condition there. Sorry, I don't see what you are suggesting, I'm distracted. Maybe you want to submit a patch, and a few more words on what you mean and why you prefer to core dump with stack overflow? I'm busy seeking a bug in the core, not in that ridiculous code. Somewhere is a real bug, probably the one which I was seeking many time before, when I got weird crashes in the small block heap of Windows. It was never solved, and never clear if it was Python or Windows memory management. Maybe we just found another entrance to this. It smells so very familiar: many many small tuples and we crash. busy-ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Fredrik Lundh" <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> <38F3544B.57AF8C42@lemburg.com> Message-ID: <004d01bfa3da$820c37a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > > nobody? >=20 > FYI, there currently is a discussion emerging about this on the > i18n-sig list. okay, I'll catch up with that one later. > > maybe all problems are gone after the last round of checkins? >=20 > Probably not :-/ ... the last round only fixed some minor > things. hey, aren't you supposed to say "don't worry, the design is rock solid"? ;-) From mal@lemburg.com Tue Apr 11 20:25:28 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 21:25:28 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> <38F3544B.57AF8C42@lemburg.com> <004d01bfa3da$820c37a0$34aab5d4@hagrid> Message-ID: <38F37C28.4E6D99F@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > > nobody? > > > > FYI, there currently is a discussion emerging about this on the > > i18n-sig list. > > okay, I'll catch up with that one later. > > > > maybe all problems are gone after the last round of checkins? > > > > Probably not :-/ ... the last round only fixed some minor > > things. > > hey, aren't you supposed to say "don't worry, the design > is rock solid"? ;-) Things are hard to get right when you have to deal with backward *and* forward compatibility, interoperability and user-friendliness all at the same time... but we'll keep trying ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Jeffery.D.Collins@aero.org Tue Apr 11 21:03:47 2000 From: Jeffery.D.Collins@aero.org (Jeff Collins) Date: Tue, 11 Apr 2000 13:03:47 -0700 Subject: [Python-Dev] Python for small platforms Message-ID: <14579.25763.434844.257544@malibu.aero.org> I've just had the chance to examine the unicode implementation and was surprised by the size of the code introduced - not just by the size of the database extension module (which I understand Christian Tismer is optimizing and which I assume can be configured away), but in particular by the size of the additional objects (unicodeobject.c, unicodetype.c). These additional objects alone contribute approximately 100K to the resulting executable. On desktop systems, this is not of much concern and suggestions have been made previously to reduce this if necessary (shared extension modules and possibly a shared VM - libpython.so). However, on small embedded systems (eg, PalmIII), this additional code is tremendous. The current size of the python-1.5.2-pre-unicode VM (after removal of float and complex objects with more reductions to come) on the PalmIII is 240K (already huge by Palm standards). (For reference, the size of python-1.5.1 on the PalmIII is 160K, after removal of the compiler, parser, float/long/complex objects.) With the unicode additions, this value jumps to 340K. The upshot of this is that for small platforms on which I am working, unicode support will have to be removed. My immediated concern is that unicode is getting so embedded in python that it will be difficult to extract. The approach I've taken for removing "features" (like float objects): 1) removes the feature with WITHOUT_XXX #ifdef/#endif decorations, where XXX denotes the removable feature (configurable in config.h) 2) preserves the python API: builtin functions, C API, PyArg_Parse, print format specifiers, etc., raise MissingFeatureError if attempts are made to use them. Of course, the API associated with the removed feature is no longer present. 3) protects the reduced VM: all reads (via marshal, compile, etc.) involving source/compiled python code will fail with a MissingFeatureError if the reduced VM doesn't support it. 4) does not yet support a MissingFeatureError in the tokenizer if, say, 2.2 (for removed floats) is entered on the python command line. This instead results in a SyntaxError indicating a problem with the decimal point. It appears that another error token would have to be added to support this error. Of course, I may have missed something, but if the above appears to be a reasonable approach, I can supply patches (at least for floats and complexes) for further discussion. In the longer term, it would be helpful if developers would follow this (or a similar agreed upon approach) when adding new features. This would reduce the burden of maintaining python for small embedded platforms. Thanks, Jeff From guido@python.org Tue Apr 11 21:29:16 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 16:29:16 -0400 Subject: [Python-Dev] ANNOUNCE: Python 1.6 alpha 2 Message-ID: <200004112029.QAA09762@eric.cnri.reston.va.us> I've just released a source tarball and a Windows installer for Python 1.6 alpha 2 to the Python website: http://www.python.org/1.6/ If you missed the announcement for 1.6a1, probably the biggest news is Unicode support. More news is on the above webpage; Unicode is being discussed in the i18n-sig. Most changes since 1.6a1 affect either details of the Unicode support, or details of what the Windows installer installs where. Note: this is an alpha release. Some of the code is very rough! Please give it a try with your favorite Python application, but don't trust it for production use yet. I plan to release several more alpha and beta releases over the next two months, culminating in an 1.6 final release before June first. We need your help to make the final 1.6 release as robust as possible -- please test this alpha release!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Tue Apr 11 22:18:02 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 17:18:02 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils README.txt,1.9,1.10 In-Reply-To: Your message of "Tue, 11 Apr 2000 17:17:01 EDT." <200004112117.RAA02446@thrak.cnri.reston.va.us> References: <200004112117.RAA02446@thrak.cnri.reston.va.us> Message-ID: <200004112118.RAA09957@eric.cnri.reston.va.us> You realize that that README didn't make it into 1.6a2, right? Shouldn't be a problem. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Tue Apr 11 22:31:46 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 23:31:46 +0200 Subject: [Python-Dev] Python for small platforms References: <14579.25763.434844.257544@malibu.aero.org> Message-ID: <38F399C2.642C1127@lemburg.com> Jeff Collins wrote: > > The approach I've taken for removing "features" (like float objects): > 1) removes the feature with WITHOUT_XXX #ifdef/#endif decorations, > where XXX denotes the removable feature (configurable in config.h) > 2) preserves the python API: builtin functions, C API, PyArg_Parse, > print format specifiers, etc., raise MissingFeatureError if > attempts are made to use them. Of course, the API associated > with the removed feature is no longer present. > 3) protects the reduced VM: all reads (via marshal, compile, etc.) > involving source/compiled python code will fail with > a MissingFeatureError if the reduced VM doesn't support it. > 4) does not yet support a MissingFeatureError in the tokenizer > if, say, 2.2 (for removed floats) is entered on the python > command line. This instead results in a SyntaxError > indicating a problem with the decimal point. It appears that > another error token would have to be added to support > this error. Wouldn't it be simpler to replace the parts in question with dummy replacements ? The dummies could then raise appropriate exceptions as needed. This would work for float, complex and Unicode objects which all have a defined API. The advantage of this approach is that you don't need to maintain separate patches for these parts (which is a pain) and that you can provide drop-in archives which are easy to install: simply unzip over the full source tree and recompile. > Of course, I may have missed something, but if the above appears to be > a reasonable approach, I can supply patches (at least for floats and > complexes) for further discussion. In the longer term, it would be > helpful if developers would follow this (or a similar agreed upon > approach) when adding new features. This would reduce the burden of > maintaining python for small embedded platforms. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Wed Apr 12 00:28:01 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 11 Apr 2000 16:28:01 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] add string precisions to PyErr_Format calls In-Reply-To: <14579.11701.733010.789688@amarok.cnri.reston.va.us> Message-ID: On Tue, 11 Apr 2000, Andrew M. Kuchling wrote: > Greg Stein writes: > >Wouldn't it be best to simply fix PyErr_Format so that we don't have to > >continue to worry about buffer overruns? > > A while ago I suggested using nsprintf() in PyErr_Format, but that > means stealing the implementation from Apache for those platforms > where libc doesn't include nsprintf(). Haven't done it yet... Seems like it would be cake to write one that took *only* the %d and %s (unadorned) modifiers. We wouldn't need anything else, would we? [ ... grep'ing the source ... ] I see the following format codes which would need to change: %.###s -- switch to %s %i -- switch to %d %c -- hrm. probably need to support this (in stringobject.c) %x -- maybe switch to %d? (in stringobject.c) The last two are used once, both in stringobject.c. I could see a case for revising that call use just %s and %d. One pass to count the length, alloc, then one pass to fill in. The second pass could actually be handled by vsprintf() since we know the buffer is large enough. The only tricky part would be determining the max length for %d. For a 32-bit value, it is 10 digits; for 64-bit value, it is 20 digits. I'd say allocate room for 20 digits regardless of platform and be done with it. Maybe support %%, but I didn't see that anywhere. Somebody could add support when the need arises. Last problem: backwards compat for third-party modules using PyErr_Format. IMO, leave PyErr_Format for them (they're already responsible for buffer overruns (or not) since PyErr_Format isn't helping them). The new one would be PyErr_SafeFormat. Recommend the Safe version, deprecate the unsafe one. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw@cnri.reston.va.us Wed Apr 12 00:22:14 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 11 Apr 2000 19:22:14 -0400 (EDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes Message-ID: <14579.45990.603625.434317@anthem.cnri.reston.va.us> --HXjrLbAr5v Content-Type: text/plain; charset=us-ascii Content-Description: message body text Content-Transfer-Encoding: 7bit Here's the second go at adding arbitrary attribute support to function and method objects. Note that this time it's illegal (TypeError) to set an attribute on a bound method object; getting an attribute on a bound method object returns the value on the underlying function object. First the diffs, then the test case and test output. Enjoy, -Barry --HXjrLbAr5v Content-Type: text/plain Content-Description: Diff -u to add arbitrary attrs to funcs and meths Content-Disposition: inline; filename="methdiff.txt" Content-Transfer-Encoding: 7bit Index: Include/funcobject.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/funcobject.h,v retrieving revision 2.16 diff -u -r2.16 funcobject.h --- funcobject.h 1998/12/04 18:48:02 2.16 +++ funcobject.h 2000/04/07 21:30:40 @@ -44,6 +44,7 @@ PyObject *func_defaults; PyObject *func_doc; PyObject *func_name; + PyObject *func_dict; } PyFunctionObject; extern DL_IMPORT(PyTypeObject) PyFunction_Type; Index: Objects/classobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/classobject.c,v retrieving revision 2.84 diff -u -r2.84 classobject.c --- classobject.c 2000/04/10 13:03:19 2.84 +++ classobject.c 2000/04/11 22:05:08 @@ -1550,28 +1550,75 @@ /* Dummies that are not handled by getattr() except for __members__ */ {"__doc__", T_INT, 0}, {"__name__", T_INT, 0}, + {"__dict__", T_INT, 0}, {NULL} /* Sentinel */ }; +static int +instancemethod_setattro(im, name, v) + register PyMethodObject *im; + PyObject *name; + PyObject *v; +{ + char* sname = PyString_AsString(name); + if (sname == NULL) + return -1; + + if (PyEval_GetRestricted() || + strcmp(sname, "im_func") == 0 || + strcmp(sname, "im_self") == 0 || + strcmp(sname, "im_class") == 0) + { + PyErr_Format(PyExc_TypeError, "read-only attribute: %s", sname); + return -1; + } + if (im->im_self != NULL) { + PyErr_Format(PyExc_TypeError, + "cannot set bound instance-method attribute: %s", + sname); + return -1; + } + return PyObject_SetAttr(im->im_func, name, v); +} + + static PyObject * -instancemethod_getattr(im, name) +instancemethod_getattro(im, name) register PyMethodObject *im; PyObject *name; { - char *sname = PyString_AsString(name); + PyObject *rtn; + char* sname = PyString_AsString(name); + + if (sname == NULL) + return NULL; + if (sname[0] == '_') { /* Inherit __name__ and __doc__ from the callable object - implementing the method */ - if (strcmp(sname, "__name__") == 0 || - strcmp(sname, "__doc__") == 0) + implementing the method. Can't allow access to __dict__ + here because it should not be readable in restricted + execution mode. + */ + if (strcmp(sname, "__name__") == 0 || + strcmp(sname, "__doc__") == 0) { return PyObject_GetAttr(im->im_func, name); + } } if (PyEval_GetRestricted()) { - PyErr_SetString(PyExc_RuntimeError, - "instance-method attributes not accessible in restricted mode"); + PyErr_Format(PyExc_RuntimeError, + "instance-method attributes not accessible in restricted mode: %s", + sname); return NULL; + } + if (sname[0] == '_' && strcmp(sname, "__dict__") == 0) + return PyObject_GetAttr(im->im_func, name); + + rtn = PyMember_Get((char *)im, instancemethod_memberlist, sname); + if (rtn == NULL) { + PyErr_Clear(); + rtn = PyObject_GetAttr(im->im_func, name); } - return PyMember_Get((char *)im, instancemethod_memberlist, sname); + return rtn; } static void @@ -1672,8 +1719,8 @@ (hashfunc)instancemethod_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ - (getattrofunc)instancemethod_getattr, /*tp_getattro*/ - 0, /*tp_setattro*/ + (getattrofunc)instancemethod_getattro, /*tp_getattro*/ + (setattrofunc)instancemethod_setattro, /*tp_setattro*/ }; /* Clear out the free list */ Index: Objects/funcobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/funcobject.c,v retrieving revision 2.18 diff -u -r2.18 funcobject.c --- funcobject.c 1998/05/22 00:55:34 2.18 +++ funcobject.c 2000/04/11 22:06:12 @@ -62,6 +62,7 @@ doc = Py_None; Py_INCREF(doc); op->func_doc = doc; + op->func_dict = PyDict_New(); } return (PyObject *)op; } @@ -133,6 +134,8 @@ {"__name__", T_OBJECT, OFF(func_name), READONLY}, {"func_defaults",T_OBJECT, OFF(func_defaults)}, {"func_doc", T_OBJECT, OFF(func_doc)}, + {"func_dict", T_OBJECT, OFF(func_dict)}, + {"__dict__", T_OBJECT, OFF(func_dict)}, {"__doc__", T_OBJECT, OFF(func_doc)}, {NULL} /* Sentinel */ }; @@ -142,12 +145,21 @@ PyFunctionObject *op; char *name; { + PyObject* rtn; + if (name[0] != '_' && PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not accessible in restricted mode"); return NULL; + } + rtn = PyMember_Get((char *)op, func_memberlist, name); + if (rtn == NULL) { + PyErr_Clear(); + rtn = PyMapping_GetItemString(op->func_dict, name); + if (rtn == NULL) + PyErr_SetString(PyExc_AttributeError, name); } - return PyMember_Get((char *)op, func_memberlist, name); + return rtn; } static int @@ -156,6 +168,8 @@ char *name; PyObject *value; { + int rtn; + if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not settable in restricted mode"); @@ -178,8 +192,23 @@ } if (value == Py_None) value = NULL; + } + else if (strcmp(name, "func_dict") == 0 || + strcmp(name, "__dict__") == 0) + { + if (value == NULL || !PyMapping_Check(value)) { + PyErr_SetString( + PyExc_TypeError, + "must set func_dict to a mapping object"); + return -1; + } + } + rtn = PyMember_Set((char *)op, func_memberlist, name, value); + if (rtn < 0) { + PyErr_Clear(); + rtn = PyMapping_SetItemString(op->func_dict, name, value); } - return PyMember_Set((char *)op, func_memberlist, name, value); + return rtn; } static void @@ -191,6 +220,7 @@ Py_DECREF(op->func_name); Py_XDECREF(op->func_defaults); Py_XDECREF(op->func_doc); + Py_XDECREF(op->func_dict); PyMem_DEL(op); } --HXjrLbAr5v Content-Type: text/plain Content-Description: Test of func/meth attrs Content-Disposition: inline; filename="test_funcattrs.py" Content-Transfer-Encoding: 7bit from test_support import verbose class F: def a(self): pass def b(): pass # setting attributes on functions try: b.blah except AttributeError: pass else: print 'did not get expected AttributeError' b.blah = 1 print b.blah == 1 print 'blah' in dir(b) # setting attributes on unbound methods try: F.a.blah except AttributeError: pass else: print 'did not get expected AttributeError' F.a.blah = 1 print F.a.blah == 1 print 'blah' in dir(F.a) # setting attributes on bound methods is illegal f1 = F() try: f1.a.snerp = 1 except TypeError: pass else: print 'did not get expected TypeError' # but accessing attributes on bound methods is fine print f1.a.blah print 'blah' in dir(f1.a) f2 = F() print f1.a.blah == f2.a.blah F.a.wazoo = F f1.a.wazoo is f2.a.wazoo # try setting __dict__ illegally try: F.a.__dict__ = (1, 2, 3) except TypeError: pass else: print 'did not get expected TypeError' F.a.__dict__ = {'one': 111, 'two': 222, 'three': 333} print f1.a.two == 222 from UserDict import UserDict d = UserDict({'four': 444, 'five': 555}) F.a.__dict__ = d try: f2.a.two except AttributeError: pass else: print 'did not get expected AttributeError' print f2.a.four is f1.a.four is F.a.four --HXjrLbAr5v Content-Type: text/plain Content-Description: Output of test of func/meth attrs Content-Disposition: inline; filename="test_funcattrs" Content-Transfer-Encoding: 7bit test_funcattrs 1 1 1 1 1 1 1 1 1 --HXjrLbAr5v-- From mhammond@skippinet.com.au Wed Apr 12 00:54:50 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 09:54:50 +1000 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> Message-ID: > > comments? (for obvious reasons, I'm especially > interested in comments > > from people using non-ASCII characters on a daily basis...) > > nobody? Almost certainly not. a) Unicode objects are very new and not everyone has the time to fiddle with them, and b) many of us only speak English. So we need _you_ to tell us what the problems were/are. Dont wait for us to find them - explain them to us. At least we than have a change of sympathizing, even if we can not directly relate the experiences... > maybe all problems are gone after the last round of checkins? > oh well, I'll rebuild again, and see what happens if I remove all > kludges in my test code... OK - but be sure to let us know :-) Mark. From mhammond@skippinet.com.au Wed Apr 12 01:04:22 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 10:04:22 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35EE4.7E741801@tismer.com> Message-ID: To answer Chris' earlier question: No threads, no gui, no events. The "parser" module is the only builtin module (apart from the obvious - ntpath etc) Greg and/or Bill can correct me if I am wrong - it is just P2C, and it is just console based, mainline procedural code. It _is_ highly recursive tho (and I believe this will turn out to be the key factor in the crash) > Somewhere is a real bug, probably the one which I was > seeking many time before, when I got weird crashes in the small > block heap of Windows. It was never solved, and never clear if > it was Python or Windows memory management. I am confident that this problem was my fault, in that I was releasing a different version of the MFC DLLs than I had actually built with. At least everyone with a test case couldnt repro it after the DLL update. This new crash is so predictable and always with the same data that I seriously doubt the problem is in any way related. > Maybe we just found another entrance to this. > It smells so very familiar: many many small tuples and we crash. Lists this time, but I take your point. Ive got a busy next few days, so it is still exists after that I will put some more effort into it. Mark. From mhammond@skippinet.com.au Wed Apr 12 01:07:43 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 10:07:43 +1000 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <38F37C28.4E6D99F@lemburg.com> Message-ID: [Marc] > Things are hard to get right when you have to deal with > backward *and* forward compatibility, interoperability and > user-friendliness all at the same time... but we'll keep > trying ;-) Let me say publically that I think you have done a fine job, and obviously have put lots of thought and effort into it. If parts of the design turn out to be less than ideal (and subsequently changed before 1.6 is real) then this will not detract from your excellent work. Well done! [And also to Fredrik, whose code was the basis for the Unicode object itself - that was a nice piece of code too!] Aww-heck-I-love-all-you-guys--ly, Mark. From gward@mems-exchange.org Wed Apr 12 03:10:18 2000 From: gward@mems-exchange.org (Greg Ward) Date: Tue, 11 Apr 2000 22:10:18 -0400 Subject: [Python-Dev] How *does* Python determine sys.prefix? Message-ID: <20000411221018.A2587@mems-exchange.org> Ooh, here's a yucky problem. Last night, I installed Oliver Andrich's Python 1.5.2 RPM on my Linux box at home, so now I have two Python installations there: * my build, in /usr/local/python and /usr/local/python.i86-linux (I need to test Distutils in the prefix != exec_prefix case) * Oliver's RPM, in /usr I have a symlink /usr/local/bin/python pointing to ../../python.i86-linux/bin/python, and /usr/local/bin is first in my path: $ ls -lF `which python` lrwxrwxrwx 1 root root 30 Aug 28 1999 /usr/local/bin/python -> ../python.i86-linux/bin/python* Since I installed the RPM, /usr/local/bin/python reports an incorrect prefix: $ /usr/local/bin/python Python 1.5.2 (#1, Jun 20 1999, 19:56:42) [GCC 2.7.2.3] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr/local/bin/../python.i86-linux') Essentially the same thing if I run it directly, not through the symlink: $ /usr/local/python.i86-linux/bin/python Python 1.5.2 (#1, Jun 20 1999, 19:56:42) [GCC 2.7.2.3] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr/local/python.i86-linux') /usr/bin/python gets it right, though: $ /usr/bin/python Python 1.5.2 (#1, Apr 18 1999, 16:03:16) [GCC pgcc-2.91.60 19981201 (egcs-1.1.1 on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr') This strikes me as a pretty reasonable and straightforward way to have multiple Python installations; if Python is fooled into getting the wrong sys.prefix, then the Distutils are going to have a much tougher job! Don't tell me I have to write my own prefix-finding code now... (And no, I have not tried this under 1.6 yet.) Damn and blast my last-minute pre-release testing... I should have just released the bloody thing and let the bugs fly. Oh hell, I think I will anyways. Greg -- Greg Ward - software developer gward@mems-exchange.org MEMS Exchange / CNRI voice: +1-703-262-5376 Reston, Virginia, USA fax: +1-703-262-5367 From janssen@parc.xerox.com Wed Apr 12 03:17:38 2000 From: janssen@parc.xerox.com (Bill Janssen) Date: Tue, 11 Apr 2000 19:17:38 PDT Subject: [Python-Dev] Re: ANNOUNCE: Python 1.6 alpha 2 In-Reply-To: Your message of "Tue, 11 Apr 2000 13:29:51 PDT." <200004112029.QAA09762@eric.cnri.reston.va.us> Message-ID: <00Apr11.191729pdt."3438"@watson.parc.xerox.com> ILU seems to work fine with it. Bill From bwarsaw@cnri.reston.va.us Wed Apr 12 03:34:04 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 11 Apr 2000 22:34:04 -0400 (EDT) Subject: [Python-Dev] How *does* Python determine sys.prefix? References: <20000411221018.A2587@mems-exchange.org> Message-ID: <14579.57500.195708.720145@anthem.cnri.reston.va.us> >>>>> "GW" == Greg Ward writes: GW> Ooh, here's a yucky problem. Last night, I installed Oliver GW> Andrich's Python 1.5.2 RPM on my Linux box at home, so now I GW> have two Python installations there: Greg, I don't know why it's finding the wrong landmark. Perhaps the first test for running out of the build directory is tripping up? What happens if you remove /usr/lib/python$VERSION/string.py? If possible you should step through calculate_path() in getpath.c -- this implements the search through the file system for the landmarks. -Barry From gward@python.net Wed Apr 12 03:34:12 2000 From: gward@python.net (Greg Ward) Date: Tue, 11 Apr 2000 22:34:12 -0400 Subject: [Python-Dev] ANNOUNCE: Distutils 0.8 released Message-ID: <20000411223412.A643@beelzebub> Python Distribution Utilities release 0.8 April 11, 2000 The Python Distribution Utilities, or Distutils for short, are a collection of modules that aid in the development, distribution, and installation of Python modules. (It is intended that ultimately the Distutils will grow up into a system for distributing and installing whole Python applications, but for now their scope is limited to module distributions.) The Distutils are a standard part of Python 1.6; if you are running 1.6, you don't need to install the Distutils separately. This release is primarily so that you can add the Distutils to a Python 1.5.2 installation -- you will then be able to install modules that require the Distutils, or use the Distutils to distribute your own modules. More information is available at the Distutils web page: http://www.python.org/sigs/distutils-sig/ and in the README.txt included in the Distutils source distribution. You can download the Distutils from http://www.python.org/sigs/distutils-sig/download.html Trivial patches can be sent to me (Greg Ward) at gward@python.net. Larger patches should be discussed on the Distutils mailing list: distutils-sig@python.org. Here are the changes in release 0.8, if you're curious: * some incompatible naming changes in the command classes -- both the classes themselves and some key class attributes were renamed (this will break some old setup scripts -- see README.txt) * half-hearted, unfinished moves towards backwards compatibility with Python 1.5.1 (the 0.1.4 and 0.1.5 releases were done independently, and I still have to fold those code changes in to the current code) * added ability to search the Windows registry to find MSVC++ (thanks to Robin Becker and Thomas Heller) * renamed the "dist" command to "sdist" and introduced the "manifest template" file (MANIFEST.in), used to generate the actual manifest * added "build_clib" command to build static C libraries needed by Python extensions * fixed the "install" command -- we now have a sane, usable, flexible, intelligent scheme for doing standard, alternate, and custom installations (and it's even documented!) (thanks to Fred Drake and Guido van Rossum for design help) * straightened out the incompatibilities between the UnixCCompiler and MSVCCompiler classes, and cleaned up the whole mechanism for compiling C code in the process * reorganized the build directories: now build to either "build/lib" or "build/lib.", with temporary files (eg. compiler turds) in "build/temp." * merged the "install_py" and "install_ext" commands into "install_lib" -- no longer any sense in keeping them apart, since pure Python modules and extension modules build to the same place * added --debug (-g) flag to "build_*" commands, and make that carry through to compiler switches, names of extensions on Windows, etc. * fixed many portability bugs on Windows (thanks to many people) * beginnings of support for Mac OS (I'm told that it's enough for the Distutils to install itself) (thanks to Corran Webster) * actually pay attention to the "--rpath" option to "build_ext" (thanks to Joe Van Andel for spotting this lapse) * added "clean" command (thanks to Bastien Kleineidam) * beginnings of support for creating built distributions: changes to the various build and install commands to support it, and added the "bdist" and "bdist_dumb" commands * code reorganization: split core.py up into dist.py and cmd.py, util.py into *_util.py * removed global "--force" option -- it's now up to individual commands to define this if it makes sense for them * better error-handling (fewer extravagant tracebacks for errors that really aren't the Distutils' fault -- Greg Ward - just another Python hacker gward@python.net http://starship.python.net/~gward/ All the world's a stage and most of us are desperately unrehearsed. From jon@dgs.monash.edu.au Wed Apr 12 03:40:23 2000 From: jon@dgs.monash.edu.au (Jonathan Giddy) Date: Wed, 12 Apr 2000 12:40:23 +1000 (EST) Subject: [Python-Dev] Re: ANNOUNCE: Python 1.6 alpha 2 In-Reply-To: <"00Apr11.191729pdt.3438"@watson.parc.xerox.com> from "Bill Janssen" at Apr 11, 2000 07:17:38 PM Message-ID: <200004120240.MAA11342@nexus.csse.monash.edu.au> Bill Janssen declared: > >ILU seems to work fine with it. > >Bill Without wishing to jinx this good news, isn't the release of 1.6 the appropriate time to remove the redundant thread.h file? Jon. From Vladimir.Marangozov@inrialpes.fr Wed Apr 12 04:18:26 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:18:26 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35EE4.7E741801@tismer.com> from "Christian Tismer" at Apr 11, 2000 07:20:36 PM Message-ID: <200004120318.FAA06750@python.inrialpes.fr> Christian Tismer wrote: > > Vladimir Marangozov wrote: > > > > Not enough good reasons to segfault. I suggest you move the > > call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert > > the condition there. > > Sorry, I don't see what you are suggesting, I'm distracted. I was thinking about the following. Change the macros in object.h from: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ else \ _PyTrash_deposit_object((PyObject*)op);\ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ to: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ where _PyTrash_deposit_object returns 0 on success, -1 on failure. This gives another last chance to the system to finalize the object, hoping that the stack won't overflow. :-) My point is that it is better to control whether _PyTrash_deposit_object succeeds or not (and it may fail because of PyList_Append). If this doesn't sound acceptable (because of the possible stack overflow) it would still be better to abort in _PyTrash_deposit_object with an exception "stack overflow on recursive finalization" when PyList_Append fails. Leaving it unchecked is not nice -- especially in such extreme situations. Currently, if something fails, the object is not finalized (leaking memory). Ok, so be it. What's not nice is that this happens silently which is not the kind of tolerance I would accept from the Python runtime. As to the bug: it's curious that, as Mark reported, without the trashcan logic, things seem to run fine. The trashcan seems to provoke (ok, detect ;) some erroneous situation. I'd expect that if the trashcan macros are implemented as above, the crash will go away (which wouldn't solve the problem and would obviate the trashcan in the first place :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov@inrialpes.fr Wed Apr 12 04:34:48 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:34:48 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <200004120318.FAA06750@python.inrialpes.fr> from "Vladimir Marangozov" at Apr 12, 2000 05:18:26 AM Message-ID: <200004120334.FAA06784@python.inrialpes.fr> Of course, this Vladimir Marangozov wrote: > > to: > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > { \ > ++_PyTrash_delete_nesting; \ > if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > was meant to be this: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov@inrialpes.fr Wed Apr 12 04:54:13 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:54:13 +0200 (CEST) Subject: [Python-Dev] trashcan and PR#7 Message-ID: <200004120354.FAA06834@python.inrialpes.fr> While I'm at it, maybe the same recursion control logic could be used to remedy (most probably in PyObject_Compare) PR#7: "comparisons of recursive objects" reported by David Asher? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein@lyra.org Wed Apr 12 05:09:19 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 11 Apr 2000 21:09:19 -0700 (PDT) Subject: [Python-Dev] Extensible library packages In-Reply-To: <025e01bfa3d3$aa182800$34aab5d4@hagrid> Message-ID: On Tue, 11 Apr 2000, Fredrik Lundh wrote: > Andrew M. Kuchling wrote: > > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > > SAX implementation. > > > Can anyone suggest a good solution? Fixing this may not require > > changing the core in any way, but the cleanest solution isn't obvious. > > saxlib.py ? > > (yes, I'm serious) +1 When we solve the problem of installing items into "core" Python packages, then we can move saxlib.py (along with the rest of the modules in the standard library). Cheers, -g -- Greg Stein, http://www.lyra.org/ From pf@artcom-gmbh.de Wed Apr 12 06:43:59 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 07:43:59 +0200 (MEST) Subject: [Python-Dev] Extensible library packages In-Reply-To: <200004111619.MAA05881@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Apr 11, 2000 12:19:21 pm" Message-ID: Hi! Andrew M. Kuchling: [...] > The problem is that, if the Python standard library includes a package > named 'xml', ... [...] > Can anyone suggest a good solution? Fixing this may not require > changing the core in any way, but the cleanest solution isn't obvious. I dislike the idea of having user visible packages in the standard library too. As Fredrik already suggested, putting a file 'saxlib.py' into the lib, which exposes all what a user needs to know about 'sax' seems to be the best solution. Regards, Peter From tim_one@email.msn.com Wed Apr 12 08:52:01 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 12 Apr 2000 03:52:01 -0400 Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: Message-ID: <000401bfa453$fca9f1e0$ae2d153f@tim> [Peter Funk] > ... > So if default behaviour of the interactive interpreter would be changed > not to use 'repr()' for objects typed at the prompt (I believe Tim > Peters suggested that), this wouldn't help to make lists, tuples and > dictionaries containing floats more readable. Or lists, tuples and dicts of anything else either: that's what I'm getting at when I keep saying containers should "pass str() down" to containees. That it doesn't has frustrated me for years; newbies aren't bothered by it because before 1.6 str == repr for almost all builtin types, and newbies (by definition ) don't have any classes of their own overriding __str__ or __repr__. But I do, and their repr is rarely what I want to see in the shell. This is a different issue than (but related to) what the interactive prompt should use by default to format expression results. They have one key conundrum in common, though: if str() is simply passed down with no other change, then e.g. print str({"a:": "b, c", "a, b": "c"}) and (same thing in disguise) print {"a:": "b, c", "a, b": "c"} would display {a:: b, c, a, b: c} and that's darned unreadable. As far as I can tell, the only reason str(container) invokes repr on the containees today is simply to get some string quotes in output like this. That's fine so far as it goes, but leads to miserably bloated displays for containees of many types *other* than the builtin ones -- and even for string containees leads to embedded octal escape sequences all over the place. > I don't know how to fix this, though. :-( Sure you do! And we look forward to your patch . From gstein@lyra.org Wed Apr 12 09:09:30 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 01:09:30 -0700 (PDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes In-Reply-To: <14579.45990.603625.434317@anthem.cnri.reston.va.us> Message-ID: On Tue, 11 Apr 2000, Barry A. Warsaw wrote: > Here's the second go at adding arbitrary attribute support to function > and method objects. Note that this time it's illegal (TypeError) to > set an attribute on a bound method object; getting an attribute on a > bound method object returns the value on the underlying function > object. First the diffs, then the test case and test output. In the instancemethod_setattro function, it might be nice to do the speed optimization and test for sname[0] == 'i' before hitting the strcmp() calls. Oh: policy question: I would think that these attributes *should* be available in restricted mode. They aren't "sneaky" like the builtin attributes. Rather than PyMapping_Get/SetItemString()... PyObject_Get/SetItem() should be used. They apply to mappings and will be faster. Note that (internally) the PyMapping_Get/SetItemString use the latter forms (after constructing a string object(!)). ... whoops. I see that the function object doesn't use the ?etattro() variants. hrm. The stuff is looking really good! Cheers, -g -- Greg Stein, http://www.lyra.org/ From andy@reportlab.com Wed Apr 12 09:18:40 2000 From: andy@reportlab.com (Andy Robinson) Date: Wed, 12 Apr 2000 09:18:40 +0100 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <20000412035101.F38D71CE29@dinsdale.python.org> Message-ID: > > Things are hard to get right when you have to deal with > > backward *and* forward compatibility, interoperability and > > user-friendliness all at the same time... but we'll keep > > trying ;-) > > Let me say publically that I think you have done a fine job, and > obviously have put lots of thought and effort into it. If parts of > the design turn out to be less than ideal (and subsequently changed > before 1.6 is real) then this will not detract from your excellent > work. > > Well done! > > [And also to Fredrik, whose code was the basis for the Unicode > object itself - that was a nice piece of code too!] > Mark I've spent a fair bit of time converting strings and files the last few days, and I'd add that what we have now seems both rock solid and very easy to use. The remaining issues are entirely a matter of us end users trying to figure out what we should have asked for in the first place. Whether we achieve that finally before 1.6 is our problem; Marc-Andr\u00C9 and Fredrik have done a great job, and I think we are on track for providing something much more useful and extensible than (say) Java. As proof of this, someone has already contributed Japanese codecs based on the spec. - Andy Robinson From pf@artcom-gmbh.de Wed Apr 12 09:11:23 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 10:11:23 +0200 (MEST) Subject: [Python-Dev] Improving readability of interpreter expression output (was The purpose of 'repr'...) In-Reply-To: <000401bfa453$fca9f1e0$ae2d153f@tim> from Tim Peters at "Apr 12, 2000 3:52: 1 am" Message-ID: Hi! Tim Peters: [...] > This is a different issue than (but related to) what the interactive prompt > should use by default to format expression results. They have one key > conundrum in common, though: if str() is simply passed down with no other > change, then e.g. > > print str({"a:": "b, c", "a, b": "c"}) > and (same thing in disguise) > print {"a:": "b, c", "a, b": "c"} > > would display > > {a:: b, c, a, b: c} > > and that's darned unreadable. Would you please elaborate a bit more, what you have in mind with "other change" in your sentence above? > As far as I can tell, the only reason > str(container) invokes repr on the containees today is simply to get some > string quotes in output like this. That's fine so far as it goes, but leads > to miserably bloated displays for containees of many types *other* than the > builtin ones -- and even for string containees leads to embedded octal > escape sequences all over the place. > > > I don't know how to fix this, though. :-( > > Sure you do! And we look forward to your patch . No. Serious. I don't see how to fix the 'darned unreadable' output. passing 'str' down seems to be simple. But how to fix the problem above isn't obvious to me. Regards, Peter From mal@lemburg.com Wed Apr 12 09:17:02 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 10:17:02 +0200 Subject: [Python-Dev] #pragmas in Python source code Message-ID: <38F430FE.BAF40AB8@lemburg.com> There currently is a discussion about how to write Python source code in different encodings on i18n. The (experimental) solution so far has been to add a command line switch to Python which tells the compiler which encoding to expect for u"...strings..." ("...8-bit strings..." will still be used as is -- it's the user's responsibility to use the right encoding; the Unicode implementation will still assume them to be UTF-8 encoded in automatic conversions). In the end, a #pragma should be usable to tell the compiler which encoding to use for decoding the u"..." strings. What we need now, is a good proposal for handling these #pragmas... does anyone have experience with these ? Any ideas ? Here's a simple strawman for the syntax: # pragma key: value parser = re.compile( '^#\s*pragma\s+' '([a-zA-Z_][a-zA-Z0-9_]*):\s*' '(.+)' ) For the encoding this would be something like: # pragma encoding: unicode-escape The compiler would scan these pragma defs, add them to an internal temporary dictionary and use them for all subsequent code it finds during the compilation process. The dictionary would have to stay around until the original compile() call has completed (spanning recursive calls). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping@lfw.org Wed Apr 12 10:24:09 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 02:24:09 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000401bfa260$33e6ff40$812d153f@tim> Message-ID: Sorry, i'm a little behind on this. I'll try to catch up over the next day or two. On Sun, 9 Apr 2000, Tim Peters wrote: > > Note the example from another reply of a machine with 2-bit floats. There > the user would see: > > >>> 0.75 # happens to be exactly representable on this machine > 0.8 # because that's the shortest string needed on this machine > # to get back 0.75 internally > >> > > This kind of surprise is inherent in the approach, not specific to 2-bit > machines . Okay, okay. But on a 2-bit machine you ought to be no more surprised by the above than by >>> 0.1 + 0.1 0.0 >>> 0.4 + 0.4 1.0 In fact, i suppose one could argue that 0.8 is just as honest as 0.75, as you could get 0.8 from anything in (0.625, 0.825)... or even *more* honest than 0.75, since "0.75" shows more significant digits than the precision of machine would justify. It could be argued either way. I don't see this as a fatal flaw of the 'smartrepr' method, though. After looking at the spec for java.lang.Float.toString() and the Clinger paper you mentioned, it appears to me that both essentially describe 'smartrepr', which seems encouraging. > BTW, I don't know that it will never print more digits than you type: did > you prove that? It's plausible, but many plausible claims about fp turn out > to be false. Indeed, fp *is* tricky, but i think in this case the proof actually is pretty evident -- The 'smartrepr' routine i suggested prints the representation with the fewest number of digits which converts back to the actual value. Since the thing that you originally typed converted to that value the first time around, certainly no *more* digits than what you typed are necessary to produce that value again. QED. > > - If you type in what the interpreter displays for a > > float, you can be assured of getting the same value. > > This isn't of value for most interactive use -- in general you want to see > the range of a number, not enough to get 53 bits exactly (that's beyond the > limits of human "number sense"). What do you mean by "the range of a number"? > It also has one clearly bad aspect: when > printing containers full of floats, the number of digits printed for each > will vary wildly from float to float. Makes for an unfriendly display. Yes, this is something you want to be able to control -- read on. > If the prompt's display function were settable, I'd probably plug in pprint! Since i've managed to convince Guido that such a hook might be nice, i seem to have worked myself into the position of being responsible for putting together a patch to do so... Configurability is good. It won't solve everything, but at least the flexibility provided by a "display" hook will let everybody have the ability to play whatever tricks they want. (Or, equivalently: to anyone who complains about the interpreter display, at least we have plausible grounds on which to tell them to go fix it themselves.) :) Here is what i have in mind: provide two hooks __builtins__.display(object) and __builtins__.displaytb(traceback, exception) that are called when the interpreter needs to display a result or when the top level catches an exception. Protocol is simple: 'display' gets one argument, an object, and can do whatever the heck it wants. 'displaytb' gets a traceback and an exception, and can do whatever the heck it wants. -- ?!ng "Je n'aime pas les stupides garçons, même quand ils sont intelligents." -- Roople Unia From fredrik@pythonware.com Wed Apr 12 10:39:03 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 11:39:03 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: Message-ID: <007801bfa462$f0f399f0$0500a8c0@secret.pythonware.com> Andy Robinson wrote: > I've spent a fair bit of time converting strings and files the=20 > last few days, and I'd add that what we have now seems both rock solid > and very easy to use. =20 I'm not worried about the core string types or the conversion machinery; what disturbs me is mostly the use of automagic conversions to UTF-8, which breaks the fundamental assumption that a string is a sequence of len(string) characters. "The items of a string are characters. There is no separate character type; a character is represented by a string of one item" (from the language reference) I still think the "all strings are sequences of unicode characters" strawman I posted earlier would simplify things for everyone in- volved (programmers, users, and the interpreter itself). more on this later. gotta ship some code first. From Vladimir.Marangozov@inrialpes.fr Wed Apr 12 10:47:56 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 11:47:56 +0200 (CEST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14577.63691.561040.281577@anthem.cnri.reston.va.us> from "Barry Warsaw" at Apr 10, 2000 11:52:43 AM Message-ID: <200004120947.LAA02067@python.inrialpes.fr> Barry Warsaw wrote: > > A number of people have played FAST and loose with function and method > docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are > handy because they are the one attribute on funcs and methods that are > easily writable. But as more people overload the semantics for > docstrings, we'll get collisions. I've had a number of discussions > with folks about adding attribute dictionaries to functions and > methods so that you can essentially add any attribute. Namespaces are > one honking great idea -- let's do more of those! Barry, I wonder what for... Just because there's a Python entity implemented as a C structure in which we can easily include a dict + access functions? I don't see the purpose of attaching state (vars) to an algorithm (i.e. a function). What are the benefits compared to class instances? And these special assignment rules because of the real overlap with real instances... Grrr, all this is pretty dark, conceptually. Okay, I inderstood: modules become classes, functions become instances, module variables are class variables, and classes become ... 2-nd order instances of modules. The only missing piece of the puzzle is a legal way to instantiate modules for obtaining functions and classes dynamically, because using eval(), the `new' module or The Hook is perceived as very hackish and definitely not OO. Once the puzzle would be solved, we'll discover that there would be only one additional little step towards inheritance for modules. How weird! Sounds like we're going to metaclass again... -1 until P3K. This is no so cute as it is dangerous. It opens the way to mind abuse. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik@pythonware.com Wed Apr 12 11:04:32 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 12:04:32 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <009601bfa466$807da2c0$0500a8c0@secret.pythonware.com> Vladimir Marangozov wrote: > I don't see the purpose of attaching state (vars) to an algorithm > (i.e. a function). What are the benefits compared to class instances? >=20 > And these special assignment rules because of the real overlap with > real instances... Grrr, all this is pretty dark, conceptually. > > -1 until P3K. I agree. From tismer@tismer.com Wed Apr 12 13:08:51 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 14:08:51 +0200 Subject: [Python-Dev] trashcan and PR#7 References: <200004120354.FAA06834@python.inrialpes.fr> Message-ID: <38F46753.3759A7B6@tismer.com> Vladimir Marangozov wrote: > > While I'm at it, maybe the same recursion control logic could be > used to remedy (most probably in PyObject_Compare) PR#7: > "comparisons of recursive objects" reported by David Asher? Hey, what a good idea. You know what's happening? We are moving towards tail recursion. If we do this everywhere, Python converges towards Stackless Python. and-most-probably-a-better-one-than-mince - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From paul@prescod.net Wed Apr 12 13:20:18 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 07:20:18 -0500 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <38F46A02.3AB10147@prescod.net> Vladimir Marangozov wrote: > > ... > > I don't see the purpose of attaching state (vars) to an algorithm > (i.e. a function). A function is also an object. > What are the benefits compared to class instances? If I follow you, you are saying that whenever you need to associate information with a function, you should wrap up the function and object into a class. But the end result of this transformation could be a program in which every single function is a class. That would be incredibly annoying, especially with Python's scoping rules. In general, it may not even be possible. Consider the following cases: * I need to associate a Java-style type declaration with a method so that it can be recognized based on its type during Java method dispatch. How would you do that with instances? * I need to associate a "grammar rule" with a Python method so that the method is invoked when the parser recognizes a syntactic construct in the input data. * I need to associate an IDL declaration with a method so that a COM interface definition can be generated from the source file. * I need to associate an XPath "pattern string" with a Python method so that the method can be invoked when a tree walker discovers a particular pattern in an XML DOM. * I need to associate multiple forms of documentation with a method. They are optimized for different IDEs, environments or languages. > And these special assignment rules because of the real overlap with > real instances... Grrr, all this is pretty dark, conceptually. I don't understand what you are saying here. > Once the puzzle would be solved, we'll discover that there would be only > one additional little step towards inheritance for modules. How weird! > Sounds like we're going to metaclass again... I don't see what any of this has to do with Barry's extremely simple idea. Functions *are objects* in Python. It's too late to change that. Objects can have properties. Barry is just allowing arbitrary properties to be associated with functions. I don't see where there is anything mysterious here. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From akuchlin@mems-exchange.org Wed Apr 12 13:22:26 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 12 Apr 2000 08:22:26 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <200004120947.LAA02067@python.inrialpes.fr> References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <14580.27266.683908.216344@newcnri.cnri.reston.va.us> Vladimir Marangozov writes: >Barry, I wonder what for... In the two quoted examples, docstrings are used to store additional info about a function. SPARK uses them to contain grammar rules and the regular expressions for matching tokens. The object publisher in Zope uses the presence of a docstring to indicate whether a function or method is publicly accessible. As a third example, the optional type info being thrashed over in the Types-SIG would be another annotation for a function (though doing def f(): ... f.type = 'void' would be really clunky. >Once the puzzle would be solved, we'll discover that there would be only >one additional little step towards inheritance for modules. How weird! >Sounds like we're going to metaclass again... No, that isn't why Barry is experimenting with this -- instead, it's simply because annotating functions seems useful, but everyone uses the docstring because it's the only option. --amk From tismer@tismer.com Wed Apr 12 13:43:40 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 14:43:40 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004120318.FAA06750@python.inrialpes.fr> Message-ID: <38F46F7C.94D29561@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Vladimir Marangozov wrote: [yup, good looking patch] > where _PyTrash_deposit_object returns 0 on success, -1 on failure. This > gives another last chance to the system to finalize the object, hoping > that the stack won't overflow. :-) > > My point is that it is better to control whether _PyTrash_deposit_object > succeeds or not (and it may fail because of PyList_Append). > If this doesn't sound acceptable (because of the possible stack overflow) > it would still be better to abort in _PyTrash_deposit_object with an > exception "stack overflow on recursive finalization" when PyList_Append > fails. Leaving it unchecked is not nice -- especially in such extreme > situations. You bet that I *would* raise an exception if I could. Unfortunately the destructors have no way to report an error, and they are always called in a context where no error is expected (Py_DECREF macro). I believe this *was* quite ok, until __del__ was introduced. After that, it looks to me like a design flaw. IMHO there should not be a single function in a system that needs heap memory, and cannot report an error. > Currently, if something fails, the object is not finalized (leaking > memory). Ok, so be it. What's not nice is that this happens silently > which is not the kind of tolerance I would accept from the Python runtime. Yes but what can I do? This isn't worse than before. deletion errors die silently, this is the current concept. I don't agree with it, but I'm not the one to change policy. In that sense, trashcan was just compliant to a concept, without saying this is a good concept. :-) > As to the bug: it's curious that, as Mark reported, without the trashcan > logic, things seem to run fine. The trashcan seems to provoke (ok, detect ;) > some erroneous situation. I'd expect that if the trashcan macros are > implemented as above, the crash will go away (which wouldn't solve the > problem and would obviate the trashcan in the first place :-) I think trashcan can be made *way* smarter: Much much more better would be to avoid memory allocation in trashcan at all. I'm wondering if that would be possible. The idea is to catch a couple of objects in an earlier recursion level, and use them as containers for later objects-to-be-deleted. Not using memory at all, that's what I want. And it would avoid all messing with errors in this context. I hate Java dieing silently, since it has not enough memory to tell me that it has not enough memory :-) but-before-implementing-this-*I*-will-need-to-become-*way*-smarter - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fredrik@pythonware.com Wed Apr 12 13:50:21 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 14:50:21 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> Message-ID: <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Paul Prescod wrote: > * I need to associate a Java-style type declaration with a method so > that it can be recognized based on its type during Java method = dispatch. class foo: typemap =3D {} def myfunc(self): pass typemap[myfunc] =3D typeinfo > * I need to associate a "grammar rule" with a Python method so that = the > method is invoked when the parser recognizes a syntactic construct in > the input data. class foo: rules =3D [] def myfunc(self): pass rules.append(pattern, myfunc) > * I need to associate an IDL declaration with a method so that a COM > interface definition can be generated from the source file. class foo: idl =3D {} def myfunc(self): pass idl[myfunc] =3D "declaration" > * I need to associate an XPath "pattern string" with a Python method = so > that the method can be invoked when a tree walker discovers a = particular > pattern in an XML DOM. class foo: xpath =3D []=20 def myfunc(self): pass xpath.append("pattern", myfunc) From tismer@tismer.com Wed Apr 12 14:00:39 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 15:00:39 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004120334.FAA06784@python.inrialpes.fr> Message-ID: <38F47377.91306DA1@tismer.com> Mark, I know you are very busy. But I have no chance to build a debug version, and probably there are more differences. Can you perhaps try Vlad's patch? and tell me if the outcome changes? This would give me much more insight. The change affects the macros and the function _PyTrash_deposit_object which now must report an error via the return value. The macro code should be: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ And the _PyTrash_deposit_object code should be (untested): int _PyTrash_deposit_object(op) PyObject *op; { PyObject *error_type, *error_value, *error_traceback; if (PyThreadState_GET() != NULL) PyErr_Fetch(&error_type, &error_value, &error_traceback); if (!_PyTrash_delete_later) _PyTrash_delete_later = PyList_New(0); if (_PyTrash_delete_later) return PyList_Append(_PyTrash_delete_later, (PyObject *)op); else return -1; if (PyThreadState_GET() != NULL) PyErr_Restore(error_type, error_value, error_traceback); return 0; } The result of this would be really enlighting :-) ciao - chris Vladimir Marangozov wrote: > > Of course, this > > Vladimir Marangozov wrote: > > > > to: > > > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > > { \ > > ++_PyTrash_delete_nesting; \ > > if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ > > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > > > > was meant to be this: > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > { \ > ++_PyTrash_delete_nesting; \ > if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > > -- > Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr > http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer@tismer.com Wed Apr 12 15:43:30 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 16:43:30 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Message-ID: <38F48B92.94477DE9@tismer.com> Fredrik Lundh wrote: > > Paul Prescod wrote: > > * I need to associate a Java-style type declaration with a method so > > that it can be recognized based on its type during Java method dispatch. > > class foo: > typemap = {} > def myfunc(self): > pass > typemap[myfunc] = typeinfo Yes, I know that nearly everything is possible to be emulated via classes. But what is so bad about an arbitrary function attribute? ciao - chris p.s.: Paul, did you know that you can use *anything* for __doc__? You could use a class instance instead which still serves as a __doc__ but has your attributes and more. Yes I know this is ugly :-)) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake@acm.org Wed Apr 12 15:47:22 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 10:47:22 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F430FE.BAF40AB8@lemburg.com> References: <38F430FE.BAF40AB8@lemburg.com> Message-ID: <14580.35962.86559.128123@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Here's a simple strawman for the syntax: ... > The compiler would scan these pragma defs, add them to an > internal temporary dictionary and use them for all subsequent > code it finds during the compilation process. The dictionary > would have to stay around until the original compile() call has > completed (spanning recursive calls). Marc-Andre, The problem with this proposal is that the pragmas are embedded in the comments; I'd rather see a new keyword and statement. It could be defined something like: pragma_atom: NAME | NUMBER | STRING+ pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* The biggest problem with embedding it in comments is that it is no longer part of the syntax tree generated by the parser. The pragmas become global to the module on a de-facto basis. While this is probably reasonable for the sorts of pragmas we've thought of so far, this seems an unnecessary restriction; future tools may support scoped pragmas to help out with selection of optimization strategies, for instance, or other applications. If we were to go with a strictly global view of pragmas, we'd need to expose the dictionary created by the parser. The parser module would need to be able to expose the dictionary and accept a dictionary when receiving a parse tree for compilation. The internals just can't be *too* internal! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson@nevex.com Wed Apr 12 15:55:55 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Wed, 12 Apr 2000 10:55:55 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: Is there any way to unify Barry's proposal for enriching doc strings with Marc-Andre's proposal for pragmas? I.e., can pragmas be doc dictionary entries on modules that have particular keys? This would make them part of the parse tree (as per Fred Drake's comments), but not require (extra) syntax changes. Greg From bwarsaw@cnri.reston.va.us Wed Apr 12 16:37:06 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 11:37:06 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Message-ID: <14580.38946.206846.261405@anthem.cnri.reston.va.us> Functions and methods are first class objects, and they already have attributes, some of which are writable. Why should __doc__ be special? Just because it was the first such attribute to have syntactic support for easily defining? Think about my proposal this way: it actual removes a restriction. What I don't like about /F's approach is that if you were building a framework, you'd now have two conventions you'd have to describe: where to find the mapping, and what keys to use in that mapping. With attributes, you've already got the former: getattr(). Plus, let's say you're handed a method object `x', would you rather do: if x.im_class.typemap[x.im_func] == 'int': ... or if x.__type__ == 'int': ... And what about function objects (as opposed to unbound methods). Where do you stash the typemap? In the module, I supposed. And if you can be passed either type of object, do you now have to do this? if hasattr(x, 'im_class'): if hasattr(x.im_class, 'typemap'): if x.im_class.typemap[x.im_func] == 'int': ... elif hasattr(x, 'func_globals'): if x.func_globals.has_key('typemap'): if x.func_globals['typemap'][x] == 'int': ... instead of the polymorphic elegance of if x.__type__ == 'int': ... Finally, think of this proposal as an evolutionary step toward enabling all kinds of future frameworks. At some point, there may be some kind of optional static type system. There will likely be some syntactic support for easily specifying the contents of the __type__ attribute. With the addition of func/meth attrs now, we can start to play with prototypes of this system, define conventions and standards, and then later when there is compiler support, simplify the definitions, but not have to change code that uses them. -Barry From akuchlin@mems-exchange.org Wed Apr 12 16:39:54 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Wed, 12 Apr 2000 11:39:54 -0400 (EDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: References: <000401bfa260$33e6ff40$812d153f@tim> Message-ID: <14580.39114.631398.101252@amarok.cnri.reston.va.us> Ka-Ping Yee writes: >Here is what i have in mind: provide two hooks > __builtins__.display(object) >and > __builtins__.displaytb(traceback, exception) Shouldn't these be in sys, along with sys.ps1 and sys.ps2? We don't want to add new display() and displaytb() built-ins, do we? --amk From pf@artcom-gmbh.de Wed Apr 12 16:37:05 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 17:37:05 +0200 (MEST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Apr 12, 2000 10:47:22 am" Message-ID: Hi! Fred L. Drake, Jr.: > M.-A. Lemburg writes: > > Here's a simple strawman for the syntax: > ... > > The compiler would scan these pragma defs, add them to an > > internal temporary dictionary and use them for all subsequent > > code it finds during the compilation process. The dictionary > > would have to stay around until the original compile() call has > > completed (spanning recursive calls). > > Marc-Andre, > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* This would defeat an important goal: backward compatibility: You can't add 'pragma division: old' or something like this to a source file, which should be able to run with both Python 1.5.2 and Py3k. This would make this mechanism useless for several important applications of pragmas. Here comes David Scherers idea into play. The relevant emails of this thread are in the archive at: > The biggest problem with embedding it in comments is that it is no > longer part of the syntax tree generated by the parser. The pragmas > become global to the module on a de-facto basis. While this is > probably reasonable for the sorts of pragmas we've thought of so far, > this seems an unnecessary restriction; future tools may support scoped > pragmas to help out with selection of optimization strategies, for > instance, or other applications. [...] IMO this is overkill. For all real applications that have been discussed so far, global pragmas are sufficient: - source file character encoding - language level - generated division operator byte codes - generated comparision operators byte codes (comparing strings and numbers) I really like Davids idea to use 'global' at module level for the purpose of pragmas. And this idea has also the advantage that Guido already wrote the idea is "kind of cute and backwards compatible". Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From bwarsaw@cnri.reston.va.us Wed Apr 12 16:56:18 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Wed, 12 Apr 2000 11:56:18 -0400 (EDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes References: <14579.45990.603625.434317@anthem.cnri.reston.va.us> Message-ID: <14580.40098.690512.903519@anthem.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GS> In the instancemethod_setattro function, it might be nice to GS> do the speed optimization and test for sname[0] == 'i' before GS> hitting the strcmp() calls. Yeah, you could do that, but it complicates the code and the win seems negligable. GS> Oh: policy question: I would think that these attributes GS> *should* be available in restricted mode. They aren't "sneaky" GS> like the builtin attributes. Hmm, good point. That does simplify the code too. I wonder if the __dict__ itself should be restricted, but that doesn't seem like it would buy you much. We don't need to restrict them in classobject anyway, because they are already restricted in funcobject (which ends up getting the call anyway). It might be reasonable to relax that for arbitrary func attrs. GS> Rather than GS> PyMapping_Get/SetItemString()... PyObject_Get/SetItem() should GS> be used. They apply to mappings and will be faster. Note that GS> (internally) the PyMapping_Get/SetItemString use the latter GS> forms (after constructing a string object(!)). ... whoops. I GS> see that the function object doesn't use the ?etattro() GS> variants. hrm. Okay cool. Made these changes and `attro'd 'em too. GS> The stuff is looking really good! Thanks! -Barry From mal@lemburg.com Wed Apr 12 16:52:34 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 17:52:34 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F430FE.BAF40AB8@lemburg.com> <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: <38F49BC2.9C192C63@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > Here's a simple strawman for the syntax: > ... > > The compiler would scan these pragma defs, add them to an > > internal temporary dictionary and use them for all subsequent > > code it finds during the compilation process. The dictionary > > would have to stay around until the original compile() call has > > completed (spanning recursive calls). > > Marc-Andre, > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* > > The biggest problem with embedding it in comments is that it is no > longer part of the syntax tree generated by the parser. The pragmas > become global to the module on a de-facto basis. While this is > probably reasonable for the sorts of pragmas we've thought of so far, > this seems an unnecessary restriction; future tools may support scoped > pragmas to help out with selection of optimization strategies, for > instance, or other applications. Fine with me, but this probably isn't going to make it into 1.7 and I don't want to wait until Py3K... perhaps there is another way to implement this without adding a new keyword, e.g. we could first use some kind of hack to implement "# pragma ..." and then later on allow dropping the "#" to make full use of the new mechanism. > If we were to go with a strictly global view of pragmas, we'd need > to expose the dictionary created by the parser. The parser module > would need to be able to expose the dictionary and accept a dictionary > when receiving a parse tree for compilation. The internals just can't > be *too* internal! ;) True :-) BTW, while poking around in the tokenizer/compiler I found a serious bug in the way concatenated strings are implemented: right now the compiler expects to always find string objects, yet it could just as well receive Unicode objects or even mixed string and Unicode objects. Try it: u = (u"abc" u"abc") dumps core ! I'll fix this with the next patch set. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Wed Apr 12 17:12:59 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 18:12:59 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4A08B.A855E69D@lemburg.com> Peter Funk wrote: > > Fred L. Drake, Jr.: > > M.-A. Lemburg writes: > > > Here's a simple strawman for the syntax: > > ... > > > The compiler would scan these pragma defs, add them to an > > > internal temporary dictionary and use them for all subsequent > > > code it finds during the compilation process. The dictionary > > > would have to stay around until the original compile() call has > > > completed (spanning recursive calls). > > > > Marc-Andre, > > The problem with this proposal is that the pragmas are embedded in > > the comments; I'd rather see a new keyword and statement. It could be > > defined something like: > > > > pragma_atom: NAME | NUMBER | STRING+ > > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* > > This would defeat an important goal: backward compatibility: You > can't add 'pragma division: old' or something like this to a source > file, which should be able to run with both Python 1.5.2 and Py3k. > This would make this mechanism useless for several important > applications of pragmas. Hmm, I don't get it: these pragmas would set variabels which make Python behave in a different way -- how do you plan to achieve backward compatibility here ? I mean, u = u"abc" raises a SyntaxError in Python 1.5.2 too... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy@cnri.reston.va.us Wed Apr 12 17:37:20 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 12:37:20 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.38946.206846.261405@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <14580.42560.713427.885436@goon.cnri.reston.va.us> >>>>> "BAW" == Barry A Warsaw writes: BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. Why should BAW> __doc__ be special? Just because it was the first such BAW> attribute to have syntactic support for easily defining? I don't have a principled argument about why doc strings should be special, but I think that they should be. I think it's weird that you can change __doc__ at runtime; I would prefer that it be constant. BAW> Think about my proposal this way: it actually removes a BAW> restriction. I think this is really the crux of the matter! The proposal removes a useful restriction. The alternatives /F suggested seem clearer to me that sticking new attributes on functions and methods. Three things I like about the approach: It affords an opportunity to be very clear about how the attributes are intended to be used. I suspect it would be easier to describe with a static type system. It prevents confusion and errors that might result from unprincipled use of function attributes. Jeremy From gmcm@hypernet.com Wed Apr 12 17:56:24 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Wed, 12 Apr 2000 12:56:24 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.42560.713427.885436@goon.cnri.reston.va.us> References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <1256563909-46814536@hypernet.com> Jeremy Hylton wrote: > BAW> Think about my proposal this way: it actually removes a > BAW> restriction. > > I think this is really the crux of the matter! The proposal removes > a useful restriction. > > The alternatives /F suggested seem clearer to me that sticking new > attributes on functions and methods. Three things I like about the > approach: It affords an opportunity to be very clear about how the > attributes are intended to be used. I suspect it would be easier to > describe with a static type system. Having to be explicit about the method <-> regex / rule would severely damage SPARK's elegance. It would make Tim's doctest useless. > It prevents confusion and errors > that might result from unprincipled use of function attributes. While I'm sure I will be properly shocked and horrified when you come up with an example, in my naivety, I can't imagine what it will look like ;-). - Gordon From skip@mojam.com (Skip Montanaro) Wed Apr 12 18:28:04 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 12 Apr 2000 12:28:04 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.38946.206846.261405@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <14580.45604.756928.858721@beluga.mojam.com> BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. (Trying to read Fredrik's mind...) By extension, we should allow writable attributes to work for other objects. To pollute this discussion with an example from another one: i = 3.1416 i.__precision__ = 4 I haven't actually got anything against adding attributes to functions (or numbers, if it's appropriate). Just wondering out loud and playing a bit of a devil's advocate. Skip From ping@lfw.org Wed Apr 12 18:35:59 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 12:35:59 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* Wa-wa-wa-wa-wait... i thought the whole point of pragmas was that they were supposed to control the operation of the parser itself (you know, set the source character encoding and so on). So by definition they would have to happen at a different level, above the parsing. Or do we need to separate out two categories of pragmas -- pre-parse and post-parse pragmas? -- ?!ng From tismer@tismer.com Wed Apr 12 18:39:34 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 19:39:34 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <38F4B4D6.6F954CDF@tismer.com> Skip Montanaro wrote: > > BAW> Functions and methods are first class objects, and they already > BAW> have attributes, some of which are writable. > > (Trying to read Fredrik's mind...) takes too long since it isn't countable infinite... > By extension, we should allow writable attributes to work for other objects. > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > > I haven't actually got anything against adding attributes to functions (or > numbers, if it's appropriate). Just wondering out loud and playing a bit of > a devil's advocate. please let me join your hexensabbat (de|en)lighted-ly -y'rs - rapunzel -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake@acm.org Wed Apr 12 18:38:26 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 13:38:26 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: <14580.46226.990025.459426@seahag.cnri.reston.va.us> Ka-Ping Yee writes: > Wa-wa-wa-wa-wait... i thought the whole point of pragmas was > that they were supposed to control the operation of the parser > itself (you know, set the source character encoding and so on). > So by definition they would have to happen at a different level, > above the parsing. Hmm. That's one proposed use, which doesn't seem to fit well with my proposal. But I don't know that I'd think of that as a "pragma" in the general sense. I'll think about this one. I think encoding is a very special case, and I'm not sure I like dealing with it as a pragma. Are there any other (programming) languages that attempt to deal with multiple encodings? Perhaps I missed a message about it. > Or do we need to separate out two categories of pragmas -- > pre-parse and post-parse pragmas? Eeeks! We don't need too many special forms! That's ugly! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Moshe Zadka Wed Apr 12 18:36:14 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 12 Apr 2000 19:36:14 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > And voila! Numbers are no longer immutable. Using any numbers as keys in dicts? Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping@lfw.org Wed Apr 12 18:45:15 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 12:45:15 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.46226.990025.459426@seahag.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > Or do we need to separate out two categories of pragmas -- > > pre-parse and post-parse pragmas? > > Eeeks! We don't need too many special forms! That's ugly! Eek indeed. I'm tempted to suggest we drop the multiple-encoding issue (i can hear the screams now). But you're right, i've never heard of another language that can handle configurable encodings right in the source code. Is it really necessary to tackle that here? Gak, what do Japanese programmers do? Has anyone seen any of that kind of source code? -- ?!ng From Fredrik Lundh" Message-ID: <002401bfa4a6$778fc360$34aab5d4@hagrid> Moshe Zadka wrote: > > To pollute this discussion with an example from another one: > >=20 > > i =3D 3.1416 > > i.__precision__ =3D 4 >=20 > And voila! Numbers are no longer immutable. Using any > numbers as keys in dicts? so? you can use methods as keys today, you know... From skip@mojam.com (Skip Montanaro) Wed Apr 12 18:47:01 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 12 Apr 2000 12:47:01 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: References: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <14580.46741.757469.645439@beluga.mojam.com> Moshe> On Wed, 12 Apr 2000, Skip Montanaro wrote: >> To pollute this discussion with an example from another one: >> >> i = 3.1416 >> i.__precision__ = 4 >> Moshe> And voila! Numbers are no longer immutable. Using any numbers as Moshe> keys in dicts? Yes, and I use functions on occasion as dict keys as well. >>> def foo(): pass ... >>> d = {foo: 1} >>> print d[foo] 1 I suspect adding methods to functions won't invalidate their use in that context, nor would adding attributes to numbers. At any rate, it was just an example. Skip From Moshe Zadka Wed Apr 12 18:44:50 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 12 Apr 2000 19:44:50 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <002401bfa4a6$778fc360$34aab5d4@hagrid> Message-ID: On Wed, 12 Apr 2000, Fredrik Lundh wrote: > so? you can use methods as keys today, you know... Actually, I didn't know. What hapens if you use a method as a key, and then change it's doc string? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jeremy@cnri.reston.va.us Wed Apr 12 18:51:32 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 13:51:32 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> Message-ID: <14580.47012.646862.615623@goon.cnri.reston.va.us> >>>>> "GMcM" == Gordon McMillan writes: [please imagine that the c is raised] BAW> Think about my proposal this way: it actually removes a BAW> restriction. [Jeremy Hylton wrote:] >> I think this is really the crux of the matter! The proposal >> removes a useful restriction. >> >> The alternatives /F suggested seem clearer to me that sticking >> new attributes on functions and methods. Three things I like >> about the approach: It affords an opportunity to be very clear >> about how the attributes are intended to be used. I suspect it >> would be easier to describe with a static type system. GMcM> Having to be explicit about the method <-> regex / rule would GMcM> severely damage SPARK's elegance. It would make Tim's doctest GMcM> useless. Do either of these examples modify the __doc__ attribute? I am happy to think of both of them as elegant abuses of the doc string. (Not sure what semantics I mean for "elegant abuse" but not pejorative.) I'm not arguing that we should change the language to prevent them from using doc strings. Fred and I were just talking, and he observed that a variant of Python that included a syntactic mechanism to specify more than one attribute (effectively, a multiple doc string syntax) might be less objectionable than setting arbitrary attributes at runtime. Neither of us could imagine just what that syntax would be. >> It prevents confusion and errors that might result from >> unprincipled use of function attributes. GMcM> While I'm sure I will be properly shocked and horrified when GMcM> you come up with an example, in my naivety, I can't imagine GMcM> what it will look like ;-). It would look really, really bad ;-). I couldn't think of a good example, so I guess this is a FUD argument. A rough sketch, though, would be a program that assigned attribute X to all functions that were to be used in a certain way. If the assignment is a runtime operation, rather than a syntactic construct that defines a static attribute, it would be possible to accidentally assign attribute X to a function that was not intended to be used that way. This connection between a group of functions and a particular behavior would depend entirely on some runtime magic with settable attributes. Jeremy From mal@lemburg.com Wed Apr 12 18:55:19 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 19:55:19 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.42560.713427.885436@goon.cnri.reston.va.us> Message-ID: <38F4B887.2C16FF03@lemburg.com> Jeremy Hylton wrote: > BAW> Think about my proposal this way: it actually removes a > BAW> restriction. > > I think this is really the crux of the matter! The proposal removes > a useful restriction. Not sure... I wouldn't mind having the ability to add attributes to all Python objects at my own liking. Ok, maybe a bit far fetched, but the idea would certainly be useful in some cases, e.g. to add new methods to built-in types or to add encoding name information to strings... > The alternatives /F suggested seem clearer to me that sticking new > attributes on functions and methods. Three things I like about the > approach: It affords an opportunity to be very clear about how the > attributes are intended to be used. I suspect it would be easier to > describe with a static type system. It prevents confusion and errors > that might result from unprincipled use of function attributes. The nice side-effect of having these function/method instance dictionaries is that they follow class inheritance. Something which is hard to do right with Fredrik's approach. I suspect that in Py3K we'll only have one type of class system: everything inherits from one global base class -- seen in that light, method attributes are really nothing unusual, since all instances would have instance dictionaries anyway (well maybe only on demand, but that's another story). Anyway, more power to you Barry :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gmcm@hypernet.com Wed Apr 12 18:56:18 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Wed, 12 Apr 2000 13:56:18 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <38F4B4D6.6F954CDF@tismer.com> Message-ID: <1256560314-47031192@hypernet.com> Christian Tismer wrote: > > > Skip Montanaro wrote: > > (Trying to read Fredrik's mind...) > > takes too long since it isn't countable infinite... Bounded, however. And therefore, um, dense ... - Gordon From paul@prescod.net Wed Apr 12 18:57:01 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 12:57:01 -0500 Subject: [Python-Dev] #pragmas and method attributes References: <38F430FE.BAF40AB8@lemburg.com> Message-ID: <38F4B8ED.8BC64F69@prescod.net> About a month ago I wrote (but did not publish) a proposal that combined #pragmas and method attributes. The reason I combined them is that in a lot of cases method "attributes" are supposed to be available in the parse tree, before the program begins to run. Here is my rough draft. ---- We've been discussing method attributes for a long time and I think that it might be worth hashing out in more detail, especially for type declaration experimentation. I'm proposing a generalization of the "decl" keyword that hasbeen kicked around in the types-sig. Other applications include Spark grammar strings, XML pattern-trigger strings, multiple language doc-strings, IDE "hints", optimization hints, associated multimedia (down with glass ttys!), IDL definitions, thread locking declarations, method visibility declarations, ... Of course some subset of attributes might migrate into Python's "core language". Decl gives us a place to experiment and get them right before we do that migration. Declarations would always be associated with functions, classes or modules. They would be simple string-keyed values in a dictionary attached to the function, class or module called __decls__. The syntax would be decl { :"value", :"value" } Key would be a Python name. Value would be any Python string. In the case of a type declaration it might be: decl {type:"def(myint: int) returns bar", french_doc:"Bonjour", english_doc: "Hello"} def func( myint ): return bar() No string interpolation or other runtime-ish evaluation is done by the compiler on those strings. Neither the keys nor the values are evaluated as Python expressions. We could have a feature that would allow values to be dictionary-ish strings themselves: decl {type:"def(myint: int) returns bar", doc : "Bonjour", languages:{ french: "Hello"} } That would presumably be rare (if we allow it at all). Again, there would be no evaluation or interpolation. The left hand must be a name. The right must be a Code which depended on the declaration can do whatever it wants...if it has some idea of "execution context" and it wants to (e.g.) do interpolation with things that have percent signs, nobody would stop it. A decl that applies to a function or class immediately precedes the funtion or class. A decl that applies to a module precedes all other statements other than the docstring (which can be before or after). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From Moshe Zadka Wed Apr 12 18:55:54 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 12 Apr 2000 19:55:54 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256560314-47031192@hypernet.com> Message-ID: On Wed, 12 Apr 2000, Gordon McMillan wrote: > Bounded, however. And therefore, um, dense ... I sorta imagined it more like the Cantor set. Nowhere dense, but perfect sorry-but-he-started-with-the-maths-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw@cnri.reston.va.us Wed Apr 12 19:00:52 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:00:52 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <14580.47572.794837.109290@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Functions and methods are first class objects, and they BAW> already have attributes, some of which are writable. SM> (Trying to read Fredrik's mind...) SM> By extension, we should allow writable attributes to work for SM> other objects. To pollute this discussion with an example SM> from another one: | i = 3.1416 | i.__precision__ = 4 SM> I haven't actually got anything against adding attributes to SM> functions (or numbers, if it's appropriate). Just wondering SM> out loud and playing a bit of a devil's advocate. Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> i = 3.1416 >>> dir(i) [] Floats don't currently have attributes. -Barry From Moshe Zadka Wed Apr 12 19:01:13 2000 From: Moshe Zadka (Moshe Zadka) Date: Wed, 12 Apr 2000 20:01:13 +0200 (IST) Subject: [Python-Dev] #pragmas and method attributes In-Reply-To: <38F4B8ED.8BC64F69@prescod.net> Message-ID: On Wed, 12 Apr 2000, Paul Prescod wrote: > About a month ago I wrote (but did not publish) a proposal that combined > #pragmas and method attributes. The reason I combined them is that in a > lot of cases method "attributes" are supposed to be available in the > parse tree, before the program begins to run. Here is my rough draft. FWIW, I really really like this. def func(...): decl {zorb: 'visible', spark: 'some grammar rule'} pass Right on! But maybe even def func(...): decl zorb='visible' decl spark='some grammar rule' pass BTW: Why force the value to be a string? Any immutable basic type should do fine, no?? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jeremy@cnri.reston.va.us Wed Apr 12 19:08:29 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 14:08:29 -0400 (EDT) Subject: [Python-Dev] trashcan and PR#7 In-Reply-To: <38F46753.3759A7B6@tismer.com> References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> Message-ID: <14580.48029.512656.911718@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> Vladimir Marangozov wrote: >> While I'm at it, maybe the same recursion control logic could be >> used to remedy (most probably in PyObject_Compare) PR#7: >> "comparisons of recursive objects" reported by David Asher? CT> Hey, what a good idea. CT> You know what's happening? We are moving towards tail recursion. CT> If we do this everywhere, Python converges towards Stackless CT> Python. It doesn't seem like tail-recursion is the issue, rather we need to define some rules about when to end the recursion. If I understand what is being suggest, it is to create a worklist of subobjects to compare instead of making recursive calls to compare. This change would turn the core dump into an infinite loop; I guess that's an improvement, but not much of one. I have tried to come up with a solution in the same style as the repr solution. repr maintains a list of objects currently being repred. If it encounters a recursive request to repr the same object, it just prints "...". (There are better solutions, but this one is fairly simple.) I always get hung up on a cmp that works this way because at some point you discover a recursive cmp of two objects and you need to decide what to do. You can't just print "..." :-). So the real problem is defining some reasonable semantics for comparison of recursive objects. I checked what Scheme and Common Lisp, thinking that these languages must have dealt with the issue before. The answer, at least in Scheme, is infinite loop. R5RS notes: "'Equal?' may fail to terminate if its arguments are circular data structures. " http://www-swiss.ai.mit.edu/~jaffer/r5rs_8.html#SEC49 For eq? and eqv?, the answer is #f. The issue was also discussed in some detail by the ANSI commitee X3J13. A summary of the discussion is at here: http://www.xanalys.com/software_tools/reference/HyperSpec/Issues/iss143-writeup.html The result was to "Clarify that EQUAL and EQUALP do not descend any structures or data types other than the ones explicitly specified here:" [both descend for cons, bit-vectors, and strings; equalp has some special rules for hashtables and arrays] I believe this means that Common Lisp behaves the same way that Scheme does: comparison of circular data structures does not terminate. I don't think an infinite loop is any better than a core dump. At least with the core dump, you can inspect the core file and figure out what went wrong. In the infinite loop case, you'd wonder for a while why your program doesn't terminate, then kill it and inspect the core file anway :-). I think the comparison ought to return false or raise a ValueError. I'm not sure which is right. It seems odd to me that comparing two builtin lists could ever raise an exception, but it may be more Pythonic to raise an exception in the face of ambiguity. As the X3J13 committee noted: Object equality is not a concept for which there is a uniquely determined correct algorithm. The appropriateness of an equality predicate can be judged only in the context of the needs of some particular program. So, in the end, I propose ValueError. Jeremy From bwarsaw@cnri.reston.va.us Wed Apr 12 19:19:47 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:19:47 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <002401bfa4a6$778fc360$34aab5d4@hagrid> Message-ID: <14580.48707.373146.936232@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: >> so? you can use methods as keys today, you know... MZ> Actually, I didn't know. What hapens if you use a method as a MZ> key, and then change it's doc string? Nothing. Python 1.5.2 (#7, Apr 16 1999, 18:24:22) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): ... 'a doc string' ... >>> d = {} >>> d[foo] = foo >>> foo.__doc__ = 'norwegian blue' >>> d[foo].__doc__ 'norwegian blue' The hash of a function object is hash(func_code) ^ id(func_globals): Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): pass ... >>> hash(foo) 557536160 >>> hash(foo.func_code) 557215928 >>> id(foo.func_globals) 860952 >>> hash(foo.func_code) ^ id(foo.func_globals) 557536160 So in the words of Mr. Praline: The plumage don't enter into it. :) But you can still get quite evil: Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): pass ... >>> def bar(): print 1 ... >>> d = {} >>> d[foo] = foo >>> d[foo] >>> foo.func_code = bar.func_code >>> d[foo] Traceback (most recent call last): File "", line 1, in ? KeyError: Mwah, ha, ha! Gimme-lists-as-keys-and-who-really-/does/-need-tuples-after-all?-ly y'rs, -Barry From gvwilson@nevex.com Wed Apr 12 19:19:52 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Wed, 12 Apr 2000 14:19:52 -0400 (EDT) Subject: [Python-Dev] Processing XML with Perl (interesting article) (fwd) Message-ID: http://www.xml.com/pub/2000/04/05/feature/index.html is Michael Rodriguez' summary of XML processing modules for Perl. It opens with: "Perl is one of the most powerful (and even the most devout Python zealots will agree here) and widely used text processing languages." Greg From bwarsaw@cnri.reston.va.us Wed Apr 12 19:20:40 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:20:40 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> <14580.47012.646862.615623@goon.cnri.reston.va.us> Message-ID: <14580.48760.957536.805522@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> Fred and I were just talking, and he observed that a variant JH> of Python that included a syntactic mechanism to specify more JH> than one attribute (effectively, a multiple doc string syntax) JH> might be less objectionable than setting arbitrary attributes JH> at runtime. Neither of us could imagine just what that syntax JH> would be. So it's the writability of the attributes that bothers you? Maybe we need WORM-attrs? :) -Barry From skip@mojam.com (Skip Montanaro) Wed Apr 12 19:27:38 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 12 Apr 2000 13:27:38 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.47572.794837.109290@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> <14580.47572.794837.109290@anthem.cnri.reston.va.us> Message-ID: <14580.49178.341131.766028@beluga.mojam.com> BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. SM> I haven't actually got anything against adding attributes to SM> functions (or numbers, if it's appropriate). Just wondering out SM> loud and playing a bit of a devil's advocate. BAW> Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 BAW> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>>> i = 3.1416 >>>> dir(i) BAW> [] BAW> Floats don't currently have attributes. True enough, but why can't they? I see no reason that your writable function attributes proposal requires that functions already have attributes. Modifying my example, how about: >>> l = [1,2,3] >>> l.__type__ = "int" Like functions, lists do have (readonly) attributes. Why not allow them to have writable attributes as well? Awhile ago, Paul Prescod proposed something I think he called a super tuple, which allowed you to address tuple elements using attribute names: >>> t = ("x": 1, "y": 2, "z": 3) >>> print t.x 1 >>> print t[1] 2 (or something like that). I'm sure Paul or others will chime in if they think it's relevant. Your observation was that functions have a __doc__ attribute that is being abused in multiple, conflicting ways because it's the only function attribute people have to play with. I have absolutely no quibble with that. See: http://www.python.org/pipermail/doc-sig/1999-December/001671.html (Note that it apparently fell on completely deaf ears... ;-) I like your proposal. I was just wondering out loud if it should be more general. Skip From bwarsaw@cnri.reston.va.us Wed Apr 12 19:31:27 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:31:27 -0400 (EDT) Subject: [Python-Dev] #pragmas and method attributes References: <38F430FE.BAF40AB8@lemburg.com> <38F4B8ED.8BC64F69@prescod.net> Message-ID: <14580.49407.807617.750146@anthem.cnri.reston.va.us> >>>>> "PP" == Paul Prescod writes: PP> About a month ago I wrote (but did not publish) a proposal PP> that combined #pragmas and method attributes. The reason I PP> combined them is that in a lot of cases method "attributes" PP> are supposed to be available in the parse tree, before the PP> program begins to run. Here is my rough draft. Very cool. Combine them with Greg Wilson's approach and you've got my +1 on the idea. I still think it's fine that the func attr dictionary is writable. -Barry From mal@lemburg.com Wed Apr 12 19:31:16 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 20:31:16 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4C0F4.E0A8B01@lemburg.com> Ka-Ping Yee wrote: > > On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > > Or do we need to separate out two categories of pragmas -- > > > pre-parse and post-parse pragmas? > > > > Eeeks! We don't need too many special forms! That's ugly! > > Eek indeed. I'm tempted to suggest we drop the multiple-encoding > issue (i can hear the screams now). But you're right, i've never > heard of another language that can handle configurable encodings > right in the source code. Is it really necessary to tackle that here? Yes. > Gak, what do Japanese programmers do? Has anyone seen any of that > kind of source code? It's not intended for use by Asian programmers, it must be seen as a way to equally support all those different languages and scripts for which Python provides codecs. Note that Fred's argument is not far fetched: if you look closely at the way the compiler works it seems that adding a new keyword would indeed be the simplest solution. If done right, we could add some nifty lookup optimizations to the byte code compiler, e.g. a module might declare all globals as being constant or have all could allow the compiler to assume that all global lookups return constants allowing it to cache them or even rewrite the byte code at run-time... But the concepts are still not 100% right -- if we want to add scope to pragmas, we ought to follow the usual Python lookup scheme: locals, globals, built-ins. This would introduce the need to pass locals and globals to all APIs compiling Python source code. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf@artcom-gmbh.de Wed Apr 12 19:17:25 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 20:17:25 +0200 (MEST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F4A08B.A855E69D@lemburg.com> from "M.-A. Lemburg" at "Apr 12, 2000 6:12:59 pm" Message-ID: Hi! [me:] > > This would defeat an important goal: backward compatibility: You > > can't add 'pragma division: old' or something like this to a source > > file, which should be able to run with both Python 1.5.2 and Py3k. > > This would make this mechanism useless for several important > > applications of pragmas. M.-A. Lemburg: > Hmm, I don't get it: these pragmas would set variabels which > make Python behave in a different way -- how do you plan to > achieve backward compatibility here ? > > I mean, u = u"abc" raises a SyntaxError in Python 1.5.2 too... Okay. What I mean is for example changing the behaviour of the division operator: if 1/2 becomes 0.5 instead of 0 in some future version of Python, it is a must to be able to put in a pragma with the meaning "use the old style division in this module" into a source file without breaking the usability of this source file on older versions of Python. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From Mike.Da.Silva@uk.fid-intl.com Wed Apr 12 19:37:56 2000 From: Mike.Da.Silva@uk.fid-intl.com (Da Silva, Mike) Date: Wed, 12 Apr 2000 19:37:56 +0100 Subject: [Python-Dev] #pragmas in Python source code Message-ID: Java uses ResourceBundles, which are identified by basename + 2 character locale id (eg "en", "fr" etc). The content of the resource bundle is essentially a dictionary of name value pairs. MS Visual C++ uses pragma code_page(windows_code_page_id) in resource files to indicate what code page was used to generate the subsequent text. In both cases, an application would rely on a fixed (7 bit ASCII) subset to give the well-known key to find the localized text for the current locale. Any "hardcoded" string literals would be mangled when attempting to display them using an alternate locale. So essentially, one could take the view that correct support for localization is a runtime issue affecting the user of an application, not the developer. Hence, myfile.py may contain 8 bit string literals encoded in my current windows encoding (1252) but my user may be using Japanese Windows in code page 932. All I can guarantee is that the first 128 characters (notwithstanding BACKSLASH) will be rendered correctly - other characters will be interpreted as half width Katakana or worse. Any literal strings one embeds in code should be purely for the benefit of the code, not for the end user, who should be seeing properly localized text, pulled back from a localized text resource file _NOT_ python code, and automatically pumped through the appropriate native <--> unicode translations as required by the code. So to sum up, 1 Hardcoded strings are evil in source code unless they use the invariant ASCII (and by extension UTF8) character set. 2 A proper localized resource loading mechanism is required to fetch genuine localized text from a static resource file (ie not myfile.py). 3 All transformations of 8 bit strings to and from unicode should explicitly specify the 8 bit encoding for the source/target of the conversion, as appropriate. 4 Assume that a Japanese / Chinese programmer will find it easier to code using the invariant ASCII subset than a Western European / American will be able to read hanzi in source code. Regards, Mike da Silva -----Original Message----- From: Ka-Ping Yee [mailto:ping@lfw.org] Sent: Wednesday, April 12, 2000 6:45 PM To: Fred L. Drake, Jr. Cc: Python Developers @ python.org Subject: Re: [Python-Dev] #pragmas in Python source code On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > Or do we need to separate out two categories of pragmas -- > > pre-parse and post-parse pragmas? > > Eeeks! We don't need too many special forms! That's ugly! Eek indeed. I'm tempted to suggest we drop the multiple-encoding issue (i can hear the screams now). But you're right, i've never heard of another language that can handle configurable encodings right in the source code. Is it really necessary to tackle that here? Gak, what do Japanese programmers do? Has anyone seen any of that kind of source code? -- ?!ng _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://www.python.org/mailman/listinfo/python-dev From bwarsaw@cnri.reston.va.us Wed Apr 12 19:43:01 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:43:01 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> <14580.47572.794837.109290@anthem.cnri.reston.va.us> <14580.49178.341131.766028@beluga.mojam.com> Message-ID: <14580.50101.747669.794035@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Floats don't currently have attributes. SM> True enough, but why can't they? Skip, didn't you realize I was setting you up to ask that question? :) I don't necessarily think other objects shouldn't have such attributes, but I thought it might be easier to shove this one tiny little pill down peoples' throats first. Once they realize it tastes good, /they'll/ want more :) SM> Awhile ago, Paul Prescod proposed something I think he called SM> a super tuple, which allowed you to address tuple elements SM> using attribute names: >> t = ("x": 1, "y": 2, "z": 3) print t.x | 1 | >>> print t[1] | 2 SM> (or something like that). I'm sure Paul or others will chime SM> in if they think it's relevant. Might be. I thought that was a cool idea too at the time. SM> Your observation was that functions have a __doc__ attribute SM> that is being abused in multiple, conflicting ways because SM> it's the only function attribute people have to play with. I SM> have absolutely no quibble with that. See: SM> SM> http://www.python.org/pipermail/doc-sig/1999-December/001671.html SM> (Note that it apparently fell on completely deaf ears... ;-) I SM> like your proposal. I was just wondering out loud if it SM> should be more general. Perhaps so. -Barry From Fredrik Lundh" Message-ID: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Mike wrote: > Any literal strings one embeds in code should be purely for the = benefit of > the code, not for the end user, who should be seeing properly = localized > text, pulled back from a localized text resource file _NOT_ python = code, and > automatically pumped through the appropriate native <--> unicode > translations as required by the code. that's hardly a CP4E compatible solution, is it? Ping wrote: > > But you're right, i've never heard of another language that can = handle > > configurable encodings right in the source code. XML? From glyph@twistedmatrix.com Wed Apr 12 20:46:24 2000 From: glyph@twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 12 Apr 2000 14:46:24 -0500 (EST) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <002101bfa37b$5b2acde0$27a2143f@tim> Message-ID: Language pragmas are all fine and good, BUT ... Here in the Real World(TM) we have to deal with version in compatibilities out the wazoo. I am currently writing a java application that has to run on JDK 1.1, and 1.2, and microsoft's half-way JDK 1.1+1/2 thingy. Python comes installed on many major linux distributions, and the installed base is likely to be even larger than it is now by the time Python 1.6 is ready for the big time. I'd like to tell people who still have RedHat 6.2 installed in six months that they can just download a 40k script and not a 5M interpreter source tarball (which will be incompatible with their previous python installation, which they need for other stuff) when I deploy an end-user application. (Sorry, I can't think of another way to say that, I'm still recovering from java-isms...) :-) What I'm saying is that it would be great if you could write an app that would still function with existing versions of the interpreter, but would be missing certain features that were easier to implement with the new language symantics or required a new core library feature. Backward compatibility is as important to me as forward compatibility, and I'd prefer not to achieve it by writing exclusively to python 1.5.2 for the rest of my life. The way I would like to see this happen is NOT with language pragmas ('global' strikes me as particularly inappropriate, since that already means something else...) but with file-extensions. For example, if you have a python file which uses 1.6 features, I call it 'foo.1_6.py'. I also have a version that will work with 1.5, albeit slightly slower/less featureful: so I call it 'foo.py'. 'import foo' will work correctly. Or, if I only have 'foo.1_6.py' it will break, which I gather would be the desired behavior. As long as we're talking about versioning issues, could we perhaps introduce a slightly more robust introspective facility than assert(sys.version[:3])=='1.5' ? And finally, I appreciate that some physics students may find it confusing that 1/2 yeilds 0 instead of 0.5, but I think it would be easier to just teach them to do 1./2 rather than changing the symantics of integer constants completely ... I use python to do a lot of GUI work right now (and it's BEAUTIFUL for interfacing with Gtk/Tk/Qt, so I'm looking forward to doing more of it) and when I divide *1* by *2*, that's what I mean. I want integers, because I'm talking about pixels. It would be a real drag to go through all of my code and insert int(1/2) because there's no way to do integer math in python anymore... (Besides, what about 100000000000000000000L/200000000000000000000L, which I believe will shortly be lacking the Ls...?) Maybe language features that are like this could be handled by a pseudo-module? I.E. import syntax syntax.floating_point_division() or somesuch ... I'm not sure how you'd implement this so it would be automatic in certain contexts (merging it into your 'site' module maybe? that has obvious problems though), but considering that such features may be NOT the behavior desired by everyone, it seems strange to move the language in that direction unilaterally. ______ __ __ _____ _ _ | ____ | \_/ |_____] |_____| |_____| |_____ | | | | @ t w i s t e d m a t r i x . c o m http://www.twistedmatrix.com/~glyph/ From Fredrik Lundh" <38F46A02.3AB10147@prescod.net><001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <002f01bfa4b0$39df5440$34aab5d4@hagrid> Barry A. Warsaw wrote: > Finally, think of this proposal as an evolutionary step toward > enabling all kinds of future frameworks. /.../ With the addition > of func/meth attrs now, we can start to play with prototypes > of this system, define conventions and standards /.../ does this mean that this feature will be labelled as "experimental" (and hopefully even "interpreter specific"). if so, +0. From bwarsaw@cnri.reston.va.us Wed Apr 12 19:56:32 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:56:32 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <002101bfa37b$5b2acde0$27a2143f@tim> Message-ID: <14580.50912.543239.347566@anthem.cnri.reston.va.us> >>>>> "GL" == Glyph Lefkowitz writes: GL> As long as we're talking about versioning issues, could we GL> perhaps introduce a slightly more robust introspective GL> facility than GL> assert(sys.version[:3])=='1.5' sys.hexversion? Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys >>> sys.hexversion 17170593 >>> hex(sys.hexversion) '0x10600a1' From bwarsaw@cnri.reston.va.us Wed Apr 12 19:57:47 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:57:47 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <002f01bfa4b0$39df5440$34aab5d4@hagrid> Message-ID: <14580.50987.10065.518955@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> does this mean that this feature will be labelled as FL> "experimental" (and hopefully even "interpreter specific"). Do you mean "don't add it to JPython whenever I actually get around to making it compatible with CPython 1.6"? -Barry From tismer@tismer.com Wed Apr 12 20:03:33 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:03:33 +0200 Subject: [Python-Dev] trashcan and PR#7 References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <38F4C885.D75DABF2@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> Vladimir Marangozov wrote: > >> While I'm at it, maybe the same recursion control logic could be > >> used to remedy (most probably in PyObject_Compare) PR#7: > >> "comparisons of recursive objects" reported by David Asher? > > CT> Hey, what a good idea. > > CT> You know what's happening? We are moving towards tail recursion. > CT> If we do this everywhere, Python converges towards Stackless > CT> Python. > > It doesn't seem like tail-recursion is the issue, rather we need to > define some rules about when to end the recursion. If I understand > what is being suggest, it is to create a worklist of subobjects to > compare instead of making recursive calls to compare. This change > would turn the core dump into an infinite loop; I guess that's an > improvement, but not much of one. Well, I actually didn't read PR#7 before replying. Thought it was about comparing deeply nested structures. What about this? For one, we do an improved comparison, which is of course towards tail recursion, since we push part of the work after the "return". Second, we can guess the number of actually existing objects, and limit the number of comparisons by this. If we need more comparisons than we have objects, then we raise an exception. Might still take some time, but a bit less than infinite. ciao - chris (sub-cantor-set-minded) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer@tismer.com Wed Apr 12 20:06:00 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:06:00 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> <14580.47012.646862.615623@goon.cnri.reston.va.us> <14580.48760.957536.805522@anthem.cnri.reston.va.us> Message-ID: <38F4C918.A1344D68@tismer.com> bwarsaw@cnri.reston.va.us wrote: > > >>>>> "JH" == Jeremy Hylton writes: > > JH> Fred and I were just talking, and he observed that a variant > JH> of Python that included a syntactic mechanism to specify more > JH> than one attribute (effectively, a multiple doc string syntax) > JH> might be less objectionable than setting arbitrary attributes > JH> at runtime. Neither of us could imagine just what that syntax > JH> would be. > > So it's the writability of the attributes that bothers you? Maybe we > need WORM-attrs? :) Why don't you just use WORM programming style. Write it once (into the CVS) and get many complaints :-) chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal@lemburg.com Wed Apr 12 20:02:25 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 21:02:25 +0200 Subject: [Python-Dev] #pragmas and method attributes References: Message-ID: <38F4C841.7CE3FB32@lemburg.com> Moshe Zadka wrote: > > On Wed, 12 Apr 2000, Paul Prescod wrote: > > > About a month ago I wrote (but did not publish) a proposal that combined > > #pragmas and method attributes. The reason I combined them is that in a > > lot of cases method "attributes" are supposed to be available in the > > parse tree, before the program begins to run. Here is my rough draft. > > FWIW, I really really like this. > > def func(...): > decl {zorb: 'visible', spark: 'some grammar rule'} > pass > > Right on! > > But maybe even > > def func(...): > decl zorb='visible' > decl spark='some grammar rule' > pass Hmm, this is not so far away from simply letting function/method attribute use the compiled-in names of all locals as basis, e.g. def func(x): a = 3 print func.a func.a would look up 'a' in func.func_code.co_names and return the corresponding value found in func.func_code.co_consts. Note that subsequent other assignments to 'a' are not recognized by this technique, since co_consts and co_names are written sequentially. For the same reason, writing things like 'a = 2 + 3' will break this lookup technique. This would eliminate any need for added keywords and probably provide the best programming comfort and the attributes are immutable per se. We would still have to come up with a way to declare these attributes for builtin methods and modules... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy@cnri.reston.va.us Wed Apr 12 20:07:41 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 15:07:41 -0400 (EDT) Subject: [Python-Dev] trashcan and PR#7 In-Reply-To: <14580.48029.512656.911718@goon.cnri.reston.va.us> References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <14580.51581.31775.233843@goon.cnri.reston.va.us> Just after I sent the previous message, I realized that the "trashcan" approach is needed in addition to some application-specific logic for what to do when recursive traversals of objects occur. This is true for repr and for a compare that fixes PR#7. Current recipe for repr coredump: original = l = [] for i in range(1000000): new = [] l.append(new) l = new l.append(original) repr(l) Jeremy From glyph@twistedmatrix.com Wed Apr 12 21:06:17 2000 From: glyph@twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 12 Apr 2000 15:06:17 -0500 (EST) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Barry A. Warsaw wrote: > sys.hexversion? Thank you! I stand corrected (and embarrassed) but perhaps this could be a bit better documented? a search of Google comes up with only one hit for this on the entire web: http://www.python.org/1.5/NEWS-152b2.txt ... From gstein@lyra.org Wed Apr 12 20:20:55 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:20:55 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: > BAW> Functions and methods are first class objects, and they already > BAW> have attributes, some of which are writable. > > (Trying to read Fredrik's mind...) > > By extension, we should allow writable attributes to work for other objects. > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > > I haven't actually got anything against adding attributes to functions (or > numbers, if it's appropriate). Just wondering out loud and playing a bit of > a devil's advocate. Numbers have no attributes right now. Functions have mutable attributes (__doc__). Barry is allowing them to be annotated (without crushing the values into __doc__ in some nasty way). Paul gave some great examples. IMO, the Zope "visibility by use of __doc__" is the worst kind of hack :-) "Let me be a good person and doc all my functions. Oh, crap! Somebody hacked my system!" And the COM thing was great. Here is what we do today: class MyCOMServer: _public_methods_ = ['Hello'] def private_method(self, args): ... def Hello(self, args) ... The _public_methods_ thing is hacky. I'd rather see a "Hello.public = 1" in there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gvwilson@nevex.com Wed Apr 12 20:16:40 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Wed, 12 Apr 2000 15:16:40 -0400 (EDT) Subject: [Python-Dev] re: #pragmas and method attributes Message-ID: > > On Wed, 12 Apr 2000, Paul Prescod wrote: > > About a month ago I wrote (but did not publish) a proposal that combined > > #pragmas and method attributes. The reason I combined them is that in a > > lot of cases method "attributes" are supposed to be available in the > > parse tree, before the program begins to run. Here is my rough draft. > Moshe Zadka wrote: > BTW: Why force the value to be a string? Any immutable basic type > should do fine, no?? If attributes can be objects other than strings, then programmers can implement hierarchical nesting directly using: def func(...): decl { 'zorb' : 'visible', 'spark' : { 'rule' : 'some grammar rule', 'doc' : 'handle quoted expressions' } 'info' : { 'author' : ('Greg Wilson', 'Allun Smythee'), 'date' : '2000-04-12 14:08:20 EDT' } } pass instead of: def func(...): decl { 'zorb' : 'visible', 'spark-rule' : 'some grammar rule', 'spark-doc' : 'handle quoted expressions' 'info-author' : 'Greg Wilson, Allun Smythee', 'info-date' : '2000-04-12 14:08:20 EDT' } pass In my experience, every system for providing information has eventually wanted/needed to be hierarchical --- code blocks, HTML, the Windows registry, you name it. This can be faked up using some convention like semicolon-separated lists, but processing (and escaping insignificant uses of separator characters) quickly becomes painful. (Note that if Python supported multi-dicts, or if something *ML-ish was being used for decl's, the "author" tag in "info" could be listed twice, instead of requiring programmers to fall back on char-separated lists.) Just another random, Greg From bwarsaw@cnri.reston.va.us Wed Apr 12 20:21:16 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Wed, 12 Apr 2000 15:21:16 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: <14580.52396.923837.488505@anthem.cnri.reston.va.us> >>>>> "GL" == Glyph Lefkowitz writes: BAW> sys.hexversion? GL> Thank you! GL> I stand corrected (and embarrassed) but perhaps this could be GL> a bit better documented? a search of Google comes up with GL> only one hit for this on the entire web: GL> http://www.python.org/1.5/NEWS-152b2.txt ... Yup, it looks like it's missing from Fred's 1.6 doc tree too. Do python-devers think we also need to make the other patchlevel.h constants available through sys? If so, and because sys.hexversion is currently undocumented, I'd propose making sys.hexversion a tuple of (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) or leaving sys.hexversion as is and crafting a new sys variable which is the [1:] of the tuple above. Prolly need to expose PY_RELEASE_LEVEL_{ALPHA,BETA,GAMMA,FINAL} as constants too. -Barry From Fredrik Lundh" <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: <007001bfa4b4$6216e780$34aab5d4@hagrid> > sys.hexversion? >=20 > Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import sys > >>> sys.hexversion > 17170593 > >>> hex(sys.hexversion) > '0x10600a1' bitmasks!? (ouch. python is definitely not what it used to be. wonder if the right answer to this is "wouldn't a tuple be much more python-like?" or "I'm outta here...") From gstein@lyra.org Wed Apr 12 20:29:04 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:29:04 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.49178.341131.766028@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: >... > BAW> Floats don't currently have attributes. > > True enough, but why can't they? I see no reason that your writable > function attributes proposal requires that functions already have > attributes. Modifying my example, how about: > > >>> l = [1,2,3] > >>> l.__type__ = "int" > > Like functions, lists do have (readonly) attributes. Why not allow them to > have writable attributes as well? Lists, floats, etc are *data*. There is plenty of opportunity for creating data structures that contain whatever you want, organized in any fashion. Functions are (typically) not data. Applying these attributes is a way to define program semantics, not record data. There are two entirely separate worlds here. Adding attributes makes great sense, as a way to enhance the definition of your program's semantics and operation. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Wed Apr 12 20:33:18 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:33:18 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.47012.646862.615623@goon.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Jeremy Hylton wrote: >... > It would look really, really bad ;-). I couldn't think of a good > example, so I guess this is a FUD argument. A rough sketch, though, > would be a program that assigned attribute X to all functions that > were to be used in a certain way. If the assignment is a runtime > operation, rather than a syntactic construct that defines a static > attribute, it would be possible to accidentally assign attribute X to > a function that was not intended to be used that way. This connection > between a group of functions and a particular behavior would depend > entirely on some runtime magic with settable attributes. This is a FUD argument also. I could just as easily mis-label a function when using __doc__ strings, when using mappings in a class object, or using some runtime structures to record the attribute. Your "label" can be recorded in any number of ways. It can be made incorrect in all of them. There is nothing intrinsic to function attributes that makes them more prone to error. Being able to place them into function attributes means that you have a better *model* for how you record these values. Why place them into a separate mapping if your intent is to enhance the semantics of a function? If the semantics apply to a function, then bind it right there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw@cnri.reston.va.us Wed Apr 12 20:29:11 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 15:29:11 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <002101bfa37b$5b2acde0$27a2143f@tim> <14580.50912.543239.347566@anthem.cnri.reston.va.us> <007001bfa4b4$6216e780$34aab5d4@hagrid> Message-ID: <14580.52871.763195.168373@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> (ouch. python is definitely not what it used to be. wonder FL> if the right answer to this is "wouldn't a tuple be much more FL> python-like?" or "I'm outta here...") Yeah, pulling the micro version number out of sys.hexversion is ugly and undocumented, hence my subsequent message. The basically idea is pretty cool though, and I've adopted it to Mailman. It allows me to do this: previous_version = last_hex_version() this_version = mm_cfg.HEX_VERSION if previous_version < this_version: # I'm upgrading -Barry From tismer@tismer.com Wed Apr 12 20:37:27 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:37:27 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: Message-ID: <38F4D077.AEE37C@tismer.com> Greg Stein wrote: ... > Being able to place them into function attributes means that you have a > better *model* for how you record these values. Why place them into a > separate mapping if your intent is to enhance the semantics of a function? > If the semantics apply to a function, then bind it right there. BTW., is then there also a way for the function *itself* so look into its attributes? If it should be able to take special care about its attributes, it would be not nice if it had to know its own name for that? Some self-like surrogate? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Fredrik Lundh" <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> > If so, and because sys.hexversion is currently undocumented, I'd > propose making sys.hexversion a tuple of >=20 > (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, > PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) thanks. I feel better now ;-) but wouldn't something like (1, 6, 0, "a1") be easier to understand and use? From fdrake@acm.org Wed Apr 12 20:46:07 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 15:46:07 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> <14580.52396.923837.488505@anthem.cnri.reston.va.us> <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> Message-ID: <14580.53887.525513.603276@seahag.cnri.reston.va.us> Fredrik Lundh writes: > but wouldn't something like (1, 6, 0, "a1") be easier > to understand and use? Yes! (But you knew that....) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From ping@lfw.org Wed Apr 12 21:06:03 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 15:06:03 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: On Wed, 12 Apr 2000, Fredrik Lundh wrote: > Ping wrote: > > > But you're right, i've never heard of another language that can handle > > > configurable encodings right in the source code. > > XML? Don't get me started. XML is not a language. It's a serialization format for trees (isomorphic to s-expressions, but five times more verbose). It has no semantics. Anyone who tries to tell you otherwise is probably a marketing drone or has been brainwashed by the buzzword brigade. -- ?!ng From bwarsaw@cnri.reston.va.us Wed Apr 12 21:04:45 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Wed, 12 Apr 2000 16:04:45 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> <14580.52396.923837.488505@anthem.cnri.reston.va.us> <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> Message-ID: <14580.55005.924001.146052@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> but wouldn't something like (1, 6, 0, "a1") be easier FL> to understand and use? I wasn't planning on splitting PY_VERSION, just in exposing the other #define ints in patchlevel.h -Barry From fdrake@acm.org Wed Apr 12 21:08:35 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 16:08:35 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: <14580.55235.6235.662297@seahag.cnri.reston.va.us> Ka-Ping Yee writes: > Don't get me started. XML is not a language. It's a serialization And XML was exactly why I asked about *programming* languages. XML just doesn't qualify in any way I can think of as a language. Unless it's also called "Marketing-speak." ;) XML, as you point out, is a syntactic aspect of tree encoding. Harrumph. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fredrik Lundh" Message-ID: <00ad01bfa4bb$24129360$34aab5d4@hagrid> Ka-Ping Yee wrote: > > XML? >=20 > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). call it whatever you want -- my point was that their way of handling configurable encodings in the source code is good enough for python. (briefly, it's all unicode on the inside, and either ASCII/UTF-8 or something compatible enough to allow the parser to find the "en- coding" attribute without too much trouble... except for the de- fault encoding, the same approach should work for python) From Fredrik Lundh" <14580.55235.6235.662297@seahag.cnri.reston.va.us> Message-ID: <00bd01bfa4bc$8ad9ce00$34aab5d4@hagrid> Fred L. Drake, Jr. wrote: > > Don't get me started. XML is not a language. It's a serialization >=20 > And XML was exactly why I asked about *programming* languages. XML > just doesn't qualify in any way I can think of as a language. oh, come on. in what way is "Python source code" more expressive than XML, if you don't have anything that inter- prets it? does the Python parser create "better" trees than an XML parser? > XML, as you point out, is a syntactic aspect of tree encoding. just like a Python source file is a syntactic aspect of a Python (parse) tree encoding, right? ;-) ... but back to the real issue -- the point is that XML provides a mechanism for going from an external representation to an in- ternal (unicode) token stream, and that mechanism is good enough for python source code. why invent yet another python-specific wheel? From Fredrik Lundh" <14580.52396.923837.488505@anthem.cnri.reston.va.us><008a01bfa4b6$2baca0c0$34aab5d4@hagrid> <14580.55005.924001.146052@anthem.cnri.reston.va.us> Message-ID: <00c901bfa4bd$4ff82560$34aab5d4@hagrid> Barry wrote: > >>>>> "FL" =3D=3D Fredrik Lundh writes: >=20 > FL> but wouldn't something like (1, 6, 0, "a1") be easier > FL> to understand and use? >=20 > I wasn't planning on splitting PY_VERSION, just in exposing the other > #define ints in patchlevel.h neither was I. I just want Python to return those values in a form suitable for a Python programmer, not a C preprocessor. in other words: char release[2+1]; sprintf(release, "%c%c", PY_RELEASE_LEVEL - 0x0A + 'a', PY_RELEASE_SERIAL + '0'); sys.longversion =3D BuildTuple("iiis", PY_MAJOR_VERSION, PY_MINOR_VERSION, PY_MICRO_VERSION, release) (this assumes that the release serial will never exceed 9, but I think that's a reasonable restriction...) From skip@mojam.com (Skip Montanaro) Wed Apr 12 21:33:22 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Wed, 12 Apr 2000 15:33:22 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: References: <14580.49178.341131.766028@beluga.mojam.com> Message-ID: <14580.56722.404953.614718@beluga.mojam.com> me> >>> l = [1,2,3] me> >>> l.__type__ = "int" Greg> Lists, floats, etc are *data*. There is plenty of opportunity for Greg> creating data structures that contain whatever you want, organized Greg> in any fashion. Yeah, but there's no reason you wouldn't want to reason about them. They are, after all, first-class objects. If you consider these other attributes as meta-data, allowing data attributes to hang off lists, tuples, ints or regex objects makes perfect sense to me. I believe someone else during this thread suggested that one use of function attributes might be to record the function's return type. My example above is not really any different. Simpleminded, yes. Part of the value of l, no. Skip From ping@lfw.org Wed Apr 12 21:54:49 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 15:54:49 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <00bd01bfa4bc$8ad9ce00$34aab5d4@hagrid> Message-ID: Fred L. Drake, Jr. wrote: > And XML was exactly why I asked about *programming* languages. XML > just doesn't qualify in any way I can think of as a language. I'm harumphing right along with you, Fred. :) On Wed, 12 Apr 2000, Fredrik Lundh wrote: > oh, come on. in what way is "Python source code" more > expressive than XML, if you don't have anything that inter- > prets it? does the Python parser create "better" trees than > an XML parser? Python isn't just a parse tree. It has semantics. XML has no semantics. It's content-free content. :) > but back to the real issue -- the point is that XML provides a > mechanism for going from an external representation to an in- > ternal (unicode) token stream, and that mechanism is good > enough for python source code. You have a point. I'll go look at what they do. -- ?!ng From gvwilson@nevex.com Wed Apr 12 22:01:04 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Wed, 12 Apr 2000 17:01:04 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Message-ID: > Ka-Ping Yee wrote: > Python isn't just a parse tree. It has semantics. > XML has no semantics. It's content-free content. :) Python doesn't even have a parse tree (never mind semantics) unless you have a Python parser handy. XML gives my application a way to parse your information, even if I can't understand it, which is a big step over (for example) comments or strings embedded in Python/Perl/Java source files, colon (or is semi-colon?) separated lists in .ini and .rc files, etc. (I say this having wrestled with large Fortran programs in which a sizeable fraction of the semantics was hidden in comment-style pragmas. Having seen the demands this style of coding places on compilers, and compiler writers, I'm willing to walk barefoot through the tundra to get something more structured. Hanging one of Barry's doc dict's off a module ensures that key information is part of the parse tree, and that anyone who wants to extend the mechanism can do so in a structured way. I'd still rather have direct embedding of XML, but I think doc dicts are still a big step forward.) Greg p.s. This has come up as a major issue in the Software Carpentry competition. On the one hand, you want (the equivalent of) makefiles to be language neutral, so that (for example) you can write processors in Perl and Java as well as Python. On the other hand, you want to have functions, lists, and all the other goodies associated with a language. From DavidA@ActiveState.com Wed Apr 12 22:10:49 2000 From: DavidA@ActiveState.com (David Ascher) Date: Wed, 12 Apr 2000 14:10:49 -0700 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <14580.52871.763195.168373@anthem.cnri.reston.va.us> Message-ID: > The basically idea is pretty cool though, and I've adopted it to > Mailman. It allows me to do this: > > previous_version = last_hex_version() > this_version = mm_cfg.HEX_VERSION > > if previous_version < this_version: > # I'm upgrading Why can't you do that with tuples? --david From bwarsaw@cnri.reston.va.us Wed Apr 12 22:44:16 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 17:44:16 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <14580.52871.763195.168373@anthem.cnri.reston.va.us> Message-ID: <14580.60976.200757.562690@anthem.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: >> The basically idea is pretty cool though, and I've adopted it >> to Mailman. It allows me to do this: previous_version = >> last_hex_version() this_version = mm_cfg.HEX_VERSION if >> previous_version < this_version: # I'm upgrading DA> Why can't you do that with tuples? How do you know they aren't tuples? :) (no, Moshe, you do not need to answer :) -Barry From DavidA@ActiveState.com Wed Apr 12 23:51:36 2000 From: DavidA@ActiveState.com (David Ascher) Date: Wed, 12 Apr 2000 15:51:36 -0700 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> Message-ID: Gordon McMillan: > Jeremy Hylton wrote: > > It prevents confusion and errors > > that might result from unprincipled use of function attributes. > > While I'm sure I will be properly shocked and horrified when > you come up with an example, in my naivety, I can't imagine > what it will look like ;-). I'm w/ Gordon & Barry on this one. I've wanted method and function attributes in the past and had to rely on building completely new classes w/ __call__ methods just to 'fake it'. There's a performance cost to having to do that, but most importantly there's a big increase in code complexity, readability, maintanability, yaddability, etc. I'm surprised that Jeremy sees it as such a problem area -- if I wanted to play around with static typing, having a version of Python which let me store method metadata cleanly would make me jump with joy. FWIW, I'm perfectly willing to live in a world where 'unprincipled use of method and function attributes' means that my code can't get optimized, just like I don't expect my code which modifies __class__ to get optimized (as long as someone defines what those principles are!). --david From paul@prescod.net Wed Apr 12 20:33:14 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 14:33:14 -0500 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4CF7A.8F99562F@prescod.net> Ka-Ping Yee wrote: > >... > > Eek indeed. I'm tempted to suggest we drop the multiple-encoding > issue (i can hear the screams now). The XML rule is one encoding per file. One thing that I think that they did innovate in (I had nothing to do with that part) is that entities encoded in something other than UTF-8 or UTF-16 must start with the declaration: "". This has two benefits: By looking at the first four bytes of the file we can differentiate between several different encoding "families" (Shift-JIS-like, UTF-8-like, UTF-16-like, ...) and then we can tell the *precise* encoding by looking at the encoding attribute. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From mhammond@skippinet.com.au Thu Apr 13 01:15:08 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 10:15:08 +1000 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: > Do python-devers think we also need to make the other patchlevel.h > constants available through sys? Can't see why, but also can't see why not! > If so, and because sys.hexversion is currently undocumented, Since when has that ever stopped anyone :-) > I'd > propose making sys.hexversion a tuple of > > (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, > PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) > > or leaving sys.hexversion as is and crafting a new sys > variable which > is the [1:] of the tuple above. My code already uses sys.hexversion to differentiate between 1.5 and 1.6, so if we do anything I would vote for a new name. Personally however, I think the hexversion gives all the information you need - ie, you either want a printable version - sys.version - or a machine comparable version - sys.hexversion. Can't really think of a reason you would want the other attributes... Mark. From mhammond@skippinet.com.au Thu Apr 13 01:20:12 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 10:20:12 +1000 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <007001bfa4b4$6216e780$34aab5d4@hagrid> Message-ID: > > >>> hex(sys.hexversion) > > '0x10600a1' > > bitmasks!? Nah - a comparable number :-) if sys.hexversion >= 0x01060100: # Require Python 1.6 or later! Seems perfectly reasonable and understandable to me. And much cleaner than a tuple: if tuple_version[0] > 1 or tuple_version[0] == 1 and tuple_version[6] >= 1: etc Unless Im missing the point - but I can't see any case other than version comparisons in which hexversion is useful - so it seems perfect to me. > (ouch. python is definitely not what it used to be. wonder > if the right answer to this is "wouldn't a tuple be much more > python-like?" or "I'm outta here...") Be sure to let us know. Mark. From akuchlin@mems-exchange.org Thu Apr 13 01:46:21 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 12 Apr 2000 20:46:21 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: References: <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: <14581.6365.234022.976395@newcnri.cnri.reston.va.us> Mark Hammond quoted Barry Warsaw: >> I'd >> propose making sys.hexversion a tuple of >> (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, >> PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) If it's a tuple, the name "hexversion" makes absolutely no sense. Call it version_tuple or something like that. --amk From gstein@lyra.org Thu Apr 13 02:10:54 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 18:10:54 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <38F4D077.AEE37C@tismer.com> Message-ID: On Wed, 12 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: > ... > > Being able to place them into function attributes means that you have a > > better *model* for how you record these values. Why place them into a > > separate mapping if your intent is to enhance the semantics of a function? > > If the semantics apply to a function, then bind it right there. > > BTW., is then there also a way for the function *itself* > so look into its attributes? If it should be able to take > special care about its attributes, it would be not nice > if it had to know its own name for that? > Some self-like surrogate? Separate problem. Functions can't do that today with their own __doc__ attribute. Feel free to solve this issue, but it is distinct from the attributes-on-functions issue being discussed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond@skippinet.com.au Thu Apr 13 02:07:45 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 11:07:45 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <200004120334.FAA06784@python.inrialpes.fr> Message-ID: The trashcan bug turns out to be trivial to describe, but not so trivial to fix. Put simply, the trashcan mechanism conflicts horribly with PY_TRACE_REFS :-( The problem stems from the fact that the trashcan resurrects objects. An object added to the trashcan has its ref count as zero, but is then added to the trash list, transitioning its ref-count back to 1. Deleting the trashcan then does a second deallocation of the object, again taking the ref count back to zero, and this time actually doing the destruction. By pure fluke, this works without Py_DEBUG defined! With Py_DEBUG defined, this first causes problems due to ob_type being NULL. _Py_Dealloc() sets the ob_type element to NULL before it calls the object de-allocater. Thus, the trash object first hits a zero refcount, and its ob_type is zapped. It is then resurrected, but the ob_type value remains NULL. When the second deallocation for the object happens, this NULL type forces the crash. Changing the Py_DEBUG version of _Py_Dealloc() to not zap the type doesnt solve the problem. The whole _Py_ForgetReference() linked-list management also dies. Second time we attempt to deallocate the object the code that removes the object from the "alive objects" linked list fails - the object was already removed first time around. I see these possible solutions: * The trash mechanism is changed to keep a list of (address, deallocator) pairs. This is a "cleaner" solution, as the list is not considered holding PyObjects as such, just blocks of memory to be freed with a custom allocator. Thus, we never end up in a position where a Python objects are resurrected - we just defer the actual memory deallocation, rather than attempting a delayed object destruction. This may not be as trivial to implement as to describe :-) * Debug builds disable the trash mechanism. Not desired as the basic behaviour of the interpreter will change, making bug tracking with debug builds difficult! If we went this way, I would (try to :-) insist that the Windows debug builds dropped Py_DEBUG, as I really want to avoid the scenario that switching to a debug build changes the behaviour to this extent. * Perform further hacks, so that Py_ForgetReference() gracefully handles NULL linked-list elements etc. Any thoughts? Mark. From gstein@lyra.org Thu Apr 13 02:25:41 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 18:25:41 -0700 (PDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: Message-ID: On Thu, 13 Apr 2000, Mark Hammond wrote: >... > I see these possible solutions: > > * The trash mechanism is changed to keep a list of (address, > deallocator) pairs. This is a "cleaner" solution, as the list is > not considered holding PyObjects as such, just blocks of memory to > be freed with a custom allocator. Thus, we never end up in a > position where a Python objects are resurrected - we just defer the > actual memory deallocation, rather than attempting a delayed object > destruction. This may not be as trivial to implement as to describe > :-) > > * Debug builds disable the trash mechanism. Not desired as the > basic behaviour of the interpreter will change, making bug tracking > with debug builds difficult! If we went this way, I would (try to > :-) insist that the Windows debug builds dropped Py_DEBUG, as I > really want to avoid the scenario that switching to a debug build > changes the behaviour to this extent. > > * Perform further hacks, so that Py_ForgetReference() gracefully > handles NULL linked-list elements etc. > > Any thoughts? Option 4: lose the trashcan mechanism. I don't think the free-threading issue was ever resolved. Cheers, -g -- Greg Stein, http://www.lyra.org/ From esr@thyrsus.com Thu Apr 13 03:56:38 2000 From: esr@thyrsus.com (esr@thyrsus.com) Date: Wed, 12 Apr 2000 22:56:38 -0400 Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: <20000412225638.E9002@thyrsus.com> Ka-Ping Yee : > > XML? > > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). It has no semantics. Anyone who tries to tell you otherwise > is probably a marketing drone or has been brainwashed by the buzzword > brigade. Heh. What he said. Squared. Describing XML as a "language" around an old-time LISPer like me (or a new-time one like Ping) is a damn good way to get your eyebrows singed. -- Eric S. Raymond "...quemadmodum gladius neminem occidit, occidentis telum est." [...a sword never kills anybody; it's a tool in the killer's hand.] -- (Lucius Annaeus) Seneca "the Younger" (ca. 4 BC-65 AD), From tim_one@email.msn.com Thu Apr 13 04:54:15 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 12 Apr 2000 23:54:15 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> Message-ID: <000001bfa4fb$f07e0340$3e2d153f@tim> Lisp systems for 40+ years traditionally supported user-muckable "property lists" on all symbols, which were basically arbitrary dicts w/ a clumsy syntax. No disaster ensued; to the contrary, it was often handy. So +0 from me on the add-attrs-to-funcs idea. The same idea applies to all objects, of course, but attrs-on-funcs has some bang for the buck (adding a dict to e.g. each int object would be a real new burden with little payback). -1 on any notion of restricting attr values to be immutable. [Gordon] > Having to be explicit about the method <-> regex / rule would > severely damage SPARK's elegance. That's why I'm only +0 instead of +1: SPARK won't switch to use the new method anyway, because the beauty of abusing docstrings is that it's syntactically *obvious*. There already exist any number of other ways to associate arbitrary info with arbitrary objects, and it's no mystery why SPARK and Zope avoided all of them in favor of docstring abuse. > It would make Tim's doctest useless. This one not so: doctest is *not* meant to warp docstrings toward testing purposes; it's intended that docstrings remain wholly for human-friendly documentation. What doctest does is give you a way to guarantee that the elucidating examples good docstrings *should have anyway* work exactly as advertised (btw, doctest examples found dozens of places in my modules that needed to be fixed to recover from 1.6 alpha no longer sticking a trailing "L" on str(long) -- if you're not using doctest every day, you're an idiot ). If I could add an attr to funcs, though, *then* I'd think about changing doctest to also run examples in any e.g. func.doctest attrs it could find, and that *new* mechanism would be warped toward testing purposes. Indeed, I think that would be an excellent use for the new facility. namespaces-are-one-honking-great-etc-ly y'rs - tim From tim_one@email.msn.com Thu Apr 13 06:00:29 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 01:00:29 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <000701bfa505$31008380$4d2d153f@tim> [Jeremy Hylton]> > It doesn't seem like tail-recursion is the issue, rather we need to > define some rules about when to end the recursion. If I understand > what is being suggest, it is to create a worklist of subobjects to > compare instead of making recursive calls to compare. This change > would turn the core dump into an infinite loop; I guess that's an > improvement, but not much of one. > > ... > > So the real problem is defining some reasonable semantics for > comparison of recursive objects. I think this is exactly a graph isomorphism problem, since Python always compares "by value" (so isomorphism is the natural generalization). This isn't hard (!= tedious, alas) to define or to implement naively, but a straightforward implementation would be very expensive at runtime compared to the status quo. That's why "real languages" would rather suffer an infinite loop. It's expensive because there's no cheap way to know whether you have a loop in an object. An anal compromise would be to run comparisons full speed without trying to detect loops, but if the recursion got "too deep" break out and start over with an expensive alternative that does check for loops. The later requires machinery similar to copy.deepcopy's. > ... > I think the comparison ought to return false or raise a ValueError. After a = [] b = [] a.append(a) b.append(b) it certainly "ought to be" the case that a == b in Python. "false" makes no sense. ValueError makes no sense either unless we incur the expense of proving first that at least one object does contain a loop (as opposed to that it's just possibly nested thousands of levels deep) -- but then we may as well implement an isomorphism discriminator. > I'm not sure which is right. It seems odd to me that comparing two > builtin lists could ever raise an exception, but it may be more > Pythonic to raise an exception in the face of ambiguity. As the > X3J13 committee noted: Lisps have more notions of "equality" than Python 1.6 has flavors of strings . Python has only one notion of equality (conveniently ignoring that it actually has two ). The thing the Lisp people argue about is which of the three or four notions of equality to apply at varying levels when trying to compute one of their *other* notions of equality -- there *can't* be a universally "good" answer to that mess. Python's life is easier here. in-concept-if-not-in-implementation-ly y'rs - tim From Fredrik Lundh" Message-ID: <003101bfa511$06c89920$34aab5d4@hagrid> Mark Hammond wrote: > Nah - a comparable number :-) tuples can also be compared. > if sys.hexversion >=3D 0x01060100: # Require Python 1.6 or later! if sys.versiontuple >=3D (1, 6, 1): ... From Moshe Zadka Thu Apr 13 08:10:30 2000 From: Moshe Zadka (Moshe Zadka) Date: Thu, 13 Apr 2000 09:10:30 +0200 (IST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Message-ID: [Ping] > But you're right, i've never heard of another language that can handle > configurable encodings right in the source code. [The eff-bot] > XML? [Ping] > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). It has no semantics. Anyone who tries to tell you otherwise > is probably a marketing drone or has been brainwashed by the buzzword > brigade. Of coursem but "everything is a tree". If you put Python in XML by having the parse-tree serialized, then you can handle any encoding in the source file, by snarfing it from XML. not-in-favour-of-Python-in-XML-but-this-is-sure-to-encourage-Greg-Wilson-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer@tismer.com Thu Apr 13 11:50:05 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 13 Apr 2000 12:50:05 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F5A65D.5C2666B5@tismer.com> Greg Stein wrote: > > On Thu, 13 Apr 2000, Mark Hammond wrote: > >... > > I see these possible solutions: > > > > * The trash mechanism is changed to keep a list of (address, > > deallocator) pairs. This is a "cleaner" solution, as the list is > > not considered holding PyObjects as such, just blocks of memory to > > be freed with a custom allocator. Thus, we never end up in a > > position where a Python objects are resurrected - we just defer the > > actual memory deallocation, rather than attempting a delayed object > > destruction. This may not be as trivial to implement as to describe > > :-) This one sounds quite hard to implement. > > * Debug builds disable the trash mechanism. Not desired as the > > basic behaviour of the interpreter will change, making bug tracking > > with debug builds difficult! If we went this way, I would (try to > > :-) insist that the Windows debug builds dropped Py_DEBUG, as I > > really want to avoid the scenario that switching to a debug build > > changes the behaviour to this extent. I vote for this one at the moment. > > * Perform further hacks, so that Py_ForgetReference() gracefully > > handles NULL linked-list elements etc. > > > > Any thoughts? > > Option 4: lose the trashcan mechanism. I don't think the free-threading > issue was ever resolved. Option 5: Forget about free threading, change trashcan in a way that it doesn't change the order of destruction, doesn't need memory at all, and therefore does not change anything if it is disabled in debug mode. cheers - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From ping@lfw.org Thu Apr 13 12:22:56 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 04:22:56 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <14580.39114.631398.101252@amarok.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Andrew M. Kuchling wrote: > Ka-Ping Yee writes: > >Here is what i have in mind: provide two hooks > > __builtins__.display(object) > >and > > __builtins__.displaytb(traceback, exception) > > Shouldn't these be in sys, along with sys.ps1 and sys.ps2? We don't > want to add new display() and displaytb() built-ins, do we? Yes, you're right, they belong in sys. For a while i was under the delusion that you could customize more than one sub-interpreter by giving each one a different modified __builtins__, but that's an rexec thing and completely the wrong approach. Looks like the right approach to customizing sub-interpreters is to generalize the interface of code.InteractiveInterpreter and add more options to code.InteractiveConsole. sys.display and sys.displaytb would then be specifically for tweaking the main interactive interpreter only (just like sys.ps1 and sys.ps2). Still quite worth it, i believe, so i'll proceed. -- ?!ng "You should either succeed gloriously or fail miserably. Just getting by is the worst thing you can do." -- Larry Smith From Fredrik Lundh" now that we have the sq_contains slot, would it make sense to add support for "key in dict" ? after all, if key in dict: ... is a bit more elegant than: if dict.has_key(key): ... and much faster than: if key in dict.keys(): ... (the drawback is that once we add this, some people might ex- pect dictionaries to behave like sequences in others ways too...) (and yes, this might break code that looks for tp_as_sequence before looking for tp_as_mapping. haven't found any code like that, but I might have missed something). whaddyathink? From gstein@lyra.org Thu Apr 13 12:14:56 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 04:14:56 -0700 (PDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F5A65D.5C2666B5@tismer.com> Message-ID: On Thu, 13 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Option 4: lose the trashcan mechanism. I don't think the free-threading > > issue was ever resolved. > > Option 5: Forget about free threading, change trashcan in a way > that it doesn't change the order of destruction, doesn't need > memory at all, and therefore does not change anything if it is > disabled in debug mode. hehe... :-) Definitely possible. Seems like you could just statically allocate an array of PyObject* and drop the pointers in there (without an INCREF or anything). Place them there, in order. Dunno about the debug stuff, and how that would affect it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Thu Apr 13 12:19:32 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 04:19:32 -0700 (PDT) Subject: [Python-Dev] if key in dict? In-Reply-To: <014901bfa538$637c37e0$34aab5d4@hagrid> Message-ID: On Thu, 13 Apr 2000, Fredrik Lundh wrote: > now that we have the sq_contains slot, would it make > sense to add support for "key in dict" ? > > after all, > > if key in dict: > ... The counter has always been, "but couldn't that be read as 'if value in dict' ??" Or maybe 'if (key, value) in dict' ?? People have different impressions of what "in" should mean for a dict. And some people change their impression from one function to the next :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal@lemburg.com Thu Apr 13 10:22:27 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 11:22:27 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F591D3.32CD3B2A@lemburg.com> I think we should put the discussion back on track again... We were originally talking about proposals to integrate #pragmas into Python source. These pragmas are (for now) intended to provide information to the Python byte code compiler, so that it can make certain assumptions on a per file basis. So far, there have been numerous proposals for all kinds of declarations and decorations of files, functions, methods, etc. As usual in Python Space, things got generalized to a point where people forgot about the original intent ;-) The current need for #pragmas is really very simple: to tell the compiler which encoding to assume for the characters in u"...strings..." (*not* "...8-bit strings..."). The idea behind this is that programmers should be able to use other encodings here than the default "unicode-escape" one. Perhaps someone has a better idea on how to signify this to the compiler ? Could be that we don't need this pragma discussion at all if there is a different, more elegant solution to this... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping@lfw.org Thu Apr 13 12:50:02 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 04:50:02 -0700 (PDT) Subject: [Python-Dev] if key in dict? In-Reply-To: Message-ID: On Thu, 13 Apr 2000, Greg Stein wrote: > On Thu, 13 Apr 2000, Fredrik Lundh wrote: > > now that we have the sq_contains slot, would it make > > sense to add support for "key in dict" ? > > > > after all, > > > > if key in dict: > > ... > > The counter has always been, "but couldn't that be read as 'if value in > dict' ??" I've been quite happy with "if key in dict". I forget if i already made this analogy when it came up in regard to the issue of supporting a "set" type, but if you think of it like a real dictionary -- when someone asks you if a particular word is "in the dictionary", you look it up in the keys of the dictionary, not in the definitions. And it does read much better than has_key, and makes it easier to use dicts like sets. So i think it would be nice, though i've seen this meet opposition before. -- ?!ng "You should either succeed gloriously or fail miserably. Just getting by is the worst thing you can do." -- Larry Smith From Fredrik Lundh" <38F591D3.32CD3B2A@lemburg.com> Message-ID: <017b01bfa53e$748cc080$34aab5d4@hagrid> M.-A. Lemburg wrote: > The current need for #pragmas is really very simple: to tell > the compiler which encoding to assume for the characters > in u"...strings..." (*not* "...8-bit strings..."). why not? why keep on pretending that strings and strings are two different things? it's an artificial distinction, and it only causes problems all over the place. > Could be that we don't need this pragma discussion at all > if there is a different, more elegant solution to this... here's one way: 1. standardize on *unicode* as the internal character set. use an encoding marker to specify what *external* encoding you're using for the *entire* source file. output from the tokenizer is a stream of *unicode* strings. 2. if the user tries to store a unicode character larger than 255 in an 8-bit string, raise an OverflowError. 3. the default encoding is "none" (instead of XML's "utf-8"). in this case, treat the script as an ascii superset, and store each string literal as is (character-wise, not byte-wise). additional notes: -- item (3) is for backwards compatibility only. might be okay to change this in Py3K, but not before that. -- leave the implementation of (1) to 1.7. for now, assume that scripts have the default encoding, which means that (2) cannot happen. -- we still need an encoding marker for ascii supersets (how about ;-). however, it's up to the tokenizer to detect that one, not the parser. the parser only sees unicode strings. From tismer@tismer.com Thu Apr 13 12:56:18 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 13 Apr 2000 13:56:18 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F5B5E2.40B20B53@tismer.com> Greg Stein wrote: > > On Thu, 13 Apr 2000, Christian Tismer wrote: > > Greg Stein wrote: > >... > > > Option 4: lose the trashcan mechanism. I don't think the free-threading > > > issue was ever resolved. > > > > Option 5: Forget about free threading, change trashcan in a way > > that it doesn't change the order of destruction, doesn't need > > memory at all, and therefore does not change anything if it is > > disabled in debug mode. > > hehe... :-) > > Definitely possible. Seems like you could just statically allocate an > array of PyObject* and drop the pointers in there (without an INCREF or > anything). Place them there, in order. Dunno about the debug stuff, and > how that would affect it. I could even better use the given objects-to-be-destroyed as an explicit stack. Similar to what the debug delloc does, I may abuse the type pointer as a stack pointer. Since the refcount is zero, it can be abused to store a type code (we have only 5 types to distinguish here), and there is enough room for some state like a loop counter as well. Given that, I can build a destructor without recursion, but with an explicit stack and iteration. It would not interfere with anything, since it actually does the same thing, just in a different way, but in the same order, without mallocs. Should I try it? (say no and I'll do it anyway:) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From skip@mojam.com (Skip Montanaro) Thu Apr 13 14:34:53 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 13 Apr 2000 08:34:53 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F591D3.32CD3B2A@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> Message-ID: <14581.52477.70286.774494@beluga.mojam.com> Marc> We were originally talking about proposals to integrate #pragmas Marc> ... Minor nit... How about we lose the "#" during these discussions so we aren't all subliminally disposed to embed pragmas in comments or to add the C preprocessor to Python? ;-) -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From skip@mojam.com (Skip Montanaro) Thu Apr 13 14:39:47 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 13 Apr 2000 08:39:47 -0500 (CDT) Subject: [Python-Dev] if key in dict? In-Reply-To: References: Message-ID: <14581.52771.512393.600949@beluga.mojam.com> Ping> I've been quite happy with "if key in dict". I forget if i Ping> already made this analogy when it came up in regard to the issue Ping> of supporting a "set" type, but if you think of it like a real Ping> dictionary -- when someone asks you if a particular word is "in Ping> the dictionary", you look it up in the keys of the dictionary, not Ping> in the definitions. Also, for many situations, "if value in dict" will be extraordinarily inefficient. In "in" semantics are added to dicts, a corollary move will be to extend this functionality to other non-dict mappings (e.g., file-based mapping objects like gdbm). Implementing "in" for them would be excruciatingly slow if the LHS was "value". To not break the rule of least astonishment when people push large dicts to disk, the only feasible implementation is "if key in dict". -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From fdrake@acm.org Thu Apr 13 14:46:44 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 09:46:44 -0400 (EDT) Subject: [Python-Dev] if key in dict? In-Reply-To: <14581.52771.512393.600949@beluga.mojam.com> References: <14581.52771.512393.600949@beluga.mojam.com> Message-ID: <14581.53188.587479.280569@seahag.cnri.reston.va.us> Skip Montanaro writes: > Also, for many situations, "if value in dict" will be extraordinarily > inefficient. In "in" semantics are added to dicts, a corollary move will be > to extend this functionality to other non-dict mappings (e.g., file-based > mapping objects like gdbm). Implementing "in" for them would be > excruciatingly slow if the LHS was "value". To not break the rule of least > astonishment when people push large dicts to disk, the only feasible > implementation is "if key in dict". Skip, Performance issues aside, I can see very valid reasons for the x in "x in dict" to be either the key or (key, value) pair. For this reason, I've come to consider "x in dict" a mis-feature, though I once pushed for it as well. It may be easy to explain that x is just the key, but it's not clearly the only reasonably desirable semantic. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Thu Apr 13 15:26:01 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 10:26:01 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <017b01bfa53e$748cc080$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> Message-ID: <14581.55545.30446.471809@seahag.cnri.reston.va.us> Fredrik Lundh writes: > -- item (3) is for backwards compatibility only. might be okay to > change this in Py3K, but not before that. > > -- leave the implementation of (1) to 1.7. for now, assume that > scripts have the default encoding, which means that (2) cannot > happen. We shouldn't need to change it then; Unicode editing capabilities will be pervasive by then, right? Oh, heck, it might even be legacy support by then! ;) Seriously, I'd hesitate to change any interpretation of default encoding until Unicode support is pervasive and fully automatic in tools like Notepad, vi/vim, XEmacs, and BBedit/Alpha (or whatever people use on MacOS these days). If I can't use teco on it, we're being too pro-active! ;) > -- we still need an encoding marker for ascii supersets (how about > ;-). however, it's up to > the tokenizer to detect that one, not the parser. the parser only > sees unicode strings. Agreed here. But shouldn't that be: This is war, I tell you, war! ;) Now, just need to hack the exec(2) call on all the Unices so that is properly recognized and used to run the scripts properly, obviating the need for those nasty shbang lines! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov@inrialpes.fr Thu Apr 13 16:22:49 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Thu, 13 Apr 2000 17:22:49 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) Message-ID: <200004131522.RAA05137@python.inrialpes.fr> Obviously, the user-attr proposal is not so "simple" as it looks like. I wish we all realize what's really going on here. In all cited use cases for this proposal, functions are no more perceived as functions per se, but as data structures (objects) which are the target of the computation. IOW, functions are just considered as *instances* of a class (inheriting from the builtin "PyFunction" class) with user-attributes, having a state, and eventually a set of operations bound to them. I guess that everybody realized that with this proposal, one could bind not only doc strings, but also functions to the function. def func(): pass def serialize(): ... func.pack = serialize func.pack() What is this? This is manual instance customization. Since nobody said that this would be done 'exceptionally', but rather on a regular basis for all functions (and generally, for all objects) in a program, the way to customize instances after the fact, makes those instances singletons of user-defined classes. You may say "so what?". Well, this is fine if it were part of the object model from the start. And there's no reason why only functions and methods can have this functionality. Stick the __dict__ slot in the object header and let me bind user-attributes to all objects. I have a favorite number, 7, so I want to label it Vlad's number. seven = 7; seven.fanclub = ['Vlad']. I want to add a boolean func to all numbers, n.is_prime(). I want to have a s.zip() method for a set of strings in my particular application, not only the builtin ones. Why is it not allowed to have this today? Think about it! How would you solve your app needs today? Through classes and instances. That's the prescribed `legal' way to do customized objects; no shortcuts. Saying that mucking with functions' __doc__ strings is the only way to implement some functionality is simply not true. In short, there's no way I can accept this proposal in its current state and make the distingo between functions/methods and other kinds of objects (including 3rd party ones). If we're to go down this road, treat all objects as equal citizens in this regard. None or all. The object model must remain consistent. This proposal opens a breach in it. And not the lightest! And this is only part of the reasons why I'm still firmly -1 until P3K. Glad to see that Barry exposed some of the truth about it, after preserving our throats, i.e. he understood that we understood that he fully understood the power of namespaces, but eventually decided to propose a fraction of a significant change reserved for the next major Python release... wink >>> wink.fraction = 1e+-1 >>> wink.fraction.precision = 1e-+1 >>> wink.compute() 0.0 -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Fredrik Lundh" Message-ID: <007c01bfa55e$0faf0360$34aab5d4@hagrid> > Modified Files: > sysmodule.c=20 > Log Message: >=20 > Define version_info to be a tuple (major, minor, micro, level); level > is a string "a2", "b1", "c1", or '' for a final release. maybe level should be chosen so that version_info for a final release is larger than version_info for the corresponding beta ? From akuchlin@mems-exchange.org Thu Apr 13 16:39:43 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 13 Apr 2000 11:39:43 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <007c01bfa55e$0faf0360$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.59967.326442.73539@amarok.cnri.reston.va.us> Fredrik Lundh writes: >> Define version_info to be a tuple (major, minor, micro, level); level >> is a string "a2", "b1", "c1", or '' for a final release. >maybe level should be chosen so that version_info for a final >release is larger than version_info for the corresponding beta ? 'a2' < 'b1' < 'c1' < 'final' --amk From fdrake@acm.org Thu Apr 13 16:41:32 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 11:41:32 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <007c01bfa55e$0faf0360$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.60076.525602.848031@seahag.cnri.reston.va.us> Fredrik Lundh writes: > maybe level should be chosen so that version_info for a final > release is larger than version_info for the corresponding beta ? I thought about that, but didn't like it; should it perhaps be 'final'? If the purpose is to simply make it increase monotonically like sys.hexversion, why not just use sys.hexversion? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin@mems-exchange.org Thu Apr 13 16:44:19 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Thu, 13 Apr 2000 11:44:19 -0400 (EDT) Subject: [Python-Dev] >2GB Data.fs files on FreeBSD In-Reply-To: References: Message-ID: <14581.60243.557955.192783@amarok.cnri.reston.va.us> [Cc'ed to python-dev from the zope-dev mailing list; trim your follow-ups appropriately] R. David Murray writes: >So it looks like there is a problem using Zope with a large database >no matter what the platform. Has anyone figured out how to fix this? ... >But given the number of people who have said "use FreeBSD if you want >big files", I'm really wondering about this. What if later I >have an application where I really need a >2GB database? Different system calls are used for large files, because you can no longer use 32-bit ints to store file position. There's a HAVE_LARGEFILE_SUPPORT #define that turns on the use of these alternate system calls; see Python's configure.in for the test used to detect when it should be turned on. You could just hack the generated config.h to turn on large file support and recompile your copy of Python, but if the configure.in test is incorrect, that should be fixed. The test is: AC_MSG_CHECKING(whether to enable large file support) if test "$have_long_long" = yes -a \ "$ac_cv_sizeof_off_t" -gt "$ac_cv_sizeof_long" -a \ "$ac_cv_sizeof_long_long" -ge "$ac_cv_sizeof_off_t"; then AC_DEFINE(HAVE_LARGEFILE_SUPPORT) AC_MSG_RESULT(yes) else AC_MSG_RESULT(no) fi I thought you have to use the loff_t type instead of off_t; maybe this test should check for it instead? Anyone know anything about large file support? -- A.M. Kuchling http://starship.python.net/crew/amk/ When I dream, sometimes I remember how to fly. You just lift one leg, then you lift the other leg, and you're not standing on anything, and you can fly. -- Chloe Russell, in SANDMAN #43: "Brief Lives:3" From Fredrik Lundh" <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> Message-ID: <008c01bfa55f$3af9f880$34aab5d4@hagrid> > Fredrik Lundh writes: > > maybe level should be chosen so that version_info for a final > > release is larger than version_info for the corresponding beta ? >=20 > I thought about that, but didn't like it; should it perhaps be > 'final'? If the purpose is to simply make it increase monotonically > like sys.hexversion, why not just use sys.hexversion? readability? the sys.hexversion stuff isn't exactly obvious: >>> dir(sys) ... 'hexversion' ... >>> sys.hexversion 17170594 eh? is that version 1.71, or what? "final" is okay, I think. better than "f0", at least ;-) From fdrake@acm.org Thu Apr 13 16:56:38 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 11:56:38 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <008c01bfa55f$3af9f880$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> Message-ID: <14581.60982.631891.629922@seahag.cnri.reston.va.us> Fredrik Lundh writes: > readability? But hexversion retains the advantage that it's been there longer, and that's just too hard to change at this point. (Guido didn't leave the keys to his time machine...) > the sys.hexversion stuff isn't exactly obvious: I didn't say hexversion was pretty or that anyone liked it! Writing the docs, version_info is a *lot* easier to explain. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal@lemburg.com Thu Apr 13 16:55:08 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 17:55:08 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> Message-ID: <38F5EDDC.731E6740@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > The current need for #pragmas is really very simple: to tell > > the compiler which encoding to assume for the characters > > in u"...strings..." (*not* "...8-bit strings..."). > > why not? Because plain old 8-bit strings should work just as before, that is, existing scripts only using 8-bit strings should not break. > why keep on pretending that strings and strings are two > different things? it's an artificial distinction, and it only > causes problems all over the place. Sure. The point is that we can't just drop the old 8-bit strings... not until Py3K at least (and as Fred already said, all standard editors will have native Unicode support by then). So for now we're stuck with Unicode *and* 8-bit strings and have to make the two meet somehow -- which isn't all that easy, since 8-bit strings carry no encoding information. > > Could be that we don't need this pragma discussion at all > > if there is a different, more elegant solution to this... > > here's one way: > > 1. standardize on *unicode* as the internal character set. use > an encoding marker to specify what *external* encoding you're > using for the *entire* source file. output from the tokenizer is > a stream of *unicode* strings. Yep, that would work in Py3K... > 2. if the user tries to store a unicode character larger than 255 > in an 8-bit string, raise an OverflowError. There are no 8-bit strings in Py3K -- only 8-bit data buffers which don't have string methods ;-) > 3. the default encoding is "none" (instead of XML's "utf-8"). in > this case, treat the script as an ascii superset, and store each > string literal as is (character-wise, not byte-wise). Uhm. I think UTF-8 will be the standard for text file formats by then... so why not make it UTF-8 ? > additional notes: > > -- item (3) is for backwards compatibility only. might be okay to > change this in Py3K, but not before that. > > -- leave the implementation of (1) to 1.7. for now, assume that > scripts have the default encoding, which means that (2) cannot > happen. I'd say, leave all this to Py3K. > -- we still need an encoding marker for ascii supersets (how about > ;-). however, it's up to > the tokenizer to detect that one, not the parser. the parser only > sees unicode strings. Hmm, the tokenizer doesn't do any string -> object conversion. That's a task done by the parser. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Thu Apr 13 17:06:53 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 18:06:53 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> Message-ID: <38F5F09D.53E323EF@lemburg.com> Skip Montanaro wrote: > > Marc> We were originally talking about proposals to integrate #pragmas > Marc> ... > > Minor nit... How about we lose the "#" during these discussions so we > aren't all subliminally disposed to embed pragmas in comments or to add the > C preprocessor to Python? ;-) Hmm, anything else would introduce a new keyword, I guess. And new keywords cause new scripts to fail in old interpreters even when they don't use Unicode at all and only include per convention. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip@mojam.com (Skip Montanaro) Thu Apr 13 17:16:55 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 13 Apr 2000 11:16:55 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F5F09D.53E323EF@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> Message-ID: <14581.62199.122899.126940@beluga.mojam.com> Marc> Skip Montanaro wrote: >> Minor nit... How about we lose the "#" during these discussions so >> we aren't all subliminally disposed to embed pragmas in comments or >> to add the C preprocessor to Python? ;-) Marc> Hmm, anything else would introduce a new keyword, I guess. And new Marc> keywords cause new scripts to fail in old interpreters even when Marc> they don't use Unicode at all and only include is> per convention. My point was only that using "#pragma" (or even "pragma") sort of implies we have our eye on a solution, but I don't think we're far enough down the path of answering what we want to have any concrete ideas about how to implement it. I think this thread started (more-or-less) when Guido posted an idea that originally surfaced on the idle-dev list about using "global ..." to implement functionality like this. It's not clear to me at this point what the best course might be. Skip From fdrake@acm.org Thu Apr 13 17:31:50 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 12:31:50 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F5F09D.53E323EF@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> Message-ID: <14581.63094.538920.187344@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Hmm, anything else would introduce a new keyword, I guess. And > new keywords cause new scripts to fail in old interpreters > even when they don't use Unicode at all and only include > per convention. Only if the new keyword is used in the script or anything it imports. This is exactly like using new syntax (u'...') or new library features (unicode('abc', 'iso-8859-1')). I can't think of anything that gets included "by convention" that breaks anything. I don't recall a proposal that we should casually add pragmas to our scripts if there's no need to do so. Adding pragmas to library modules is *not* part of the issue; they'd only be there if the version of Python they're part of supports the syntax. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Thu Apr 13 17:47:52 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 12:47:52 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F4CF7A.8F99562F@prescod.net> References: <38F4CF7A.8F99562F@prescod.net> Message-ID: <14581.64056.727047.412805@seahag.cnri.reston.va.us> Paul Prescod writes: > The XML rule is one encoding per file. One thing that I think that they > did innovate in (I had nothing to do with that part) is that entities I think an important part of this is that the location of the encoding declaration is completely fixed; it can't start five lines down (after all, it might be hard to know what a line is!). If we say, "The first character of a Python source file must be '#', or assume native encoding.", we go a long way to figuring out what's a line (CR/LF/CRLF can be dealt with in a "universal" fashion), so we can deal with something a little farther down, but I'd hate to be so flexible that it became too tedious to implement. I'd be more accepting of encoding declarations embedded in comments than pragmas. (Not that I *like* abusing comments like that.) So perhaps a Python encoding declaration becomes: #?python encoding="iso-8859-7"?# ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fredrik Lundh" <38F4CF7A.8F99562F@prescod.net> <14581.64056.727047.412805@seahag.cnri.reston.va.us> Message-ID: <003901bfa568$890a5200$34aab5d4@hagrid> Fred wrote: > #?python encoding=3D"iso-8859-7"?# like in: #!/usr/bin/python #?python encoding=3D"utf-8" tabsize=3D5 if __name__ =3D=3D "__main__": print "hello!" I've seen worse... From Fredrik Lundh" <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> Message-ID: <003a01bfa568$b190c560$34aab5d4@hagrid> M.-A. Lemburg wrote: > Fredrik Lundh wrote: > >=20 > > M.-A. Lemburg wrote: > > > The current need for #pragmas is really very simple: to tell > > > the compiler which encoding to assume for the characters > > > in u"...strings..." (*not* "...8-bit strings..."). > >=20 > > why not? >=20 > Because plain old 8-bit strings should work just as before, > that is, existing scripts only using 8-bit strings should not break. but they won't -- if you don't use an encoding directive, and don't use 8-bit characters in your string literals, everything works as before. (that's why the default is "none" and not "utf-8") if you use 8-bit characters in your source code and wish to add an encoding directive, you need to add the right encoding directive... > > why keep on pretending that strings and strings are two > > different things? it's an artificial distinction, and it only > > causes problems all over the place. >=20 > Sure. The point is that we can't just drop the old 8-bit > strings... not until Py3K at least (and as Fred already > said, all standard editors will have native Unicode support > by then). I discussed that in my original "all characters are unicode characters" proposal. in my proposal, the standard string type will have to roles: a string either contains unicode characters, or binary bytes. -- if it contains unicode characters, python guarantees that methods like strip, lower (etc), and regular expressions work as expected. -- if it contains binary data, you can still use indexing, slicing, find, split, etc. but they then work on bytes, not on chars. it's still up to the programmer to keep track of what a certain string object is (a real string, a chunk of binary data, an en- coded string, a jpeg image, etc). if the programmer wants to convert between a unicode string and an external encoding to use a certain unicode encoding, she needs to spell it out. the codecs are never called "under the hood". (note that if you encode a unicode string into some other encoding, the result is binary buffer. operations like strip, lower et al does *not* work on encoded strings). > So for now we're stuck with Unicode *and* 8-bit strings > and have to make the two meet somehow -- which isn't all > that easy, since 8-bit strings carry no encoding information. in my proposal, both string types hold unicode strings. they don't need to carry any encoding information, because they're not encoded. > > > Could be that we don't need this pragma discussion at all > > > if there is a different, more elegant solution to this... > >=20 > > here's one way: > >=20 > > 1. standardize on *unicode* as the internal character set. use > > an encoding marker to specify what *external* encoding you're > > using for the *entire* source file. output from the tokenizer is > > a stream of *unicode* strings. >=20 > Yep, that would work in Py3K... or 1.7 -- see below. > > 2. if the user tries to store a unicode character larger than 255 > > in an 8-bit string, raise an OverflowError. >=20 > There are no 8-bit strings in Py3K -- only 8-bit data > buffers which don't have string methods ;-) oh, you've seen the Py3K specification? > > 3. the default encoding is "none" (instead of XML's "utf-8"). in > > this case, treat the script as an ascii superset, and store each > > string literal as is (character-wise, not byte-wise). >=20 > Uhm. I think UTF-8 will be the standard for text file formats > by then... so why not make it UTF-8 ? in time for 1.6? or you mean Py3K? sure! I said that in my first "additional note", didn't I: > > additional notes: > >=20 > > -- item (3) is for backwards compatibility only. might be okay to > > change this in Py3K, but not before that. > >=20 > > -- leave the implementation of (1) to 1.7. for now, assume that > > scripts have the default encoding, which means that (2) cannot > > happen. >=20 > I'd say, leave all this to Py3K. do you mean it's okay to settle for a broken design in 1.6, since we can fix it in Py3K? that's scary. fixing the design is not that hard, and can be done now. implementing all parts of it is harder, and require extensive changes to the compiler/interpreter architecture. but iirc, such changes are already planned for 1.7... > > -- we still need an encoding marker for ascii supersets (how about > > ;-). however, it's up = to > > the tokenizer to detect that one, not the parser. the parser only > > sees unicode strings. >=20 > Hmm, the tokenizer doesn't do any string -> object conversion. > That's a task done by the parser. "unicode string" meant Py_UNICODE*, not PyUnicodeObject. if the tokenizer does the actual conversion doesn't really matter; the point is that once the code has passed through the tokenizer, it's unicode. From bwarsaw@cnri.reston.va.us Thu Apr 13 17:59:03 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 12:59:03 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.64727.928889.239985@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> maybe level should be chosen so that version_info for a final FL> release is larger than version_info for the corresponding beta FL> ? Yes, absolutely. Please don't break the comparability of version_info or the connection with the patchversion.h macros. -Barry From fdrake@acm.org Thu Apr 13 18:05:17 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:05:17 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.64727.928889.239985@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> Message-ID: <14581.65101.110813.343483@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > Yes, absolutely. Please don't break the comparability of version_info > or the connection with the patchversion.h macros. So I'm the only person here today who prefers the release level of a final version to be '' instead of 'final'? Or did I miss all the messages of enthusiastic support for '' from my screaming fans? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw@cnri.reston.va.us Thu Apr 13 18:04:40 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:04:40 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> Message-ID: <14581.65064.13261.43476@anthem.cnri.reston.va.us> >>>>> "AMK" == Andrew M Kuchling writes: AMK> Fredrik Lundh writes: >> Define version_info to be a tuple (major, minor, micro, level); >> level is a string "a2", "b1", "c1", or '' for a final release. >> maybe level should be chosen so that version_info for a final >> release is larger than version_info for the corresponding beta >> ? AMK> 'a2' < 'b1' < 'c1' < 'final' Another reason I don't like the strings: 'b9' > 'b10' :( I can imagine a remote possibility of more than 9 pre-releases (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has to fit in 4 bits), so at the very least, make that string 'a02', 'a03', etc. -Barry From bwarsaw@cnri.reston.va.us Thu Apr 13 18:07:54 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:07:54 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> Message-ID: <14581.65258.431992.820885@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> readability? Yup. FL> "final" is okay, I think. better than "f0", at least ;-) And I think (but am not 100% positive) that once a final release comes out, Guido stops incrementing the PY_RELEASE_SERIAL's and instead starts incrementing PY_MICRO_VERSION. If that's not the case, then it complicates things a bit. -Barry From bwarsaw@cnri.reston.va.us Thu Apr 13 18:08:51 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:08:51 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> <14581.60982.631891.629922@seahag.cnri.reston.va.us> Message-ID: <14581.65315.489980.275044@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I didn't say hexversion was pretty or that anyone liked Fred> it! Writing the docs, version_info is a *lot* easier to Fred> explain. So is it easier to explain that the empty string means a final release or that 'final' means a final release? :) -Barry From fdrake@acm.org Thu Apr 13 18:11:19 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:11:19 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.65064.13261.43476@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> Message-ID: <14581.65463.994272.442725@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > I can imagine a remote possibility of more than 9 pre-releases > (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has > to fit in 4 bits), so at the very least, make that string 'a02', > 'a03', etc. Doesn't this further damage the human readability of the value? I thought that was an important reason to break it up from sys.hexversion. (Note also that you're not just saying more than 9 pre-releases, but more than 9 at any one of alpha, beta, or release candidate stages. 1-9 at each stage is already 27 pre-release packages.) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gmcm@hypernet.com Thu Apr 13 18:11:14 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 13:11:14 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131522.RAA05137@python.inrialpes.fr> Message-ID: <1256476619-52065132@hypernet.com> Vladimir Marangozov wrote: > > > Obviously, the user-attr proposal is not so "simple" as it looks like. This is not obvious to me. Both the concept and implementation appear fairly simple to me. > I wish we all realize what's really going on here. > > In all cited use cases for this proposal, functions are no more > perceived as functions per se, but as data structures (objects) > which are the target of the computation. IOW, functions are just > considered as *instances* of a class (inheriting from the builtin > "PyFunction" class) with user-attributes, having a state, and > eventually a set of operations bound to them. I don't see why they aren't still functions. Putting a rack on my bicycle doesn't make it a pickup truck. I think it's a real stretch to say they would become "instances of a class". There's no inheritance, and the "state" isn't visible inside the function (probably unfortunately ). Just like today, they are objects of type PyFunction, and they get called the same old way. You'll be able to hang extra stuff off them, just like today you can hang extra stuff off a module object without the module's knowledge or cooperation. > I guess that everybody realized that with this proposal, one could > bind not only doc strings, but also functions to the function. > > def func(): pass > def serialize(): ... > func.pack = serialize > func.pack() > > What is this? This is manual instance customization. What is "def"? What is f.__doc__ = ... ? > Since nobody said that this would be done 'exceptionally', but rather > on a regular basis for all functions (and generally, for all objects) > in a program, the way to customize instances after the fact, makes > those instances singletons of user-defined classes. Only according to a very loose definition of "instance" and "user-defined class". More accurately, they are objects as they always have been (oops, Barry screwed up the time- machine again; please adjust the tenses of the above). > You may say "so what?". Well, this is fine if it were part of the > object model from the start. And there's no reason why only functions > and methods can have this functionality. Stick the __dict__ slot in > the object header and let me bind user-attributes to all objects. Perceived need is part of this. > I have a favorite number, 7, so I want to label it Vlad's number. > seven = 7; seven.fanclub = ['Vlad']. I want to add a boolean func > to all numbers, n.is_prime(). I want to have a s.zip() method for a > set of strings in my particular application, not only the builtin ones. > > Why is it not allowed to have this today? Think about it! This is apparently a reducto ad absurdum argument. It's got the absurdum, but not much reducto. I prefer this one: Adding attributes to functions is immoral. Therefore func.__doc__ is immoral and should be removed. For another thing, we'll need a couple generations to argue about what to do with those 100 singleton int objects . > How would you solve your app needs today? Through classes and instances. > That's the prescribed `legal' way to do customized objects; no shortcuts. > Saying that mucking with functions' __doc__ strings is the only way to > implement some functionality is simply not true. No, it's a matter of convenience. Remember, Pythonistas are from Yorkshire ("You had Python??... You had assembler??.. You had front-panel toggle switches??.. You had wire- wrapping tools??.."). > In short, there's no way I can accept this proposal in its current > state and make the distingo between functions/methods and other kinds > of objects (including 3rd party ones). If we're to go down this road, > treat all objects as equal citizens in this regard. None or all. They are all first class objects already. Adding capabilities to one of them doesn't subtract them from any other. > The object model must remain consistent. This proposal opens a breach in it. > And not the lightest! In any sense in which you can apply the word "consistent" to Python's object model, I fail to see how this makes it less so. > And this is only part of the reasons why I'm still firmly -1 until P3K. > Glad to see that Barry exposed some of the truth about it, after preserving > our throats, i.e. he understood that we understood that he fully understood > the power of namespaces, but eventually decided to propose a fraction of > a significant change reserved for the next major Python release... wink > > >>> wink.fraction = 1e+-1 > >>> wink.fraction.precision = 1e-+1 > >>> wink.compute() > 0.0 I don't see anything here but an argument that allowing attributes on function objects makes them vaguely similar to instance objects. To the extent that I can agree with that, I fail to see any harm in it. - Gordon From fdrake@acm.org Thu Apr 13 18:16:15 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:16:15 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.65258.431992.820885@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> <14581.60982.631891.629922@seahag.cnri.reston.va.us> <14581.65315.489980.275044@anthem.cnri.reston.va.us> <14581.65258.431992.820885@anthem.cnri.reston.va.us> Message-ID: <14582.223.861189.614634@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > And I think (but am not 100% positive) that once a final release comes > out, Guido stops incrementing the PY_RELEASE_SERIAL's and instead > starts incrementing PY_MICRO_VERSION. If that's not the case, then > it complicates things a bit. patchlevel.h includes a comment that indicates serial should be 0 for final releases. > So is it easier to explain that the empty string means a final release > or that 'final' means a final release? :) I think it's the same; either is a special value. The only significant advantage of 'final' is the monotonicity provided by 'final'. I'm not convinced that it's otherwise any better. It also means to create a formatter version number from this that you need to special-case the last item in sys.version_info. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fredrik Lundh" <007c01bfa55e$0faf0360$34aab5d4@hagrid><14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> Message-ID: <007301bfa56b$d8ec1ee0$34aab5d4@hagrid> > I can imagine a remote possibility of more than 9 pre-releases > (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has > to fit in 4 bits) or rather, "I can imagine a remote possibility of more than 5 = pre-releases (counting from 1), but not more than 9 (since PY_RELEASE_SERIAL has to fit in a single decimal digit"? in the very unlikely case that I'm wrong, feel free to break the glass and install the following patch: #define PY_RELEASE_LEVEL_DESPAIR 0xD #define PY_RELEASE_LEVEL_EXTRAMUNDANE 0xE #define PY_RELEASE_LEVEL_FINAL 0xF /* Serial should be 0 here */ From bwarsaw@cnri.reston.va.us Thu Apr 13 18:17:30 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:17:30 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> Message-ID: <14582.298.938842.466851@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> So I'm the only person here today who prefers the release Fred> level of a final version to be '' instead of 'final'? Or Fred> did I miss all the messages of enthusiastic support for '' Fred> from my screaming fans? I've blocked those messages at your mta, so you would't be fooled into doing the wrong thing. I'll repost them to you, but only after you change it back to 'final' means final. Then you can be rightfully indignant at all of us losers who wanted it the other way, and caused you all that extra work! :) root-of-all-evil-ly y'rs, -Barry From fdrake@acm.org Thu Apr 13 18:20:24 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:20:24 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.298.938842.466851@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> <14582.298.938842.466851@anthem.cnri.reston.va.us> Message-ID: <14582.472.612445.191833@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > I've blocked those messages at your mta, so you would't be fooled into > doing the wrong thing. I'll repost them to you, but only after you I don't mind that, just don't stop the groupies! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip@mojam.com (Skip Montanaro) Thu Apr 13 18:24:35 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 13 Apr 2000 12:24:35 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256476619-52065132@hypernet.com> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> Message-ID: <14582.723.427231.355475@beluga.mojam.com> Gordon> I don't see why they aren't still functions. Putting a rack on Gordon> my bicycle doesn't make it a pickup truck. Though putting a gun in the rack might... ;-) Skip From bwarsaw@cnri.reston.va.us Thu Apr 13 18:25:13 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:25:13 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> Message-ID: <14582.761.365390.946880@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> Doesn't this further damage the human readability of the Fred> value? A little, but it's a fine compromise between the various constraints. Another way you could structure that tuple is to split the PY_RELEASE_LEVEL and the PY_RELEASE_SERIAL. Make the former even more readable if you want, and make the latter a real int. Thus Python 1.6a2 would have a sys.version_info() of (1, 6, 0, 'alpha', 2), e.g. the form is: (major, minor, micro, level, serial) You can't use 'gamma' though because then you break comparability. Maybe use 'candidate' instead? Sigh. Fred> I thought that was an important reason to break it Fred> up from sys.hexversion. (Note also that you're not just Fred> saying more than 9 pre-releases, but more than 9 at any one Fred> of alpha, beta, or release candidate stages. 1-9 at each Fred> stage is already 27 pre-release packages.) Well, Guido hisself must have thought that there was a remote possibility of more than 9 releases at a particular level, otherwise he'd have jammed PY_RELEASE_SERIAL in 3 bits. I mean, there's no other possible explanation for his choices is there?! :) -Barry From fdrake@acm.org Thu Apr 13 18:31:04 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:31:04 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.761.365390.946880@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> Message-ID: <14582.1112.750322.6958@seahag.cnri.reston.va.us> bwarsaw@cnri.reston.va.us writes: > A little, but it's a fine compromise between the various constraints. > Another way you could structure that tuple is to split the > PY_RELEASE_LEVEL and the PY_RELEASE_SERIAL. Make the former even more > readable if you want, and make the latter a real int. Thus Python > 1.6a2 would have a sys.version_info() of (1, 6, 0, 'alpha', 2), > e.g. the form is: > > (major, minor, micro, level, serial) I've thought of this as well, and certainly prefer it to the 'a01' solution. > You can't use 'gamma' though because then you break comparability. > Maybe use 'candidate' instead? Sigh. Yeah. > Well, Guido hisself must have thought that there was a remote > possibility of more than 9 releases at a particular level, otherwise > he'd have jammed PY_RELEASE_SERIAL in 3 bits. I mean, there's no > other possible explanation for his choices is there?! :) Clearly. I'll have to break his heart when I release 1.6a16 this afternoon. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Thu Apr 13 18:32:41 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:32:41 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.723.427231.355475@beluga.mojam.com> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> <14582.723.427231.355475@beluga.mojam.com> Message-ID: <14582.1209.471995.242974@seahag.cnri.reston.va.us> Skip Montanaro writes: > Though putting a gun in the rack might... ;-) And make sure that rack is big enough for the dogs, we don't want them to feel left out! (Gosh, I'm feeling like I'm back in south-west Virginia already! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip@mojam.com (Skip Montanaro) Thu Apr 13 18:32:37 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 13 Apr 2000 12:32:37 -0500 (CDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.761.365390.946880@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> Message-ID: <14582.1205.389790.293558@beluga.mojam.com> BAW> Thus Python 1.6a2 would have a sys.version_info() of (1, 6, 0, BAW> 'alpha', 2), e.g. the form is: BAW> (major, minor, micro, level, serial) BAW> You can't use 'gamma' though because then you break comparability. Yeah, you can. Don't use 'final'. Use 'omega'... ;-) Skip From bwarsaw@cnri.reston.va.us Thu Apr 13 18:35:05 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:35:05 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> <14582.298.938842.466851@anthem.cnri.reston.va.us> <14582.472.612445.191833@seahag.cnri.reston.va.us> Message-ID: <14582.1353.482124.111121@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I don't mind that, just don't stop the groupies! ;) Hey, take it from me, groupies are a dime a dozen. They ask you all kinds of boring questions like what kind of strings you use (or how fast your disk drives are). It's the "gropies" you want. 'Course, tappin' away at a keyboard that only makes one kind of annoying clicking sound and isn't midi-fied won't get you any gropies. Even if you're an amazing hunk of a bass god, it's tough (so you know I'm at a /severe/ disadvantage :) -Barry From bwarsaw@cnri.reston.va.us Thu Apr 13 18:38:29 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:38:29 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> <14582.1205.389790.293558@beluga.mojam.com> Message-ID: <14582.1557.128677.346938@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Thus Python 1.6a2 would have a sys.version_info() of (1, 6, BAW> 0, 'alpha', 2), e.g. the form is: BAW> (major, minor, micro, level, serial) BAW> You can't use 'gamma' though because then you break BAW> comparability. SM> Yeah, you can. Don't use 'final'. Use 'omega'... ;-) Or how 'bout: "zats the last one yer gonna git, ya peons, now leave me ALONE" ? -Barry From gmcm@hypernet.com Thu Apr 13 18:39:06 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 13:39:06 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.723.427231.355475@beluga.mojam.com> References: <1256476619-52065132@hypernet.com> Message-ID: <1256474945-52165985@hypernet.com> Skip wrote: > > Gordon> I don't see why they aren't still functions. Putting a rack on > Gordon> my bicycle doesn't make it a pickup truck. > > Though putting a gun in the rack might... ;-) Nah, I live in downeast Maine. I'd need a trailer hitch and snow- plow mount to qualify. - Gordon From skip@mojam.com (Skip Montanaro) Thu Apr 13 18:51:08 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 13 Apr 2000 12:51:08 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.1209.471995.242974@seahag.cnri.reston.va.us> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> <14582.723.427231.355475@beluga.mojam.com> <14582.1209.471995.242974@seahag.cnri.reston.va.us> Message-ID: <14582.2316.638334.342115@beluga.mojam.com> Fred> Skip Montanaro writes: >> Though putting a gun in the rack might... ;-) Fred> And make sure that rack is big enough for the dogs, we don't want Fred> them to feel left out! They fit in the panniers. (They're minature german shorthair pointers...) extending-this-silliness-ly y'rs... Skip From Vladimir.Marangozov@inrialpes.fr Thu Apr 13 19:10:33 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Thu, 13 Apr 2000 20:10:33 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM Message-ID: <200004131810.UAA05752@python.inrialpes.fr> Gordon McMillan wrote: > > I don't see anything here but an argument that allowing > attributes on function objects makes them vaguely similar to > instance objects. To the extent that I can agree with that, I fail > to see any harm in it. > To the extent it encourages confusion, I think it sucks. >>> def this(): ... sucks = "no" ... >>> this.sucks = "yes" >>> >>> print this.sucks 'yes' Why on earth 'sucks' is not the object defined in the function's namespace? Who made that deliberate decision? Clearly 'this' defines a new namespace, so it'll be also legitimate to get a NameError, or to: >>> print this.sucks 'no' Don't you think? And don't explain to me that this is because there's a code object, different from the function object, which is compiled at the function's definition, then assotiated with the function object, blah, blah, blah... -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jeremy@cnri.reston.va.us Thu Apr 13 20:08:12 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 15:08:12 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <000701bfa505$31008380$4d2d153f@tim> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> Message-ID: <14582.5791.148277.87450@walden> >>>>> "TP" == Tim Peters writes: TP> [Jeremy Hylton]> >> So the real problem is defining some reasonable semantics for >> comparison of recursive objects. TP> I think this is exactly a graph isomorphism problem, since TP> Python always compares "by value" (so isomorphism is the natural TP> generalization). I'm not familiar with any algorithms for the graph isomorphism problem, but I took a stab at a simple comparison algorithm. The idea is to detect comparisons that would cross back-edges in the object graphs. Instead of starting a new comparison, assume they are the same. If, in fact, the objects are not the same, they must differ in some other way; some other part of the comparison will fail. TP> This isn't hard (!= tedious, alas) to define or to implement TP> naively, but a straightforward implementation would be very TP> expensive at runtime compared to the status quo. That's why TP> "real languages" would rather suffer an infinite loop. TP> It's expensive because there's no cheap way to know whether you TP> have a loop in an object. My first attempt at implementing this is expensive. I maintain a dictionary that contains all the object pairs that are currently being compared. Specifically, the dictionary is used to implement a set of object id pairs. Every call to PyObject_Compare will add a new pair to the dictionary when it is called and remove it when it returns (except for a few trivial cases). A naive patch is included below. It does seem to involve a big performance hit -- more than 10% slower on pystone. It also uses a lot of extra space. Note that the patch has all its initialization code inline in PyObject_Compare; moving that elsewhere will help a little. It also use a bunch of function calls where macros would be more efficient. TP> An anal compromise would be to run comparisons full speed TP> without trying to detect loops, but if the recursion got "too TP> deep" break out and start over with an expensive alternative TP> that does check for loops. The later requires machinery similar TP> to copy.deepcopy's. It looks like the anal compromise might be necessary. I'll re-implement the patch more carefully and see what the real effect on performance is. Jeremy Index: object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 239c239 < "__repr__ returned non-string (type %.200s)", --- > "__repr__ returned non-string (type %s)", 276c276 < "__str__ returned non-string (type %.200s)", --- > "__str__ returned non-string (type %s)", 300a301,328 > static PyObject *cmp_state_key = NULL; > > static PyObject* > cmp_state_make_pair(v, w) > PyObject *v, *w; > { > PyObject *pair = PyTuple_New(2); > if (pair == NULL) > return NULL; > if ((long)v <= (long)w) { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); > } else { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); > } > return pair; > } > > void > cmp_state_clear_pair(dict, key) > PyObject *dict, *key; > { > PyDict_DelItem(dict, key); > Py_DECREF(key); > } > > 305a334,336 > PyObject *tstate_dict, *cmp_dict, *pair; > int result; > 311a343,376 > tstate_dict = PyThreadState_GetDict(); > if (tstate_dict == NULL) { > PyErr_BadInternalCall(); > return -1; > } > /* fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v, > v->ob_type->tp_name, (long)w, w->ob_type->tp_name); > */ > /* XXX should initialize elsewhere */ > if (cmp_state_key == NULL) { > cmp_state_key = PyString_InternFromString("compare_state"); > cmp_dict = PyDict_New(); > if (cmp_dict == NULL) > return -1; > PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); > } else { > cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key); > if (cmp_dict == NULL) > return NULL; > PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); > } > > pair = cmp_state_make_pair(v, w); > if (pair == NULL) { > PyErr_BadInternalCall(); > return -1; > } > if (PyDict_GetItem(cmp_dict, pair)) { > /* already comparing these objects. assume they're > equal until shown otherwise > */ > Py_DECREF(pair); > return 0; > } 316a382,384 > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } 317a386 > cmp_state_clear_pair(cmp_dict, pair); 329a399,401 > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } 344a417 > cmp_state_clear_pair(cmp_dict, pair); 350,364c423,425 < else if (PyUnicode_Check(v) || PyUnicode_Check(w)) { < int result = PyUnicode_Compare(v, w); < if (result == -1 && PyErr_Occurred() && < PyErr_ExceptionMatches(PyExc_TypeError)) < /* TypeErrors are ignored: if Unicode coercion < fails due to one of the arguments not < having the right type, we continue as < defined by the coercion protocol (see < above). Luckily, decoding errors are < reported as ValueErrors and are not masked < by this technique. */ < PyErr_Clear(); < else < return result; < } --- > cmp_state_clear_pair(cmp_dict, pair); > if (PyUnicode_Check(v) || PyUnicode_Check(w)) > return PyUnicode_Compare(v, w); 372c433,434 < if (vtp->tp_compare == NULL) --- > if (vtp->tp_compare == NULL) { > cmp_state_clear_pair(cmp_dict, pair); 374c436,439 < return (*vtp->tp_compare)(v, w); --- > } > result = (*vtp->tp_compare)(v, w); > cmp_state_clear_pair(cmp_dict, pair); > return result; From gstein@lyra.org Thu Apr 13 20:09:02 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 12:09:02 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.60,2.61 In-Reply-To: <200004131744.NAA30726@seahag.cnri.reston.va.us> Message-ID: It's great that you made this change! I hadn't got through my mail, but was going to recommend it... :-) One comment: On Thu, 13 Apr 2000, Fred Drake wrote: >... > --- 409,433 ---- > v = PyInt_FromLong(PY_VERSION_HEX)); > Py_XDECREF(v); > + /* > + * These release level checks are mutually exclusive and cover > + * the field, so don't get too fancy with the pre-processor! > + */ > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_ALPHA > + v = PyString_FromString("alpha"); > + #endif > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_BETA > + v = PyString_FromString("beta"); > + #endif > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_GAMMA > + v = PyString_FromString("candidate"); > + #endif > #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_FINAL > ! v = PyString_FromString("final"); > ! #endif > PyDict_SetItemString(sysdict, "version_info", > ! v = Py_BuildValue("iiiNi", PY_MAJOR_VERSION, > PY_MINOR_VERSION, > ! PY_MICRO_VERSION, v, > ! PY_RELEASE_SERIAL)); > Py_XDECREF(v); > PyDict_SetItemString(sysdict, "copyright", I would recommend using the "s" format code in Py_BuildValue. It simplifies the code, and it is quite a bit easier for a human to process. When I first saw the code, I thought "the level string leaks!" Then I saw the "N" code, went and looked it up, and realized what is going on. So... to avoid that, the "s" code would be great. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bitz@bitdance.com Thu Apr 13 20:12:34 2000 From: bitz@bitdance.com (R. David Murray) Date: Thu, 13 Apr 2000 15:12:34 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Andrew M. Kuchling wrote: > longer use 32-bit ints to store file position. There's a > HAVE_LARGEFILE_SUPPORT #define that turns on the use of these > alternate system calls; see Python's configure.in for the test used to I just looked in my python config.h on my FreeBSD system, and I see: #define HAVE_LARGEFILE_SUPPORT 1 So it looks like it is on, and it seems to me the problem could be in either Python or FileStorage.py in Zope. This is a Zope 2.1.2 system (but I diffed filestorage.py against the 2.1.6 version and didn't see any relevant changes) running on a FreeBSD 3.1 system. A make test in Python passed all tests, but I don't know if large file support is tested by the tests. --RDM PS: anyone from the python list replying to this please CC me as I am not on that list. From gmcm@hypernet.com Thu Apr 13 20:26:13 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 15:26:13 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131810.UAA05752@python.inrialpes.fr> References: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM Message-ID: <1256468519-52554453@hypernet.com> Vladimir Marangozov wrote: > Gordon McMillan wrote: > > > > I don't see anything here but an argument that allowing > > attributes on function objects makes them vaguely similar to > > instance objects. To the extent that I can agree with that, I fail > > to see any harm in it. > > > > To the extent it encourages confusion, I think it sucks. > > >>> def this(): > ... sucks = "no" > ... > >>> this.sucks = "yes" > >>> > >>> print this.sucks > 'yes' > > Why on earth 'sucks' is not the object defined in the function's namespace? Because that one is a local. Python allows the same name in different places. Used wisely, it's a handy feature of namespaces. > Who made that deliberate decision? What decision? To put a name "sucks" both in the function's locals and as a function attribute? To print something accessed with object.attribute notation in the obvious manner? Deciding not to cause gratuitous UnboundLocalErrors? This is nowhere near as confusing as, say, putting a module named X in a package named X and then saying "from X import *", (hi, Marc-Andre!). > Clearly 'this' defines a new namespace, > so it'll be also legitimate to get a NameError, or to: > > >>> print this.sucks > 'no' > > Don't you think? Only if you've done "this.sucks = 'no'". Or are you saying that if functions have attributes, people will all of a sudden expect that function locals will have initialized and maintained state? We certainly get plenty of newbie confusion about namespaces, assignment and scoping; maybe I've seen one or two where people thought function.local should be legal (do Python-tutors see this?). In those cases, is it the existence of function.__doc__ that causes the confusion? If yes, and this is a serious problem, then you should be arguing for the removal of __doc__. If not, why would allowing adding more attributes exacerbate the problem? > And don't explain to me that this is because there's a code object, > different from the function object, which is compiled at the function's > definition, then assotiated with the function object, blah, blah, blah... No problem. [Actually, the best argument against this I can see is that functional-types already try to use function objects where any sane person knows you should use an instance; and since this doesn't further their agenda, the bastard's will just scream louder ]. - Gordon From fdrake@acm.org Thu Apr 13 21:05:10 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 16:05:10 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.60,2.61 In-Reply-To: References: <200004131744.NAA30726@seahag.cnri.reston.va.us> Message-ID: <14582.10358.843823.467467@seahag.cnri.reston.va.us> Greg Stein writes: > I would recommend using the "s" format code in Py_BuildValue. It > simplifies the code, and it is quite a bit easier for a human to process. > When I first saw the code, I thought "the level string leaks!" Then I saw > the "N" code, went and looked it up, and realized what is going on. Good point; 'N' is relatively obscure in my experience as well. I've made the change (and there's probably less code in the binary as well!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bitz@bitdance.com Thu Apr 13 21:04:56 2000 From: bitz@bitdance.com (R. David Murray) Date: Thu, 13 Apr 2000 16:04:56 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: OK, some more info. The code in FileStorage.py looks like this: ------------------- def read_index(file, name, index, vindex, tindex, stop='\377'*8, ltid=z64, start=4, maxoid=z64): read=file.read seek=file.seek seek(0,2) file_size=file.tell() print file_size, start if file_size: if file_size < start: raise FileStorageFormatError, file.name [etc] ------------------- I stuck that print statement in there. The results of the print are: -2147248811L 4 So it looks to my uneducated eye like file.tell() is broken. The actual on-disk size of the file, by the way, is indeed 2147718485, so it looks like somebody's not using the right size data structure somewhere. So, can anyone tell me what to look for, or am I stuck for the moment? --RDM PS: anyone on pthon-dev replying please CC me as I am only on the zope list. From paul@prescod.net Thu Apr 13 21:55:43 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 13 Apr 2000 15:55:43 -0500 Subject: [Python-Dev] OT: XML References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> <20000412225638.E9002@thyrsus.com> Message-ID: <38F6344F.25D344B5@prescod.net> Well, as long as everyone else is going to be off-topic: What definition of "language" are you using? And while you're at it, what definition of "semantics" are you using? As I recall, a string is an ordered list of symbols and a language is an unordered set of strings. I know that Ka-Ping, despite going to a great university was in Engineering, not computer science, so I'll excuse him for not knowing the Chomskian definition of language, :), but what's your excuse Eric? Most XML people will happily admit that XML has no "semantics" but I think that's bullshit too. The mapping from the string to the abstract tree data model *is the semantic content* of the XML specification. Yes, it is a brain-dead simple mapping and so the semantic structure provided by the XML specification is minimal...but that's the whole point. It's supposed to be simple. It's supposed to not get in the way of higher level semantics. It makes as little sense to reject XML out of hand because it is a buzzword but is not innovative as it does for people to embrace it mystically because it is Microsoft's flavor of the week. XML takes simple ideas from the Lisp and document processing communities and popularize them so that they can achieve economies of scale. It sounds exactly like the relationship between Lisp and Python to me... By the way, what data model or text encoding is NOT isomorphic to Lisp S-expressions? Isn't Python code isomorphic to Lisp s-expessions? Paul Prescod From jeremy@cnri.reston.va.us Thu Apr 13 23:06:39 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 18:06:39 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> Message-ID: <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> I did one more round of work on this idea, and I'm satisfied with the results. Most of the performance hit can be eliminated by doing nothing until there are at least N recursive calls to PyObject_Compare, where N is fairly large. (I picked 25000.) Non-circular objects that are not deeply nested only pay for an integer increment, a decrement, and a compare. Background for patches-only readers: This patch appears to fix PR#7. Comments and suggestions solicitied. I think this is worth checking in. Jeremy Index: Include/object.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/object.h,v retrieving revision 2.52 diff -r2.52 object.h 286a287,289 > /* tstate dict key for PyObject_Compare helper */ > extern PyObject *_PyCompareState_Key; > Index: Python/pythonrun.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v retrieving revision 2.91 diff -r2.91 pythonrun.c 151a152,153 > _PyCompareState_Key = PyString_InternFromString("cmp_state"); > Index: Objects/object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 300a301,306 > PyObject *_PyCompareState_Key; > > int _PyCompareState_nesting = 0; > int _PyCompareState_flag = 0; > #define NESTING_LIMIT 25000 > 305a312,313 > int result; > 372c380 < if (vtp->tp_compare == NULL) --- > if (vtp->tp_compare == NULL) { 374c382,440 < return (*vtp->tp_compare)(v, w); --- > } > ++_PyCompareState_nesting; > if (_PyCompareState_nesting > NESTING_LIMIT) > _PyCompareState_flag = 1; > if (_PyCompareState_flag && > (vtp->tp_as_mapping || (vtp->tp_as_sequence && > !PyString_Check(v)))) > { > PyObject *tstate_dict, *cmp_dict, *pair; > > tstate_dict = PyThreadState_GetDict(); > if (tstate_dict == NULL) { > PyErr_BadInternalCall(); > return -1; > } > cmp_dict = PyDict_GetItem(tstate_dict, _PyCompareState_Key); > if (cmp_dict == NULL) { > cmp_dict = PyDict_New(); > if (cmp_dict == NULL) > return -1; > PyDict_SetItem(tstate_dict, > _PyCompareState_Key, > cmp_dict); > } > > pair = PyTuple_New(2); > if (pair == NULL) { > return -1; > } > if ((long)v <= (long)w) { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); > } else { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); > } > if (PyDict_GetItem(cmp_dict, pair)) { > /* already comparing these objects. assume > they're equal until shown otherwise > */ > Py_DECREF(pair); > --_PyCompareState_nesting; > if (_PyCompareState_nesting == 0) > _PyCompareState_flag = 0; > return 0; > } > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } > result = (*vtp->tp_compare)(v, w); > PyDict_DelItem(cmp_dict, pair); > Py_DECREF(pair); > } else { > result = (*vtp->tp_compare)(v, w); > } > --_PyCompareState_nesting; > if (_PyCompareState_nesting == 0) > _PyCompareState_flag = 0; > return result; From ping@lfw.org Thu Apr 13 23:41:44 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 17:41:44 -0500 (CDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.5791.148277.87450@walden> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > >>>>> "TP" == Tim Peters writes: > > TP> [Jeremy Hylton]> > >> So the real problem is defining some reasonable semantics for > >> comparison of recursive objects. There is a "right" way to do this, i believe, and my friend Mark Miller implemented it in E. He tells me his algorithm is inspired by the method for unification of cyclic structures in Prolog III. It's available in the E source code (in the file elib/tables/Equalizer.java). See interesting stuff on equality and cyclic data structures at http://www.erights.org/javadoc/org/erights/e/elib/tables/Equalizer.html http://www.erights.org/elang/same-ref.html http://www.erights.org/elang/blocks/defVar.html http://www.eros-os.org/~majordomo/e-lang/0698.html There is also a thread about equality issues in general at: http://www.eros-os.org/~majordomo/e-lang/0000.html It's long, but worth perusing. Here is my rough Python translation of the code in the E Equalizer. Python 1.4 (Mar 25 2000) [C] Copyright 1991-1997 Stichting Mathematisch Centrum, Amsterdam Python Console v1.4 by Ka-Ping Yee >>> def same(left, right, sofar={}): ... hypothesis = (id(left), id(right)) ... if left is right or sofar.has_key(hypothesis): return 1 ... if type(left) is not type(right): return 0 ... if type(left) is type({}): ... left, right = left.items(), right.items() ... if type(left) is type([]): ... sofar[hypothesis] = 1 ... try: ... for i in range(len(left)): ... if not same(left[i], right[i], sofar): return 0 ... return 1 ... finally: ... del sofar[hypothesis] ... return left == right ... ... >>> same([3],[4]) 0 >>> same([3],[3]) 1 >>> a = [1,2,3] >>> b = [1,2,3] >>> c = [1,2,3] >>> same(a,b) 1 >>> a[1] = a >>> same(a,a) 1 >>> same(a,b) 0 >>> b[1] = b >>> same(a,b) 1 >>> b[1] = c >>> b [1, [1, 2, 3], 3] >>> same(a,b) 0 >>> c[1] = b >>> same(a,b) 1 >>> same(b,c) 1 >>> I would like to see Python's comparisons work this way (i.e. "correct" as opposed to "we give up"). -- ?!ng From ping@lfw.org Thu Apr 13 23:49:21 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 17:49:21 -0500 (CDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: Message-ID: As a reference, here is the corresponding cyclic-structure-comparison example from a message about E: ? define tight := [1, tight, "x"] # value: [1, ***CYCLE***, x] ? define loose := [1, [1, loose, "x"], "x"] # value: [1, ***CYCLE***, x] ? tight == loose # value: true ? def map := [tight => "foo"] # value: [[1, ***CYCLE***, x] => foo] ? map[loose] # value: foo Internally, tight and loose have very different representations. However, when both cycles are unwound, they represent the same infinite tree. One could say that tight's representation of this tree is more tightly wound than loose's representation. However, this difference is only in the implementation, not in the semantics. The value of tight and loose is only the infinite tree they represent. If these trees are the same, then tight and loose are ==. Notice that loose prints out according to the tightest winding of the tree it represents, not according to the cycle by which it represents this tree. Only the tightest winding is finite and canonical. -- ?!ng From bwarsaw@cnri.reston.va.us Fri Apr 14 00:14:49 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 19:14:49 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> Message-ID: <14582.21737.387268.332139@anthem.cnri.reston.va.us> JH> Comments and suggestions solicitied. I think this is worth JH> checking in. Please regenerate with unified or context diffs! -Barry From jeremy@cnri.reston.va.us Fri Apr 14 00:19:30 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 19:19:30 -0400 (EDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: References: Message-ID: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Looks like the proposed changed to PyObject_Compare matches E for your example. The printed representation doesn't match, but I'm not sure that is as important. >>> tight = [1, None, "x"] >>> tight[1] = tight >>> tight [1, [...], 'x'] >>> loose = [1, [1, None, "x"], "x"] >>> loose[1][1] = loose >>> loose [1, [1, [...], 'x'], 'x'] >>> tight [1, [...], 'x'] >>> tight == loose 1 Jeremy From jeremy@cnri.reston.va.us Fri Apr 14 00:30:02 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 19:30:02 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.21737.387268.332139@anthem.cnri.reston.va.us> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> <14582.21737.387268.332139@anthem.cnri.reston.va.us> Message-ID: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> Here it is contextified. One small difference from the previous patch is that NESTING_LIMIT is now only 1000. I think this is sufficient to cover commonly occuring nested containers. Jeremy Index: Include/object.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/object.h,v retrieving revision 2.52 diff -c -r2.52 object.h *** object.h 2000/03/21 16:14:47 2.52 --- object.h 2000/04/13 21:50:10 *************** *** 284,289 **** --- 284,292 ---- extern DL_IMPORT(int) Py_ReprEnter Py_PROTO((PyObject *)); extern DL_IMPORT(void) Py_ReprLeave Py_PROTO((PyObject *)); + /* tstate dict key for PyObject_Compare helper */ + extern PyObject *_PyCompareState_Key; + /* Flag bits for printing: */ #define Py_PRINT_RAW 1 /* No string quotes etc. */ Index: Python/pythonrun.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v retrieving revision 2.91 diff -c -r2.91 pythonrun.c *** pythonrun.c 2000/03/10 23:03:54 2.91 --- pythonrun.c 2000/04/13 21:50:25 *************** *** 149,154 **** --- 149,156 ---- /* Init Unicode implementation; relies on the codec registry */ _PyUnicode_Init(); + _PyCompareState_Key = PyString_InternFromString("cmp_state"); + bimod = _PyBuiltin_Init_1(); if (bimod == NULL) Py_FatalError("Py_Initialize: can't initialize __builtin__"); Index: Objects/object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -c -r2.67 object.c *** object.c 2000/04/10 13:42:33 2.67 --- object.c 2000/04/13 21:44:42 *************** *** 298,308 **** --- 298,316 ---- return PyInt_FromLong(c); } + PyObject *_PyCompareState_Key; + + int _PyCompareState_nesting = 0; + int _PyCompareState_flag = 0; + #define NESTING_LIMIT 1000 + int PyObject_Compare(v, w) PyObject *v, *w; { PyTypeObject *vtp, *wtp; + int result; + if (v == NULL || w == NULL) { PyErr_BadInternalCall(); return -1; *************** *** 369,377 **** /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); } ! if (vtp->tp_compare == NULL) return (v < w) ? -1 : 1; ! return (*vtp->tp_compare)(v, w); } long --- 377,443 ---- /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); } ! if (vtp->tp_compare == NULL) { return (v < w) ? -1 : 1; ! } ! ++_PyCompareState_nesting; ! if (_PyCompareState_nesting > NESTING_LIMIT) ! _PyCompareState_flag = 1; ! if (_PyCompareState_flag && ! (vtp->tp_as_mapping || (vtp->tp_as_sequence && ! !PyString_Check(v)))) ! { ! PyObject *tstate_dict, *cmp_dict, *pair; ! ! tstate_dict = PyThreadState_GetDict(); ! if (tstate_dict == NULL) { ! PyErr_BadInternalCall(); ! return -1; ! } ! cmp_dict = PyDict_GetItem(tstate_dict, _PyCompareState_Key); ! if (cmp_dict == NULL) { ! cmp_dict = PyDict_New(); ! if (cmp_dict == NULL) ! return -1; ! PyDict_SetItem(tstate_dict, ! _PyCompareState_Key, ! cmp_dict); ! } ! ! pair = PyTuple_New(2); ! if (pair == NULL) { ! return -1; ! } ! if ((long)v <= (long)w) { ! PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); ! PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); ! } else { ! PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); ! PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); ! } ! if (PyDict_GetItem(cmp_dict, pair)) { ! /* already comparing these objects. assume ! they're equal until shown otherwise ! */ ! Py_DECREF(pair); ! --_PyCompareState_nesting; ! if (_PyCompareState_nesting == 0) ! _PyCompareState_flag = 0; ! return 0; ! } ! if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { ! return -1; ! } ! result = (*vtp->tp_compare)(v, w); ! PyDict_DelItem(cmp_dict, pair); ! Py_DECREF(pair); ! } else { ! result = (*vtp->tp_compare)(v, w); ! } ! --_PyCompareState_nesting; ! if (_PyCompareState_nesting == 0) ! _PyCompareState_flag = 0; ! return result; } long From ping@lfw.org Fri Apr 14 03:09:49 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 21:09:49 -0500 (CDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > Looks like the proposed changed to PyObject_Compare matches E for your > example. The printed representation doesn't match, but I'm not sure > that is as important. Very, very cool. Well done. Say, when did printing get fixed? > >>> tight = [1, None, "x"] > >>> tight[1] = tight > >>> tight > [1, [...], 'x'] -- ?!ng From jeremy@cnri.reston.va.us Fri Apr 14 03:14:11 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 22:14:11 -0400 (EDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: References: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: <14582.32499.38092.53395@bitdiddle.cnri.reston.va.us> >>>>> "KPY" == Ka-Ping Yee writes: KPY> On Thu, 13 Apr 2000, Jeremy Hylton wrote: >> Looks like the proposed changed to PyObject_Compare matches E for >> your example. The printed representation doesn't match, but I'm >> not sure that is as important. KPY> Very, very cool. Well done. Say, when did printing get fixed? Looks like the repr checkin was pre-1.5.1. I glanced at the sameness code in E, and it looks like it is doing exactly the same thing. It keeps a mapping of comparisons seen sofar and returns true for them. It seems that E's types don't define their own methods for sameness, though. The same methods seem to understand the internals of the various E types. Or is it just a few special ones. Jeremy From tim_one@email.msn.com Fri Apr 14 03:32:48 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 22:32:48 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256468519-52554453@hypernet.com> Message-ID: <000a01bfa5b9$b99a6760$182d153f@tim> [Gordon McMillan] > ... > Or are you saying that if functions have attributes, people will > all of a sudden expect that function locals will have initialized > and maintained state? I expect that they'll expect exactly what happens in JavaScript, which supports function attributes too, and where it's often used as a nicer-than-globals way to get the effect of C-like mutable statics (conceptually) local to the function. BTW, viewing this all in OO terms would make compelling sense only if Guido viewed everything in OO terms -- but he doesn't. To the extent that people must , Python doesn't stop you from adding arbitrary unique attrs to class instances today either. consistent-in-inconsistency-ly y'rs - tim From tim_one@email.msn.com Fri Apr 14 03:32:44 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 22:32:44 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.5791.148277.87450@walden> Message-ID: <000901bfa5b9$b7f6c980$182d153f@tim> [Jeremy Hylton] > I'm not familiar with any algorithms for the graph isomorphism > problem, Well, while an instance of graph isomorphism, this one is a relatively simple special case (because "the graphs" here are rooted, directed, and have ordered children). > but I took a stab at a simple comparison algorithm. The idea > is to detect comparisons that would cross back-edges in the object > graphs. Instead of starting a new comparison, assume they are the > same. If, in fact, the objects are not the same, they must differ in > some other way; some other part of the comparison will fail. Bingo! That's the key trick. From Fredrik Lundh" Message-ID: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Tim Peters wrote: > [Gordon McMillan] > > ... > > Or are you saying that if functions have attributes, people will > > all of a sudden expect that function locals will have initialized > > and maintained state? >=20 > I expect that they'll expect exactly what happens in JavaScript, which > supports function attributes too, and where it's often used as a > nicer-than-globals way to get the effect of C-like mutable statics > (conceptually) local to the function. so it's no longer an experimental feature, it's a "static variables" thing? umm. I had nearly changed my mind to a "okay, if you insist +1", but now it's back to -1 again. maybe in Py3K... From bwarsaw@cnri.reston.va.us Fri Apr 14 06:23:40 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 01:23:40 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000a01bfa5b9$b99a6760$182d153f@tim> <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <14582.43868.600655.132428@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> so it's no longer an experimental feature, it's a "static FL> variables" thing? FL> umm. I had nearly changed my mind to a "okay, if you insist FL> +1", but now it's back to -1 again. maybe in Py3K... C'mon! Most people are still going to just use module globals for function statics because they're less to type (notwithstanding the sometimes-optional global decl). You can't worry about all the novel abuses people will think up for this feature -- they're already doing it with all sorts of other things Pythonic, e.g. docstrings, global as pragma, etc. Can I get at least a +0? :) -Barry From tim_one@email.msn.com Fri Apr 14 08:34:46 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 14 Apr 2000 03:34:46 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <000401bfa5e3$e8c5ce60$612d153f@tim> [Tim] >> I expect that they'll expect exactly what happens in JavaScript, which >> supports function attributes too, and where it's often used as a >> nicer-than-globals way to get the effect of C-like mutable statics >> (conceptually) local to the function. [/F] > so it's no longer an experimental feature, it's a "static variables" > thing? Yes, of course people will use it to get the effect of function statics. OK by me. People do the same thing today with class data attributes (i.e., to get the effect of mutable statics w/o polluting the module namespace). They'll use it for all sorts of other stuff too -- it's mechanism, not policy. BTW, I don't think any "experimental feature" has ever been removed -- only features that weren't experimental. So if you want to see it go away ... > umm. I had nearly changed my mind to a "okay, if you insist +1", > but now it's back to -1 again. maybe in Py3K... Greg gave the voting rule as: > -1 "Veto. And is my reasoning." Vladimir has done some reasoning, but the basis of your objection remains a mystery. We should be encouraging our youth to submit patches with their crazy ideas . From gansevle@cs.utwente.nl Fri Apr 14 08:46:08 2000 From: gansevle@cs.utwente.nl (Fred Gansevles) Date: Fri, 14 Apr 2000 09:46:08 +0200 Subject: [Python-Dev] cvs-server out of sync with mailing-list ? Message-ID: <200004140746.JAA05473@localhost.localdomain> I try to keep up-to-date with the cvs-tree at cvs.python.org and receive the python-checkins@python.org mailing-list. Just now I discovered that the cvs-server and the checkins-list are out of sync. For example: according to the checkins-list the latest version of src/Python/sysmodule.c is 2.62 and according to the cvs-server the latest version is 2.59 Am I missing something or is there some kind of a problem ? ____________________________________________________________________________ Fred Gansevles Phone: +31 53 489 4613 >>> Your one-stop-shop for Linux/WinNT/NetWare <<< Org.: Twente University, Fac. of CS, Box 217, 7500 AE Enschede, Netherlands "Bill needs more time to learn Linux" - Steve B. From mal@lemburg.com Fri Apr 14 00:05:12 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 01:05:12 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM <1256468519-52554453@hypernet.com> Message-ID: <38F652A8.B2F8C822@lemburg.com> Gordon McMillan wrote: > ... > This is nowhere near as confusing as, say, putting a module > named X in a package named X and then saying "from X > import *", (hi, Marc-Andre!). Users shouldn't bother looking into packages... only at the documented interface ;-) The hack is required to allow sibling submodules to import the packages main module (I could have also written import __init__ everywhere but that wouldn't have made things clearer), BTW. It turned out to be very convenient during development of all those mx packages. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Fri Apr 14 09:46:15 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 10:46:15 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> Message-ID: <38F6DAD7.BBAF72E5@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > Fredrik Lundh wrote: > > > > > > M.-A. Lemburg wrote: > > > > The current need for #pragmas is really very simple: to tell > > > > the compiler which encoding to assume for the characters > > > > in u"...strings..." (*not* "...8-bit strings..."). > > > > > > why not? > > > > Because plain old 8-bit strings should work just as before, > > that is, existing scripts only using 8-bit strings should not break. > > but they won't -- if you don't use an encoding directive, and > don't use 8-bit characters in your string literals, everything > works as before. > > (that's why the default is "none" and not "utf-8") > > if you use 8-bit characters in your source code and wish to > add an encoding directive, you need to add the right encoding > directive... Fair enough, but this would render all the auto-coercion code currently in 1.6 useless -- all string to Unicode conversions would have to raise an exception. > > > why keep on pretending that strings and strings are two > > > different things? it's an artificial distinction, and it only > > > causes problems all over the place. > > > > Sure. The point is that we can't just drop the old 8-bit > > strings... not until Py3K at least (and as Fred already > > said, all standard editors will have native Unicode support > > by then). > > I discussed that in my original "all characters are unicode > characters" proposal. in my proposal, the standard string > type will have to roles: a string either contains unicode > characters, or binary bytes. > > -- if it contains unicode characters, python guarantees that > methods like strip, lower (etc), and regular expressions work > as expected. > > -- if it contains binary data, you can still use indexing, slicing, > find, split, etc. but they then work on bytes, not on chars. > > it's still up to the programmer to keep track of what a certain > string object is (a real string, a chunk of binary data, an en- > coded string, a jpeg image, etc). if the programmer wants > to convert between a unicode string and an external encoding > to use a certain unicode encoding, she needs to spell it out. > the codecs are never called "under the hood". > > (note that if you encode a unicode string into some other > encoding, the result is binary buffer. operations like strip, > lower et al does *not* work on encoded strings). Huh ? If the programmer already knows that a certain string uses a certain encoding, then he can just as well convert it to Unicode by hand using the right encoding name. The whole point we are talking about here is that when having the implementation convert a string to Unicode all by itself it needs to know which encoding to use. This is where we have decided long ago that UTF-8 should be used. The pragma discussion is about a totally different issue: pragmas could make it possible for the programmer to tell the *compiler* which encoding to use for literal u"unicode" strings -- nothing more. Since "8-bit" strings currently don't have an encoding attached to them we store them as-is. I don't want to get into designing a completely new character container type here... this can all be done for Py3K, but not now -- it breaks things at too many ends (even though it would solve the issues with strings being used in different contexts). > > > -- we still need an encoding marker for ascii supersets (how about > > > ;-). however, it's up to > > > the tokenizer to detect that one, not the parser. the parser only > > > sees unicode strings. > > > > Hmm, the tokenizer doesn't do any string -> object conversion. > > That's a task done by the parser. > > "unicode string" meant Py_UNICODE*, not PyUnicodeObject. > > if the tokenizer does the actual conversion doesn't really matter; > the point is that once the code has passed through the tokenizer, > it's unicode. The tokenizer would have to know which parts of the input string to convert to Unicode and which not... plus there are different encodings to be applied, e.g. UTF-8, Unicode-Escape, Raw-Unicode-Escape, etc. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Fri Apr 14 09:24:30 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 10:24:30 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> <14581.63094.538920.187344@seahag.cnri.reston.va.us> Message-ID: <38F6D5BE.924F4D62@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > Hmm, anything else would introduce a new keyword, I guess. And > > new keywords cause new scripts to fail in old interpreters > > even when they don't use Unicode at all and only include > > per convention. > > Only if the new keyword is used in the script or anything it > imports. This is exactly like using new syntax (u'...') or new > library features (unicode('abc', 'iso-8859-1')). Right, but I would guess that people would then start using these keywords in all files per convention (so as not to trip over bugs due to wrong encodings). Perhaps I'm overcautious here... > I can't think of anything that gets included "by convention" that > breaks anything. I don't recall a proposal that we should casually > add pragmas to our scripts if there's no need to do so. Adding > pragmas to library modules is *not* part of the issue; they'd only be > there if the version of Python they're part of supports the syntax. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From paul@prescod.net Fri Apr 14 10:12:08 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 14 Apr 2000 04:12:08 -0500 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000a01bfa5b9$b99a6760$182d153f@tim> <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <38F6E0E8.E336F6C6@prescod.net> Fredrik Lundh wrote: > > so it's no longer an experimental feature, it's a "static variables" > thing? > > umm. I had nearly changed my mind to a "okay, if you insist +1", > but now it's back to -1 again. maybe in Py3K... I think that we get 95% of the benefit without any of the "dangers" (though I don't agree with the arguments against) if we allow the attachment of properties only at compile time and disallow mutation of them at runtime. That will allow Spark, EventDOM, multi-lingual docstrings etc., but disallow static variables. I'm not agreeing that using function properties as static variables is a bad thing...I'm just saying that we might be able to agree on a less powerful mechanism and then revisit the more general one in Py3K. Let's not forget that Py3K is going to be a very hard exercise in trying to combine everyone's ideas "all at once". Experience gained now is golden. We should probably be more amenable to "experimental ideas" now -- secure in the knowledge that they can be killed off in Py3K. If we put ideas we are not 100% comfortable with in Py3K we will be stuck with them forever. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From mhammond@skippinet.com.au Fri Apr 14 14:01:33 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 14 Apr 2000 23:01:33 +1000 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: > Can I get at least a +0? :) Im quite amazed this is contentious! Definately a +1 from me! Mark. From skip@mojam.com (Skip Montanaro) Fri Apr 14 14:05:44 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 14 Apr 2000 08:05:44 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.6056.362378.834649@beluga.mojam.com> Mark> Im quite amazed this is contentious! Definately a +1 from me! +1 from the skippi in Chicago as well... Skip From mhammond@skippinet.com.au Fri Apr 14 14:11:39 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Fri, 14 Apr 2000 23:11:39 +1000 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <38F6E0E8.E336F6C6@prescod.net> Message-ID: > I think that we get 95% of the benefit without any of the > "dangers" > (though I don't agree with the arguments against) if we allow the > attachment of properties only at compile time and > disallow mutation of > them at runtime. AFAIK, this would be a pretty serious change. The compiler just generates (basically)PyObject_SetAttr() calls. There is no way in the current runtime to differentiate between "compile time" and "runtime" attribute references... If this was done, it would simply be ugly hacks to support what can only be described as unpythonic in the first place! [Unless of course Im missing something...] Mark. From fredrik@pythonware.com Fri Apr 14 14:34:48 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 15:34:48 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: Message-ID: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Barry Wrote: > > Can I get at least a +0? :) okay, I'll retract. here's today's opinion: +1 on an experimental future, which is not part of the language definition, and not necessarily supported by all implementations. (and where supported, not necessarily very efficient). -1 on static function variables implemented as attributes on function or method objects. def eff(): "eff" print "eff", eff.__doc__ def bot(): "bot" print "bot", bot.__doc__ eff() bot() eff, bot =3D bot, eff eff() bot() # or did your latest patch solve this little dilemma? # if so, -1 on your patch ;-) From fdrake@acm.org Fri Apr 14 14:46:11 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:46:11 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F6D5BE.924F4D62@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> <14581.63094.538920.187344@seahag.cnri.reston.va.us> <38F6D5BE.924F4D62@lemburg.com> Message-ID: <14583.8483.628361.523059@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Right, but I would guess that people would then start using these > keywords in all files per convention (so as not to trip over > bugs due to wrong encodings). I don't imagine the new keywords would be used by anyone that wasn't specifically interested in their effect. Code that isn't needed tends not to get written! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Fri Apr 14 14:55:36 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:55:36 -0400 (EDT) Subject: [Python-Dev] cvs-server out of sync with mailing-list ? In-Reply-To: <200004140746.JAA05473@localhost.localdomain> References: <200004140746.JAA05473@localhost.localdomain> Message-ID: <14583.9048.857826.186107@seahag.cnri.reston.va.us> Fred Gansevles writes: > Just now I discovered that the cvs-server and the checkins-list are out of > sync. For example: according to the checkins-list the latest version of > src/Python/sysmodule.c is 2.62 and according to the cvs-server the latest > version is 2.59 > > Am I missing something or is there some kind of a problem ? There's a problem, but it's highly isolated. We're updating the public CVS using rsync tunnelled through ssh, which worked greate until some of us switched to Linux workstations, where OpenSSH behaves a little differently with some private keys files. I've not figured out how to work around it yet, but will keep playing with it. I've synced the public CVS from a Solaris box for now, so all the recent changes should be visible. Until I get things fixed, I'll try to remember to sync it before I head home in the evenings. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake@acm.org Fri Apr 14 14:57:34 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:57:34 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils unixccompiler.py,1.21,1.22 In-Reply-To: <200004141353.JAA04309@thrak.cnri.reston.va.us> References: <200004141353.JAA04309@thrak.cnri.reston.va.us> Message-ID: <14583.9166.166905.476276@seahag.cnri.reston.va.us> Greg Ward writes: > ! # Not many Unices required ranlib anymore -- SunOS 4.x is, I > ! # think the only major Unix that does. Maybe we need some You're saying that SunOS 4.x *is* a major Unix???? Not for a while, now.... -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin@mems-exchange.org Fri Apr 14 15:15:37 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 14 Apr 2000 10:15:37 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000401bfa5e3$e8c5ce60$612d153f@tim> References: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> <000401bfa5e3$e8c5ce60$612d153f@tim> Message-ID: <14583.10249.322298.959083@amarok.cnri.reston.va.us> >Yes, of course people will use it to get the effect of function statics. OK >by me. People do the same thing today with class data attributes (i.e., to Wait, the attributes added to a function are visible inside the function? (I haven't looked that closely at the patch?) That strikes me as a much more significant change to Python's scoping, making it local, function attribute, then global scope. a I thought of the attributes as labels that could be attached to a callable object for the convenience of some external system, but the function would remain blissfully unaware of the external meaning attached to itself. -1 from me if a function's attributes are visible to code inside the function; +0 if they're not. -- A.M. Kuchling http://starship.python.net/crew/amk/ The paradox of money is that when you have lots of it you can manage life quite cheaply. Nothing so economical as being rich. -- Robertson Davies, _The Rebel Angels_ From skip@mojam.com (Skip Montanaro) Fri Apr 14 15:39:27 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 14 Apr 2000 09:39:27 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.10249.322298.959083@amarok.cnri.reston.va.us> References: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> <000401bfa5e3$e8c5ce60$612d153f@tim> <14583.10249.322298.959083@amarok.cnri.reston.va.us> Message-ID: <14583.11679.107267.727484@beluga.mojam.com> >> Yes, of course people will use it to get the effect of function >> statics. OK by me. People do the same thing today with class data >> attributes (i.e., to AMK> Wait, the attributes added to a function are visible inside the AMK> function? (I haven't looked that closely at the patch?) No, they aren't. There is no change of Python's scoping rules using Barry's function attributes patch. In fact, they are *only* available to the function itself via the function's name in the module globals. That's why Fredrik's "eff, bot = bot, eff" trick worked as it did. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From Vladimir.Marangozov@inrialpes.fr Fri Apr 14 15:41:39 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 16:41:39 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: from "Mark Hammond" at Apr 14, 2000 11:01:33 PM Message-ID: <200004141441.QAA02162@python.inrialpes.fr> Mark Hammond wrote: > > > Can I get at least a +0? :) > > Im quite amazed this is contentious! Definately a +1 from me! > > Mark. > Amazed or not, it is contentious. I have the responsability to remove my veto once my concerns are adressed. So far, I have the impression that all I get (if I get anything at all -- see above) is "conveniency" from Gordon, which is nothing else but laziness about creating instances. As long as we discuss customization of objects with builtin types, the "inconsistency" stays bound to classes and instances. Add modules if you wish, but they are just namespaces. This proposal expands the customization inconsistency to functions and methods. And I am reluctant to see this happening "under the hood", without a global vision of the problem, just because a couple of people have abused unprotected attributes and claim that they can't do what they want because Python doesn't let them to. As to the object model, together with naming and binding, I say: KISS or do it right the first time. add-more-oil-to-the-fire-and-you'll-burn-your-house--ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip@mojam.com (Skip Montanaro) Fri Apr 14 16:04:51 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 14 Apr 2000 10:04:51 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141441.QAA02162@python.inrialpes.fr> References: <200004141441.QAA02162@python.inrialpes.fr> Message-ID: <14583.13203.900930.294033@beluga.mojam.com> Vladimir> So far, I have the impression that all I get (if I get Vladimir> anything at all -- see above) is "conveniency" from Gordon, Vladimir> which is nothing else but laziness about creating instances. No, you get function metadata. Barry's original reason for creating the patch was that the only writable attribute for functions or methods is the doc string. Multiple people are using it now to mean different things, and this leads to problems when those different uses clash. I submit that if I have to wrap methods (not functions) in classes and instantiate them to avoid being "lazy", then my code is going to look pretty horrible after applying this more than once or twice. Both Zope and John Aycock's system (SPARK?) demonstrate the usefulness of being able to attach metadata to functions and methods. All Barry is suggesting is that Python support that capability better. Finally, it's not clear to my feeble brain just how I would go about instantiating a method to get this capability today. Suppose I have class Spam: def eggs(self, a): return a and I want to attach an attribute to Spam.eggs that tells me if it is public/private in the Zope sense. Zope requires you to add a doc string to a method to declare that it's public: class Spam: def eggs(self, a): "doc" return a Fine, except that effectively prevents you from adding doc strings to your "private" methods as Greg Stein pointed out. Barry's proposal would allow the Spam.eggs author to attach an attribute to it: class Spam: def eggs(self, a): "doc" return a eggs.__zope_access__ = "private" I think the solution you're proposing is class Spam: class EggsMethod: def __call__(self, a): "doc" return a __zope_access__ = "private" eggs = EggsMethod() This seems to work, but also seems like a lot of extra baggage (and a performance hit to boot) to arrive at what seems like a very simple concept. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From Vladimir.Marangozov@inrialpes.fr Fri Apr 14 16:30:31 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 17:30:31 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.13203.900930.294033@beluga.mojam.com> from "Skip Montanaro" at Apr 14, 2000 10:04:51 AM Message-ID: <200004141530.RAA02277@python.inrialpes.fr> Skip Montanaro wrote: > > Barry's proposal would allow the Spam.eggs author to attach an attribute to > it: > > class Spam: > def eggs(self, a): > "doc" > return a > eggs.__zope_access__ = "private" > > I think the solution you're proposing is > > class Spam: > class EggsMethod: > def __call__(self, a): > "doc" > return a > __zope_access__ = "private" > eggs = EggsMethod() > > This seems to work, but also seems like a lot of extra baggage (and a > performance hit to boot) to arrive at what seems like a very simple concept. > If you prefer embedded definitions, among other things, you could do: __zope_access__ = { 'Spam' : 'public' } class Spam: __zope_access__ = { 'eggs' : 'private', 'eats' : 'public' } def eggs(self, ...): ... def eats(self, ...): ... or have a completely separate class/structure for access control (which is what you would do it in C, btw, for existing objects to which you can't add slots, ex: file descriptors, mem segments, etc). -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From bwarsaw@cnri.reston.va.us Fri Apr 14 16:52:17 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 11:52:17 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <38F6E0E8.E336F6C6@prescod.net> Message-ID: <14583.16049.933693.237302@anthem.cnri.reston.va.us> >>>>> "MH" == Mark Hammond writes: MH> AFAIK, this would be a pretty serious change. The compiler MH> just generates (basically)PyObject_SetAttr() calls. There is MH> no way in the current runtime to differentiate between MH> "compile time" and "runtime" attribute references... If this MH> was done, it would simply be ugly hacks to support what can MH> only be described as unpythonic in the first place! MH> [Unless of course Im missing something...] You're not missing anything Mark! Remember Python's /other/ motto: "we're all consenting adults here". If you don't wanna mutate your function attrs at runtime... just don't! :) -Barry From bwarsaw@cnri.reston.va.us Fri Apr 14 16:59:55 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 14 Apr 2000 11:59:55 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Message-ID: <14583.16507.268456.950881@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> # or did your latest patch solve this little dilemma? No, definitely not. >>>>> "AMK" == Andrew M Kuchling writes: AMK> Wait, the attributes added to a function are visible inside AMK> the function? My patch definitely does not change Python's scoping rules in any way. This was a 1/2 hour hack, for Guido's sake! :) -Barry From tim_one@email.msn.com Fri Apr 14 17:04:32 2000 From: tim_one@email.msn.com (Tim Peters) Date: Fri, 14 Apr 2000 12:04:32 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.10249.322298.959083@amarok.cnri.reston.va.us> Message-ID: <000501bfa62b$1ef8f560$d82d153f@tim> [Tim] > Yes, of course people will use it to get the effect of function > statics. OK by me. People do the same thing today with class data > attributes (i.e., to [Andrew M. Kuchling] > Wait, the attributes added to a function are visible inside the > function? No, same as in JavaScript, you need funcname.attr, just as you need classname.attr in Python today to fake the effect of mutable class statics (in the C++ sense). > [hysteria deleted ] > ... > +0 if they're not. From paul@prescod.net Fri Apr 14 17:21:31 2000 From: paul@prescod.net (Paul Prescod) Date: Fri, 14 Apr 2000 11:21:31 -0500 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: Message-ID: <38F7458B.17F72652@prescod.net> Mark Hammond wrote: > > AFAIK, this would be a pretty serious change. The compiler just > generates (basically)PyObject_SetAttr() calls. I posted a proposal a few days back that does not use the "." SetAttr syntax and is clearly distinguisable (visually and by the compiler) from runtime property assignment. http://www.python.org/pipermail/python-dev/2000-April/004875.html The response was light but positive... -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From skip@mojam.com (Skip Montanaro) Fri Apr 14 17:29:34 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 14 Apr 2000 11:29:34 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <38F7458B.17F72652@prescod.net> References: <38F7458B.17F72652@prescod.net> Message-ID: <14583.18286.67371.157754@beluga.mojam.com> Paul> I posted a proposal a few days back that does not use the "." Paul> SetAttr syntax and is clearly distinguisable (visually and by the Paul> compiler) from runtime property assignment. Paul> http://www.python.org/pipermail/python-dev/2000-April/004875.html Paul> The response was light but positive... Paul, I have a question. Given the following example from your note: decl {type:"def(myint: int) returns bar", french_doc:"Bonjour", english_doc: "Hello"} def func( myint ): return bar() how is the compiler supposed to associate a particular "decl {...}" with a particular function? Is it just by order in the file? -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From gmcm@hypernet.com Fri Apr 14 17:32:42 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 12:32:42 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141441.QAA02162@python.inrialpes.fr> References: from "Mark Hammond" at Apr 14, 2000 11:01:33 PM Message-ID: <1256392532-57122808@hypernet.com> Vladimir Marangozov wrote: > Amazed or not, it is contentious. I have the responsability to > remove my veto once my concerns are adressed. So far, I have the > impression that all I get (if I get anything at all -- see above) > is "conveniency" from Gordon, which is nothing else but laziness > about creating instances. I have the impression that majority of changes to Python are conveniences. > As long as we discuss customization of objects with builtin types, > the "inconsistency" stays bound to classes and instances. Add modules > if you wish, but they are just namespaces. This proposal expands > the customization inconsistency to functions and methods. And I am > reluctant to see this happening "under the hood", without a global > vision of the problem, just because a couple of people have abused > unprotected attributes and claim that they can't do what they want > because Python doesn't let them to. Can you please explain how "consistency" is violated? - Gordon From gmcm@hypernet.com Fri Apr 14 17:32:42 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 12:32:42 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Message-ID: <1256392531-57122875@hypernet.com> Fredrik Lundh wrote: > -1 on static function variables implemented as > attributes on function or method objects. > > def eff(): > "eff" > print "eff", eff.__doc__ > > def bot(): > "bot" > print "bot", bot.__doc__ > > eff() > bot() > > eff, bot = bot, eff > > eff() > bot() > > # or did your latest patch solve this little dilemma? > # if so, -1 on your patch ;-) To belabor the obvious (existing Python allows obsfuction), I present: class eff: "eff" def __call__(self): print "eff", eff.__doc__ class bot: "bot" def __call__(self): print "bot", bot.__doc__ e = eff() b = bot() e() b() eff, bot = bot, eff e = eff() b = bot() e() b() There's nothing new here. Why does allowing the ability to obsfucate suddenly warrant a -1? - Gordon From Vladimir.Marangozov@inrialpes.fr Fri Apr 14 18:15:09 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 19:15:09 +0200 (CEST) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> from "Jeremy Hylton" at Apr 13, 2000 07:30:02 PM Message-ID: <200004141715.TAA02492@python.inrialpes.fr> Jeremy Hylton wrote: > > Here it is contextified. One small difference from the previous patch > is that NESTING_LIMIT is now only 1000. I think this is sufficient to > cover commonly occuring nested containers. > > Jeremy > > [patch omitted] Nice. I think you don't need the _PyCompareState_flag. Like in trashcan, _PyCompareState_nesting is enough to enter the sections of the code that depend on _PyCompareState_flag. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Moshe Zadka Fri Apr 14 18:46:12 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 14 Apr 2000 19:46:12 +0200 (IST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131810.UAA05752@python.inrialpes.fr> Message-ID: On Thu, 13 Apr 2000, Vladimir Marangozov wrote: > >>> def this(): > ... sucks = "no" > ... > >>> this.sucks = "yes" > >>> > >>> print this.sucks > 'yes' > > Why on earth 'sucks' is not the object defined in the function's namespace? > Who made that deliberate decision? Clearly 'this' defines a new namespace, > so it'll be also legitimate to get a NameError, or to: > > >>> print this.sucks > 'no' > > Don't you think? No. >>> def this(turing_machine): ... if stops(turing_machine): ... confusing = "yes" ... else: ... confusing = "no" ... >>> print this.confusing -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From klm@digicool.com Fri Apr 14 19:19:42 2000 From: klm@digicool.com (Ken Manheimer) Date: Fri, 14 Apr 2000 14:19:42 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > Can I get at least a +0? :) I want function attributes. (There are all sorts of occasions i need cues to classify functions for executives that map and apply them, and this seems like the perfect way to couple that information with the object. Much nicer than having to mangle the names of the functions, or create some external registry with the classifications.) And i think i'd want them even more if they were visible within the function, so i could do static variables. Why is that a bad thing? So i guess that means i'd give a +1 for the proposal as stands, with the understanding that you'd get *another* +1 for the additional feature - yielding a bigger, BETTER +1. Metadata, static vars, frameworks ... oh my!-) (Oh, and i'd suggest up front that documentation for this feature recommend people not use "__*__" names for their own object attributes, to avoid collisions with eventual use of them by python.) Ken klm@digicool.com From bwarsaw@cnri.reston.va.us Fri Apr 14 19:21:11 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Fri, 14 Apr 2000 14:21:11 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.24983.768952.870567@anthem.cnri.reston.va.us> >>>>> "KM" == Ken Manheimer writes: KM> (Oh, and i'd suggest up front that documentation for this KM> feature recommend people not use "__*__" names for their own KM> object attributes, to avoid collisions with eventual use of KM> them by python.) Agreed. From fdrake@acm.org Fri Apr 14 19:25:46 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 14:25:46 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Ken Manheimer writes: > (Oh, and i'd suggest up front that documentation for this feature > recommend people not use "__*__" names for their own object attributes, to > avoid collisions with eventual use of them by python.) Isn't that a standing recommendation for all names? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fredrik Lundh" Message-ID: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> > To belabor the obvious (existing Python allows obsfuction), I=20 > present: >=20 > class eff: > "eff" > def __call__(self): > print "eff", eff.__doc__ > =20 > class bot: > "bot" > def __call__(self): > print "bot", bot.__doc__ > =20 > e =3D eff() > b =3D bot() > e() > b() >=20 > eff, bot =3D bot, eff > e =3D eff() > b =3D bot() > e() > b()=20 >=20 > There's nothing new here. Why does allowing the ability to=20 > obsfucate suddenly warrant a -1? since when did Python grow full lexical scoping? does anyone that has learned about the LGB rule expect the above to work? in contrast, my example used a name which appears to be defined in the same scope as the other names introduced on the same line of source code -- but isn't. def foo(x): foo.x =3D x here, "foo" doesn't refer to the same namespace as the argument "x", but to instead whatever happens to be in an entirely different namespace at the time the function is executed. in other words, this feature cannot really be used to store statics -- it only looks that way... From Fredrik Lundh" Message-ID: <001901bfa63f$ca0c08c0$34aab5d4@hagrid> Ken Manheimer wrote: > I want function attributes. (There are all sorts of occasions i need = cues > to classify functions for executives that map and apply them, and this > seems like the perfect way to couple that information with the > object. Much nicer than having to mangle the names of the functions, = or > create some external registry with the classifications.) how do you expect to find all methods that has a given attribute? > And i think i'd want them even more if they were visible within the > function, so i could do static variables. Why is that a bad thing? because it doesn't work, unless you change python in a backwards incompatible way. that's okay in py3k, it's not okay in 1.6. From Vladimir.Marangozov@inrialpes.fr Fri Apr 14 20:07:15 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 21:07:15 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256392532-57122808@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 12:32:42 PM Message-ID: <200004141907.VAA02670@python.inrialpes.fr> Gordon McMillan wrote: > > [VM] > > As long as we discuss customization of objects with builtin types, > > the "inconsistency" stays bound to classes and instances. Add modules > > if you wish, but they are just namespaces. This proposal expands > > the customization inconsistency to functions and methods. And I am > > reluctant to see this happening "under the hood", without a global > > vision of the problem, just because a couple of people have abused > > unprotected attributes and claim that they can't do what they want > > because Python doesn't let them to. > > Can you please explain how "consistency" is violated? > Yes, I can. To start with and to save me typing, please reread the 1st section of Demo/metaclasses/meta-vladimir.txt about Classes. ------- Now, whenever there are two instances 'a' and 'b' of the class A, the first inconsistency is that we're allowed to assign attributes to these instances dynamically, which are not declared in the class A. Strictly speaking, if I say: >>> a.author = "Guido" and if 'author' is not an attribute of 'a' after the instantiation of A (i.e. after a = A() completes), we should get a NameError. It's an inconsistency because whenever the above assignment succeeds, 'a' is no more an instance of A. It's an instance of some other class, because A prescribes what *all* instances of A have in *common*. So from here, we have to find our way in the object model and live with this 1st inconsistency. Problem: What is the class of the singleton 'a' then? Say, I need this class after the fact to build another society of objects, i.e. "clone" 'a' a hundred of times, because 'a' has dozens of attributes different than 'b'. To make a long story short, it turns out that we can build a Python class A1, having those attributes declared, then instantiate A1 hundreds of times and hopefully, let 'a' find its true identity with: >>> a.__class__ = A1 This is the key of the story. We *can* build, for a given singleton, its Python class, after the fact. And this is the only thing which still makes the Python class model 'relatively consistent'! If it weren't possible to build that class A1, it would have been better to stop talking about classes and a class model in Python. ("associations of typed structures with per-type binding rules" would have probably been a better term). Now to the question: how "consistency" is violated by the proposal? It is violated, because actually we *can't* build and restore the class, after the fact, of a builtin object (a funtion 'f') to which we add user attributes. We can't do it for 2 reasons, which we hope to solve in Py3K: 1) the class of 'f' is implemented in C 2) we still can't inherit from builtin classes (at least in CPython) As a consequence, we can't actually build hundreds of "clones" of 'f' by instantiating a class object. We can build them by adding manually the same attribute, but this is not OO, this is just 'binding to a namespace'. This is the true reason on why this fragile consistency is violated. Please, save me the trouble to expose the details you're missing, to each of you, where those details are omitted for simplicity. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Fredrik Lundh" <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> Message-ID: <005401bfa646$123ef2a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > > but they won't -- if you don't use an encoding directive, and > > don't use 8-bit characters in your string literals, everything > > works as before. > >=20 > > (that's why the default is "none" and not "utf-8") > >=20 > > if you use 8-bit characters in your source code and wish to > > add an encoding directive, you need to add the right encoding > > directive... >=20 > Fair enough, but this would render all the auto-coercion > code currently in 1.6 useless -- all string to Unicode > conversions would have to raise an exception. I though it was rather clear by now that I think the auto- conversion stuff *is* useless... but no, that doesn't mean that all string to unicode conversions need to raise exceptions -- any 8-bit unicode character obviously fits into a 16-bit unicode character, just like any integer fits in a long integer. if you convert the other way, you might get an OverflowError, just like converting from a long integer to an integer may give you an exception if the long integer is too large to be represented as an ordinary integer. after all, i =3D int(long(v)) doesn't always raise an exception... > > > > why keep on pretending that strings and strings are two > > > > different things? it's an artificial distinction, and it only > > > > causes problems all over the place. > > > > > > Sure. The point is that we can't just drop the old 8-bit > > > strings... not until Py3K at least (and as Fred already > > > said, all standard editors will have native Unicode support > > > by then). > >=20 > > I discussed that in my original "all characters are unicode > > characters" proposal. in my proposal, the standard string > > type will have to roles: a string either contains unicode > > characters, or binary bytes. > >=20 > > -- if it contains unicode characters, python guarantees that > > methods like strip, lower (etc), and regular expressions work > > as expected. > >=20 > > -- if it contains binary data, you can still use indexing, slicing, > > find, split, etc. but they then work on bytes, not on chars. > >=20 > > it's still up to the programmer to keep track of what a certain > > string object is (a real string, a chunk of binary data, an en- > > coded string, a jpeg image, etc). if the programmer wants > > to convert between a unicode string and an external encoding > > to use a certain unicode encoding, she needs to spell it out. > > the codecs are never called "under the hood". > >=20 > > (note that if you encode a unicode string into some other > > encoding, the result is binary buffer. operations like strip, > > lower et al does *not* work on encoded strings). >=20 > Huh ? If the programmer already knows that a certain > string uses a certain encoding, then he can just as well > convert it to Unicode by hand using the right encoding > name. I thought that was what I said, but the text was garbled. let's try again: if the programmer wants to convert between a unicode string and a buffer containing encoded text, she needs to spell it out. the codecs are never called "under the hood" > The whole point we are talking about here is that when > having the implementation convert a string to Unicode all > by itself it needs to know which encoding to use. This is > where we have decided long ago that UTF-8 should be > used. does "long ago" mean that the decision cannot be questioned? what's going on here? face it, I don't want to guess when and how the interpreter will convert strings for me. after all, this is Python, not Perl. if I want to convert from a "string of characters" to a byte buffer using a certain character encoding, let's make that explicit. Python doesn't convert between other data types for me, so why should strings be a special case? > The pragma discussion is about a totally different > issue: pragmas could make it possible for the programmer > to tell the *compiler* which encoding to use for literal > u"unicode" strings -- nothing more. Since "8-bit" strings > currently don't have an encoding attached to them we store > them as-is. what do I have to do to make you read my proposal? shout? okay, I'll try: THERE SHOULD BE JUST ONE INTERNAL CHARACTER SET IN PYTHON 1.6: UNICODE. for consistency, let this be true for both 8-bit and 16-bit strings (as well as Py3K's 31-bit strings ;-). there are many possible external string encodings, just like there are many possible external integer encodings. but for integers, that's not something that the core implementation cares much about. why are strings different? > I don't want to get into designing a completely new > character container type here... this can all be done for Py3K, > but not now -- it breaks things at too many ends (even though > it would solve the issues with strings being used in different > contexts). you don't need to -- you only need to define how the *existing* string type should be used. in my proposal, it can be used in two ways: -- as a string of unicode characters (restricted to the 0-255 subset, by obvious reasons). given a string 's', len(s) is always the number of characters, s[i] is the i'th character, etc. or=20 -- as a buffer containing binary bytes. given a buffer 'b', len(b) is always the number of bytes, b[i] is the i'th byte, etc. this is one flavour less than in the 1.6 alphas -- where strings = sometimes contain UTF-8 (and methods like upper etc doesn't work), sometimes an 8-bit character set (and upper works), and sometimes binary buffers (for which upper doesn't work). (hmm. I've said all this before, haven't I?) > > > > -- we still need an encoding marker for ascii supersets (how = about > > > > ;-). however, = it's up to > > > > the tokenizer to detect that one, not the parser. the parser = only > > > > sees unicode strings. > > > > > > Hmm, the tokenizer doesn't do any string -> object conversion. > > > That's a task done by the parser. > >=20 > > "unicode string" meant Py_UNICODE*, not PyUnicodeObject. > >=20 > > if the tokenizer does the actual conversion doesn't really matter; > > the point is that once the code has passed through the tokenizer, > > it's unicode. >=20 > The tokenizer would have to know which parts of the > input string to convert to Unicode and which not... plus there > are different encodings to be applied, e.g. UTF-8, Unicode-Escape, > Raw-Unicode-Escape, etc. sigh. why do you insist on taking a very simple thing and making it very very complicated? will anyone out there ever use an editor that supports different encodings for different parts of the file? why not just assume that the *ENTIRE SOURCE FILE* uses a single encoding, and let the tokenizer (or more likely, a conversion stage before the tokenizer) convert the whole thing to unicode. let the rest of the compiler work on Py_UNICODE* strings only, and all your design headaches will just disappear. ... frankly, I'm beginning to feel like John Skaller. do I have to write my own interpreter to get this done right? :-( From klm@digicool.com Fri Apr 14 20:18:18 2000 From: klm@digicool.com (Ken Manheimer) Date: Fri, 14 Apr 2000 15:18:18 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> Message-ID: > since when did Python grow full lexical scoping? > > does anyone that has learned about the LGB rule expect > the above to work? Not sure what LGB stands for. "Local / Global / Built-in"? > in contrast, my example used a name which appears to be > defined in the same scope as the other names introduced > on the same line of source code -- but isn't. > > def foo(x): > foo.x =3D x > > here, "foo" doesn't refer to the same namespace as the > argument "x", but to instead whatever happens to be in > an entirely different namespace at the time the function > is executed. > > in other words, this feature cannot really be used to store > statics -- it only looks that way... Huh. ?? I'm assuming your hypothetical foo.x means the attribute 'x' of the function 'foo' in the global namespace for the function 'foo' - which, conveniently, is the module where foo is defined! 8<--- foo.py --->8 def foo(): # Return the object named 'foo'. return foo 8<--- end foo.py --->8 8<--- bar.py --->8 from foo import * print foo() 8<--- end bar.py --->8 % python bar.py % I must be misapprehending what you're suggesting - i know you know this stuff better than i do - but it seems to me that foo.x would work, were foo to have an x. (And that foo.x would, in my esteem, be a suboptimal way to get at x from within foo, but that's besides the fact.) Ken klm@digicool.com From jeremy@cnri.reston.va.us Fri Apr 14 20:18:53 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Fri, 14 Apr 2000 15:18:53 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <200004141715.TAA02492@python.inrialpes.fr> References: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> <200004141715.TAA02492@python.inrialpes.fr> Message-ID: <14583.28445.105079.446201@bitdiddle.cnri.reston.va.us> >>>>> "VM" == Vladimir Marangozov writes: VM> Jeremy Hylton wrote: >> Here it is contextified. One small difference from the previous >> patch is that NESTING_LIMIT is now only 1000. I think this is >> sufficient to cover commonly occuring nested containers. >> >> Jeremy >> >> [patch omitted] VM> Nice. VM> I think you don't need the _PyCompareState_flag. Like in VM> trashcan, _PyCompareState_nesting is enough to enter the VM> sections of the code that depend on _PyCompareState_flag. Right. Thanks for the suggestion, and thanks to Barry & Fred for theirs. I've checked in the changes. Jeremy From Fredrik Lundh" Message-ID: <006201bfa647$92f3c000$34aab5d4@hagrid> Ken Manheimer wrote: > > does anyone that has learned about the LGB rule expect > > the above to work? >=20 > Not sure what LGB stands for. "Local / Global / Built-in"? certain bestselling python books are known to use this acronym... > I'm assuming your hypothetical foo.x means the attribute 'x' of the > function 'foo' in the global namespace for the function 'foo' - which, = > conveniently, is the module where foo is defined! did you run the eff() bot() example? > I must be misapprehending what you're suggesting - i know you know = this > stuff better than i do - but it seems to me that foo.x would work, = were > foo to have an x. sure, it seems to be working. but not for the right reason. > (And that foo.x would, in my esteem, be a suboptimal > way to get at x from within foo, but that's besides the fact.) fwiw, I'd love to see a good syntax for this. might even change my mind... From Fredrik Lundh" Message-ID: <007e01bfa648$3c701de0$34aab5d4@hagrid> TimBot wrote: > Greg gave the voting rule as: >=20 > > -1 "Veto. And is my reasoning." sorry, I must have missed that post, since I've interpreted the whole thing as: if reduce(operator.add, list_of_votes) > 0 and guido_likes_it(): implement(feature) (probably because I've changed the eff-bot script to use 'sre' instead of 're'...) can you repost the full set of rules? From gmcm@hypernet.com Fri Apr 14 20:36:53 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 15:36:53 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> Message-ID: <1256381481-212228@hypernet.com> Fredrik Lundh wrote: > > To belabor the obvious (existing Python allows obsfuction), I > > present: > > > > class eff: > > "eff" > > def __call__(self): > > print "eff", eff.__doc__ > > > > class bot: > > "bot" > > def __call__(self): > > print "bot", bot.__doc__ > > > > e = eff() > > b = bot() > > e() > > b() > > > > eff, bot = bot, eff > > e = eff() > > b = bot() > > e() > > b() > > > > There's nothing new here. Why does allowing the ability to > > obsfucate suddenly warrant a -1? > > since when did Python grow full lexical scoping? I know that's not Swedish, but I haven't the foggiest what you're getting at. Where did lexical scoping enter? > does anyone that has learned about the LGB rule expect > the above to work? You're the one who did "eff, bot = bot, eff". The only intent I can infer is obsfuction. The above works the same as yours, for whatever your definition of "work". > in contrast, my example used a name which appears to be > defined in the same scope as the other names introduced > on the same line of source code -- but isn't. > > def foo(x): > foo.x = x I guess I'm missing something. -------snip------------ def eff(): "eff" print "eff", eff.__doc__ def bot(): "bot" print "bot", bot.__doc__ eff() bot() eff, bot = bot, eff eff() bot() -----------end----------- I guess we're not talking about the same example. > here, "foo" doesn't refer to the same namespace as the > argument "x", but to instead whatever happens to be in > an entirely different namespace at the time the function > is executed. > > in other words, this feature cannot really be used to store > statics -- it only looks that way... Again, I'm mystified. After "eff, bot = bot, eff", I don't see why 'bot() == "eff bot"' is a wrong result. Put it another way: are you reporting a bug in 1.5.2? If it's a bug, why is my example not a bug? If it's not a bug, why would the existence of other attributes besides __doc__ be a problem? - Gordon From akuchlin@mems-exchange.org Fri Apr 14 20:37:01 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Fri, 14 Apr 2000 15:37:01 -0400 (EDT) Subject: Re[Python-Dev] #pragmas in Python source code In-Reply-To: <005401bfa646$123ef2a0$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> Message-ID: <14583.29533.608524.961284@amarok.cnri.reston.va.us> Fredrik Lundh writes: > if the programmer wants to convert between a unicode > string and a buffer containing encoded text, she needs > to spell it out. the codecs are never called "under the > hood" Watching the successive weekly Unicode patchsets, each one fixing some obscure corner case that turned out to be buggy -- '%s' % ustr, concatenating literals, int()/float()/long(), comparisons -- I'm beginning to agree with Fredrik. Automatically making Unicode strings and regular strings interoperate looks like it requires many changes all over the place, and I worry if it's possible to catch them all in time. Maybe we should consider being more conservative, and just having the Unicode built-in type, the unicode() built-in function, and the u"..." notation, and then leaving all responsibility for conversions up to the user. On the other hand, *some* default conversion seems needed, because it seems draconian to make open(u"abcfile") fail with a TypeError. (While I want to see Python 1.6 expedited, I'd also not like to see it saddled with a system that proves to have been a mistake, or one that's a maintenance burden. If forced to choose between delaying and getting it right, the latter wins.) >why not just assume that the *ENTIRE SOURCE FILE* uses a single >encoding, and let the tokenizer (or more likely, a conversion stage >before the tokenizer) convert the whole thing to unicode. To reinforce Fredrik's point here, note that XML only supports encodings at the level of an entire file (or external entity). You can't tell an XML parser that a file is in UTF-8, except for this one element whose contents are in Latin1. -- A.M. Kuchling http://starship.python.net/crew/amk/ Dream casts a human shadow, when it occurs to him to do so. -- From SANDMAN: "Season of Mists", episode 0 From Fredrik Lundh" Message-ID: <009401bfa64b$202f6c00$34aab5d4@hagrid> > > > There's nothing new here. Why does allowing the ability to=20 > > > obsfucate suddenly warrant a -1? > >=20 > > since when did Python grow full lexical scoping? >=20 > I know that's not Swedish, but I haven't the foggiest what=20 > you're getting at. Where did lexical scoping enter? > > > does anyone that has learned about the LGB rule expect > > the above to work? >=20 > You're the one who did "eff, bot =3D bot, eff". The only intent I=20 > can infer is obsfuction. The above works the same as yours,=20 > for whatever your definition of "work". okay, I'll try again: in your example, the __call__ function refers to a name that is defined several levels up. in my example, the "foo" function refers to a name that *looks* like it's in the same scope as the "x" argument (etc), but isn't. for the interpreter, the examples are identical. for the reader, they're not. > Put it another way: are you reporting a bug in 1.5.2? If it's a=20 > bug, why is my example not a bug? If it's not a bug, why=20 > would the existence of other attributes besides __doc__ be a=20 > problem? because people isn't likely to use __doc__ to store static variables? From skip@mojam.com (Skip Montanaro) Fri Apr 14 21:03:41 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 14 Apr 2000 15:03:41 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <006201bfa647$92f3c000$34aab5d4@hagrid> References: <006201bfa647$92f3c000$34aab5d4@hagrid> Message-ID: <14583.31133.851143.570161@beluga.mojam.com> >> (And that foo.x would, in my esteem, be a suboptimal way to get at x >> from within foo, but that's besides the fact.) Fredrik> fwiw, I'd love to see a good syntax for this. might even Fredrik> change my mind... Could we overload "_"'s meaning yet again (assuming it doesn't already have a special meaning within functions)? That way def bar(): print _.x def foo(): print _.x foo.x = "public" bar.x = "private" bar, foo = foo, bar foo() would display private on stdout. *Note* - I would not advocate this use be extended to do a more general lookup of attributes - it should just refer to attributes of the function of which the executing code object is an attribute. (It may not even be possible.) (I've never used _ for anything, so I don't know all its current (ab)uses. This is just a thought that occurred to me...) Skip From gmcm@hypernet.com Fri Apr 14 21:18:56 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 16:18:56 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141907.VAA02670@python.inrialpes.fr> References: <1256392532-57122808@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 12:32:42 PM Message-ID: <1256378958-363995@hypernet.com> Vladimir Marangozov wrote: > Gordon McMillan wrote: > > Can you please explain how "consistency" is violated? > > > > Yes, I can. > Strictly speaking, if I say: > > >>> a.author = "Guido" > > and if 'author' is not an attribute of 'a' after the instantiation > of A (i.e. after a = A() completes), we should get a NameError. Ah. I see. Quite simply, you're arguing from First Principles in an area where I have none. I used to, but I found that all systems built from First Principles (Eiffel, Booch's methodology...) yielded 3 headed monsters. It can be entertaining (in the WWF sense). Just trick some poor sucker into saying "class method" in the C++ sense and then watch Jim Fulton deck him, the ref and half the front row. Personally, I regard (dynamic instance.attribute) as a handy feature, not as a flaw in the object model. - Gordon From Moshe Zadka Fri Apr 14 21:19:50 2000 From: Moshe Zadka (Moshe Zadka) Date: Fri, 14 Apr 2000 22:19:50 +0200 (IST) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <000901bfa5b9$b7f6c980$182d153f@tim> Message-ID: On Thu, 13 Apr 2000, Tim Peters wrote: > Well, while an instance of graph isomorphism, this one is a relatively > simple special case (because "the graphs" here are rooted, directed, and > have ordered children). Ordered? What about dictionaries? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw@cnri.reston.va.us Fri Apr 14 21:49:41 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 16:49:41 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <006201bfa647$92f3c000$34aab5d4@hagrid> Message-ID: <14583.33893.192967.369037@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> fwiw, I'd love to see a good syntax for this. might even FL> change my mind... def foo(x): self.x = x ? :) -Barry From bwarsaw@cnri.reston.va.us Fri Apr 14 22:03:25 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 17:03:25 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256381481-212228@hypernet.com> <009401bfa64b$202f6c00$34aab5d4@hagrid> Message-ID: <14583.34717.128345.245459@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> because people isn't likely to use __doc__ to store FL> static variables? Okay, let's really see how much we can abuse __doc__ today. I'm surprised neither Zope nor SPARK are this evil. Why must I add the extra level of obfuscating indirection? Or are we talking about making __doc__ read-only in 1.6, or restricting it to strings only? -Barry -------------------- snip snip -------------------- import sys print sys.version def decorate(func): class D: pass doc = func.__doc__ func.__doc__ = D() func.__doc__.__doc__ = doc def eff(): "eff" print "eff", eff.__doc__.__doc__ decorate(eff) def bot(): "bot" print "bot", bot.__doc__.__doc__ decorate(bot) eff.__doc__.publish = 1 bot.__doc__.publish = 0 eff() bot() eff, bot = bot, eff eff() bot() for f in (eff, bot): print 'Can I publish %s? ... %s' % (f.__name__, f.__doc__.publish and 'yes' or 'no') -------------------- snip snip -------------------- % python /tmp/scary.py 1.5.2 (#7, Apr 16 1999, 18:24:22) [GCC 2.8.1] eff eff bot bot bot eff eff bot Can I publish bot? ... no Can I publish eff? ... yes From bwarsaw@cnri.reston.va.us Fri Apr 14 22:05:43 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 17:05:43 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <006201bfa647$92f3c000$34aab5d4@hagrid> <14583.31133.851143.570161@beluga.mojam.com> Message-ID: <14583.34855.459510.161223@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> (I've never used _ for anything, so I don't know all its SM> current (ab)uses. This is just a thought that occurred to SM> me...) One place it's used is in localized applications. See Tools/i18n/pygettext.py. -Barry From gstein@lyra.org Fri Apr 14 22:20:27 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:20:27 -0700 (PDT) Subject: [Python-Dev] voting (was: Object customization) In-Reply-To: <007e01bfa648$3c701de0$34aab5d4@hagrid> Message-ID: On Fri, 14 Apr 2000, Fredrik Lundh wrote: > TimBot wrote: > > Greg gave the voting rule as: > > > > > -1 "Veto. And is my reasoning." > > sorry, I must have missed that post, since I've > interpreted the whole thing as: > > if reduce(operator.add, list_of_votes) > 0 and guido_likes_it(): > implement(feature) As in all cases, that "and" should be an "or" :-) > (probably because I've changed the eff-bot script > to use 'sre' instead of 're'...) > > can you repost the full set of rules? http://www.python.org/pipermail/python-dev/2000-March/004312.html Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Apr 14 22:23:50 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:23:50 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <14583.33893.192967.369037@anthem.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > >>>>> "FL" == Fredrik Lundh writes: > > FL> fwiw, I'd love to see a good syntax for this. might even > FL> change my mind... > > def foo(x): > self.x = x > > ? :) Hehe... actually, I'd take Skip's "_.x = x" over the above suggestion. The above syntax creates too much of an expectation to look for "self". There would, of course, be problems that self.x doesn't work in a method while _.x could. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake@acm.org Fri Apr 14 22:18:48 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 17:18:48 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: References: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: <14583.35640.746399.601030@seahag.cnri.reston.va.us> R. David Murray writes: > So it looks to my uneducated eye like file.tell() is broken. The actual > on-disk size of the file, by the way, is indeed 2147718485, so it looks > like somebody's not using the right size data structure somewhere. > > So, can anyone tell me what to look for, or am I stuck for the moment? Hmm. What is off_t defined to be on your platform? In config.h, is HAVE_FTELLO or HAVE_FTELL64 defined? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov@inrialpes.fr Fri Apr 14 22:21:59 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 23:21:59 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256378958-363995@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 04:18:56 PM Message-ID: <200004142121.XAA03202@python.inrialpes.fr> Gordon McMillan wrote: > > Ah. I see. Quite simply, you're arguing from First Principles Exactly. I think that these principles play an important role in the area of computer programming, because they put the markers in the evolution of our thoughts when we're trying to transcript the real world through formal computer terms. No kidding :-) So we need to put some limits before loosing completely these driving markers. No kidding. > in an area where I have none. too bad for you > I used to, but I found that all systems built from First Principles > (Eiffel, Booch's methodology...) yielded 3 headed monsters. Yes. This is the state Python tends to reach, btw. I'd like to avoid this madness. Put simply, if we loose the meaning of the notion of a class of objects, there's no need to have a 'class' keyword, because it would do more harm than good. > Personally, I regard (dynamic instance.attribute) as a handy feature Gordon, I know that it's handy! > not as a flaw in the object model. if we still pretend there is one... -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Fri Apr 14 22:22:08 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 23:22:08 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> Message-ID: <38F78C00.7BAE1C12@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > > but they won't -- if you don't use an encoding directive, and > > > don't use 8-bit characters in your string literals, everything > > > works as before. > > > > > > (that's why the default is "none" and not "utf-8") > > > > > > if you use 8-bit characters in your source code and wish to > > > add an encoding directive, you need to add the right encoding > > > directive... > > > > Fair enough, but this would render all the auto-coercion > > code currently in 1.6 useless -- all string to Unicode > > conversions would have to raise an exception. > > I though it was rather clear by now that I think the auto- > conversion stuff *is* useless... > > but no, that doesn't mean that all string to unicode conversions > need to raise exceptions -- any 8-bit unicode character obviously > fits into a 16-bit unicode character, just like any integer fits in a > long integer. > > if you convert the other way, you might get an OverflowError, just > like converting from a long integer to an integer may give you an > exception if the long integer is too large to be represented as an > ordinary integer. after all, > > i = int(long(v)) > > doesn't always raise an exception... This is exactly the same as proposing to change the default encoding to Latin-1. I don't have anything against that (being a native Latin-1 user :), but I would assume that other native language writer sure do: e.g. all programmers not using Latin-1 as native encoding (and there are lots of them). > > > > > why keep on pretending that strings and strings are two > > > > > different things? it's an artificial distinction, and it only > > > > > causes problems all over the place. > > > > > > > > Sure. The point is that we can't just drop the old 8-bit > > > > strings... not until Py3K at least (and as Fred already > > > > said, all standard editors will have native Unicode support > > > > by then). > > > > > > I discussed that in my original "all characters are unicode > > > characters" proposal. in my proposal, the standard string > > > type will have to roles: a string either contains unicode > > > characters, or binary bytes. > > > > > > -- if it contains unicode characters, python guarantees that > > > methods like strip, lower (etc), and regular expressions work > > > as expected. > > > > > > -- if it contains binary data, you can still use indexing, slicing, > > > find, split, etc. but they then work on bytes, not on chars. > > > > > > it's still up to the programmer to keep track of what a certain > > > string object is (a real string, a chunk of binary data, an en- > > > coded string, a jpeg image, etc). if the programmer wants > > > to convert between a unicode string and an external encoding > > > to use a certain unicode encoding, she needs to spell it out. > > > the codecs are never called "under the hood". > > > > > > (note that if you encode a unicode string into some other > > > encoding, the result is binary buffer. operations like strip, > > > lower et al does *not* work on encoded strings). > > > > Huh ? If the programmer already knows that a certain > > string uses a certain encoding, then he can just as well > > convert it to Unicode by hand using the right encoding > > name. > > I thought that was what I said, but the text was garbled. let's > try again: > > if the programmer wants to convert between a unicode > string and a buffer containing encoded text, she needs > to spell it out. the codecs are never called "under the > hood" Again and again... The orginal intent of the Unicode integration was trying to make Unicode and 8-bit strings interoperate without too much user intervention. At a cost (the UTF-8 encoding), but then if you do use this encoding (and this is not far fetched since there are input sources which do return UTF-8, e.g. TCL), the Unicode implementation will apply all its knowledge in order to get you satisfied. If you don't like this, you can always apply explicit conversion calls wherever needed. Latin-1 and UTF-8 are not compatible, the conversion is very likely to cause an exception, so the user will indeed be informed about this failure. > > The whole point we are talking about here is that when > > having the implementation convert a string to Unicode all > > by itself it needs to know which encoding to use. This is > > where we have decided long ago that UTF-8 should be > > used. > > does "long ago" mean that the decision cannot be > questioned? what's going on here? > > face it, I don't want to guess when and how the interpreter > will convert strings for me. after all, this is Python, not Perl. > > if I want to convert from a "string of characters" to a byte > buffer using a certain character encoding, let's make that > explicit. Hey, there's nothing which prevents you from doing so explicitly. > Python doesn't convert between other data types for me, so > why should strings be a special case? Sure it does: 1.5 + 2 == 3.5, 2L + 3 == 5L, etc... > > The pragma discussion is about a totally different > > issue: pragmas could make it possible for the programmer > > to tell the *compiler* which encoding to use for literal > > u"unicode" strings -- nothing more. Since "8-bit" strings > > currently don't have an encoding attached to them we store > > them as-is. > > what do I have to do to make you read my proposal? > > shout? > > okay, I'll try: > > THERE SHOULD BE JUST ONE INTERNAL CHARACTER > SET IN PYTHON 1.6: UNICODE. Please don't shout... simply read on... Note that you are again argueing for using Latin-1 as default encoding -- why don't you simply make this fact explicit ? > for consistency, let this be true for both 8-bit and 16-bit > strings (as well as Py3K's 31-bit strings ;-). > > there are many possible external string encodings, just like there > are many possible external integer encodings. but for integers, > that's not something that the core implementation cares much > about. why are strings different? > > > I don't want to get into designing a completely new > > character container type here... this can all be done for Py3K, > > but not now -- it breaks things at too many ends (even though > > it would solve the issues with strings being used in different > > contexts). > > you don't need to -- you only need to define how the *existing* > string type should be used. in my proposal, it can be used in two > ways: > > -- as a string of unicode characters (restricted to the > 0-255 subset, by obvious reasons). given a string 's', > len(s) is always the number of characters, s[i] is the > i'th character, etc. > > or > > -- as a buffer containing binary bytes. given a buffer 'b', > len(b) is always the number of bytes, b[i] is the i'th > byte, etc. > > this is one flavour less than in the 1.6 alphas -- where strings sometimes > contain UTF-8 (and methods like upper etc doesn't work), sometimes an > 8-bit character set (and upper works), and sometimes binary buffers (for > which upper doesn't work). Strings always contain data -- there's no encoding attached to them. If the user calls .upper() on a binary string the output will most probably no longer be usable... but that's the programmers fault, not the string type's fault. > (hmm. I've said all this before, haven't I?) You know as well as I do that the existing string type is used for both binary and text data. You cannot simply change this by introducing some new definition of what should be stored in buffers and what in strings... not until we officially redefined these things say in Py3K ;-) > frankly, I'm beginning to feel like John Skaller. do I have to write my > own interpreter to get this done right? :-( No, but you should have started this discussion in late November last year... not now, when everything has already been implemented and people are starting to the use the code that's there with great success. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Fri Apr 14 22:29:48 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 23:29:48 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <14583.29533.608524.961284@amarok.cnri.reston.va.us> Message-ID: <38F78DCC.C630F32@lemburg.com> "Andrew M. Kuchling" wrote: > > >why not just assume that the *ENTIRE SOURCE FILE* uses a single > >encoding, and let the tokenizer (or more likely, a conversion stage > >before the tokenizer) convert the whole thing to unicode. > > To reinforce Fredrik's point here, note that XML only supports > encodings at the level of an entire file (or external entity). You > can't tell an XML parser that a file is in UTF-8, except for this one > element whose contents are in Latin1. Hmm, this would mean that someone who writes: """ #pragma script-encoding utf-8 u = u"\u1234" print u """ would suddenly see "\u1234" as output. If that's ok, fine with me... it would make things easier on the compiler side (even though I'm pretty sure that people won't like this). BTW: I will be offline for the next week... I'm looking forward to where this dicussion will be heading. Have fun, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Fri Apr 14 22:43:16 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:43:16 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: On Fri, 14 Apr 2000, Fredrik Lundh wrote: > Tim Peters wrote: > > [Gordon McMillan] > > > ... > > > Or are you saying that if functions have attributes, people will > > > all of a sudden expect that function locals will have initialized > > > and maintained state? > > > > I expect that they'll expect exactly what happens in JavaScript, which > > supports function attributes too, and where it's often used as a > > nicer-than-globals way to get the effect of C-like mutable statics > > (conceptually) local to the function. > > so it's no longer an experimental feature, it's a "static variables" > thing? Don't be so argumentative. Tim suggested a possible use. Not what it really means or how it really works. I look at it as labelling a function with metadata about that function. I use globals or class attrs for "static" data. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Apr 14 22:45:51 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:45:51 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: Message-ID: On Fri, 14 Apr 2000, Mark Hammond wrote: > > I think that we get 95% of the benefit without any of the > > "dangers" > > (though I don't agree with the arguments against) if we allow the > > attachment of properties only at compile time and > > disallow mutation of > > them at runtime. > > AFAIK, this would be a pretty serious change. The compiler just > generates (basically)PyObject_SetAttr() calls. There is no way in > the current runtime to differentiate between "compile time" and > "runtime" attribute references... If this was done, it would simply > be ugly hacks to support what can only be described as unpythonic in > the first place! > > [Unless of course Im missing something...] You aren't at all! Paul hit his head, or he is assuming some additional work to allow the compiler to know more. I agree with you: compilation in Python is just code execution; there is no way Python can disallow runtime changes. (from a later note, it appears he is referring to introducing "decl", which I don't think is on the table for 1.6) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Apr 14 22:48:27 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:48:27 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004141530.RAA02277@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: >... > If you prefer embedded definitions, among other things, you could do: > > __zope_access__ = { 'Spam' : 'public' } > > class Spam: > __zope_access__ = { 'eggs' : 'private', > 'eats' : 'public' } > def eggs(self, ...): ... > def eats(self, ...): ... > > or have a completely separate class/structure for access control > (which is what you would do it in C, btw, for existing objects > to which you can't add slots, ex: file descriptors, mem segments, etc). This is uglier than attaching the metadata directly to the target that you are describing! If you want to apply metadata to functions, then apply them to the function! Don't shove them off in a separate structure. You're the one talking about cleanliness, yet you suggest something that is very poor from a readability, maintainability, and semantic angle. Ick. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Apr 14 22:52:22 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:52:22 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Fred L. Drake, Jr. wrote: > Ken Manheimer writes: > > (Oh, and i'd suggest up front that documentation for this feature > > recommend people not use "__*__" names for their own object attributes, to > > avoid collisions with eventual use of them by python.) > > Isn't that a standing recommendation for all names? Yup. Personally, I use "_*" for private variables or other "hidden" type things that shouldn't be part of an object's normal interface. For example, all the stuff that the Python/COM interface uses is prefixed by "_" to denote that it is metadata about the classes rather than part of its interface. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Apr 14 22:56:37 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:56:37 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141907.VAA02670@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: >... > Now, whenever there are two instances 'a' and 'b' of the class A, > the first inconsistency is that we're allowed to assign attributes > to these instances dynamically, which are not declared in the class A. > > Strictly speaking, if I say: > > >>> a.author = "Guido" > > and if 'author' is not an attribute of 'a' after the instantiation > of A (i.e. after a = A() completes), we should get a NameError. I'll repeat what Gordon said: the current Python behavior is entirely correct, entirely desirable, and should not (can not) change. Your views on what an object model should be are not Python's views. If the person who writes "a.author =" wants to do that, then let them. Python does not put blocks in people's way, it simply presumes that people are intelligent and won't do Bad Things. There are enumerable times where I've done the following: class _blank() pass data = _blank() data.item = foo data.extra = bar func(data) It is a tremendously easy way to deal with arbitrary data on an attribute basis, rather than (say) dictionary's key-based basis. >... arguments about alternate classes and stuff ... Sorry. That just isn't Python. Not in practice, nor in intent. Applying metadata to the functions is an entirely valid, Pythonic idea. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Apr 14 23:01:51 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:01:51 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004142121.XAA03202@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: > Gordon McMillan wrote: > > > > Ah. I see. Quite simply, you're arguing from First Principles > > Exactly. > > I think that these principles play an important role in the area > of computer programming, because they put the markers in the > evolution of our thoughts when we're trying to transcript the > real world through formal computer terms. No kidding :-) > So we need to put some limits before loosing completely these > driving markers. No kidding. In YOUR opinion. In MY opinion, they're bunk. Python provides me with the capabilities that I want: objects when I need them, and procedural flow when that is appropriate. It avoids obstacles and gives me freedom of expression and ways to rapidly develop code. I don't have to worry about proper organization unless and until I need it. Formalisms be damned. I want something that works for ME. Give me code, make it work, and get out of my way. That's what Python is good for. I could care less about "proper programming principles". Pragmatism. That's what I seek. >... > > I used to, but I found that all systems built from First Principles > > (Eiffel, Booch's methodology...) yielded 3 headed monsters. > > Yes. This is the state Python tends to reach, btw. I'd like to avoid > this madness. Does not. There are many cases where huge systems have been built using Python, built well, and are quite successful. And yes, there have also been giant, monster-sized Bad Python Programs out there, too. But that can be done in ANY language. Python doesn't *tend* towards that at all. Certainly, Perl does, but we aren't talking about that (until now :-) > Put simply, if we loose the meaning of the notion of a class of objects, > there's no need to have a 'class' keyword, because it would do more harm > than good. Huh? What the heck do you mean by this? >... > > not as a flaw in the object model. > > if we still pretend there is one... It *DOES* have one. To argue there isn't one is simply insane and argumentative. Python just doesn't have YOUR object model. Live with it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov@inrialpes.fr Fri Apr 14 23:00:19 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 00:00:19 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on In-Reply-To: from "Greg Stein" at Apr 14, 2000 02:56:37 PM Message-ID: <200004142200.AAA03409@python.inrialpes.fr> Greg Stein wrote: > > Your views on what an object model should be are not Python's views. Ehm, could you explain to me what are Python's views? Sorry, I don't see any worthy argument in your posts that would make me switch from -1 to -0. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip@mojam.com (Skip Montanaro) Fri Apr 14 23:00:30 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 14 Apr 2000 17:00:30 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Message-ID: <14583.38142.520596.804466@beluga.mojam.com> Barry said "_" is effectively taken because it means something (at least when used a function?) to pygettext. How about "__" then? def bar(): print __.x def foo(): print __.x foo.x = "public" bar.x = "private" ... It has the added benefit that this usage adheres to the "Python gets to stomp on __-prefixed variables" convention. my-underscore-key-works-better-than-yours-ly y'rs, Skip From gstein@lyra.org Fri Apr 14 23:13:25 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:13:25 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142200.AAA03409@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Your views on what an object model should be are not Python's views. > > Ehm, could you explain to me what are Python's views? > Sorry, I don't see any worthy argument in your posts > that would make me switch from -1 to -0. "We're all adults here." Python says that you can do what you want. It won't get in your way. Badness is not defined. If somebody wants to write "a.author='Guido'" then they can. There are a number of objects that can have arbitrary attributes. Classes, modules, and instances are a few (others?). Function objects are a proposed additional one. In all cases, attaching new attributes is fine and dandy -- no restriction. (well, you can implement __setattr__ on a class instance) Python's object model specifies a number of other behaviors, but nothing really material here. Of course, all these "views" are simply based on Guido's thoughts and the implementation. Implementation, doc, current practice, and Guido's discussions over the past eight years of Python's existence have also contributed to the notion of "The Python Way". Some of that may be very hard to write down, although I've attempted to write a bit of that above. After five years of working with Python, I'd like to think that I've absorbed and understand the Python Way. Can I state it? No. "We're all adults here" is a good one for this discussion. If you think that function attributes are bad for your programs, then don't use them. There are many others who find them tremendously handy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Apr 14 23:19:24 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:19:24 -0700 (PDT) Subject: [Python-Dev] veto? (was: Object customization) In-Reply-To: <200004142200.AAA03409@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Your views on what an object model should be are not Python's views. > > Ehm, could you explain to me what are Python's views? > Sorry, I don't see any worthy argument in your posts > that would make me switch from -1 to -0. Note that all votes are important, but only a signal to Guido about our individual feelings on the matter. Every single person on this list could vote -1, and Guido can still implement the feature (at his peril :-). Conversely, we could all vote +1 and he can refuse to implement it. In this particular case, your -1 vote says that you really dislike this feature. Great. And you've provided a solid explanation why. Even better! Now, people can respond to your vote and attempt to get you to change it. This is goodness because maybe you voted -1 based on a misunderstanding or something unclear in the proposal (I'm talking general now; I don't believe that is the case here). After explanation and enlightenment, you could change the vote. The discussion about *why* you voted -1 is also instructive to Guido. It may raise an issue that he hadn't considered. In addition, people attempting to change your mind are also providing input to Guido. [ maybe too much input is flying around, but the principle is there :-) ] Basically, we can call them vetoes or votes. Either way, this is still Guido's choice :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov@inrialpes.fr Fri Apr 14 23:54:24 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 00:54:24 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Greg Stein" at Apr 14, 2000 03:13:25 PM Message-ID: <200004142254.AAA03535@python.inrialpes.fr> Greg Stein wrote: > > Python says that you can do what you want. 'Python' says nothing. Or are you The Voice of Python? If custom object attributes are convenient for you, then I'd suggest to generalize the concept, because I perceived it as a limitation too, but not for functions and methods. I'll repeat myself: >>> wink >>> wink.fraction = 1e+-1 >>> wink.fraction.precision = 1e-+1 >>> wink.compute() 0.0 Has anybody noticed that 'fraction' is a float I wanted to qualify with a 'precision' attribute? Again: if we're about to go that road, let's do it in one shot. *This* is what would change my vote. I'll leave Guido to cut the butter, or to throw it all out the window. You're right Greg: I hardly can contribute more in this case, even if I wanted to. Okay, +53 :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip@mojam.com (Skip Montanaro) Sat Apr 15 00:01:14 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Fri, 14 Apr 2000 18:01:14 -0500 (CDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142254.AAA03535@python.inrialpes.fr> References: <200004142254.AAA03535@python.inrialpes.fr> Message-ID: <14583.41786.144784.114440@beluga.mojam.com> Vladimir> I'll repeat myself: >>>> wink Vladimir> >>>> wink.fraction = 1e+-1 >>>> wink.fraction.precision = 1e-+1 >>>> wink.compute() Vladimir> 0.0 Vladimir> Has anybody noticed that 'fraction' is a float I wanted to Vladimir> qualify with a 'precision' attribute? Quick comment before I rush home... There is a significant cost to be had by adding attributes to numbers (ints at least). They can no longer be shared in the int cache. I think the runtime size increase would be pretty huge, as would the extra overhead in creating all those actual (small) IntObjects instead of sharing a single copy. On the other hand, functions are already pretty heavyweight objects and occur much less frequently than numbers in common Python programs. They aren't shared (except for instance methods, which Barry's patch already excludes), so there's no risk of stomping on attributes that are shared by more than one function. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From gstein@lyra.org Sat Apr 15 00:14:01 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 16:14:01 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142254.AAA03535@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Python says that you can do what you want. > > 'Python' says nothing. Or are you The Voice of Python? Well, yah. You're just discovering that?! :-) I meant "The Python Way" says that you can do what you want. It doesn't speak often, but if you know how to hear it... it is a revelation :-) > If custom object attributes are convenient for you, then I'd suggest to Custom *function* attributes. Functions are one of the few objects in Python that are "structural" in their intent and use, yet have no way to record data. Modules and classes have a way to, but not functions. [ by "structure", I mean something that contributes to the structure, organization, and mechanics of your program. as opposed to data, such as lists, dicts, instances. ] And ditto what Skip said about attaching attributes to ints and other immutables. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond@skippinet.com.au Sat Apr 15 02:45:27 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sat, 15 Apr 2000 11:45:27 +1000 Subject: Re[Python-Dev] #pragmas in Python source code In-Reply-To: <14583.29533.608524.961284@amarok.cnri.reston.va.us> Message-ID: I can see the dilemma, but... > Maybe we should consider being more conservative, and > just having the > Unicode built-in type, the unicode() built-in function, > and the u"..." > notation, and then leaving all responsibility for > conversions up to > the user. Win32 and COM has been doing exactly this for the last couple of years. And it sucked. > On the other hand, *some* default conversion > seems needed, > because it seems draconian to make open(u"abcfile") fail with a > TypeError. For exactly this reason. The end result is that the first thing you ever do with a Unicode object is convert it to a string. > (While I want to see Python 1.6 expedited, I'd also not > like to see it > saddled with a system that proves to have been a mistake, or one > that's a maintenance burden. If forced to choose between > delaying and > getting it right, the latter wins.) Agreed. I thought this implementation stemmed from Guido's desire to do it this way in the 1.x family, and move towards Fredrik's proposal for Py3k. As a geneal comment: Im a little confused and dissapointed here. We are all bickering like children while our parents are away. All we are doing is creating a _huge_ pile of garbage for Guido to ignore when he returns. We are going to be presenting Guido with around 400 messages at my estimate. He can't possibly read them all. So the end result is that all the posturing and flapping going on here is for naught, and he is just going to do whatever he wants anyway - as he always has done, and as has worked so well for Python. Sheesh - we should all consider how we can be the most effective, not the most loud or aggressive! Mark. From Moshe Zadka Sat Apr 15 06:06:00 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 15 Apr 2000 07:06:00 +0200 (IST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141530.RAA02277@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: > If you prefer embedded definitions, among other things, you could do: > > __zope_access__ = { 'Spam' : 'public' } > > class Spam: > __zope_access__ = { 'eggs' : 'private', > 'eats' : 'public' } > def eggs(self, ...): ... > def eats(self, ...): ... This solution is close to what the eff-bot suggested. In this case it is horrible because of "editing effort": the meta-data and code of a function are better off together physically, so you would change it to class Spam: __zope_access__ = {} def eggs(self): pass __zope_access__['eggs'] = 'private' def eats(self): pass __zope_access__['eats'] = 'public' Which is way too verbose. Especially, if the method gets thrown around, you find yourself doing things like meth.im_class.__zope_access__[meth.im_func.func_name] Instead of meth.__zope_access__ And sometimes you write a function: def do_something(self): pass And the infrastructure adds the method to a class of its choice. Where would you stick the attribute then? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tim_one@email.msn.com Sat Apr 15 06:51:57 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 15 Apr 2000 01:51:57 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan andPR#7) In-Reply-To: Message-ID: <000701bfa69e$b5d768e0$092d153f@tim> [Tim] > Well, while an instance of graph isomorphism, this one is a relatively > simple special case (because "the graphs" here are rooted, directed, and > have ordered children). [Moshe Zadka] > Ordered? What about dictionaries? An ordering of a dict's kids is forced in the context of comparison (see dict_compare in dictobject.c). From Vladimir.Marangozov@inrialpes.fr Sat Apr 15 07:56:44 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 08:56:44 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: <14583.41786.144784.114440@beluga.mojam.com> from "Skip Montanaro" at Apr 14, 2000 06:01:14 PM Message-ID: <200004150656.IAA03994@python.inrialpes.fr> Skip Montanaro wrote: > > Vladimir> Has anybody noticed that 'fraction' is a float I wanted to > Vladimir> qualify with a 'precision' attribute? > > Quick comment before I rush home... There is a significant cost to be had > by adding attributes to numbers (ints at least). They can no longer be > shared in the int cache. I think the runtime size increase would be pretty > huge, as would the extra overhead in creating all those actual (small) > IntObjects instead of sharing a single copy. I know that. Believe it or not, I have a good image of the cost it would infer, better than yours. because I've thought about this problem (as well as other related problems yet to be 'discovered'), and have spent some time in the past trying to find a couple of solutions to them. However, I eventually decided to stop my time machine and wait for these issues to show up, then take a stance on them. And this is what I did in this case. I'm tired to lack good arguments and see incoming capitalized words. This makes no sense here. Go to c.l.py and repeat "we're all adults here" *there*, please. To close this chapter, I think that if this gets in, Python's user base will get more confused and would have to swallow yet another cheap gimmick. You won't be able to explain it well to them. They won't really understand it, because their brains are still young, inexperienced, looking for logical explanations where all notions coexist peacefully. In the long term, what you're pushing for to get your money quickly, isn't a favor. And that's why I maintain my vote. call-me-again-if-you-need-more-than-53'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal@lemburg.com Sat Apr 15 10:28:15 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 15 Apr 2000 11:28:15 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: Message-ID: <38F8362F.51C2F7EC@lemburg.com> Mark Hammond wrote: > > I thought this implementation stemmed from Guido's desire > to do it this way in the 1.x family, and move towards Fredrik's > proposal for Py3k. Right. Let's do this step by step and get some experience first. With that gained experience we can still polish up the design towards a compromise which best suits all our needs. The integration of Unicode into Python is comparable to the addition of floats to an interpreter which previously only understood integers -- things are obviously going to be a little different than before. Our goal should be to make it as painless as possible and at least IMHO this can only be achieved by gaining practical experience in this new field first. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Fredrik Lundh" <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <38F78C00.7BAE1C12@lemburg.com> Message-ID: <002101bfa6c3$73e6c3c0$34aab5d4@hagrid> > This is exactly the same as proposing to change the default > encoding to Latin-1. no, it isn't. here's what I'm proposing: -- the internal character set is unicode, and nothing but unicode. in 1.6, this applies to strings. in 1.7 or later, it applies to source code as well. -- the default source encoding is "unknown" -- the is no other default encoding. all strings use the unicode character set. to give you some background, let's look at section 3.2 of the existing language definition: [Sequences] represent finite ordered sets indexed by natural numbers. The built-in function len() returns the number of items of a sequence. When the length of a sequence is n, the index set contains the numbers 0, 1, ..., n-1. Item i of sequence a is selected by a[i]. An object of an immutable sequence type cannot change once it is created. The items of a string are characters. There is no separate character type; a character is represented by a string of one item. Characters represent (at least) 8-bit bytes. The built-in functions chr() and ord() convert between characters and nonnegative integers representing the byte values. Bytes with the values 0-127 usually represent the corre- sponding ASCII values, but the interpretation of values is up to the program. The string data type is also used to represent arrays of bytes, e.g., to hold data read from a file.=20 (in other words, given a string s, len(s) is the number of characters in the string. s[i] is the i'th character. len(s[i]) is 1. etc. the existing string type doubles as byte arrays, where given an array b, len(b) is the number of bytes, b[i] is the i'th byte, etc). my proposal boils down to a few small changes to the last three sentences in the definition. basically, change "byte value" to "character code" and "ascii" to "unicode": The built-in functions chr() and ord() convert between characters and nonnegative integers representing the character codes. Character codes usually represent the corresponding unicode values. The 8-bit string data type is also used to represent arrays of bytes, e.g., to hold data read from a file. that's all. the rest follows from this. ... just a few quickies to sort out common misconceptions: > I don't have anything against that (being a native Latin-1 > user :), but I would assume that other native language > writer sure do: e.g. all programmers not using Latin-1 > as native encoding (and there are lots of them). the unicode folks have already made that decision. I find it very strange that we should use *another* model for the first 256 characters, just to "equally annoy everyone". (if people have a problem with the first 256 unicode characters having the same internal representation as the ISO 8859-1 set, tell them to complain to the unicode folks). > (and this is not far fetched since there are input sources > which do return UTF-8, e.g. TCL), the Unicode implementation > will apply all its knowledge in order to get you satisfied. there are all sorts of input sources. major platforms like windows and java use 16-bit unicode. and Tcl has an internal unicode string type, since they realized that storing UTF-8 in 8-bit strings was horridly inefficient (they tried to do it right, of course). the internal type looks like this: typedef unsigned short Tcl_UniChar; typedef struct String { int numChars; size_t allocated; size_t uallocated; Tcl_UniChar unicode[2]; } String; (Tcl uses dual-ported objects, where each object can have an UTF-8 string representation in addition to the internal representation. if you change one of them, the other is recalculated on demand) in fact, it's Tkinter that converts the return value to UTF-8, not Tcl. that can be fixed. > > Python doesn't convert between other data types for me, so > > why should strings be a special case? >=20 > Sure it does: 1.5 + 2 =3D=3D 3.5, 2L + 3 =3D=3D 5L, etc... but that's the key point: 2L and 3 are both integers, from the same set of integers. if you convert a long integer to an integer, it still contains an integer from the same set. (maybe someone can fill me in here: what's the formally correct word here? set? domain? category? universe?) also, if you convert every item in a sequence of long integers to ordinary integers, all items are still members of the same integer set. in contrast, the UTF-8 design converts between strings of characters, and arrays of bytes. unless you change the 8-bit string type to know about UTF-8, that means that you change string items from one domain (characters) to another (bytes). > Note that you are again argueing for using Latin-1 as > default encoding -- why don't you simply make this fact > explicit ? nope. I'm standardizing on a character set, not an encoding. character sets are mapping between integers and characters. in this case, we use the unicode character set. encodings are ways to store strings of text as bytes in a byte array. > not now, when everything has already been implemented and > people are starting to the use the code that's there with great > success. the positive reports I've seen all rave about the codec frame- work. that's a great piece of work. without that, it would have been impossible to do what I'm proposing. (so what are you complaining about? it's all your fault -- if you hadn't done such a great job on that part of the code, I wouldn't have noticed the warts ;-) if you look at my proposal from a little distance, you'll realize that it doesn't really change much. all that needs to be done is to change some of the conversion stuff. if we decide to do this, I can do the work for you, free of charge. From Fredrik Lundh" <38F8362F.51C2F7EC@lemburg.com> Message-ID: <005801bfa6c7$b22783a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > Right. Let's do this step by step and get some experience first. > With that gained experience we can still polish up the design > towards a compromise which best suits all our needs. so practical experience from other languages, other designs, and playing with the python alphas doesn't count? > The integration of Unicode into Python is comparable to the > addition of floats to an interpreter which previously only > understood integers. use "long integers" instead of "floats", and you'll get closer to the actual case. but where's the problem? python has solved this problem for numbers, and what's more important: the language reference tells us how strings are supposed to work: "The items of a string are characters." (see previous mail) "Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters." this solves most of the issues. to handle the rest, look at the language reference description of integer: [Integers] represent elements from the mathematical set of whole numbers. Borrowing the "elements from a single set" concept, define characters as Characters represent elements from the unicode character set. and let all mixed-string operations use string coercion, just like numbers. can it be much simpler? From Fredrik Lundh" <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <14583.29533.608524.961284@amarok.cnri.reston.va.us> <38F78DCC.C630F32@lemburg.com> Message-ID: <00c901bfa6cd$62259580$34aab5d4@hagrid> M.-A. Lemburg wrote: > > To reinforce Fredrik's point here, note that XML only supports > > encodings at the level of an entire file (or external entity). You > > can't tell an XML parser that a file is in UTF-8, except for this = one > > element whose contents are in Latin1. >=20 > Hmm, this would mean that someone who writes: >=20 > """ > #pragma script-encoding utf-8 >=20 > u =3D u"\u1234" > print u > """ >=20 > would suddenly see "\u1234" as output. not necessarily. consider this XML snippet: ሴ if I run this through an XML parser and write it out as UTF-8, I get: =E1^=B4 in other words, the parser processes "&#x" after decoding to unicode, not before. I see no reason why Python cannot do the same. From Fredrik Lundh" Message-ID: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Greg Stein wrote: > On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > > >>>>> "FL" =3D=3D Fredrik Lundh writes: > >=20 > > FL> fwiw, I'd love to see a good syntax for this. might even > > FL> change my mind... > >=20 > > def foo(x): > > self.x =3D x > >=20 > > ? :) >=20 > Hehe... actually, I'd take Skip's "_.x =3D x" over the above = suggestion. The > above syntax creates too much of an expectation to look for "self". = There > would, of course, be problems that self.x doesn't work in a method = while > _.x could. how about the obvious one: adding the name of the function to the local namespace? def foo(x): foo.x =3D x (in theory, this might of course break some code. but is that a real problem?) after all, my concern is that the above appears to work, but mostly by accident: >>> def foo(x): >>> foo.x =3D x >>> foo(10) >>> foo.x 10 >>> # cool. now let's try this on a method >>> class Foo: >>> def bar(self, x): >>> bar.x =3D x >>> foo =3D Foo() >>> foo.bar(10) Traceback (most recent call first): NameError: bar >>> # huh? maybe making it work in both cases would help? ... but on third thought, maybe it's sufficient to just keep the "static variable" aspect out of the docs. I just browsed a number of javascript manuals, and I couldn't find a trace of this feature. so how about this? -0.90000000000000002 on documenting this as "this can be used to store static data in a function" +1 on the feature itself. From Vladimir.Marangozov@inrialpes.fr Sat Apr 15 15:33:51 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 16:33:51 +0200 (CEST) Subject: [Python-Dev] veto? (was: Object customization) In-Reply-To: from "Greg Stein" at Apr 14, 2000 03:19:24 PM Message-ID: <200004151433.QAA04376@python.inrialpes.fr> Greg Stein wrote: > > Note that all votes are important, but only a signal to Guido ... > [good stuff deleted] Very true. I think we've made good progress here. > Now, people can respond to your vote and attempt to get you to change it. > ... After explanation and enlightenment, you could change the vote. Or vice-versa :-) Fredrik has been very informative about the evolution of his opinions as the discussion evolved. As was I, but I don't count ;-) It would be nice if we adopt his example and send more signals to Guido, emitted with a fixed (positive or negative) or with a sinusoidal frequency. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From bwarsaw@cnri.reston.va.us Sat Apr 15 17:45:27 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 15 Apr 2000 12:45:27 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <14583.25258.427604.293809@seahag.cnri.reston.va.us> <14583.38142.520596.804466@beluga.mojam.com> Message-ID: <14584.40103.511210.929864@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> Barry said "_" is effectively taken because it means something SM> (at least when used a function?) to pygettext. How about "__" SM> then? oops, yes, only when used as a function. so _.x would be safe. From bwarsaw@cnri.reston.va.us Sat Apr 15 17:52:56 2000 From: bwarsaw@cnri.reston.va.us (bwarsaw@cnri.reston.va.us) Date: Sat, 15 Apr 2000 12:52:56 -0400 (EDT) Subject: [Python-Dev] Object customization References: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Message-ID: <14584.40552.13918.707019@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> so how about this? FL> -0.90000000000000002 on documenting this as "this FL> can be used to store static data in a function" FL> +1 on the feature itself. Works for me! I think function attrs would be a lousy place to put statics anyway. -Barry From tim_one@email.msn.com Sat Apr 15 18:43:05 2000 From: tim_one@email.msn.com (Tim Peters) Date: Sat, 15 Apr 2000 13:43:05 -0400 Subject: [Python-Dev] Object customization In-Reply-To: <14584.40552.13918.707019@anthem.cnri.reston.va.us> Message-ID: <000801bfa702$0e1b3320$8a2d153f@tim> [/F] > -0.90000000000000002 on documenting this as "this > can be used to store static data in a function" -1 on that part from me. I never recommended to do it, I merely predicted that people *will* do it. And, they will. > +1 on the feature itself. I remain +0. [Barry] > Works for me! I think function attrs would be a lousy place to put > statics anyway. Yes, but the alternatives are also lousy: a global, or abusing default args. def f(): f.n = f.n + 1 return 42 f.n = 0 ... print "f called", f.n, "times" vs _f_n = 0 def f(): global _f_n _f_n = _f_n + 1 return 42 ... print "f called", _f_n, "times" vs def f(n=[0]): n[0] = n[0] + 1 return 42 ... print "f called ??? times" As soon as s person bumps into the first way, they're likely to adopt it, simply because it's less lousy than the others on first sight. From Moshe Zadka Sat Apr 15 18:44:25 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 15 Apr 2000 19:44:25 +0200 (IST) Subject: [Python-Dev] Object customization In-Reply-To: <000801bfa702$0e1b3320$8a2d153f@tim> Message-ID: On Sat, 15 Apr 2000, Tim Peters wrote: [Barry] > Works for me! I think function attrs would be a lousy place to put > statics anyway. [Tim Peters] > Yes, but the alternatives are also lousy: a global, or abusing default > args. Personally I kind of like the alternative of a class: class _Foo: def __init__(self): self.n = 0 def f(self): self.n = self.n+1 return 42 f = _Foo().f getting-n-out-of-f-is-left-as-an-exercise-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal@lemburg.com Sun Apr 16 16:52:20 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sun, 16 Apr 2000 17:52:20 +0200 Subject: [Python-Dev] Python source code encoding Message-ID: <38F9E1B4.F664D173@lemburg.com> [Fredrik]: > [MAL]: > > > To reinforce Fredrik's point here, note that XML only supports > > > encodings at the level of an entire file (or external entity). You > > > can't tell an XML parser that a file is in UTF-8, except for this one > > > element whose contents are in Latin1. > > > > Hmm, this would mean that someone who writes: > > > > """ > > #pragma script-encoding utf-8 > > > > u = u"\u1234" > > print u > > """ > > > > would suddenly see "\u1234" as output. > > not necessarily. consider this XML snippet: > > > ሴ > > if I run this through an XML parser and write it > out as UTF-8, I get: > > á^´ > > in other words, the parser processes "&#x" after > decoding to unicode, not before. > > I see no reason why Python cannot do the same. Sure, and this is what I meant when I said that the compiler has to deal with several different encodings. Unicode escape sequences are currently handled by a special codec, the unicode-escape codec which reads all characters with ordinal < 256 as-is (meaning Latin-1, since the first 256 Unicode ordinals map to Latin-1 characters (*)) except a few escape sequences which it processes much like the Python parser does for 8-bit strings and the new \uXXXX escape. Perhaps we should make this processing use two levels... the escape codecs would need some rewriting to process Unicode-> Unicode instead of 8-bit->Unicode as they do now. -- To move along the method Fredrik is proposing I would suggest (for Python 1.7) to introduce a preprocessor step which gets executed even before the tokenizer. The preprocessor step would then translate char* input into Py_UNICODE* (using an encoding hint which would have to appear in the first few lines of input using some special format). The tokenizer could then work on Py_UNICODE* buffer and the parser would then take care of the conversion from Py_UNICODE* back to char* for Python's 8-bit strings. It should shout out loud in case it sees input data outside Unicode range(256) in what is supposed to be a 8-bit string. To make this fully functional we would have to change the 8-bit string to Unicode coercion mechanism, though. It would have to make a Latin-1 assumption instead of the current UTF-8 assumption. In contrast to the current scheme, this assumption would be correct for all constant strings appearing in source code given the above preprocessor logic. For strings constructed from file or user input the programmer would have to assure proper encoding or do the Unicode conversion himself. Sidenote: The UTF-8->Latin-1 change would probably also have to be propogated to all other Unicode in/output logic -- perhaps Latin-1 is the better default encoding after all... A programmer could then write a Python script completely in UTF-8, UTF-16 or Shift-JIS and the above logic would convert the input data to Unicode or Latin-1 (which is 8-bit Unicode) as appropriate and it would warn about impossible conversions to Latin-1 in the compile step. The programmer would still have to make sure that file and user input gets converted using the proper encoding, but this can easily be done using the stream wrappers in the standard codecs module. Note that in this discussion we need to be very careful not to mangle encodings used for source code and ones used when reading/writing to files or other streams (including stdin/stdout). BTW, to experiment with all this you can use the codecs.EncodedFile stream wrapper. It allows specifying both data and stream side encodings, e.g. you can redirect a UTF-8 stdin stream to Latin-1 returning file object which can then be used as source of data input. (*) The conversion from Unicode to Latin-1 is similar to converting a 2-byte unsigned short to an unsigned byte with some extra logic to catch data loss. Latin-1 is comparable to 8-bit Unicode... this is where all this talk about Latin-1 originates from :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov@inrialpes.fr Sun Apr 16 21:28:41 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Sun, 16 Apr 2000 22:28:41 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Greg Stein" at Apr 14, 2000 02:48:27 PM Message-ID: <200004162028.WAA06045@python.inrialpes.fr> [Skip, on attaching access control rights to function objects] > [VM] > >... > > If you prefer embedded definitions, among other things, you could do: > > > > __zope_access__ = { 'Spam' : 'public' } > > > > class Spam: > > __zope_access__ = { 'eggs' : 'private', > > 'eats' : 'public' } > > def eggs(self, ...): ... > > def eats(self, ...): ... [Greg] > This is uglier than attaching the metadata directly to the target that you > are describing! If you want to apply metadata to functions, then apply > them to the function! Don't shove them off in a separate structure. > > You're the one talking about cleanliness, yet you suggest something that > is very poor from a readability, maintainability, and semantic angle. Ick. [Moshe] > This solution is close to what the eff-bot suggested. In this case it > is horrible because of "editing effort": the meta-data and code of a > function are better off together physically, so you would change it > to ... > [equivalent solution deleted] In this particular use case, we're discussing access control rights which are part of some protection policy. A protection policy is a matrix Objects/Rights. It can be impemented in 3 ways, depending on the system: 1. Attach the Rights to the Objects 2. Attach the Objects to the Rights 3. Have a separate structure which implements the matrix. I agree that in this particular case, it seems handy to attach the rights to the objects. But in other cases, it's more appropriate to attach the objects to the rights. However, the 3rd solution is the one to be used when the objects (respectively, the rights) are fixed from the start and cannot be modified, and solution 2 (resp, 3) is not desirable/optimal/plausible... That's what I meant with: [VM] > > or have a completely separate class/structure for access control > > (which is what you would do it in C, btw, for existing objects > > to which you can't add slots, ex: file descriptors, mem segments, etc). Which presents an advantage: the potential to change completely the protection policy of the system in future versions of the software, because the protection implementation is decoupled from the objects' and the rights' implementation. damned-but-persistent-first-principles-again-'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From klm@digicool.com Sun Apr 16 22:45:00 2000 From: klm@digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 17:45:00 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: On Sun, 16 Apr 2000 Vladimir.Marangozov@inrialpes.fr wrote: > > [Skip, on attaching access control rights to function objects] > > > [VM] > > >... > > > If you prefer embedded definitions, among other things, you could do: > > > > > > __zope_access__ = { 'Spam' : 'public' } > > > > > > class Spam: > > > __zope_access__ = { 'eggs' : 'private', > > > 'eats' : 'public' } > > > def eggs(self, ...): ... > > > def eats(self, ...): ... > > [Greg] > > This is uglier than attaching the metadata directly to the target that you > > are describing! If you want to apply metadata to functions, then apply > > them to the function! Don't shove them off in a separate structure. > [...] > In this particular use case, we're discussing access control rights > which are part of some protection policy. > A protection policy is a matrix Objects/Rights. It can be impemented > in 3 ways, depending on the system: > 1. Attach the Rights to the Objects > 2. Attach the Objects to the Rights > 3. Have a separate structure which implements the matrix. > [...] > [VM] > > > or have a completely separate class/structure for access control > > > (which is what you would do it in C, btw, for existing objects > > > to which you can't add slots, ex: file descriptors, mem segments, etc). > > Which presents an advantage: the potential to change completely the > protection policy of the system in future versions of the software, > because the protection implementation is decoupled from the objects' > and the rights' implementation. It may well make sense to have the system *implement* the rights somewhere else. (Distributed system, permissions caches in an object system, etc.) However it seems to me to make exceeding sense to have the initial intrinsic settings specified as part of the object! More generally, it is the ability to associate intrinsic metadata that is the issue, not the designs of systems that employ the metadata. Geez. And, in the case of functions, it seems to me to be outstandingly consistent with python's treatment of objects. I'm mystified about why you would reject that so adamantly! That said, i can entirely understand concerns about whether or how to express references to the metadata from within the function's body. We haven't even seen a satisfactory approach to referring to the function, itself, from within the function. Maybe it's not even desirable to be able to do that - that's an interesting question. (I happen to think it's a good idea, just requiring a suitable means of expression.) But being able to associate metadata with functions seems like a good idea, and i've seen no relevant clues in your "first principles" about why it would be bad. Ken klm@digicool.com Return-Path: Delivered-To: python-dev@python.org Received: from merlin.codesourcery.com (merlin.codesourcery.com [206.168.99.1]) by dinsdale.python.org (Postfix) with SMTP id 7312F1CD5A for ; Sat, 15 Apr 2000 12:50:20 -0400 (EDT) Received: (qmail 17758 invoked by uid 513); 15 Apr 2000 16:57:54 -0000 Mailing-List: contact sc-publicity-help@software-carpentry.com; run by ezmlm Precedence: bulk X-No-Archive: yes Delivered-To: mailing list sc-publicity@software-carpentry.com Delivered-To: moderator for sc-publicity@software-carpentry.com Received: (qmail 16214 invoked from network); 15 Apr 2000 16:19:12 -0000 Date: Sat, 15 Apr 2000 12:11:52 -0400 (EDT) From: To: sc-announce@software-carpentry.com, sc-publicity@software-carpentry.com Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: [Python-Dev] Software Carpentry entries now on-line Sender: python-dev-admin@python.org Errors-To: python-dev-admin@python.org X-BeenThere: python-dev@python.org X-Mailman-Version: 2.0beta3 List-Id: Python core developers First-round entries in the Software Carpentry design competition are now available on-line at: http://www.software-carpentry.com/entries/index.html Our thanks to everyone who entered; we look forward to some lively discussion on the "sc-discuss" list. Best regards, Greg Wilson Software Carpentry Project Coordinator From skip@mojam.com (Skip Montanaro) Sun Apr 16 23:06:08 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Sun, 16 Apr 2000 17:06:08 -0500 (CDT) Subject: [Python-Dev] Object customization In-Reply-To: References: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: <14586.14672.949500.986951@beluga.mojam.com> Ken> We haven't even seen a satisfactory approach to referring to the Ken> function, itself, from within the function. Maybe it's not even Ken> desirable to be able to do that - that's an interesting question. I hereby propose that within a function the special name __ refer to the function. You could have def fact(n): if n <= 1: return 1 return __(n-1) * n You could also refer to function attributes through __ (presuming Barry's proposed patch gets adopted): def pub(*args): if __.access == "private": do_private_stuff(*args) else: do_public_stuff(*args) ... if validate_permissions(): pub.access = "private" else: pub.access = "public" When in a bound method, __ should refer to the bound method, not the unbound method, which is already accessible via the class name. As far as lexical scopes are concerned, this won't change anything. I think it could be implemented by adding a reference to the function called __ in the local vars of each function. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From klm@digicool.com Sun Apr 16 23:09:18 2000 From: klm@digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:09:18 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Message-ID: On Sat, 15 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > > > >>>>> "FL" == Fredrik Lundh writes: > > > > > > FL> fwiw, I'd love to see a good syntax for this. might even > > > FL> change my mind... > > > > > > def foo(x): > > > self.x = x > > > > > > ? :) > > > > Hehe... actually, I'd take Skip's "_.x = x" over the above suggestion. The > > above syntax creates too much of an expectation to look for "self". There > > would, of course, be problems that self.x doesn't work in a method while > > _.x could. > > how about the obvious one: adding the name of the > function to the local namespace? > > def foo(x): > foo.x = x 'self.x' would collide profoundly with the convention of using 'self' for the instance-argument in bound methods. Here, foo.x assumes that 'foo' is not rebound in the context of the def - the class, module, function, or wherever it's defined. That seems like an unnecessarily too strong an assumption. Both of these things suggest to me that we don't want to use a magic variable name, but rather some kind of builtin function to get the object (lexically) containing the block. It's tempting to name it something like 'this()', but that would be much too easily confused in methods with 'self'. Since we're looking for the lexically containing object, i'd call it something like current_object(). class Something: """Something's cooking, i can feel it.""" def help(self, *args): """Spiritual and operational guidance for something or other. Instructions for using help: ...""" print self.__doc__ print current_object().__doc__ if args: self.do_mode_specific_help(args) I think i'd be pretty happy with the addition of __builtins__.current_object, and the allowance of arbitrary metadata with functions (and other funtion-like objects like methods). Ken klm@digicool.com From klm@digicool.com Sun Apr 16 23:12:29 2000 From: klm@digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:12:29 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14584.40552.13918.707019@anthem.cnri.reston.va.us> Message-ID: On Sat, 15 Apr 2000 bwarsaw@cnri.reston.va.us wrote: > >>>>> "FL" == Fredrik Lundh writes: > > FL> so how about this? > > FL> -0.90000000000000002 on documenting this as "this > FL> can be used to store static data in a function" > > FL> +1 on the feature itself. > > Works for me! I think function attrs would be a lousy place to put > statics anyway. Huh? Why? (I don't have a problem with omitting mention of this use - seems like encouraging the use of globals, often a mistake.) Ken klm@digicool.com From klm@digicool.com Sun Apr 16 23:21:59 2000 From: klm@digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:21:59 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14586.14672.949500.986951@beluga.mojam.com> Message-ID: On Sun, 16 Apr 2000, Skip Montanaro wrote: > > Ken> We haven't even seen a satisfactory approach to referring to the > Ken> function, itself, from within the function. Maybe it's not even > Ken> desirable to be able to do that - that's an interesting question. > > I hereby propose that within a function the special name __ refer to the > function. You could have > > def fact(n): > if n <= 1: return 1 > return __(n-1) * n > > You could also refer to function attributes through __ (presuming Barry's > proposed patch gets adopted): At first i thought you were kidding about using '__' because '_' was taken - on lots of terminals that i use, there is no intervening whitespace separating the two '_'s, so it's pretty hard to tell the difference between it and '_'! Now, i wouldn't mind using '_' if it's available, but guido was pretty darned against using it in my initial designs for packages - i wanted to use it to refer to the package containing the current module, like unix '..'. I gathered that a serious part of the objection was in using a character to denote non-operation syntax - python just doesn't do that. I also like the idea of using a function instead of a magic variable - most of python's magic variables are in packages, like os.environ. Ken klm@digicool.com From Vladimir.Marangozov@inrialpes.fr Mon Apr 17 03:30:48 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Mon, 17 Apr 2000 04:30:48 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Ken Manheimer" at Apr 16, 2000 05:45:00 PM Message-ID: <200004170230.EAA06492@python.inrialpes.fr> Ken Manheimer wrote: > > However it seems to me to make exceeding sense to have the initial > intrinsic settings specified as part of the object! Indeed. It makes perfect sense to have _intial_, intrinsic attributes. The problem is that currently they can't be specified for builtin objects. Skip asked for existing solutions, so I've made a quick tour on the problem, pointing him to 3). > And, in the case of functions, it seems to me to be outstandingly > consistent with python's treatment of objects. Oustandingly consistent isn't my opinion, but that's fine with both of us. If functions win this cause, the next imminent wish of all (Zope) users will be to attach (protection, or other) attributes to *all* objects: class Spam(...): """ Spam product""" zope_product_version = "2.51" zope_persistency = 0 zope_cache_limit = 64 * 1024 def eggs(self): ... def eats(self): ... How would you qualify the zope_* attributes so that only the zope_product version is accessible? (without __getattr__ tricks, since we're talking about `metadata'). Note that I don't expect an answer :-). The issue is crystal clear already. Be prepared to answer cool questions like this one to your customers. > > I'm mystified about why you would reject that so adamantly! Oops, I'll demystify you instantly, here, by summing up my posts: I'm not rejecting anything adamantly! To the countrary, I've suggested more. Greg said it quite well: Barry's proposal made me sending you signals about different issues you've probably not thought about before, yet we'd better sort them out before adopting his patch. As a member of this list, I feel obliged to share with you my concerns whenever I have them. My concerns in this case are: a) consistency of the class model. Apparently this signal was lost in outerspace, because my interpretation isn't yours. Okay, fine by me. This one will come back in Py3K. I'm already curious to see what will be on the table at that time. :-) b) confusion about the namespaces associated with a function object. You've been more receptive to this one. It's currently being discussed. c) generalize user-attributes for all builtin objects. You'd like to, but it looks expensive. This one is a compromise: it's related with sharing, copy on write builtin objects with modified user-attr, etc. In short, it doesn't seem to be on the table, because this signal hasn't been emitted before, nor it was really decrypted on python-dev. Classifying objects as light and heavy, and attributing them specific functionality only because of their "weight" looks very hairy. That's all for now. Discussing these issues in prime time here is goodness for Python and its users! Adopting the proposal in a hurry, because of the tight schedule for 1.6, isn't. It needs more maturation. Witness the length of the thread. it's-vacation-time-for-me-so-see-you-all-after-Easter'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein@lyra.org Mon Apr 17 09:14:51 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 17 Apr 2000 01:14:51 -0700 (PDT) Subject: [Python-Dev] baby steps for free-threading Message-ID: A couple months ago, I exchanged a few emails with Guido about doing the free-threading work. In particular, for the 1.6 release. At that point (and now), I said that I wouldn't be starting on it until this summer, which means it would miss the 1.6 release. However, there are some items that could go into 1.6 *today* that would make it easier down the road to add free-threading to Python. I said that I'd post those in the hope that somebody might want to look at developing the necessary patches. It fell off my plate, so I'm getting back to that now... Python needs a number of basic things to support free threading. None of these should impact its performance or reliability. For the most part, they just provide a platform for the later addition. 1) Create a portable abstraction for using the platform's per-thread state mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. This mechanism will be used to store PyThreadState structure pointers, rather than _PyThreadState_Current. The latter variable must go away. Rationale: two threads will be operating simultaneously. An inherent conflict arises if _PyThreadState_Current is used. The TLS-like mechanism is used by the threads to look up "their" state. There will be a ripple effect on PyThreadState_Swap(); dunno offhand what. It may become empty. 2) Python needs a lightweight, short-duration, internally-used critical section type. The current lock type is used at the Python level and internally. For internal operations, it is rather heavyweight, has unnecessary semantics, and is slower than a plain crit section. Specifically, I'm looking at Win32's CRITICAL_SECTION and pthread's mutex type. A spinlock mechanism would be coolness. Rationale: Python needs critical sections to protect data from being trashed by multiple, simultaneous access. These crit sections need to be as fast as possible since they'll execute at all key points where data is manipulated. 3) Python needs an atomic increment/decrement (internal) operation. Rationale: these are used in INCREF/DECREF to correctly increment or decrement the refcount in the face of multiple threads trying to do this. Win32: InterlockedIncrement/Decrement. pthreads would use the lightweight crit section above (on every INC/DEC!!). Some other platforms may have specific capabilities to keep this fast. Note that platforms (outside of their threading libraries) may have functions to do this. 4) Python's configuration system needs to be updated to include a --with-free-thread option since this will not be enabled by default. Related changes to acconfig.h would be needed. Compiling in the above pieces based on the flag would be nice (although Python could switch to the crit section in some cases where it uses the heavy lock today) Rationale: duh 5) An analysis of Python's globals needs to be performed. Any global that can safely be made "const" should. If a global is write-once (such as classobject.c::getattrstr), then these are marginally okay (there is a race condition, with an acceptable outcome, but a mem leak occurs). Personally, I would prefer a general mechanism in Python for creating "constants" which can be tracked by the runtime and freed. I would also like to see a generalized "object pool" mechanism be built and used for tuples, ints, floats, frames, etc. Rationale: any globals which are mutable must be made thread-safe. The fewer non-const globals to examine, the fewer to analyze for race conditions and thread-safety requirements. Note: making some globals "const" has a ripple effect through Python. This is sometimes known as "const poisoning". Guido has stated an acceptance to adding "const" throughout the interpreter, but would prefer a complete (rather than ripple-based, partial) overhaul. I think that is all for now. Achieving these five steps within the 1.6 timeframe means that the free-threading patches will be *much* smaller. It also creates much more visibility and testing for these sections. Post 1.6, a patch set to add critical sections to lists and dicts would be built. In addition, a new analysis would be done to examine the globals that are available along with possible race conditions in other mutable types and structures. Not all structures will be made thread-safe; for example, frame objects are used by a single thread at a time (I'm sure somebody could find a way to have multiple threads use or look at them, but that person can take a leap, too :-) Depending upon Guido's desire, the various schedules, and how well the development goes, Python 1.6.1 could incorporate the free-threading option in the base distribution. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping@lfw.org Mon Apr 17 02:54:41 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 16 Apr 2000 18:54:41 -0700 (PDT) Subject: [Python-Dev] OT: XML In-Reply-To: <38F6344F.25D344B5@prescod.net> Message-ID: I'll begin with my conclusion, so you can get the high-order bit and skip the rest of the message if you like: XML is useful, but it's not a language. On Thu, 13 Apr 2000, Paul Prescod wrote: > > What definition of "language" are you using? And while you're at it, > what definition of "semantics" are you using? > > As I recall, a string is an ordered list of symbols and a language is an > unordered set of strings. I use the word "language" to mean an expression medium that carries semantics at a usefully high level. The computer-science definition you gave would include, say, the set of all pairs of integers (not a language to most people), but not include classical music [1] (indeed a language to many people). I admit that the boundary of "enough" semantics to qualify as a language is fuzzy, but some things do seem quite clearly to fall on either side of the line for me. For example, saying that XML has semantics is roughly equivalent to saying that ASCII has semantics. Well, sure, i suppose 65 has the "semantics" of the uppercase letter A, but that's not semantics at any level high enough to be useful. That is why most people would probably not normally call ASCII a "language". It has to express something to be a language. Granted, you can pick nits and say that XML has semantics as you did, but to me that essentially amounts to calling the syntax the semantics. > I know that Ka-Ping, despite going to a great university was in > Engineering, not computer science Cute. (I'm glad i was in engineering; at least we got a little design and software engineering background there, and i didn't see much of that in CS, unfortunately.) > Most XML people will happily admit that XML has no "semantics" but I > think that's bullshit too. The mapping from the string to the abstract > tree data model *is the semantic content* of the XML specification. Okay, fine. Technically, it has semantics; they're just very minimal semantics (so minimal that i felt quite comfortable in saying that it has none). But that doesn't change my point -- for "it has no semantics and therefore doesn't qualify as a language" just read "it has far too minimal semantics to qualify as a language". > It makes as little sense to reject XML out of hand because it is a > buzzword but is not innovative as it does for people to embrace it > mystically because it is Microsoft's flavor of the week. Before you get the wrong impression, i don't intend to reject XML out of hand, or to suggest that people do. It has its uses, just as ASCII has its uses. As a way of serializing trees, it's quite acceptable. I am, however, reacting to the sudden onslaught of hype that gives people the impression that XML can do anything. It's this sort of attitude that "oh, all of our representation problems will go away if we throw XML at it" that makes me cringe; that's only avoiding the issue. (I'm not saying that you are this clueless, Paul! -- just that some people seem to be.) As long as we recognize XML as exactly what it is, no more and no less -- a generic mechanism for serializing trees, with associated technologies for manipulating those trees -- there's no problem. > By the way, what data model or text encoding is NOT isomorphic to Lisp > S-expressions? Isn't Python code isomorphic to Lisp s-expessions? No! You can run Python code. The code itself, of course, can be interpreted as a stream of bytes, or arranged into a tree of LISP s-expressions. But if s-expressions that were *all* that constituted Python, Python would be pretty useless indeed! The entity we call Python includes real content: the rules for deriving the expected behaviour of a Python program from its parse tree, as variously specified in the reference manual, the library manual, and in our heads. LISP itself is a great deal more than just s-expressions. The language system specifies the behaviour you expect from a given piece of LISP code, and *that* is the part i call semantics. "real" semantics: Python LISP English MIDI minimal or no semantics: ASCII lists alphabet bytes The things in the top row are generally referred to as "languages"; the things in the bottom row are not. Although each thing in the top row is constructed from its corresponding thing in the bottom row, the difference between the two is what i am calling "semantics". If the top row says A and the bottom row says B, you can look at the B-type things that constitute the A and say, "if you see this particular B, it means foo". XML belongs in the bottom row, not the top row. Python: "If you see 'a = 3' in a function, it means you take the integer object 3 and bind it to the name 'a' in the local namespace." XML: "If you see the tag , it means... well... uh, nothing. Sorry. But you do get to decide that 'spam' and 'eggs' and 'boiled' mean whatever you want." That is why i am unhappy with XML being referred to as a "language": it is a misleading label that encourages people to make the mistake of imagining that XML has more semantic power than it really does. Why is this a fatal mistake? Because using XML will no more solve your information interchange problems than writing Japanese using the Roman alphabet will suddenly cause English speakers to be able to read Japanese novels. It may *help*, but there's a lot more to it than serialization. Thus: XML is useful, but it's not a language. And, since that reasonably summarizes my views on the issue, i'll say no more on this topic on the python-dev list -- any further blabbing i'll do in private e-mail. -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton [1] I anticipate an objection such as "but you can encode a piece of classical music as accurately as you like as a sequence of symbols." But the music itself doesn't fit the Chomskian definition of "language" until you add that symbolic mapping and the rules to arrange those symbols in sequence. At that point the thing you've just added *is* the language: it's the mapping from symbols to the semantics of e.g. "and at time 5.36 seconds the first violinist will play an A-flat at medium volume". From ping@lfw.org Mon Apr 17 02:06:40 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Sun, 16 Apr 2000 18:06:40 -0700 (PDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > Looks like the proposed changed to PyObject_Compare matches E for your > example. The printed representation doesn't match, but I'm not sure > that is as important. > > >>> tight = [1, None, "x"] > >>> tight[1] = tight > >>> tight > [1, [...], 'x'] > >>> loose = [1, [1, None, "x"], "x"] > >>> loose[1][1] = loose > >>> loose > [1, [1, [...], 'x'], 'x'] > >>> tight > [1, [...], 'x'] > >>> tight == loose > 1 Actually, i thought about this a little more and realized that the above *is* exactly the correct behaviour. In E, [] makes an immutable list. To make it mutable you then have to "flex" it. A mutable empty list is written "[] flex" (juxtaposition means a method call). In the above, the identities of the inner and outer lists of "loose" are different, and so should be printed separately. They are equal but not identical: >>> loose == loose[1] 1 >>> loose is loose[1] 0 >>> loose is loose[1][1] 1 >>> loose.append(4) >>> loose [1, [1, [...], 'x'], 'x', 4] -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From paul@prescod.net Tue Apr 18 15:58:08 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 18 Apr 2000 09:58:08 -0500 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> Message-ID: <38FC7800.D88E14D7@prescod.net> "M.-A. Lemburg" wrote: > > ... > The current need for #pragmas is really very simple: to tell > the compiler which encoding to assume for the characters > in u"...strings..." (*not* "...8-bit strings..."). The idea > behind this is that programmers should be able to use other > encodings here than the default "unicode-escape" one. I'm totally confused about this. Are we going to allow UCS-2 sequences in the middle of Python programs that are otherwise ASCII? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself From paul@prescod.net Mon Apr 17 14:37:19 2000 From: paul@prescod.net (Paul Prescod) Date: Mon, 17 Apr 2000 08:37:19 -0500 Subject: [Python-Dev] Unicode and XML References: Message-ID: <38FB138F.1DAF3891@prescod.net> Let's presume that we agreed that XML is not a language because it doesn't have semantics. What does that have to do with the applicability of its Unicode-handling model? Here is a list of a hundred specifications which we can probably agree have "useful semantics" that are all based on XML and thus have the same Unicode model: http://www.xml.org/xmlorg_registry/index.shtml XML's unicode model seems mostly appropriate to me. I can only see one reason it might not apply: which comes first the #! line or the #encoding line? We could say that the #! line can only be used in encodings that are direct supersets of ASCII (e.g. UTF-8 but not UTF-16). That shouldnt' cause any problems with Unix because as far as I know, Unix can only read the first line if it is in an ASCII superset anyhow! Then the second line could describe the precise ASCII superset in use (8859-1, 8859-2, UTF-8, raw ASCII, etc.). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From jeremy@cnri.reston.va.us Mon Apr 17 16:41:26 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Mon, 17 Apr 2000 11:41:26 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: References: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: <14587.12454.7542.709571@goon.cnri.reston.va.us> >>>>> "KLM" == Ken Manheimer writes: KLM> It may well make sense to have the system *implement* the KLM> rights somewhere else. (Distributed system, permissions caches KLM> in an object system, etc.) However it seems to me to make KLM> exceeding sense to have the initial intrinsic settings KLM> specified as part of the object! It's not clear to me that the person writing the code is or should be the person specifying the security policy. I believe the CORBA security model separates policy definition into three parts -- security attributes, required rights, and policy domains. The developer would only be responsible for the first part -- the security attributes, which describe methods in a general way so that a security administrators can develop an effective policy for it. I suppose that function attributes would be a sensible way to do this, but it might also be accomplished with a separate wrapper object. I'm still not thrilled with the idea of using regular attribute access to describe static properties on code. To access the properties, yes, to define and set them, probably not. Jeremy From jeremy@cnri.reston.va.us Mon Apr 17 16:49:11 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Mon, 17 Apr 2000 11:49:11 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14586.14672.949500.986951@beluga.mojam.com> References: <200004162028.WAA06045@python.inrialpes.fr> <14586.14672.949500.986951@beluga.mojam.com> Message-ID: <14587.12919.488508.522746@goon.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: Ken> We haven't even seen a satisfactory approach to referring to Ken> the function, itself, from within the function. Maybe it's not Ken> even desirable to be able to do that - that's an interesting Ken> question. SM> I hereby propose that within a function the special name __ SM> refer to the function. I think the syntax is fairly obscure. I'm neurtral on the whole idea of having a special way to get at the function object from within the body of the code. Also, the proposal to handle security policies using attributes attached to the function seems wrong. The access control decision depends on the security policy defined for the object *and* the authorization of the caller. You can't decide based solely on some attribute of the function, nor can you assume that every call of a function object will be made with the same authorization (from the same protection domain). Jeremy From gstein@lyra.org Mon Apr 17 21:28:18 2000 From: gstein@lyra.org (Greg Stein) Date: Mon, 17 Apr 2000 13:28:18 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <14587.12919.488508.522746@goon.cnri.reston.va.us> Message-ID: On Mon, 17 Apr 2000, Jeremy Hylton wrote: > >>>>> "SM" == Skip Montanaro writes: > > Ken> We haven't even seen a satisfactory approach to referring to > Ken> the function, itself, from within the function. Maybe it's not > Ken> even desirable to be able to do that - that's an interesting > Ken> question. > > SM> I hereby propose that within a function the special name __ > SM> refer to the function. > > I think the syntax is fairly obscure. I'm neurtral on the whole idea > of having a special way to get at the function object from within the > body of the code. I agree. > Also, the proposal to handle security policies using attributes > attached to the function seems wrong. This isn't the only application of function attributes. Can't throw them out because one use seems wrong :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA@ActiveState.com Mon Apr 17 22:37:16 2000 From: DavidA@ActiveState.com (David Ascher) Date: Mon, 17 Apr 2000 14:37:16 -0700 Subject: [Python-Dev] Encoding of code in XML Message-ID: Lots of projects embed scripting & other code in XML, typically as CDATA elements. For example, XBL in Mozilla. As far as I know, no one ever bothers to define how one should _encode_ code in a CDATA segment, and it appears that at least in the Mozilla world the 'encoding' used is 'cut & paste', and it's the XBL author's responsibility to make sure that ]]> is nowhere in the JavaScript code. That seems suboptimal to me, and likely to lead to disasters down the line. The only clean solution I can think of is to define a standard encoding/decoding process for storing program code (which may very well contain occurences of ]]> in CDATA, which effectively hides that triplet from the parser. While I'm dreaming, it would be nice if all of the relevant language communities (JS, Python, Perl, etc.) could agree on what that encoding is. I'd love to hear of a recommendation on the topic by the XML folks, but I haven't been able to find any such document. Any thoughts? --david ascher From ping@lfw.org Mon Apr 17 22:47:40 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 16:47:40 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts Message-ID: One gripe that i hear a lot is that it's really difficult to cut and paste chunks of Python code when you're working with the interpreter because the ">>> " and "... " prompts keep getting in the way. Does anyone else often have or hear of this problem? Here is a suggested solution: for interactive mode only, the console maintains a flag "dropdots", initially false. After line = raw_input(">>> "): if line[:4] in [">>> ", "... "]: dropdots = 1 line = line[4:] else: dropdots = 0 interpret(line) After line = raw_input("... "): if dropdots and line[:4] == "... ": line = line[4:] interpret(line) The above solution depends on the fact that ">>> " and "... " are always invalid at the beginning of a bit of Python. So, if sys.ps1 is not ">>> " or sys.ps2 is not "... ", all dropdots behaviour is disabled. I realize it's not going to handle all cases (in particular mixing pasted text with typed-in text), but at least it makes it *possible* to paste code, and it's quite a simple rule. I suppose it all depends on whether or not you guys often experience this particular little irritation. Any thoughts on this? -- ?!ng From skip@mojam.com (Skip Montanaro) Mon Apr 17 22:47:30 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 17 Apr 2000 16:47:30 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: References: Message-ID: <14587.34418.633570.133957@beluga.mojam.com> Ping> One gripe that i hear a lot is that it's really difficult to cut Ping> and paste chunks of Python code when you're working with the Ping> interpreter because the ">>> " and "... " prompts keep getting in Ping> the way. Does anyone else often have or hear of this problem? First time I encountered this and complained about it Guido responded with import sys sys.ps1 = sys.ps2 = "" Crude, but effective... -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From ping@lfw.org Mon Apr 17 23:07:47 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 17:07:47 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: <14587.34418.633570.133957@beluga.mojam.com> Message-ID: On Mon, 17 Apr 2000, Skip Montanaro wrote: > > First time I encountered this and complained about it Guido responded with > > import sys > sys.ps1 = sys.ps2 = "" > > Crude, but effective... Yeah, i tried that, but it's suboptimal (no feedback), not the default behaviour, and certainly non-obvious to the beginner. -- ?!ng From ping@lfw.org Mon Apr 17 23:34:54 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 17:34:54 -0500 (CDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: On Mon, 17 Apr 2000, David Ascher wrote: > > The only clean solution I can think of is to define a standard > encoding/decoding process for storing program code (which may very well > contain occurences of ]]> in CDATA, which effectively hides that triplet > from the parser. Hmm. I think the way everybody does it is to use the language to get around the need for ever saying "]]>". For example, in Python, if that was outside of a string, you could insert some spaces without changing the meaning, or if it was inside a string, you could add two strings together etc. You're right that this seems a bit ugly, but i think it could be even harder to get all the language communities to swallow something like "replace all occurrences of ]]> with some ugly escape string" -- since the above (hackish) method has the advantage that you can just run code directly copied from a piece of CDATA, and now you're asking them all to run the CDATA through some unescaping mechanism beforehand. Although i'm less optimistic about the success of such a standard, i'd certainly be up for it, if we had a good answer to propose. Here is one possible answer (to pick "@@" as a string very unlikely to occur much in most scripting languages): @@ --> @@@ ]]> --> @@> def escape(text): cdata = replace(text, "@@", "@@@") cdata = replace(cdata, "]]>", "@@>") return cdata def unescape(cdata): text = replace(cdata, "@@>", "]]>") text = replace(text, "@@@", "@@") return text The string "@@" occurs nowhere in the Python standard library. Another possible solution: <] --> <]> ]]> --> <][ etc. Generating more solutions is left as an exercise to the reader. :) -- ?!ng From DavidA@ActiveState.com Mon Apr 17 23:51:21 2000 From: DavidA@ActiveState.com (David Ascher) Date: Mon, 17 Apr 2000 15:51:21 -0700 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: > Hmm. I think the way everybody does it is to use the language > to get around the need for ever saying "]]>". For example, in > Python, if that was outside of a string, you could insert some > spaces without changing the meaning, or if it was inside a string, > you could add two strings together etc. > You're right that this seems a bit ugly, but i think it could be > even harder to get all the language communities to swallow > something like "replace all occurrences of ]]> with some ugly > escape string" -- since the above (hackish) method has the > advantage that you can just run code directly copied from a piece > of CDATA, and now you're asking them all to run the CDATA through > some unescaping mechanism beforehand. But it has the bad disadvantages that it's language-specific and modifies code rather than encode it. It has the even worse disadvantage that it requires you to parse the code to encode/decode it, something much more expensive than is really necessary! > Although i'm less optimistic about the success of such a standard, > i'd certainly be up for it, if we had a good answer to propose. I'm thinking that if we had a good answer, we can probably get it into the core libraries for a few good languages, and document it as 'the standard', if we could get key people on board. > Here is one possible answer Right, that's the sort of thing I was looking for. > def escape(text): > cdata = replace(text, "@@", "@@@") > cdata = replace(cdata, "]]>", "@@>") > return cdata > > def unescape(cdata): > text = replace(cdata, "@@>", "]]>") > text = replace(text, "@@@", "@@") > return text (the above fails on @@>, but that's the general idea I had in mind). --david I know!: "]]>" <==> "Microsoft engineers are puerile weenies!" From ping@lfw.org Tue Apr 18 00:01:58 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 18:01:58 -0500 (CDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: On Mon, 17 Apr 2000, David Ascher wrote: > > (the above fails on @@>, but that's the general idea I had in mind). Oh, that's stupid of me. I used the wrong test harness. Okay, well the latter example works (i tested it): <] --> <]> ]]> --> <][ And this also works: @@ --> @@] ]]> --> @@> -- ?!ng From ping@lfw.org Tue Apr 18 00:08:53 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 18:08:53 -0500 (CDT) Subject: [Python-Dev] Escaping CDATA Message-ID: Here's what i'm playing with, if you want to mess with it too: import string def replace(text, old, new, join=string.join, split=string.split): return join(split(text, old), new) la, ra = "@@", "@@]" lb, rb = "]]>", "@@>" la, ra = "<]", "<]>" lb, rb = "]]>", "<][" def escape(text): cdata = replace(text, la, ra) cdata = replace(cdata, lb, rb) return cdata def unescape(cdata): text = replace(cdata, rb, lb) text = replace(text, ra, la) return text chars = "" for ch in la + ra + lb + rb: if ch not in chars: chars = chars + ch if __name__ == "__main__": class Tester: def __init__(self): self.failed = [] self.count = 0 def test(self, s, find=string.find): cdata = escape(s) text = unescape(cdata) print "%s -e-> %s -u-> %s" % (s, cdata, text) if find(cdata, "]]>") >= 0: print "EXPOSURE!" self.failed.append(s) elif s != text: print "MISMATCH!" self.failed.append(s) self.count = self.count + 1 tester = Tester() test = tester.test for a in chars: for b in chars: for c in chars: for d in chars: for e in chars: for f in chars: for g in chars: for h in chars: test(a+b+c+d+e+f+g+h) print if tester.failed == []: print "All tests succeeded." else: print "Failed %d of %d tests." % (len(tester.failed), tester.count) for t in tester.failed: tester.test(t) From Moshe Zadka Tue Apr 18 07:55:20 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 18 Apr 2000 08:55:20 +0200 (IST) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Mon, 17 Apr 2000, Ka-Ping Yee wrote: > One gripe that i hear a lot is that it's really difficult > to cut and paste chunks of Python code when you're working > with the interpreter because the ">>> " and "... " prompts > keep getting in the way. Does anyone else often have or > hear of this problem? > > Here is a suggested solution: for interactive mode only, > the console maintains a flag "dropdots", initially false. > > After line = raw_input(">>> "): > if line[:4] in [">>> ", "... "]: > dropdots = 1 > line = line[4:] > else: > dropdots = 0 > interpret(line) Python 1.5.2 (#1, Feb 21 2000, 14:52:33) [GCC 2.95.2 19991024 (release)] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> a=[] >>> a[ ... ... ... ] Traceback (innermost last): File "", line 1, in ? TypeError: sequence index must be integer >>> Sorry. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping@lfw.org Tue Apr 18 09:01:54 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 01:01:54 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Moshe Zadka wrote: > > >>> a=[] > >>> a[ > ... ... > ... ] > Traceback (innermost last): > File "", line 1, in ? > TypeError: sequence index must be integer > >>> > > Sorry. What was your point? -- ?!ng From Fredrik Lundh" Message-ID: <00e301bfa909$fbb053a0$34aab5d4@hagrid> Ka-Ping Yee wrote: > On Tue, 18 Apr 2000, Moshe Zadka wrote: > >=20 > > >>> a=3D[] > > >>> a[ > > ... ...=20 > > ... ] > > Traceback (innermost last): > > File "", line 1, in ? > > TypeError: sequence index must be integer > > >>>=20 > >=20 > > Sorry. >=20 > What was your point? a[...] is valid syntax, and not the same thing as a[]. From Moshe Zadka Tue Apr 18 10:29:04 2000 From: Moshe Zadka (Moshe Zadka) Date: Tue, 18 Apr 2000 11:29:04 +0200 (IST) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Ka-Ping Yee wrote: > On Tue, 18 Apr 2000, Moshe Zadka wrote: > > > > >>> a=[] > > >>> a[ > > ... ... > > ... ] > > Traceback (innermost last): > > File "", line 1, in ? > > TypeError: sequence index must be integer > > >>> > > > > Sorry. > > What was your point? That "... " in the beginning of the line is not a syntax error. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal@lemburg.com Mon Apr 17 23:01:38 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 18 Apr 2000 00:01:38 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> Message-ID: <38FB89C2.26817F97@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > > ... > > The current need for #pragmas is really very simple: to tell > > the compiler which encoding to assume for the characters > > in u"...strings..." (*not* "...8-bit strings..."). The idea > > behind this is that programmers should be able to use other > > encodings here than the default "unicode-escape" one. > > I'm totally confused about this. Are we going to allow UCS-2 sequences > in the middle of Python programs that are otherwise ASCII? The idea is to make life a little easier for programmers who's native script is not easily writable using ASCII, e.g. the whole Asian world. While originally only the encoding used within the quotes of u"..." was targetted (on the i18n sig), there has now been some discussion on this list about whether to move forward in a whole new direction: that of allowing whole Python scripts to be encoded in many different encodings. The compiler will then convert the scripts first to Unicode and then to 8-bit strings as needed. Using this technique which was introduced by Fredrik Lundh we could in fact have Python scripts which are encoded in UTF-16 (two bytes per character) or other more obscure encodings. The Python interpreter would only see Unicode and Latin-1. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Mon Apr 17 23:10:12 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 18 Apr 2000 00:10:12 +0200 Subject: [Python-Dev] Unicode and XML References: <38FB138F.1DAF3891@prescod.net> Message-ID: <38FB8BC4.96678FD6@lemburg.com> Paul Prescod wrote: > > Let's presume that we agreed that XML is not a language because it > doesn't have semantics. What does that have to do with the applicability > of its Unicode-handling model? > > Here is a list of a hundred specifications which we can probably agree > have "useful semantics" that are all based on XML and thus have the same > Unicode model: > > http://www.xml.org/xmlorg_registry/index.shtml > > XML's unicode model seems mostly appropriate to me. I can only see one > reason it might not apply: which comes first the #! line or the > #encoding line? We could say that the #! line can only be used in > encodings that are direct supersets of ASCII (e.g. UTF-8 but not > UTF-16). That shouldnt' cause any problems with Unix because as far as I > know, Unix can only read the first line if it is in an ASCII superset > anyhow! > > Then the second line could describe the precise ASCII superset in use > (8859-1, 8859-2, UTF-8, raw ASCII, etc.). Sounds like a good idea... how would such a line look like ? #!/usr/bin/env python # version: 1.6, encoding: iso-8859-1 ... Meaning: the module script needs Python version >=1.6 and uses iso-8859-1 as source file encoding. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Tue Apr 18 11:35:33 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 06:35:33 -0400 Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Your message of "Tue, 18 Apr 2000 00:01:38 +0200." <38FB89C2.26817F97@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <200004181035.GAA12526@eric.cnri.reston.va.us> > The idea is to make life a little easier for programmers > who's native script is not easily writable using ASCII, e.g. > the whole Asian world. > > While originally only the encoding used within the quotes of > u"..." was targetted (on the i18n sig), there has now been > some discussion on this list about whether to move forward > in a whole new direction: that of allowing whole Python scripts > to be encoded in many different encodings. The compiler will > then convert the scripts first to Unicode and then to 8-bit > strings as needed. > > Using this technique which was introduced by Fredrik Lundh > we could in fact have Python scripts which are encoded in > UTF-16 (two bytes per character) or other more obscure > encodings. The Python interpreter would only see Unicode > and Latin-1. Wouldn't it make more sense to have the Python compiler *always* see UTF-8 and to use a simple preprocessor to deal with encodings? (Disclaimer: there are about 300 unread python-dev messages in my inbox still.) --Guido van Rossum (home page: http://www.python.org/~guido/) From Fredrik Lundh" <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <200004181035.GAA12526@eric.cnri.reston.va.us> Message-ID: <001001bfa925$021cf700$34aab5d4@hagrid> Guido van Rossum wrote: > > Using this technique which was introduced by Fredrik Lundh > > we could in fact have Python scripts which are encoded in > > UTF-16 (two bytes per character) or other more obscure > > encodings. The Python interpreter would only see Unicode > > and Latin-1. >=20 > Wouldn't it make more sense to have the Python compiler *always* see > UTF-8 and to use a simple preprocessor to deal with encodings? to some extent, this depends on what the "everybody" in CP4E means -- if you were to do user-testing on non-americans, I suspect "why cannot I use my own name as a variable name" might be as common as "why are SPAM and spam two different variables?". and if you're willing to address both issues in Py3K, it's much easier to use a simple internal representation, and handle en- codings on the way in and out. and PY_UNICODE* strings are easier to process than UTF-8 encoded char* strings... From ping@lfw.org Tue Apr 18 12:59:34 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 04:59:34 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Moshe Zadka wrote: > > What was your point? > > That "... " in the beginning of the line is not a syntax error. So? You can put "... " at the beginning of a line in a string, too: >>> a = """ ... ... spam spam""" >>> a '\012... spam spam' That isn't a problem with the suggested mechanism, since dropdots only comes into effect when the *first* line entered at a >>> begins with ">>> " or "... ". -- ?!ng "Je n'aime pas les stupides garçons, même quand ils sont intelligents." -- Roople Unia From guido@python.org Tue Apr 18 14:01:47 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 09:01:47 -0400 Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Your message of "Tue, 18 Apr 2000 04:59:34 PDT." References: Message-ID: <200004181301.JAA12697@eric.cnri.reston.va.us> Has anybody noticed that this is NOT a problem in IDLE? It will eventually go away, especially for the vast masses. So I don't think a solution is necessary -- and as was shown, the simple hacks don't really work. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Tue Apr 18 14:38:36 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 18 Apr 2000 09:38:36 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38FB89C2.26817F97@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <14588.25948.202273.502469@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > The idea is to make life a little easier for programmers > who's native script is not easily writable using ASCII, e.g. > the whole Asian world. > > While originally only the encoding used within the quotes of > u"..." was targetted (on the i18n sig), there has now been > some discussion on this list about whether to move forward > in a whole new direction: that of allowing whole Python scripts I had thought this was still an issue for interpretation of string contents, and really only meaningful when converting the source representations of Unicode strings to the internal represtenation. I see no need to change the language definition in general. Unless we *really* want to impose those evil trigraph sequences from C! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Fredrik Lundh" <38F591D3.32CD3B2A@lemburg.com><38FC7800.D88E14D7@prescod.net><38FB89C2.26817F97@lemburg.com> <14588.25948.202273.502469@seahag.cnri.reston.va.us> Message-ID: <004401bfa942$49bcaa20$34aab5d4@hagrid> Fred Drake wrote: > > While originally only the encoding used within the quotes of > > u"..." was targetted (on the i18n sig), there has now been > > some discussion on this list about whether to move forward > > in a whole new direction: that of allowing whole Python scripts >=20 > I had thought this was still an issue for interpretation of string > contents, and really only meaningful when converting the source > representations of Unicode strings to the internal represtenation. why restrict the set of possible source encodings to ASCII compatible 8-bit encodings? (or are there really authoring systems out there that can use different encodings for different parts of the file?) > I see no need to change the language definition in general. Unless > we *really* want to impose those evil trigraph sequences from C! ;) sorry, but I don't see the connection. From fdrake@acm.org Tue Apr 18 15:35:37 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Tue, 18 Apr 2000 10:35:37 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <004401bfa942$49bcaa20$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <14588.25948.202273.502469@seahag.cnri.reston.va.us> <004401bfa942$49bcaa20$34aab5d4@hagrid> Message-ID: <14588.29369.39945.849489@seahag.cnri.reston.va.us> Fredrik Lundh writes: > why restrict the set of possible source encodings to ASCII > compatible 8-bit encodings? I'm not suggesting that. I just don't see any call to change the language definition (such as allowing additional characters in NAME tokens). I don't mind whatsoever if the source is stored in UCS-2, and the tokenizer does need to understand that to create the right value for Unicode strings specified as u'...' literals. > (or are there really authoring systems out there that can use > different encodings for different parts of the file?) Not that I know of, and I doubt I'd want to see the result! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From paul@prescod.net Tue Apr 18 15:42:32 2000 From: paul@prescod.net (Paul Prescod) Date: Tue, 18 Apr 2000 09:42:32 -0500 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <38FC7458.EA90F085@prescod.net> My vote is all or nothing. Either the whole file is in UCS-2 (for example) or none of it is. I'm not sure if we really need to allow multiple file encodings in version 1.6 but we do need to allow that ultimately. If we agree to allow the whole file to be in another encoding then we should use the XML trick of having a known start-sequence for encodings other than UTF-8. It doesn't matter much whether it is syntactically a comment or a pragma. I am still in favor of compile time pragmas but they can probably wait for Python 1.7. > Using this technique which was introduced by Fredrik Lundh > we could in fact have Python scripts which are encoded in > UTF-16 (two bytes per character) or other more obscure > encodings. The Python interpreter would only see Unicode > and Latin-1. In what sense is Latin-1 not Unicode? Isn't it just the first 256 characters of Unicode or something like that? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself [In retrospect] the story of a Cold War that was the scene of history's only nuclear arms race will be very different from the story of a Cold War that turned out to be only the first of many interlocking nuclear arms races in many parts of the world. The nuclear, question, in sum, hangs like a giant question mark over our waning century. - The Unfinished Twentieth Century by Jonathan Schell Harper's Magazine, January 2000 From Fredrik Lundh" <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <38FC7458.EA90F085@prescod.net> Message-ID: <00b301bfa946$47ebbde0$34aab5d4@hagrid> Paul Prescod wrote: > My vote is all or nothing. Either the whole file is in UCS-2 (for > example) or none of it is. agreed. > In what sense is Latin-1 not Unicode? Isn't it just the first 256 > characters of Unicode or something like that? yes. ISO Latin-1 is unicode. what MAL really meant was that the interpreter would only deal with 8-bit (traditional) or 16-bit (unicode) strings. (in my string type proposals, the same applies to text strings manipulated by the user. if it's not unicode, it's a byte array, and methods expecting text don't work) From jeremy@cnri.reston.va.us Tue Apr 18 16:40:10 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 11:40:10 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce Message-ID: <14588.33242.799629.934118@goon.cnri.reston.va.us> As many of you have probably noticed, the moderators of comp.lang.python.announce do not deal with pending messages in a timely manner. There have been no new posts since Mar 27, and delays of several weeks were common before then. I wanted to ask a smallish group of potential readers of this group what we should do about the problem. I have tried to contact the moderators several times, but haven't heard a peep from them since late February, when the response was: "Sorry. Temporary problem. It's all fixed now." Three possible solutions come to mind: - Get more moderators. It appears that Marcus Fleck is the only active moderator. I have never received a response to private email sent to Vladimir Ulogov. I suggested to Marcus that we get more moderators, but he appeared to reject the idea. Perhaps some peer pressure from other unsatisfied readers would help. - De-couple the moderation of comp.lang.python.announce and of python-annouce@python.org. We could keep the gateway between the lists going, but have different moderators for the mailing list. This would be less convenient for people who prefer to read news, but would at least get announcement out in a timely fashion. - Give up on comp.lang.python.announce. Since moderation has been so spotty, most people have reverted to making all anouncements to comp.lang.python anyway. This option is unfortunate, because it makes it harder for people who don't have time to read comp.lang.python to keep up with announcements. Any other ideas? Suggestions on how to proceed? Jeremy From skip@mojam.com (Skip Montanaro) Tue Apr 18 17:04:29 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Tue, 18 Apr 2000 11:04:29 -0500 (CDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14588.33242.799629.934118@goon.cnri.reston.va.us> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> Message-ID: <14588.34701.354301.740696@beluga.mojam.com> Jeremy> Any other ideas? Suggestions on how to proceed? How about decouple the python-announce mailing list from the newsgroup (at least partially), manage the mailing list from Mailman (it probably already is), then require moderator approval to post? With a handful of moderators (5-10), the individual effort should be fairly low. You can set up the default reject message to be strongly related to the aims of the list so that most of the time the moderator needs only to click the approve or drop buttons or make a slight edit to the response and click the reject button. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ From ping@lfw.org Tue Apr 18 18:03:35 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 10:03:35 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: <200004181301.JAA12697@eric.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Guido van Rossum wrote: > Has anybody noticed that this is NOT a problem in IDLE? Certainly. This was one of the first problems i solved when writing my console script, too. (Speaking of that, i still can't find auto-completion in IDLE -- is it in there?) But: startup time, startup time. I'm not going to wait to start IDLE every time i want to ask Python a quick question. Hey, i just tried it and actually it doesn't work. I mean, yes, sys.ps2 is missing, but that still doesn't mean you can select a whole line and paste it. You have to aim very carefully to start dragging from the fourth column. > So I don't think a solution is necessary -- and as was shown, the > simple hacks don't really work. I don't think this was shown at all. -- ?!ng From guido@python.org Tue Apr 18 17:50:49 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 12:50:49 -0400 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: Your message of "Tue, 18 Apr 2000 11:04:29 CDT." <14588.34701.354301.740696@beluga.mojam.com> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> <14588.34701.354301.740696@beluga.mojam.com> Message-ID: <200004181650.MAA12894@eric.cnri.reston.va.us> I vote to get more moderators for the newsgroup. If Marcus and Gandalf don't moderate quickly the community can oust them. --Guido van Rossum (home page: http://www.python.org/~guido/) From Fredrik Lundh" Message-ID: <007201bfa956$d1311680$34aab5d4@hagrid> Jeremy Hylton wrote: > As many of you have probably noticed, the moderators of > comp.lang.python.announce do not deal with pending messages in a > timely manner. There have been no new posts since Mar 27, and > delays of several weeks were common before then. and as noted on c.l.py, those posts didn't make it to many servers, since they use "00" instead of "2000". I haven't seen any announcements on any local news- server since last year. > Any other ideas? Suggestions on how to proceed. post to comp.lang.python, and tell people who don't want to read the newsgroup to watch the python.org news page and/or the daily python URL? <0.5 wink> From jeremy@cnri.reston.va.us Tue Apr 18 17:58:54 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 12:58:54 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004181650.MAA12894@eric.cnri.reston.va.us> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> <14588.34701.354301.740696@beluga.mojam.com> <200004181650.MAA12894@eric.cnri.reston.va.us> Message-ID: <14588.37966.825565.8871@goon.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> I vote to get more moderators for the newsgroup. That seems like the simplest mechanism. We just need volunteers (I am one), and we need to get Marcus to notify the Usenet powers-that-be of the new moderators. GvR> If Marcus and Gandalf don't moderate quickly the community can GvR> oust them. A painful process. Vladimir/Gandalf seems to have disappeared completely. (The original message in this thread bounced when I sent it to him.) The only way to add new moderators without Marcus's help is to have a new RFD/CFV process. It would be like creating the newsgroup all over again, except we'd have to convince the moderator of news.announce.newsgroups that the current moderator was unfit first. Jeremy From bwarsaw@cnri.reston.va.us Tue Apr 18 19:24:30 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 18 Apr 2000 14:24:30 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce References: <14588.33242.799629.934118@goon.cnri.reston.va.us> Message-ID: <14588.43102.864339.347892@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> - De-couple the moderation of comp.lang.python.announce and of JH> python-annouce@python.org. We could keep the gateway between JH> the lists going, but have different moderators for the mailing JH> list. This would be less convenient for people who prefer to JH> read news, but would at least get announcement out in a timely JH> fashion. We could do this -- and in fact, this was the effective set up until a couple of weeks ago. We'd set it up as a moderated group, so that /every/ message is held for approval. I'd have to investigate, but we probably don't want to hold messages that originate on Usenet. Of course, gating back to Usenet will still be held up for c.l.py.a's moderators. Still, I'd rather not do this. It would be best to get more moderators helping out with the c.l.py.a content. -Barry From guido@python.org Tue Apr 18 19:25:11 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 14:25:11 -0400 Subject: [Python-Dev] baby steps for free-threading In-Reply-To: Your message of "Mon, 17 Apr 2000 01:14:51 PDT." References: Message-ID: <200004181825.OAA13261@eric.cnri.reston.va.us> > A couple months ago, I exchanged a few emails with Guido about doing the > free-threading work. In particular, for the 1.6 release. At that point > (and now), I said that I wouldn't be starting on it until this summer, > which means it would miss the 1.6 release. However, there are some items > that could go into 1.6 *today* that would make it easier down the road to > add free-threading to Python. I said that I'd post those in the hope that > somebody might want to look at developing the necessary patches. It fell > off my plate, so I'm getting back to that now... > > Python needs a number of basic things to support free threading. None of > these should impact its performance or reliability. For the most part, > they just provide a platform for the later addition. I agree with the general design sketched below. > 1) Create a portable abstraction for using the platform's per-thread state > mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. There are at least 7 other platform specific thread implementations -- probably an 8th for the Mac. These all need to support this. (One solution would be to have a portable implementation that uses the thread-ID to index an array.) > This mechanism will be used to store PyThreadState structure pointers, > rather than _PyThreadState_Current. The latter variable must go away. > > Rationale: two threads will be operating simultaneously. An inherent > conflict arises if _PyThreadState_Current is used. The TLS-like > mechanism is used by the threads to look up "their" state. > > There will be a ripple effect on PyThreadState_Swap(); dunno offhand > what. It may become empty. Cool. > 2) Python needs a lightweight, short-duration, internally-used critical > section type. The current lock type is used at the Python level and > internally. For internal operations, it is rather heavyweight, has > unnecessary semantics, and is slower than a plain crit section. > > Specifically, I'm looking at Win32's CRITICAL_SECTION and pthread's > mutex type. A spinlock mechanism would be coolness. > > Rationale: Python needs critical sections to protect data from being > trashed by multiple, simultaneous access. These crit sections need to > be as fast as possible since they'll execute at all key points where > data is manipulated. Agreed. > 3) Python needs an atomic increment/decrement (internal) operation. > > Rationale: these are used in INCREF/DECREF to correctly increment or > decrement the refcount in the face of multiple threads trying to do > this. > > Win32: InterlockedIncrement/Decrement. pthreads would use the > lightweight crit section above (on every INC/DEC!!). Some other > platforms may have specific capabilities to keep this fast. Note that > platforms (outside of their threading libraries) may have functions to > do this. I'm worried here that since INCREF/DECREF are used so much this will slow down significantly, especially on platforms that don't have safe hardware instructions for this. So it should only be enabled when free threading is turned on. > 4) Python's configuration system needs to be updated to include a > --with-free-thread option since this will not be enabled by default. > Related changes to acconfig.h would be needed. Compiling in the above > pieces based on the flag would be nice (although Python could switch to > the crit section in some cases where it uses the heavy lock today) > > Rationale: duh Maybe there should be more fine-grained choices? As you say, some stuff could be used without this flag. But in any case this is trivial to add. > 5) An analysis of Python's globals needs to be performed. Any global that > can safely be made "const" should. If a global is write-once (such as > classobject.c::getattrstr), then these are marginally okay (there is a > race condition, with an acceptable outcome, but a mem leak occurs). > Personally, I would prefer a general mechanism in Python for creating > "constants" which can be tracked by the runtime and freed. They are almost all string constants, right? How about a macro Py_CONSTSTROBJ("value", variable)? > I would also like to see a generalized "object pool" mechanism be built > and used for tuples, ints, floats, frames, etc. Careful though -- generalizing this will slow it down. (Here I find myself almost wishing for C++ templates :-) > Rationale: any globals which are mutable must be made thread-safe. The > fewer non-const globals to examine, the fewer to analyze for race > conditions and thread-safety requirements. > > Note: making some globals "const" has a ripple effect through Python. > This is sometimes known as "const poisoning". Guido has stated an > acceptance to adding "const" throughout the interpreter, but would > prefer a complete (rather than ripple-based, partial) overhaul. Actually, it's okay to do this on an "as-neeed" basis. I'm also in favor of changing all the K&R code to ANSI, and getting rid of Py_PROTO and friends. Cleaner code! > I think that is all for now. Achieving these five steps within the 1.6 > timeframe means that the free-threading patches will be *much* smaller. It > also creates much more visibility and testing for these sections. Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough testing of some of these changes, the extensive nature of some of the changes, and my other obligations during those 6 weeks, I don't see how it can be done for 1.6. I would prefer to do an accellerated 1.7 or 1.6.1 release that incorporates all this. (It could be called 1.6.1 only if it'nearly identical to 1.6 for the Python user and not too different for the extension writer.) > Post 1.6, a patch set to add critical sections to lists and dicts would be > built. In addition, a new analysis would be done to examine the globals > that are available along with possible race conditions in other mutable > types and structures. Not all structures will be made thread-safe; for > example, frame objects are used by a single thread at a time (I'm sure > somebody could find a way to have multiple threads use or look at them, > but that person can take a leap, too :-) It is unacceptable to have thread-unsafe structures that can be accessed in a thread-unsafe way using pure Python code only. > Depending upon Guido's desire, the various schedules, and how well the > development goes, Python 1.6.1 could incorporate the free-threading option > in the base distribution. Indeed. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA@ActiveState.com Tue Apr 18 22:03:32 2000 From: DavidA@ActiveState.com (David Ascher) Date: Tue, 18 Apr 2000 14:03:32 -0700 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004181650.MAA12894@eric.cnri.reston.va.us> Message-ID: > I vote to get more moderators for the newsgroup. If Marcus and > Gandalf don't moderate quickly the community can oust them. FWIW, I think they should step down now. They've not held up their end of the bargain, even though several folks have offered to help repeatedly throughout the 'problem period', which includes most of the life of c.l.p.a. As a compromise solution, and only if it's effective, we can add moderators. I'll volunteer, as long as someone gives me hints as to the mechanisms (it's been a while since I was doing usenet for real). --david PS: I think decoupling the mailing list from the newsgroup is a bad precedent and a political trouble zone. From gstein@lyra.org Tue Apr 18 22:16:44 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 14:16:44 -0700 (PDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14588.37966.825565.8871@goon.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Jeremy Hylton wrote: > >>>>> "GvR" == Guido van Rossum writes: >... > GvR> If Marcus and Gandalf don't moderate quickly the community can > GvR> oust them. > > A painful process. Vladimir/Gandalf seems to have disappeared > completely. (The original message in this thread bounced when I sent > it to him.) The only way to add new moderators without Marcus's help > is to have a new RFD/CFV process. It would be like creating the > newsgroup all over again, except we'd have to convince the moderator > of news.announce.newsgroups that the current moderator was unfit > first. Nevertheless, adding more moderators is the "proper" answer to the problem. Even if it is difficult to get more moderators into the system, there doesn't seem to be a better alternative. Altering the mailing list gateway will simply serve to create divergent announcement forums. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy@cnri.reston.va.us Tue Apr 18 22:30:01 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 17:30:01 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: References: <14588.37966.825565.8871@goon.cnri.reston.va.us> Message-ID: <14588.54233.528093.55997@goon.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GvR> If Marcus and Gandalf don't moderate quickly the community can GvR> oust them. JH> A painful process. Vladimir/Gandalf seems to have disappeared JH> completely. (The original message in this thread bounced when I JH> sent it to him.) The only way to add new moderators without JH> Marcus's help is to have a new RFD/CFV process. It would be like JH> creating the newsgroup all over again, except we'd have to JH> convince the moderator of news.announce.newsgroups that the JH> current moderator was unfit first. GS> Nevertheless, adding more moderators is the "proper" answer to GS> the problem. Even if it is difficult to get more moderators into GS> the system, there doesn't seem to be a better alternative. Proper is not necessarily the same as possible. We may fail in an attempt to add a moderator without cooperation from Marcus. GS> Altering the mailing list gateway will simply serve to create GS> divergent announcement forums. If only one of the forums works, this isn't a big problem. Jeremy From gstein@lyra.org Wed Apr 19 01:05:01 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 17:05:01 -0700 (PDT) Subject: [Python-Dev] switch to ANSI C (was: baby steps for free-threading) In-Reply-To: <14588.58817.746201.456992@anthem.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Barry A. Warsaw wrote: > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Actually, it's okay to do this on an "as-neeed" basis. I'm > GvR> also in favor of changing all the K&R code to ANSI, and > GvR> getting rid of Py_PROTO and friends. Cleaner code! > > I agree, and here's yet another plea for moving to 4-space indents in > the C code. For justification, look at the extended call syntax hacks > in ceval.c. They essentially /use/ 4si because they have no choice! > > Let's clean it up in one fell swoop! Obviously not for 1.6. I > volunteer to do all three mutations. Why not for 1.6? These changes are pretty brain-dead ("does it compile?") and can easily be reviewed. If somebody out there happens to have the time to work up ANSI C patches, then why refuse them? Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Wed Apr 19 07:46:56 2000 From: gstein@lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 23:46:56 -0700 (PDT) Subject: [Python-Dev] baby steps for free-threading In-Reply-To: <200004181825.OAA13261@eric.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Guido van Rossum wrote: >... > > 1) Create a portable abstraction for using the platform's per-thread state > > mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. > > There are at least 7 other platform specific thread implementations -- > probably an 8th for the Mac. These all need to support this. (One > solution would be to have a portable implementation that uses the > thread-ID to index an array.) Yes. As the platforms "come up to speed", they can replace the fallback, portable implementation. "Users" of the TLS mechanism would allocate indices into the per-thread arrays. Another alternative is to only manage a mapping of thread-ID to ThreadState structures. The TLS code can then get the ThreadState and access the per-thread dict. Of course, the initial impetus is to solve the lookup of the ThreadState rather than a general TLS mechanism :-) Hmm. I'd say that we stick with defining a Python TLS API (in terms of the platform when possible). The fallback code would be the per-thread arrays design. "thread dict" would still exist, but is deprecated. >... > > 3) Python needs an atomic increment/decrement (internal) operation. > > > > Rationale: these are used in INCREF/DECREF to correctly increment or > > decrement the refcount in the face of multiple threads trying to do > > this. > > > > Win32: InterlockedIncrement/Decrement. pthreads would use the > > lightweight crit section above (on every INC/DEC!!). Some other > > platforms may have specific capabilities to keep this fast. Note that > > platforms (outside of their threading libraries) may have functions to > > do this. > > I'm worried here that since INCREF/DECREF are used so much this will > slow down significantly, especially on platforms that don't have safe > hardware instructions for this. This definitely slows Python down. If an object is known to be visible to only one thread, then you can avoid the atomic inc/dec. But that leads to madness :-) > So it should only be enabled when free threading is turned on. Absolutely. No question. Note to readers: the different definitions of INCREF/DECREF has an impact on mixing modules in the same way Py_TRACE_REFS does. > > 4) Python's configuration system needs to be updated to include a > > --with-free-thread option since this will not be enabled by default. > > Related changes to acconfig.h would be needed. Compiling in the above > > pieces based on the flag would be nice (although Python could switch to > > the crit section in some cases where it uses the heavy lock today) > > > > Rationale: duh > > Maybe there should be more fine-grained choices? As you say, some > stuff could be used without this flag. But in any case this is > trivial to add. Sure. For example, something like the Python TLS API could be keyed off --with-threads. Replacing _PyThreadState_Current with a TLS-based mechanism should be keyed on free threads. The "critical section" stuff could be keyed on threading -- they would be nice for Python to use internally for its standard threading operation. > > 5) An analysis of Python's globals needs to be performed. Any global that > > can safely be made "const" should. If a global is write-once (such as > > classobject.c::getattrstr), then these are marginally okay (there is a > > race condition, with an acceptable outcome, but a mem leak occurs). > > Personally, I would prefer a general mechanism in Python for creating > > "constants" which can be tracked by the runtime and freed. > > They are almost all string constants, right? Yes, I believe so. (Analysis needed) > How about a macro Py_CONSTSTROBJ("value", variable)? Sure. Note that the variable name can usually be constructed from the value. > > I would also like to see a generalized "object pool" mechanism be built > > and used for tuples, ints, floats, frames, etc. > > Careful though -- generalizing this will slow it down. (Here I find > myself almost wishing for C++ templates :-) :-) This is a desire, but not a requirement. Same with the write-once stuff. A general pool mechanism would reduce code duplication for lock management, and possibly clarify some operation. >... > > Note: making some globals "const" has a ripple effect through Python. > > This is sometimes known as "const poisoning". Guido has stated an > > acceptance to adding "const" throughout the interpreter, but would > > prefer a complete (rather than ripple-based, partial) overhaul. > > Actually, it's okay to do this on an "as-neeed" basis. I'm also in > favor of changing all the K&R code to ANSI, and getting rid of > Py_PROTO and friends. Cleaner code! Yay! :-) > > I think that is all for now. Achieving these five steps within the 1.6 > > timeframe means that the free-threading patches will be *much* smaller. It > > also creates much more visibility and testing for these sections. > > Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough > testing of some of these changes, the extensive nature of some of the [ aside: most of these changes are specified with the intent of reducing the impact on Python. most are additional behavior rather than changing behavior. ] > changes, and my other obligations during those 6 weeks, I don't see > how it can be done for 1.6. I would prefer to do an accellerated 1.7 > or 1.6.1 release that incorporates all this. (It could be called > 1.6.1 only if it'nearly identical to 1.6 for the Python user and not > too different for the extension writer.) Ah. That would be nice. It also provides some focus on what would need to occur for the extension writer: *) Python TLS API *) critical sections *) WITH_FREE_THREAD from the configure process The INCREF/DECREF and const-ness is hidden from the extension writer. Adding integrity locks to list/dict/etc is also hidden. > > Post 1.6, a patch set to add critical sections to lists and dicts would be > > built. In addition, a new analysis would be done to examine the globals > > that are available along with possible race conditions in other mutable > > types and structures. Not all structures will be made thread-safe; for > > example, frame objects are used by a single thread at a time (I'm sure > > somebody could find a way to have multiple threads use or look at them, > > but that person can take a leap, too :-) > > It is unacceptable to have thread-unsafe structures that can be > accessed in a thread-unsafe way using pure Python code only. Hmm. I guess that I can grab a frame object reference via a traceback object. The frame and traceback objects can then be shared between threads. Now the question arises: if the original thread resumes execution and starts modifying these objects (inside the interpreter since both are readonly to Python), then the passed-to thread might see invalid data. I'm not sure whether these objects have multi-field integrity constraints. Conversely: if they don't, then changing a single field will simply create a race condition with the passed-to thread. Oh, and assuming that we remove a value from the structure before DECREF'ing it. By your "pure Python" statement, I'm presuming that you aren't worried about PyTuple_SET_ITEM() and similar. However, do you really want to start locking up the frame and traceback objects? (and code objects and ...) Cheers, -g -- Greg Stein, http://www.lyra.org/ From sjoerd@oratrix.nl Wed Apr 19 10:51:53 2000 From: sjoerd@oratrix.nl (Sjoerd Mullender) Date: Wed, 19 Apr 2000 11:51:53 +0200 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Your message of Mon, 17 Apr 2000 14:37:16 -0700. References: Message-ID: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> What is wrong with encoding ]]> in the XML way by using an extra CDATA. In other words split up the CDATA section into two in the middle of the ]]> sequence: import string def encode_cdata(str): return ''), ']]]]>')) + \ ']]>' On Mon, Apr 17 2000 "David Ascher" wrote: > Lots of projects embed scripting & other code in XML, typically as CDATA > elements. For example, XBL in Mozilla. As far as I know, no one ever > bothers to define how one should _encode_ code in a CDATA segment, and it > appears that at least in the Mozilla world the 'encoding' used is 'cut & > paste', and it's the XBL author's responsibility to make sure that ]]> is > nowhere in the JavaScript code. > > That seems suboptimal to me, and likely to lead to disasters down the line. > > The only clean solution I can think of is to define a standard > encoding/decoding process for storing program code (which may very well > contain occurences of ]]> in CDATA, which effectively hides that triplet > from the parser. > > While I'm dreaming, it would be nice if all of the relevant language > communities (JS, Python, Perl, etc.) could agree on what that encoding is. > I'd love to hear of a recommendation on the topic by the XML folks, but I > haven't been able to find any such document. > > Any thoughts? > > --david ascher > > -- Sjoerd Mullender From SalzR@CertCo.com Wed Apr 19 15:57:16 2000 From: SalzR@CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 10:57:16 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading Message-ID: >This definitely slows Python down. If an object is known to be visible to >only one thread, then you can avoid the atomic inc/dec. But that leads to >madness :-) I would much rather see the language extended to indicate that a particular variable is "shared" across free-threaded interpreters. The hit of taking a mutex on every incref/decref is way bad. From gvwilson@nevex.com Wed Apr 19 16:03:17 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Wed, 19 Apr 2000 11:03:17 -0400 (EDT) Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading In-Reply-To: Message-ID: > Rich Salz wrote: > I would much rather see the language extended to indicate that a > particular variable is "shared" across free-threaded interpreters. The > hit of taking a mutex on every incref/decref is way bad. In my experience, allowing/requiring programmers to specify sharedness is a very rich source of hard-to-find bugs. (Not saying I have an answer to the performance hit of locking on incref/decref, just saying that the development cost of 'shared' is very high.) Greg From petrilli@amber.org Wed Apr 19 16:09:04 2000 From: petrilli@amber.org (Christopher Petrilli) Date: Wed, 19 Apr 2000 11:09:04 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading In-Reply-To: ; from SalzR@CertCo.com on Wed, Apr 19, 2000 at 10:57:16AM -0400 References: Message-ID: <20000419110904.C6107@trump.amber.org> Salz, Rich [SalzR@CertCo.com] wrote: > >This definitely slows Python down. If an object is known to be visible to > >only one thread, then you can avoid the atomic inc/dec. But that leads to > >madness :-) > > I would much rather see the language extended to indicate that a particular > variable is "shared" across free-threaded interpreters. The hit of taking > a mutex on every incref/decref is way bad. I wonder if the energy is better spent in a truly highly-optimized implementation on the major platforms rather than trying to conditional this. This may mean writing x86 assembler, and a few others, but then again, once written, it shouldn't need much modification. I wonder if the conditional mutexing might be slower because of the check and lack of focus on bringing the core implementation up to speed. Chris -- | Christopher Petrilli | petrilli@amber.org From ping@lfw.org Wed Apr 19 16:40:08 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 08:40:08 -0700 (PDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> Message-ID: On Wed, 19 Apr 2000, Sjoerd Mullender wrote: > What is wrong with encoding ]]> in the XML way by using an extra > CDATA. In other words split up the CDATA section into two in the > middle of the ]]> sequence: Brilliant. Now that i've seen it, this has to be the right answer. -- ?!ng "Je n'aime pas les stupides garçons, même quand ils sont intelligents." -- Roople Unia From SalzR@CertCo.com Wed Apr 19 17:04:48 2000 From: SalzR@CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 12:04:48 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading Message-ID: >In my experience, allowing/requiring programmers to specify sharedness is >a very rich source of hard-to-find bugs. My experience is the opposite, since most objects aren't shared. :) You could probably do something like add an "owning thread" to each object structure, and on refcount throw an exception if not shared and the current thread isn't the owner. Not sure if space is a concern, but since the object is either shared or needs its own mutex, you make them a union: bool shared; union { python_thread_id_type id; python_mutex_type m; }; (Not saying I have an answer to the performance hit of locking on incref/decref, just saying that the development cost of 'shared' is very high.) Greg _______________________________________________ Thread-SIG maillist - Thread-SIG@python.org http://www.python.org/mailman/listinfo/thread-sig From ping@lfw.org Wed Apr 19 18:07:36 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 10:07:36 -0700 (PDT) Subject: [Python-Dev] Generic notifier module Message-ID: I think it would be very nice for the Python standard library to provide a messaging mechanism (you may know it as signals/slots, publish/subscribe, listen/notify, etc.). This could be very useful, especially for interactive applications where various components need to keep each other up to date about things. I know of several Tkinter programs where i'd like to use this mechanism. The proposed interface is: To add notification ability, mix in class notifier.Notifier. object.notify(message, callback) - Set up notification for message. object.denotify(message[, callback]) - Turn off notification. object.send(message, **args) - Call all callbacks registered on object for message, in reverse order of registration, passing along message and **args as arguments to each callback. If a callback returns notifier.BREAK, no further callbacks are called. (Alternatively, we could use signals/slots terminology: connect/disconnect/emit. I'm not aware of anything the signals/slots mechanism has that the above lacks.) Two kinds of messages are supported: 1. The 'message' passed to notify/denotify may be a class, and the 'message' passed to send may be a class or instance of a message class. In this case callbacks registered on that class and all its bases are called. 2. The 'message' passed to all three methods may be any other hashable object, in which case it is looked up by its hash, and callbacks registered on a hash-equal object are called. Thoughts and opinions are solicited (especially from those who have worked with messaging-type things before, and know the gotchas!). I haven't run into many tricky problems with these things in general, and i figure that the predictable order of callbacks should reduce complication. (I chose reverse ordering so that you always have the ability to add a callback that overrides existing ones.) A straw-man implementation follows. The callback registry is maintained in the notifier module so you don't have to worry about it messing up the attributes of your objects. -------- snip snip ---------------------------------- notifier.py -------- # If a callback returns BREAK, no more callbacks are called. BREAK = "break" # This number goes up every time a callback is added. serial = 0 # This dictionary maps callback functions to serial numbers. callbacks = {} def recipients(sender, message): """Return a list of (serial, callback) pairs for all the callbacks on this message and its base classes.""" key = (sender, message) if callbacks.has_key(key): list = map(lambda (k, v): (v, k), callbacks[key].items()) else: list = [] if hasattr(message, "__bases__"): for base in message.__bases__: list.extend(recipients(sender, base)) return list class Notifier: def send(self, message, **args): """Call any callbacks registered on this object for the given message. If message is a class or instance, callbacks registered on the class or any base class are called. Otherwise callbacks registered on a message of the same value (compared by hash) are called. The message and any extra keyword arguments are passed along to each callback.""" if hasattr(message, "__class__"): message = message.__class__ recip = recipients(self, message) recip.sort() recip.reverse() for serial, callback in recip: if callback(message, **args) == BREAK: return def notify(self, message, callback): """Register a callback on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if not callbacks.has_key(key): callbacks[key] = {} callbacks[key][callback] = serial = serial + 1 def denotify(self, message, callback=None): """Unregister a particular callback or all existing callbacks on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] elif callbacks[key].has_key(callback): del callbacks[key][callback] -------- snip snip ---------------------------------- notifier.py -------- -- ?!ng "Je n'aime pas les stupides garçons, même quand ils sont intelligents." -- Roople Unia From ping@lfw.org Wed Apr 19 18:25:12 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 10:25:12 -0700 (PDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Ka-Ping Yee wrote: > object.send(message, **args) - Call all callbacks registered on > object for message, in reverse order of registration, passing > along message and **args as arguments to each callback. One revision to the above: callbacks should get the sender of the message passed in as well as the message. The tweaked module follows. -------- snip snip ---------------------------------- notifier.py -------- # If a callback returns BREAK, no more callbacks are called. BREAK = "break" # This number goes up every time a callback is added. serial = 0 # This dictionary maps callback functions to serial numbers. callbacks = {} def recipients(sender, message): """Return a list of (serial, callback) pairs for all the callbacks on this message and its base classes.""" key = (sender, message) if callbacks.has_key(key): list = map(lambda (k, v): (v, k), callbacks[key].items()) else: list = [] if hasattr(message, "__bases__"): for base in message.__bases__: list.extend(recipients(sender, base)) return list class Notifier: """Mix in this class to provide notifier functionality on your objects. On a notifier object, use the 'notify' and 'denotify' methods to register or unregister callbacks on messages, and use the 'send' method to send a message from the object.""" def send(self, message, **args): """Call any callbacks registered on this object for the given message. If message is a class or instance, callbacks registered on the class or any base class are called. Otherwise callbacks registered on a message of the same value (compared by hash) are called. The message and any extra keyword arguments are passed along to each callback.""" if hasattr(message, "__class__"): message = message.__class__ recip = recipients(self, message) recip.sort() recip.reverse() for serial, callback in recip: if callback(self, message, **args) == BREAK: return def notify(self, message, callback): """Register a callback on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if not callbacks.has_key(key): callbacks[key] = {} callbacks[key][callback] = serial def denotify(self, message, callback=None): """Unregister a particular callback or all existing callbacks on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] elif callbacks[key].has_key(callback): del callbacks[key][callback] -------- snip snip ---------------------------------- notifier.py -------- -- ?!ng "Je n'aime pas les stupides garçons, même quand ils sont intelligents." -- Roople Unia From Fredrik Lundh" Message-ID: <001901bfaa22$e202af60$34aab5d4@hagrid> Ka-Ping Yee wrote: > I think it would be very nice for the Python standard library to > provide a messaging mechanism (you may know it as signals/slots, > publish/subscribe, listen/notify, etc.). your notifier looks like a supercharged version of the "Observer" pattern [1]. here's a minimalistic observer mixin from "(the eff- bot guide to) Python Patterns and Idioms" [2]. class Observable: __observers =3D None def addobserver(self, observer): if not self.__observers: self.__observers =3D [] self.__observers.append(observer) def removeobserver(self, observer): self.__observers.remove(observer) def notify(self, event): for o in self.__observers or (): o(event) notes: -- in the GOF pattern, to "notify" is to tell observers that something happened, not to register an observer. -- GOF uses "attach" and "detach" to install and remove observers; the pattern book version uses slightly more descriptive names. -- the user is expected to use bound methods and event instances (or classes) to associate data with the notifier and events. earlier implementations were much more elaborate, but we found that the standard mechanisms was more than sufficient in real life... 1) "Design Patterns", by Gamma et al. 2) http://www.pythonware.com/people/fredrik/patternbook.htm From DavidA@ActiveState.com Wed Apr 19 18:43:26 2000 From: DavidA@ActiveState.com (David Ascher) Date: Wed, 19 Apr 2000 10:43:26 -0700 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> Message-ID: > What is wrong with encoding ]]> in the XML way by using an extra > CDATA. In other words split up the CDATA section into two in the > middle of the ]]> sequence: > > import string > def encode_cdata(str): > return ' string.join(string.split(str, ']]>'), ']]]]>')) + \ > ']]>' If I understand what you're proposing, you're splitting a single bit of Python code into N XML elements. This requires smarts not on the decode function (where they should be, IMO), but on the XML parsing stage (several leaves of the tree have to be merged). Seems like the wrong direction to push things. Also, I can imagine cases where the app puts several scripts in consecutive CDATA elements (assuming that's legal XML), and where a merge which inserted extra ]]> would be very surprising. Maybe I'm misunderstanding you, though.... --david ascher From Fredrik Lundh" Message-ID: <000701bfaa27$c3546e00$34aab5d4@hagrid> David Ascher wrote: > > What is wrong with encoding ]]> in the XML way by using an extra > > CDATA. In other words split up the CDATA section into two in the > > middle of the ]]> sequence: > > > > import string > > def encode_cdata(str): > > return ' > string.join(string.split(str, ']]>'), ']]]]>')) + \ > > ']]>' >=20 > If I understand what you're proposing, you're splitting a single bit = of > Python code into N XML elements. nope. CDATA sections are used to encode data, they're not elements: XML 1.0, section 2.7: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as mark- up. you can put each data character in its own CDATA section, if you like. if the parser cannot handle that, it's broken. (if you've used xmllib, think handle_data, not start_cdata). From sjoerd@oratrix.nl Wed Apr 19 20:24:31 2000 From: sjoerd@oratrix.nl (Sjoerd Mullender) Date: Wed, 19 Apr 2000 21:24:31 +0200 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Your message of Wed, 19 Apr 2000 10:43:26 -0700. References: Message-ID: <20000419192432.F2A19301CF9@bireme.oratrix.nl> On Wed, Apr 19 2000 "David Ascher" wrote: > > What is wrong with encoding ]]> in the XML way by using an extra > > CDATA. In other words split up the CDATA section into two in the > > middle of the ]]> sequence: > > > > import string > > def encode_cdata(str): > > return ' > string.join(string.split(str, ']]>'), ']]]]>')) + \ > > ']]>' > > If I understand what you're proposing, you're splitting a single bit of > Python code into N XML elements. This requires smarts not on the decode > function (where they should be, IMO), but on the XML parsing stage (several > leaves of the tree have to be merged). Seems like the wrong direction to > push things. Also, I can imagine cases where the app puts several scripts > in consecutive CDATA elements (assuming that's legal XML), and where a merge > which inserted extra ]]> would be very surprising. > > Maybe I'm misunderstanding you, though.... I think you're not misunderstanding me, but maybe you are misunderstanding XML. :-) [Of course, it is also conceivable that I misunderstand XML. :-] First of all, I don't propose to split up the single bit of Python into multiple XML elements. CDATA sections are not XML elements. The XML standard says this: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. [http://www.w3.org/TR/REC-xml#sec-cdata-sect] In other words, according to the XML standard wherever you are allowed to put character data (such as in this case Python code), you are allowed to use CDATA sections. Their purpose is to escape blocks of text containing characters that would otherwise be recognized as markup. CDATA sections are not part of the markup, so the XML parser is allowed to coallese the multiple CDATA sections and other character data into one string before it gives it to the application. So, yes, this requires smarts on the XML parsing stage, but I think those smarts need to be there anyway. If an application put several pieces of Python code in one character data section, it is basically on its own. I don't think XML guarantees that those pieces aren't merged into one string by the XML parser before it gets to the application. As I said already, this is my interpretation of XML, and I could be misinterpreting things. -- Sjoerd Mullender From ping@lfw.org Wed Apr 19 21:14:39 2000 From: ping@lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 15:14:39 -0500 (CDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <001901bfaa22$e202af60$34aab5d4@hagrid> Message-ID: On Wed, 19 Apr 2000, Fredrik Lundh wrote: > > your notifier looks like a supercharged version of the "Observer" > pattern [1]. here's a minimalistic observer mixin from "(the eff- > bot guide to) Python Patterns and Idioms" [2]. Oh, yeah, "observer". That was the other name for this mechanism that i forgot. > class Observable: I'm not picky about names... anything is fine. > def notify(self, event): > for o in self.__observers or (): > o(event) *Some* sort of dispatch would be nice, i think, rather than having to check the kind of event you're getting in every callback. Here are the three sources of "more stuff" in Notifier as opposed to Observer: 1. Dispatch. You register callbacks for particular messages rather than on the whole object. 2. Ordering. The callbacks are always called in reverse order of registration, which makes BREAK meaningful. 3. Inheritance. You can use a class hierarchy of messages. I think #1 is fairly essential, and i know i've come across situations where #3 is useful. The need for #2 is only a conjecture on my part. Does anyone care about the order in which callbacks get called? If not (and no one needs to use BREAK), we can throw out #2 and make Notifier simpler: callbacks = {} def send(key, message, **args): if callbacks.has_key(key): for callback in callbacks[key]: callback(key[0], message, **args) if hasattr(key[1], "__bases__"): for base in key[1].__bases__: send((key[0], base), message, **args) class Notifier: def send(self, message, **args): if hasattr(message, "__class__"): send((self, message.__class__), message, **args) else: send((self, message), message, **args) def notify(self, message, callback): key = (self, message) if not callbacks.has_key(key): callbacks[key] = [] callbacks[key].append(callback) def denotify(self, message, callback=None): key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] else: callbacks[key].remove(callback) -- ?!ng From paul@prescod.net Wed Apr 19 21:19:31 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 19 Apr 2000 15:19:31 -0500 Subject: [Python-Dev] Encoding of code in XML References: Message-ID: <38FE14D3.AC05DAE0@prescod.net> David Ascher wrote: > > ... > > If I understand what you're proposing, you're splitting a single bit of > Python code into N XML elements. No, a CDATA section is not an element. But the question of whether boundary placements are meaningful is sepearate. This comes back to the "semantics question". Most tools will not differentiate between two adjacent CDATA sections and one. The XML specification does not say whether they should or should not but in practice tools that consume XML and then throw it away typically do NOT care about CDATA section boundaries and tools that edit XML *do* care. This "break it into to two sections" solution is the typical one but it is god-awful ugly, even in XML editors. Many stream-based XML tools (e.g. mostSAX parsers, xmllib) *will* report two separate CDATA sections as two different character events. Application code must be able to handle this situation. It doesn't only occur with CDATA sections. XML parsers could equally break up long text chunks based on 1024-byte block boundaries or line breaks or whatever they feel like. In my opinion these variances in behvior stem from the myth that XML has no semantics but that's another off-topic topic. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From gstein@lyra.org Wed Apr 19 21:27:11 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 13:27:11 -0700 (PDT) Subject: [Python-Dev] marking shared-ness (was: baby steps for free-threading) In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Salz, Rich wrote: > >In my experience, allowing/requiring programmers to specify sharedness is > >a very rich source of hard-to-find bugs. > > My experience is the opposite, since most objects aren't shared. :) > You could probably do something like add an "owning thread" to each object > structure, and on refcount throw an exception if not shared and the current > thread isn't the owner. Not sure if space is a concern, but since the object > is either shared or needs its own mutex, you make them a union: > bool shared; > union { > python_thread_id_type id; > python_mutex_type m; > }; > > > (Not saying I have an answer to > the performance hit of locking on incref/decref, just saying that the > development cost of 'shared' is very high.) Regardless of complexity or lack thereof, any kind of "specified sharedness" cannot be implemented. Consider the case where a programmer forgets to note the sharedness. He passes the object to another thread. At certain points: BAM! The interpreter dumps core. Guido has specifically stated that *nothing* should ever allow that (in terms of pure Python code; bad C extension coding is all right). Sharedness has merit, but it cannot be used :-( Cheers, -g -- Greg Stein, http://www.lyra.org/ From SalzR@CertCo.com Wed Apr 19 21:27:10 2000 From: SalzR@CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 16:27:10 -0400 Subject: [Python-Dev] RE: [Thread-SIG] marking shared-ness (was: baby steps for free-th reading) Message-ID: >Consider the case where a programmer forgets to note the sharedness. He >passes the object to another thread. At certain points: BAM! The >interpreter dumps core. No. Using the "owning thread" idea prevents coredumps and allows the interpreter to throw an exception. Perhaps my note wasn't clear enough? /r$ From paul@prescod.net Wed Apr 19 21:25:42 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 19 Apr 2000 15:25:42 -0500 Subject: [Python-Dev] Encoding of code in XML References: <20000419192432.F2A19301CF9@bireme.oratrix.nl> Message-ID: <38FE1646.B29CAA8A@prescod.net> Sjoerd Mullender wrote: > > ... > > CDATA sections are not part of the markup, so the XML parser > is allowed to coallese the multiple CDATA sections and other character > data into one string before it gives it to the application. Allowed but not required. Most SAX parsers will not. Some DOM parsers will and some won't. :( > So, yes, this requires smarts on the XML parsing stage, but I think > those smarts need to be there anyway. I don't follow this part. Typically those "smarts" are not there. At the end of one CDATA section you get an event and at the beginning of the next you get a different event. It's the application's job to glue them together. :( Fixing this is one of the goals of EventDOM. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From tismer@tismer.com Wed Apr 19 21:38:31 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 19 Apr 2000 22:38:31 +0200 Subject: [Python-Dev] marking shared-ness (was: baby steps for free-threading) References: Message-ID: <38FE1947.70FC6AEE@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Salz, Rich wrote: > > >In my experience, allowing/requiring programmers to specify sharedness is > > >a very rich source of hard-to-find bugs. > > > > My experience is the opposite, since most objects aren't shared. :) > > You could probably do something like add an "owning thread" to each object > > structure, and on refcount throw an exception if not shared and the current > > thread isn't the owner. Not sure if space is a concern, but since the object > > is either shared or needs its own mutex, you make them a union: > > bool shared; > > union { > > python_thread_id_type id; > > python_mutex_type m; > > }; > > > > > > (Not saying I have an answer to > > the performance hit of locking on incref/decref, just saying that the > > development cost of 'shared' is very high.) > > Regardless of complexity or lack thereof, any kind of "specified > sharedness" cannot be implemented. > > Consider the case where a programmer forgets to note the sharedness. He > passes the object to another thread. At certain points: BAM! The > interpreter dumps core. > > Guido has specifically stated that *nothing* should ever allow that (in > terms of pure Python code; bad C extension coding is all right). > > Sharedness has merit, but it cannot be used :-( Too bad that we don't have incref/decref as methods. The possible mutables which have to be protected could in fact carry a thread handle of their current "owner" (probably the one who creted them), and incref would check whether the owner is still same. If it is not same, then the owner field would be wiped, and that turns the (higher cost) shared refcounting on, and all necessary protection as well. (Maybe some extra care is needed to ensure that this info isn't changed while we are testing it). Without inc/dec-methods, something similar could be done, but every inc/decref will be a bit more expensive since we must figure out wether we have a mutable or not. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein@lyra.org Wed Apr 19 21:52:12 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 13:52:12 -0700 (PDT) Subject: [Python-Dev] RE: marking shared-ness In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Salz, Rich wrote: > >Consider the case where a programmer forgets to note the sharedness. He > >passes the object to another thread. At certain points: BAM! The > >interpreter dumps core. > > No. Using the "owning thread" idea prevents coredumps and allows the > interpreter to throw an exception. Perhaps my note wasn't clear > enough? INCREF and DECREF cannot throw exceptions. Are there other points where you could safely detect erroneous sharing of objects? (in a guaranteed fashion) For example: what are all the ways that objects can be transported between threads. Can you erect tests at each of those points? I believe "no" since there are too many ways (func arg or an item in a shared ob). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Wed Apr 19 22:15:39 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:15:39 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FE1947.70FC6AEE@tismer.com> Message-ID: On Wed, 19 Apr 2000, Christian Tismer wrote: >... > Too bad that we don't have incref/decref as methods. This would probably impose more overhead than some of the atomic inc/dec mechanisms. > The possible mutables which have to be protected could Non-mutable objects must be protected, too. An integer can be shared just as easily as a list. > in fact carry a thread handle of their current "owner" > (probably the one who creted them), and incref would > check whether the owner is still same. > If it is not same, then the owner field would be wiped, > and that turns the (higher cost) shared refcounting on, > and all necessary protection as well. > (Maybe some extra care is needed to ensure that this info > isn't changed while we are testing it). Ah. Neat. "Automatic marking of shared-ness" Could work. That initial test for the thread id could be expensive, though. What is the overhead of getting the current thread id? [ ... thinking about the code ... ] Nope. Won't work at all. There is a race condition when an object "becomes shared". DECREF: if ( object is not shared ) /* whoops! it just became shared! */ --(op)->ob_refcnt; else atomic_decrement(op) To prevent the race, you'd need an interlock which is more expensive than an atomic decrement. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer@tismer.com Wed Apr 19 22:25:45 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 19 Apr 2000 23:25:45 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <38FE2459.E0300B5@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Christian Tismer wrote: > >... > > Too bad that we don't have incref/decref as methods. > > This would probably impose more overhead than some of the atomic inc/dec > mechanisms. > > > The possible mutables which have to be protected could > > Non-mutable objects must be protected, too. An integer can be shared just > as easily as a list. Uhh, right. Everything is mutable, since me mutate the refcount :-( ... > Ah. Neat. "Automatic marking of shared-ness" > > Could work. That initial test for the thread id could be expensive, > though. What is the overhead of getting the current thread id? Zero if we cache it in the thread state. > [ ... thinking about the code ... ] > > Nope. Won't work at all. @#$%§!!-| yes-you-are-right - gnnn! > There is a race condition when an object "becomes shared". > > DECREF: > if ( object is not shared ) > /* whoops! it just became shared! */ > --(op)->ob_refcnt; > else > atomic_decrement(op) > > To prevent the race, you'd need an interlock which is more expensive than > an atomic decrement. Really, sad but true. Are atomic decrements really so cheap, meaning "are they mapped to the atomic dec opcode"? Then this is all ok IMHO. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein@lyra.org Wed Apr 19 22:34:19 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:34:19 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FE2459.E0300B5@tismer.com> Message-ID: On Wed, 19 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Ah. Neat. "Automatic marking of shared-ness" > > > > Could work. That initial test for the thread id could be expensive, > > though. What is the overhead of getting the current thread id? > > Zero if we cache it in the thread state. You don't have the thread state at incref/decref time. And don't say "_PyThreadState_Current" or I'll fly to Germany and personally kick your ass :-) >... > > There is a race condition when an object "becomes shared". > > > > DECREF: > > if ( object is not shared ) > > /* whoops! it just became shared! */ > > --(op)->ob_refcnt; > > else > > atomic_decrement(op) > > > > To prevent the race, you'd need an interlock which is more expensive than > > an atomic decrement. > > Really, sad but true. > > Are atomic decrements really so cheap, meaning "are they mapped > to the atomic dec opcode"? On some platforms and architectures, they *might* be. On Win32, we call InterlockedIncrement(). No idea what that does, but I don't think that it is a macro or compiler-detected thingy to insert opcodes. I believe there is a function call involved. pthreads do not define atomic inc/dec, so we must use a critical section + normal inc/dec operators. Linux has a kernel macro for atomic inc/dec, but it is only valid if __SMP__ is defined in your compilation context. etc. Platforms that do have an API (as Donn stated: BeOS has one; Win32 has one), they will be cheaper than an interlock. Therefore, we want to take advantage of an "atomic inc/dec" semantic when possible (and fallback to slower stuff when not). Cheers, -g -- Greg Stein, http://www.lyra.org/ From fleck@triton.informatik.uni-bonn.de Wed Apr 19 22:32:42 2000 From: fleck@triton.informatik.uni-bonn.de (Markus Fleck) Date: Wed, 19 Apr 2000 23:32:42 +0200 (MET DST) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: from "Greg Stein" at Apr 18, 2000 02:16:44 PM Message-ID: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Greg Stein: > Nevertheless, adding more moderators is the "proper" answer to the > problem. Even if it is difficult to get more moderators into the system, > there doesn't seem to be a better alternative. I agree with this. What would be helpful would be (i) a web interface for multiple-moderator moderation (which I believe Mailman already provides), and (ii) some rather simple changes to the list-to-newsgroup gateway to do some header manipulations before posting each approved message to c.l.py.a. I've been more or less "off the Net" for almost two months now, while getting started at my new job, and I will try to do some (summary-style) retro-moderation of the ca. 50 c.l.py.a submissions that I missed during this time. Automating the submission process and getting additional moderators would make c.l.py.a less dependent on me and avoid such moderation lags in the future. (And yes, of course, I'm sorry for the lag. But now I'm back, and I'm willing to help change the process so that such lags won't happen again in the future. Getting additional moderators would likely help with this.) Yours, Markus. From gstein@lyra.org Wed Apr 19 22:42:34 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:42:34 -0700 (PDT) Subject: [Python-Dev] optimize atomic inc/dec? (was: baby steps for free-threading) In-Reply-To: <20000419110904.C6107@trump.amber.org> Message-ID: On Wed, 19 Apr 2000, Christopher Petrilli wrote: > Salz, Rich [SalzR@CertCo.com] wrote: > > >This definitely slows Python down. If an object is known to be visible to > > >only one thread, then you can avoid the atomic inc/dec. But that leads to > > >madness :-) > > > > I would much rather see the language extended to indicate that a particular > > variable is "shared" across free-threaded interpreters. The hit of taking > > a mutex on every incref/decref is way bad. > > I wonder if the energy is better spent in a truly highly-optimized > implementation on the major platforms rather than trying to > conditional this. This may mean writing x86 assembler, and a few > others, Bill Tutt had a good point -- we can get a bunch of assembler fragments from the Linux kernel for atomic inc/dec. On specific compiler and processor architecture combinations, we could drop to assembly to provide an atomic dec/inc. For example, when we see we're using GCC on an x86 processor (whether FreeBSD or Linux), we can define atomic_inc() as an __asm fragment. > but then again, once written, it shouldn't need much > modification. I wonder if the conditional mutexing might be slower > because of the check and lack of focus on bringing the core > implementation up to speed. Won't work anyhow. See previous email. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy@cnri.reston.va.us Wed Apr 19 22:32:17 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Wed, 19 Apr 2000 17:32:17 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004192132.XAA14501@hera.informatik.uni-bonn.de> References: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: <14590.9697.366632.708503@goon.cnri.reston.va.us> Glad to hear from you, Marcus! I'm willing to help with both (a) and (b). I'll talk to Barry about the Mailman issues tomorrow. Jeremy From DavidA@ActiveState.com Wed Apr 19 22:46:49 2000 From: DavidA@ActiveState.com (David Ascher) Date: Wed, 19 Apr 2000 14:46:49 -0700 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: I can help moderate as well. --david ascher From billtut@microsoft.com Wed Apr 19 22:28:12 2000 From: billtut@microsoft.com (Bill Tutt) Date: Wed, 19 Apr 2000 14:28:12 -0700 Subject: [Python-Dev] RE: [Thread-SIG] Re: marking shared-ness Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF71@RED-MSG-50> > From: Christian Tismer [mailto:tismer@tismer.com] > Are atomic decrements really so cheap, meaning "are they mapped > to the atomic dec opcode"? > Then this is all ok IMHO. > On x86en they are mapped to an "atomic assembly fragment" i.e. params to some registers and then stick in a "lock add" instruction or something. (please forgive me if I've botched the details). So in that respect they are cheap, its a hardware level feature. On the otherhand though, given the effect that these instructions have on the CPU (its caches, buses, and so forth) it is by for no means free. My recollection vaguely recalls someone saying that all the platforms NT has supported so far has had at the minimum an InterlockedInc/Dec. InterlockCompareExchange() is where I think not all of the Intel family (386) and some of the other platforms may not have had the appropriate instructions. InterlockCompareExchange() is useful for creating your own spinlocks. The GCC list might be a good place for enquiring about the feasability of InterlockedInc/Dec on various platforms. Bill From bwarsaw@cnri.reston.va.us Thu Apr 20 01:51:55 2000 From: bwarsaw@cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 19 Apr 2000 20:51:55 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce References: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: <14590.21675.154913.979353@anthem.cnri.reston.va.us> >>>>> "MF" == Markus Fleck writes: MF> I agree with this. What would be helpful would be (i) a web MF> interface for multiple-moderator moderation (which I believe MF> Mailman already provides), and (ii) some rather simple changes MF> to the list-to-newsgroup gateway to do some header MF> manipulations before posting each approved message to MF> c.l.py.a. This is doable in Mailman, but I'm not so sure how much it will help, unless we make a further refinement. I don't know enough about the Usenet moderation process to know if this will work, but let me outline things here. There's two ways a message can get announced, first via email or first via Usenet. Here's what happens in each case: - A message is sent to python-announce@python.org. This is the preferred email address to post to. These messages get forwarded to clpa@python.net, which I believe is just a simple exploder to Markus and Vladimir. Obviously with the Starship current dead, this is broken too. I don't know what happens to these messages once Markus and Vladimir get it, but I assume that Markus adds a magic approval header and forwards the message to Usenet. Perhaps Markus can explain this process in more detail. - A message is sent to python-announce-list@python.org. This is not the official place to send announcements, but this specific alias simply forwards to python-announce@python.org so see above. Note that the other standard Mailman python-announce-list-* aliases are in place, and python-announce-list is a functioning Mailman mailing list. This list gates from Usenet, but not /to/ Usenet because of the forwarding described above. When it sees a message on c.l.py.a, it sucks the messages off the newsgroup and forwards it to all list members. Obviously those messages must have already been approved by the Usenet moderators. - A message is sent directly to c.l.py.a. From what I understand, the Usenet software itself forwards to the moderators, who again, do their magic and forwards the message to Usenet. So, given this arrangement, the messages never arrive unapproved at a mailing list. What it sounds like Markus is proposing is that the official Usenet moderator address would be a mailing list. It would be a closed mailing list whose members are approved moderators, with a shared Mailman alias. Any message posted there would be held for approval, and once approved, it would be injected directly into Usenet, with the appropriate magic header. I think I know what I'd need to add to Mailman to support this, though it'll be a little tricky. I need to know exactly how approved messages should be posted to Usenet. Does someone have a URL reference to this procedure, or is it easy enough to explain? -Barry From gstein@lyra.org Thu Apr 20 05:09:13 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 21:09:13 -0700 (PDT) Subject: [Python-Dev] [OT] [Q] corruption in DB files? Message-ID: Hey guys, You're the Smart Guys that I know, and it seems this is also the forum where I once heard a long while back that DB can occasionally corrupt its files. True? Was it someone here that mentioned that? (Skip?) Or maybe it was bsddb? (or is that the same as the Berkeley DB, now handled by Sleepycat) A question just came up elsewhere about DB and I seemed to recall somebody mentioning the occasional corruption. Oh, maybe it was related to multiple threads. Any help appreciated! Cheers, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Thu Apr 20 07:09:45 2000 From: Moshe Zadka (Moshe Zadka) Date: Thu, 20 Apr 2000 09:09:45 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Ka-Ping Yee wrote: > object.denotify(message[, callback]) - Turn off notification. You need to be a bit more careful here. What if callback is foo().function? It's unique, so I could never denotify it. A better way, and more popular (at least in the signal/slot terminology), is to return a cookie on connect, and have disconnect requests by a cookie. > object.send(message, **args) - Call all callbacks registered on > object for message, in reverse order of registration, passing > along message and **args as arguments to each callback. > If a callback returns notifier.BREAK, no further callbacks > are called. When I implemented that mechanism, I just used a special exception (StopCommandExecution). I prefer that, since it allows the programmer much more flexibility (which I used) > (Alternatively, we could use signals/slots terminology: > connect/disconnect/emit. I'm not aware of anything the signals/slots > mechanism has that the above lacks.) Me neither. Some offer a variety of connect-methods: connect after, connect-before (this actually has some uses). Have a short look at the Gtk+ signal mechanism -- it has all these. > 1. The 'message' passed to notify/denotify may be a class, and > the 'message' passed to send may be a class or instance of > a message class. In this case callbacks registered on that > class and all its bases are called. This seems a bit unneccessary, but YMMV. In all cases I've needed this, a simple string sufficed (i.e., method 2) Implementation nit: I usually use class _BREAK: pass BREAK=_BREAK() That way it is gurranteed that BREAK is unique. Again, I use this mostly with exceptions. All in all, great idea Ping! -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fleck@triton.informatik.uni-bonn.de Thu Apr 20 08:02:33 2000 From: fleck@triton.informatik.uni-bonn.de (Markus Fleck) Date: Thu, 20 Apr 2000 09:02:33 +0200 (MET DST) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14590.21675.154913.979353@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at Apr 19, 2000 08:51:55 PM Message-ID: <200004200702.JAA14939@hera.informatik.uni-bonn.de> Barry A. Warsaw: > What it sounds like Markus is proposing is that the official Usenet > moderator address would be a mailing list. It would be a closed > mailing list whose members are approved moderators, with a shared > Mailman alias. Any message posted there would be held for approval, > and once approved, it would be injected directly into Usenet, with the > appropriate magic header. Exactly. (In fact, each approved message could be both posted to Usenet and forwarded to the subscription-based shadow mailing list at the same time.) > I think I know what I'd need to add to Mailman to support this, though > it'll be a little tricky. I need to know exactly how approved messages > should be posted to Usenet. Does someone have a URL reference to this > procedure, or is it easy enough to explain? Basically, you need two headers: Newsgroups: comp.lang.python.announce Approved: python-announce@python.org The field contents of the "Approved:" header are in fact never checked for validity; it only has to be non-empty for the message to be successfully posted to a moderated newsgroup. (BTW, posting to the "alt.hackers" newsgroup actually relies on posters inserting "Approved: whatever" headers on their own, because "alt.hackers" is a moderated newsgroup without a moderator. You need to "hack" the Usenet moderation mechanism to be able to post there. :-) Because of the simplicity of this mechanism, no cross-posting to another moderated newsgroup should occur when posting an approved message to Usenet; e.g. if someone cross-posts to comp.lang.python, comp.lang.python.announce, comp.os.linux.misc and comp.os.linux.announce, the posting will go to the moderation e-mail address of the first moderated newsgroup in the "Newsgroups:" header supplied by the author's Usenet posting agent. (I.e., in this case, clpa@starship.skyport.net, if the header enumerates newsgroups in the above-mentioned order, "c.l.py,c.l.py.a,c.o.l.a,c.o.l.m".) Ideally, the moderators (or moderation software) of this first moderated newsgroup should split the posting up accordingly: a) remove names of newsgroups that we want to handle ourselves (e.g. c.l.py.a, possibly also c.l.py if cross-posted), and re-post the otherwise unchanged message to Usenet with only a changed "Newsgroups:" header (Headers: "Newsgroups: c.o.l.a,c.o.l.m" / no "Approved:" header added) -> this is necessary for the message to ever reach c.o.l.a and c.o.l.m -> the message will get forwarded by the Usenet server software to the moderation address of c.o.l.a, which is the first moderated newsgroup in the remaining list of newsgroups c) approve (or reject) posting to c.l.py.a and/or c.l.py (Headers: "Newsgroups: c.l.py.a" or "Newsgroups: c.l.py.a,c.l.py" or "Newsgroups: c.l.py" / an "Approved: python-announce@python.org" header may always be added, but is only necessary if also posting to c.l.py.a) According to the c.l.py.a posting guidelines, a "Followup-To:" header, will be added, if it doesn't exist yet, pointing to c.l.py for follow-up messages ("Follow-Up: c.l.py"). While a) may always happen automatically, prior to moderation, and needs to be custom-tailored for our c.l.py.a/c.l.py use case, the moderation software for b), i.e. Mailman, should allow moderators to adjust the "Newsgroups:" header while approving a message. It might also be nice to have an "X-Original-Newsgroups:" line in Mailman with a copy of the original "Newsgroups:" line. Regarding headers, usually e-mail will allow and forward almost any non-standard header field (a feature that is used to preserve the "Newsgroups:" header even when forwarding a posting to an e-mail address), but the Usenet server software may not accept all kinds of headers, so that just before posting, only known "standard" header fields should be preserved; any "X-*:" headers, for example, might be candidates for removal prior to posting, because some Usenet servers return strange errors when a message is posted that contains certain special "X-*:" headers. OTOH, AFAIK, the posting agent should generate and add a unique "Message-ID:" header for each Usenet posting itself. But if you have a Usenet forwarding agent already running, much of this should be implemented there already. Okay, now some links to resources and FAQs on that subject: Moderated Newsgroups FAQ http://www.swcp.com/~dmckeon/mod-faq.html USENET Moderators Archive http://www.landfield.com/moderators/ NetNews Moderators Handbook - 5.2.1 Approved: Line http://www.landfield.com/usenet/moderators/handbook/mod05.html#5.2.1 Please e-mail me if you have any further questions. Yours, Markus. From harri.pasanen@trema.com Thu Apr 20 08:42:29 2000 From: harri.pasanen@trema.com (Harri Pasanen) Date: Thu, 20 Apr 2000 09:42:29 +0200 Subject: [Python-Dev] Re: [Thread-SIG] optimize atomic inc/dec? (was: baby steps for free-threading) References: Message-ID: <38FEB4E5.47DCD834@trema.com> Greg Stein wrote, talking about optimizing atomic inc/dec: > > For example, when we see we're using GCC on an x86 processor (whether > FreeBSD or Linux), we can define atomic_inc() as an __asm fragment. > The same applies for Sparc. In our C++ software we have currently the atomic increment as inlined assembly for x86, sparc and sparc-v9, using GCC. It is a function though, so there is a function call involved. -Harri From tismer@tismer.com Thu Apr 20 14:23:31 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 20 Apr 2000 15:23:31 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <38FF04D3.4CE2067E@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Christian Tismer wrote: > > Greg Stein wrote: > >... > > > Ah. Neat. "Automatic marking of shared-ness" > > > > > > Could work. That initial test for the thread id could be expensive, > > > though. What is the overhead of getting the current thread id? > > > > Zero if we cache it in the thread state. > > You don't have the thread state at incref/decref time. > > And don't say "_PyThreadState_Current" or I'll fly to Germany and > personally kick your ass :-) A real temptation to see whether I can really get you to Germany :-)) ... Thanks for all the info. > Linux has a kernel macro for atomic inc/dec, but it is only valid if > __SMP__ is defined in your compilation context. Well, and while it looks cheap, it is for sure expensive since several caches are flushed, and the system is stalled until the modified value is written back into the memory bank. Could it be that we might want to use another thread design at all? I'm thinking of running different interpreters in the same process space, but with all objects really disjoint, invisible between the interpreters. This would perhaps need some internal changes, in order to make all the builtin free-lists disjoint as well. Now each such interpreter would be running in its own thread without any racing condition at all so far. To make this into threading and not just a flavor of multitasking, we now need of course shared objects, but only those objects which we really want to share. This could reduce the cost for free threading to nearly zero, except for the (hopefully) few shared objects. I think, instead of shared globals, it would make more sense to have some explicit shared resource pool, which controls every access via mutexes/semas/whateverweneed. Maybe also that we would prefer to copy objects into it over sharing, in order to minimize collisions. I hope the need for true sharing can be minimized to a few variables. Well, I hope. "freethreads" could even coexist with the current locking threads, we would not even need a special build for them, but to rethink threading. Like "the more free threading is, the more disjoint threads are". are-you-now-convinced-to-come-and-kick-my-ass-ly y'rs - chris :-) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From skip@mojam.com (Skip Montanaro) Thu Apr 20 14:29:37 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 20 Apr 2000 08:29:37 -0500 (CDT) Subject: [Python-Dev] [OT] [Q] corruption in DB files? In-Reply-To: References: Message-ID: <14591.1601.125779.714243@beluga.mojam.com> Greg> You're the Smart Guys that I know, and it seems this is also the Greg> forum where I once heard a long while back that DB can Greg> occasionally corrupt its files. ... Greg> A question just came up elsewhere about DB and I seemed to recall Greg> somebody mentioning the occasional corruption. Oh, maybe it was Greg> related to multiple threads. Yes, Berkeley DB 1.85 (exposed through the bsddb module in Python) has bugs in the hash implementation. They never fixed them (well maybe in 1.86?), but moved on to version 2.x. Of course, they changed the function call interface and the file format, so many people didn't follow. They do provide a 1.85-compatible API but you have to #include db_185.h instead of db.h. As far as I know, if you stick to the btree interface with 1.85 you should be okay. Unfortunately, both the anydbm and dbhash modules both use the hash interface, so if you're trying to be more or less portable and not modify your Python sources, you've also got buggy db files... Someone did create a libdb 2.x-compatible module that exposes more of the underlying functionality. Check the VoP for it. libdb == Berkeley DB == Sleepycat... Skip From skip@mojam.com (Skip Montanaro) Thu Apr 20 14:40:06 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 20 Apr 2000 08:40:06 -0500 (CDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF04D3.4CE2067E@tismer.com> References: <38FF04D3.4CE2067E@tismer.com> Message-ID: <14591.2230.630609.500780@beluga.mojam.com> Chris> I think, instead of shared globals, it would make more sense to Chris> have some explicit shared resource pool, which controls every Chris> access via mutexes/semas/whateverweneed. Tuple space, anyone? Check out http://www.snurgle.org/~pybrenda/ It's a Linda implementation for Python. Linda was developed at Yale by David Gelernter. Unfortunately, he's better known to the general public as being one of the Unabomber's targets. You can find out more about Linda at http://www.cs.yale.edu/Linda/linda.html Skip From fredrik@pythonware.com Thu Apr 20 14:55:52 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 20 Apr 2000 15:55:52 +0200 Subject: [Python-Dev] Generic notifier module References: Message-ID: <000c01bfaad0$f2d1d2e0$0500a8c0@secret.pythonware.com> Moshe Zadka" wrote: > > object.denotify(message[, callback]) - Turn off notification. >=20 > You need to be a bit more careful here. What if callback is > foo().function? It's unique, so I could never denotify it. if you need a value later, the usual approach is to bind it to a name. works in all other situations, so why not use it here? > A better way, and more popular (at least in the signal/slot = terminology), > is to return a cookie on connect, and have disconnect requests by a = cookie. in which way is "harder to use in all common cases" better? ... as for the "break" functionality, I'm not sure it really belongs in a basic observer class (in GOF terms, that's a "chain of responsibility"). but if it does, I sure prefer an exception over a magic return value. From tismer@tismer.com Thu Apr 20 15:23:56 2000 From: tismer@tismer.com (Christian Tismer) Date: Thu, 20 Apr 2000 16:23:56 +0200 Subject: [Python-Dev] Re: marking shared-ness References: <38FF04D3.4CE2067E@tismer.com> <14591.2230.630609.500780@beluga.mojam.com> Message-ID: <38FF12FC.32052356@tismer.com> Skip Montanaro wrote: > > Chris> I think, instead of shared globals, it would make more sense to > Chris> have some explicit shared resource pool, which controls every > Chris> access via mutexes/semas/whateverweneed. > > Tuple space, anyone? Check out > > http://www.snurgle.org/~pybrenda/ Very interesting, indeed. > It's a Linda implementation for Python. Linda was developed at Yale by > David Gelernter. Unfortunately, he's better known to the general public as > being one of the Unabomber's targets. You can find out more about Linda at > > http://www.cs.yale.edu/Linda/linda.html Many broken links. The most activity appears to have stopped around 94/95, the project looks kinda dead. But this doesn't mean that we cannot learn from them. Will think more when the starship problem is over... ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Moshe Zadka Thu Apr 20 15:24:49 2000 From: Moshe Zadka (Moshe Zadka) Date: Thu, 20 Apr 2000 17:24:49 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <000c01bfaad0$f2d1d2e0$0500a8c0@secret.pythonware.com> Message-ID: On Thu, 20 Apr 2000, Fredrik Lundh wrote: > > A better way, and more popular (at least in the signal/slot terminology), > > is to return a cookie on connect, and have disconnect requests by a cookie. > > in which way is "harder to use in all common cases" > better? I'm not sure I agree this is harder to use in all common cases, but YMMV. Strings are prone to collisions, etc. And usually the code which connects the callback is pretty close (flow-control wise) to the code that would disconnect. FWIW, the Gtk+ signal mechanism has 3-4 different disconnects, and it might not be a bad idea, now that I think of it. > as for the "break" functionality, I'm not sure it really > belongs in a basic observer class (in GOF terms, that's ^^^ TLA overload! What's GOF? > a "chain of responsibility"). but if it does, I sure prefer > an exception over a magic return value. I don't know if it belongs or not, but I do know that it is sometimes needed, and is very hard and ugly to simulate otherwise. That's one FAQ I don't want to answer -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip@mojam.com (Skip Montanaro) Thu Apr 20 15:38:08 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Thu, 20 Apr 2000 09:38:08 -0500 (CDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF12FC.32052356@tismer.com> References: <38FF04D3.4CE2067E@tismer.com> <14591.2230.630609.500780@beluga.mojam.com> <38FF12FC.32052356@tismer.com> Message-ID: <14591.5712.162339.740646@beluga.mojam.com> >> http://www.cs.yale.edu/Linda/linda.html Chris> Many broken links. The most activity appears to have stopped Chris> around 94/95, the project looks kinda dead. But this doesn't mean Chris> that we cannot learn from them. Yes, I think Linda mostly lurks under the covers these days. Their Piranha project, which aims to soak up spare CPU cycles to do parallel computing, uses Linda. I suspect Linda is probably hidden somewhere inside Lifestreams as well. As a correction to my original note, Nicholas Carriero was the other primary lead on Linda. I no longer recall the details, but he may have been on of Gelernter's grad students in the late 80's. Skip From gvwilson@nevex.com Thu Apr 20 15:40:48 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Thu, 20 Apr 2000 10:40:48 -0400 (EDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <14591.2230.630609.500780@beluga.mojam.com> Message-ID: > Chris> I think, instead of shared globals, it would make more sense to > Chris> have some explicit shared resource pool, which controls every > Chris> access via mutexes/semas/whateverweneed. > Skip wrote: > Tuple space, anyone? Check out > http://www.snurgle.org/~pybrenda/ > It's a Linda implementation for Python. You can find out more about > Linda at > http://www.cs.yale.edu/Linda/linda.html Linda is also the inspiration for Sun's JavaSpaces, an easier-to-use layer on top of Jini: http://java.sun.com/products/javaspaces/ http://cseng.aw.com/bookpage.taf?ISBN=0-201-30955-6 On the plus side: 1. It's much (much) easier to use than mutex, semaphore, or monitor models: students in my parallel programming course could start writing C-Linda programs after (literally) five minutes of instruction. 2. If you're willing/able to do global analysis of access patterns, its simplicity doesn't have a significant performance penalty. 3. (Bonus points) It integrates very well with persistence schemes. On the minus side: 1. Some things that "ought" to be simple (e.g. barrier synchronization) are surprisingly difficult to get right, efficiently, in vanilla Linda-like systems. Some VHLL derivates (based on SETL and Lisp dialects) solved this in interesting ways. 2. It's different enough from hardware-inspired shared-memory + mutex models to inspire the same "Huh, that looks weird" reaction as Scheme's parentheses, or Python's indentation. On the other hand, Bill Joy and company are now backing it... Personal opinion: I've felt for 15 years that something like Linda could be to threads and mutexes what structured loops and conditionals are to the "goto" statement. Were it not for the "Huh" effect, I'd recommend hanging "Danger!" signs over threads and mutexes, and making tuple spaces the "standard" concurrency mechanism in Python. I'd also recommend calling the system "Carol", after Monty Python regular Carol Cleveland. The story is that Linda itself was named after the 70s porn star Linda Lovelace, in response to the DoD naming its language "Ada" after the other Lovelace... Greg p.s. I talk a bit about Linda, and the limitations of the vanilla approach, in http://mitpress.mit.edu/book-home.tcl?isbn=0262231867. From mlh@swl.msd.ray.com Thu Apr 20 16:02:30 2000 From: mlh@swl.msd.ray.com (Milton L. Hankins) Date: Thu, 20 Apr 2000 11:02:30 -0400 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF12FC.32052356@tismer.com> Message-ID: On Thu, 20 Apr 2000, Christian Tismer wrote: > Skip Montanaro wrote: > > > > Tuple space, anyone? Check out > > > > http://www.snurgle.org/~pybrenda/ > > Very interesting, indeed. *Steps out of the woodwork and bows* PyBrenda doesn't have a thread implementation, but it could be adapted to do so. It might be prudent to eliminate the use of TCP/IP in that case as well. In case anyone is interested, I just created a mailing list for PyBrenda at egroups: http://www.egroups.com/group/pybrenda-users -- Milton L. Hankins \\ ><> Ephesians 5:2 ><> http://www.snurgle.org/~mhankins // These are my opinions, not Raytheon's. \\ W. W. J. D. ? From Fredrik Lundh" Message-ID: <018101bfaaec$65e56740$34aab5d4@hagrid> Moshe Zadka wrote: > > in which way is "harder to use in all common cases" > > better? >=20 > I'm not sure I agree this is harder to use in all common cases, but = YMMV. > Strings are prone to collisions, etc. not sure what you're talking about here, so I suppose we're talking past each other. what I mean is that: model.addobserver(view.notify) model.removeobserver(view.notify) works just fine without any cookies. having to do: view.cookie =3D model.addobserver(view.notify) model.removeobserver(view.cookie) is definitely no improvement. and if you have an extraordinary case (like a function pointer extracted from an object returned from a factory function), you just have to assign the function pointer to a local variable: self.callback =3D strangefunction().notify model.addobserver(self.callback) model.removeobserver(self.callback) in this case, you would probably keep a pointer to the object returned by the function anyway: self.viewer =3D getviewer() model.addobserver(viewer.notify) model.removeobserver(viewer.notify) > And usually the code which connects > the callback is pretty close (flow-control wise) to the code that = would > disconnect. FWIW, the Gtk+ signal mechanism has 3-4 different = disconnects, > and it might not be a bad idea, now that I think of it. you really hate keeping things as simple as possible, don't you? ;-) what are these 3-4 "disconnects" doing? > > as for the "break" functionality, I'm not sure it really > > belongs in a basic observer class (in GOF terms, that's > ^^^ TLA overload! What's GOF? http://www.hillside.net/patterns/DPBook/GOF.html > > a "chain of responsibility"). but if it does, I sure prefer > > an exception over a magic return value. >=20 > I don't know if it belongs or not, but I do know that it is sometimes > needed, and is very hard and ugly to simulate otherwise. That's one = FAQ > I don't want to answer yeah, but the two patterns have different uses. From Moshe Zadka Thu Apr 20 20:31:05 2000 From: Moshe Zadka (Moshe Zadka) Date: Thu, 20 Apr 2000 22:31:05 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <018101bfaaec$65e56740$34aab5d4@hagrid> Message-ID: [Fredrik Lundh] > not sure what you're talking about here, so I suppose > we're talking past each other. Nah, I guess it was a simple case of you being right and me being wrong. (In other words, you've convinced me) [Moshe] > FWIW, the Gtk+ signal mechanism has 3-4 different disconnects, > and it might not be a bad idea, now that I think of it. [Fredrik Lundh] > you really hate keeping things as simple as possible, > don't you? ;-) > > what are these 3-4 "disconnects" doing? gtk_signal_disconnect -- disconnect by cookie gtk_signal_disconnect_by_func -- disconnect by function pointer gtk_signal_disconnect_by_data -- disconnect by the void* pointer passed Hey, you asked just-preparing-for-my-lecture-next-friday-ly y'rs, Z. (see www.linux.org.il for more) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein@lyra.org Thu Apr 20 21:43:24 2000 From: gstein@lyra.org (Greg Stein) Date: Thu, 20 Apr 2000 13:43:24 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF04D3.4CE2067E@tismer.com> Message-ID: On Thu, 20 Apr 2000, Christian Tismer wrote: >... > > Linux has a kernel macro for atomic inc/dec, but it is only valid if > > __SMP__ is defined in your compilation context. > > Well, and while it looks cheap, it is for sure expensive > since several caches are flushed, and the system is stalled > until the modified value is written back into the memory bank. Yes, Bill mentioned that yesterday. Important fact, but there isn't much you can do -- they must be atomic. > Could it be that we might want to use another thread design > at all? I'm thinking of running different interpreters in > the same process space, but with all objects really disjoint, > invisible between the interpreters. This would perhaps need > some internal changes, in order to make all the builtin > free-lists disjoint as well. > Now each such interpreter would be running in its own thread > without any racing condition at all so far. > To make this into threading and not just a flavor of multitasking, > we now need of course shared objects, but only those objects > which we really want to share. This could reduce the cost for > free threading to nearly zero, except for the (hopefully) few > shared objects. > I think, instead of shared globals, it would make more sense > to have some explicit shared resource pool, which controls > every access via mutexes/semas/whateverweneed. Maybe also that > we would prefer to copy objects into it over sharing, in order > to minimize collisions. I hope the need for true sharing > can be minimized to a few variables. Well, I hope. > "freethreads" could even coexist with the current locking threads, > we would not even need a special build for them, but to rethink > threading. > Like "the more free threading is, the more disjoint threads are". No. Now you're just talking processes with IPC. Yes, they happen to run in threads, but you got none of the advantages of a threaded application. Threading is about sharing an address space. Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA@ActiveState.com Thu Apr 20 21:40:54 2000 From: DavidA@ActiveState.com (David Ascher) Date: Thu, 20 Apr 2000 13:40:54 -0700 Subject: [Python-Dev] String issues -- see the JavaScript world Message-ID: Just an FYI to those discussing Unicode issues. There is currently a big debate over in Mozilla-land looking at how XPIDL (their interface definition language) should deal with the various kinds of string types. Someone who cares may want to follow up on that to see if some of their issues apply to Python as well. News server: news.mozilla.org Newsgroup: netscape.public.mozilla.xpcom Thread: Encoding wars -- more in the Big String Story Cheers, --david From tismer@tismer.com Fri Apr 21 13:38:27 2000 From: tismer@tismer.com (Christian Tismer) Date: Fri, 21 Apr 2000 14:38:27 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <39004BC3.1DD108D0@tismer.com> Greg Stein wrote: > > On Thu, 20 Apr 2000, Christian Tismer wrote: [me, about free threading with less sharing] > No. Now you're just talking processes with IPC. Yes, they happen to run in > threads, but you got none of the advantages of a threaded application. Are you shure that every thread user shares your opinion? I see many people using threads just in order to have multiple tasks in parallel, with none or quite few shared variables. > Threading is about sharing an address space. This is part of the truth. There are a number of other reasons to use threads, too. Since Python has nothing really private, this implies in fact to protect every single object for free threading, although nobody wants this in the first place to happen. Other languages have much fewer problems here (I mean C, C++, Delphi...), they are able to do the right thing in the right place. Python is not designed for that. Why do you want to enforce the impossible, letting every object pay a high penalty to become completely thread-safe? Sharing an address space should not mean to share everything, but something. If Python does not support this, we should think of a redesign of its threading model, instead of loosing so much of efficiency. You end up in a situation where all your C extensions can run free threaded at high speed, just Python is busy all the time to fight the threading. That is not Python. You know that I like to optimize things. For me, optimization mut give an overall gain, not just in one area, where others get worse. If free threading cannot be optimized in a way that gives better overall performance, then it is a wrong optimization to me. Well, this is all speculative until we did some measures. Maybe I'm just complaining about 1-2 percent of performance loss, then I'd agree to move my complaining into /dev/null :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mlh@swl.msd.ray.com Fri Apr 21 17:36:40 2000 From: mlh@swl.msd.ray.com (Milton L. Hankins) Date: Fri, 21 Apr 2000 12:36:40 -0400 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: <39004BC3.1DD108D0@tismer.com> Message-ID: On Fri, 21 Apr 2000, Christian Tismer wrote: > Are you shure that every thread user shares your opinion? > I see many people using threads just in order to have > multiple tasks in parallel, with none or quite few shared > variables. About the only time I use threads is when 1) I'm doing something asynchronous in an event loop-driven paradigm (such as Tkinter) or 2) I'm trying to emulate fork() under win32 > Since Python has nothing really private, this implies in > fact to protect every single object for free threading, > although nobody wants this in the first place to happen. How does Java solve this problem? (Is this analagous to native vs. green threads?) > Python is not designed for that. Why do you want to enforce > the impossible, letting every object pay a high penalty > to become completely thread-safe? Hmm, how about declaring only certain builtins as free-thread safe? Or is "the impossible" necessary because of the nature of incref/decref? -- Milton L. Hankins :: ><> Ephesians 5:2 ><> Software Engineer, Raytheon Systems Company :: http://amasts.msd.ray.com/~mlh :: RayComNet 7-225-4728 From billtut@microsoft.com Fri Apr 21 17:50:47 2000 From: billtut@microsoft.com (Bill Tutt) Date: Fri, 21 Apr 2000 09:50:47 -0700 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF9F@RED-MSG-50> > From: Milton L. Hankins [mailto:mlh@swl.msd.ray.com] > > On Fri, 21 Apr 2000, Christian Tismer wrote: > > > Are you shure that every thread user shares your opinion? > > I see many people using threads just in order to have > > multiple tasks in parallel, with none or quite few shared > > variables. > > About the only time I use threads is when > 1) I'm doing something asynchronous in an event loop-driven > paradigm > (such as Tkinter) or > 2) I'm trying to emulate fork() under win32 > 3) I'm doing something that would block in an asynchronous FSM. (e.g. Medusa, or an NT I/O completion port driven system) > > Since Python has nothing really private, this implies in > > fact to protect every single object for free threading, > > although nobody wants this in the first place to happen. > > How does Java solve this problem? (Is this analagous to > native vs. green > threads?) > Java allows you to specifically mention whether something should be seralized or not, and no, this doesn't have anything to do with native vs. green threads) > > Python is not designed for that. Why do you want to enforce > > the impossible, letting every object pay a high penalty > > to become completely thread-safe? > > Hmm, how about declaring only certain builtins as free-thread > safe? incref/decref are not type object specific, they're global macros. Making them methods on the type object would be the sensible thing to do, but would definately be non-backward compatible. Bill From seanj@speakeasy.org Fri Apr 21 17:55:29 2000 From: seanj@speakeasy.org (Sean Jensen_Grey) Date: Fri, 21 Apr 2000 09:55:29 -0700 (PDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: > > Since Python has nothing really private, this implies in > > fact to protect every single object for free threading, > > although nobody wants this in the first place to happen. > > How does Java solve this problem? (Is this analagous to native vs. green > threads?) > > > Python is not designed for that. Why do you want to enforce > > the impossible, letting every object pay a high penalty > > to become completely thread-safe? > > Hmm, how about declaring only certain builtins as free-thread safe? Or is > "the impossible" necessary because of the nature of incref/decref? http://www.javacats.com/US/articles/MultiThreading.html I would like sync foo: bloc of code here maybe we could merge in some Occam while were at it. B^) sync would be a most excellent operator in python. From seanj@speakeasy.org Fri Apr 21 18:16:29 2000 From: seanj@speakeasy.org (Sean Jensen_Grey) Date: Fri, 21 Apr 2000 10:16:29 -0700 (PDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: http://www.cs.bris.ac.uk/~alan/javapp.html Take a look at the above link. It merges the Occam model with Java and uses 'channel based' interfaces (not sure exactly what this is). But they seem pretty exicted. I vote for using InterlockedInc/Dec as it is available as an assembly instruction on almost everyplatform. Could be then derive all other locking schemantics from this? And our portability problem is solved if it comes in the box with gcc. On Fri, 21 Apr 2000, Sean Jensen_Grey wrote: > > > Since Python has nothing really private, this implies in > > > fact to protect every single object for free threading, > > > although nobody wants this in the first place to happen. > > > > How does Java solve this problem? (Is this analagous to native vs. green > > threads?) > > > > > Python is not designed for that. Why do you want to enforce > > > the impossible, letting every object pay a high penalty > > > to become completely thread-safe? > > > > Hmm, how about declaring only certain builtins as free-thread safe? Or is > > "the impossible" necessary because of the nature of incref/decref? > > http://www.javacats.com/US/articles/MultiThreading.html > > I would like > > sync foo: > bloc of code here > > maybe we could merge in some Occam while were at it. B^) > > > sync would be a most excellent operator in python. > > > > > _______________________________________________ > Thread-SIG maillist - Thread-SIG@python.org > http://www.python.org/mailman/listinfo/thread-sig > From gvwilson@nevex.com Fri Apr 21 18:27:49 2000 From: gvwilson@nevex.com (gvwilson@nevex.com) Date: Fri, 21 Apr 2000 13:27:49 -0400 (EDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: > On Fri, 21 Apr 2000, Sean Jensen_Grey wrote: > http://www.cs.bris.ac.uk/~alan/javapp.html > Take a look at the above link. It merges the Occam model with Java and uses > 'channel based' interfaces (not sure exactly what this is). Channel-based programming has been called "the revenge of the goto", as in, "Where the hell does this channel go to?" Programmers must manage conversational continuity manually (i.e. keep track of the origins of messages, so that they can be replied to). It also doesn't really help with the sharing problem that started this thread: if you want a shared integer, you have to write a little server thread that knows how to act like a semaphore, and then it read/write requests that are exactly equivalent to P and V operations (and subject to all the same abuses). Oh, and did I mention the joys of trying to draw a semi-accurate diagram of the plumbing in your program after three months of upgrade work? *shudder* Greg From guido@python.org Fri Apr 21 18:29:06 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 21 Apr 2000 13:29:06 -0400 Subject: [Python-Dev] Inspiration Message-ID: <200004211729.NAA16454@eric.cnri.reston.va.us> http://www.perl.com/pub/2000/04/whatsnew.html --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Fri Apr 21 20:52:06 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 12:52:06 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <39004BC3.1DD108D0@tismer.com> Message-ID: On Fri, 21 Apr 2000, Christian Tismer wrote: >... > > No. Now you're just talking processes with IPC. Yes, they happen to run in > > threads, but you got none of the advantages of a threaded application. > > Are you shure that every thread user shares your opinion? Now you're just being argumentative. I won't respond to this. >... > Other languages have much fewer problems here (I mean > C, C++, Delphi...), they are able to do the right thing > in the right place. > Python is not designed for that. Why do you want to enforce > the impossible, letting every object pay a high penalty > to become completely thread-safe? Existing Python semantics plus free-threading places us in this scenario. Many people have asked for free-threading, and the number of inquiries that I receive have grown over time. (nobody asked in 1996 when I first published my patches; I get a query every couple months now) >... > You know that I like to optimize things. For me, optimization > mut give an overall gain, not just in one area, where others > get worse. If free threading cannot be optimized in > a way that gives better overall performance, then > it is a wrong optimization to me. > > Well, this is all speculative until we did some measures. > Maybe I'm just complaining about 1-2 percent of performance > loss, then I'd agree to move my complaining into /dev/null :-) It is more than this. In my last shot at this, pystone ran about half as fast. There are a few things that will be different this time around, but it certainly won't in the "few percent" range. Presuming you can keep your lock contention low, then your overall performances *goes up* once you have a multiprocessor machine. Sure, each processor runs Python (say) 10% slower, but you have *two* of them going. That is 180% compared to a central-lock Python on an MP machine. Lock contention: my last patches had really high contention. It didn't scale across processors well. This round will have more fine-grained locks than the previous version. But it will be interesting to measure the contention. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Fri Apr 21 20:49:09 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 21 Apr 2000 15:49:09 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Your message of "Fri, 21 Apr 2000 12:52:06 PDT." References: Message-ID: <200004211949.PAA16911@eric.cnri.reston.va.us> > It is more than this. In my last shot at this, pystone ran about half as > fast. There are a few things that will be different this time around, but > it certainly won't in the "few percent" range. Interesting thought: according to patches recently posted to patches@python.org (but not yet vetted), "turning on" threads on Win32 in regular Python also slows down Pystone considerably. Maybe it's not so bad? Maybe those patches contain a hint of what we could do? --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein@lyra.org Fri Apr 21 21:02:23 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 13:02:23 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <200004211949.PAA16911@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > > It is more than this. In my last shot at this, pystone ran about half as > > fast. There are a few things that will be different this time around, but > > it certainly won't in the "few percent" range. > > Interesting thought: according to patches recently posted to > patches@python.org (but not yet vetted), "turning on" threads on Win32 > in regular Python also slows down Pystone considerably. Maybe it's > not so bad? Maybe those patches contain a hint of what we could do? I think that my tests were threaded vs. free-threaded. It has been so long ago, though... :-) Yes, we'll get those patches reviewed and installed. That will at least help the standard threading case. With more discrete locks (e.g. one per object or one per code section), then we will reduce lock contention. Working on improving the lock mechanism itself and the INCREF/DECREF system will help, too. But this initial thread was to seek people to assist with some coding to get stuff into 1.6. The heavy lifting will certainly be after 1.6, but we can get some good stuff in *today*. We'll examine performance later on, then start improving it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein@lyra.org Fri Apr 21 21:21:55 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 13:21:55 -0700 (PDT) Subject: [Python-Dev] RE: [Thread-SIG] Re: marking shared-ness In-Reply-To: Message-ID: On Fri, 21 Apr 2000, Brent Fulgham wrote: >... > The problem is that having to grab the global interpreter lock > every time I want to manipulate Python objects from C seems wasteful. > This is perhaps more of a "interpreter" issue, rather than a > thread issue perhaps, but it does seem that if each thread (and > therefore interpreter state from my perspective) kept internal > track of itself, there would be much less lock contention as one > interpreter drops out of Python into the C code for a moment, then > releases the lock and returns, etc. > > So I think it's possible that free-threading changes might provide > some benefit even on uniprocessor systems. This is true. Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS become null macros. Your C extensions operate within their thread of execution, but have no central lock to worry about releasing before they block on something. And from an embedding standpoint, the same is true. You do not need to acquire any locks to start manipulating Python objects. Each object maintains its own integrity. Note: embedding/extending *can* destroy integrity. For example, tuples have no integrity locking -- Python programs cannot change them, so you cannot have two Python threads breaking things. C code can certainly destroy things with something this simple: Py_DECREF(PyTuple_GET_ITEM(tuple, 3)); PyTuple_SET_ITEM(tuple, 3, ob); Exercise for the reader on why the above code is a disaster waiting to happen :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer@tismer.com Fri Apr 21 21:29:06 2000 From: tismer@tismer.com (Christian Tismer) Date: Fri, 21 Apr 2000 22:29:06 +0200 Subject: [Python-Dev] Re: marking shared-ness References: <200004211949.PAA16911@eric.cnri.reston.va.us> Message-ID: <3900BA12.DFE0A6EB@tismer.com> Guido van Rossum wrote: > > > It is more than this. In my last shot at this, pystone ran about half as > > fast. There are a few things that will be different this time around, but > > it certainly won't in the "few percent" range. > > Interesting thought: according to patches recently posted to > patches@python.org (but not yet vetted), "turning on" threads on Win32 > in regular Python also slows down Pystone considerably. Maybe it's > not so bad? Maybe those patches contain a hint of what we could do? I had a rough look at the patches but didn't understand enough yet. But I tried the sample scriptlet on python 1.5.2 and Stackless Python - see here: D:\python>python -c "import test.pystone;test.pystone.main()" Pystone(1.1) time for 10000 passes = 1.96765 This machine benchmarks at 5082.2 pystones/second D:\python>python spc/threadstone.py Pystone(1.1) time for 10000 passes = 5.57609 This machine benchmarks at 1793.37 pystones/second This is even worse than Markovitch's observation. Now, let's try with Stackless Python: D:\python>cd spc D:\python\spc>python -c "import test.pystone;test.pystone.main()" Pystone(1.1) time for 10000 passes = 1.843 This machine benchmarks at 5425.94 pystones/second D:\python\spc>python threadstone.py Pystone(1.1) time for 10000 passes = 3.27625 This machine benchmarks at 3052.27 pystones/second Isn't that remarkable? Stackless performs nearly 1.8 as good under threads. Why? I've optimized the ticker code away for all those "fast" opcodes which never can cause another interpreter incarnation. Standard Python does a bit too much here, dealing the same way with extremely fast opcodes like POP_TOP, as with a function call. Responsiveness is still very good. Markovitch's example also tells us this story: Even with his patches, the threading stuff still costs 10 percent. This is the lock that we touch every ten opcodes. In other words: touching a lock costs about as much as an opcode costs on average. ciao - chris threadstone.py: import thread # Start empty thread to initialise thread mechanics (and global lock!) # This thread will finish immediately thus won't make much influence on # test results by itself, only by that fact that it initialises global lock thread.start_new_thread(lambda : 1, ()) import test.pystone test.pystone.main() -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein@lyra.org Sat Apr 22 00:19:03 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 16:19:03 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1 _socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10 mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11 python16.dsp,1.2,1.3 pythonw.dsp,1.8,1.9 select.dsp,1.1,1.2 unicodedata.dsp,1.1,1.2 zlib.dsp,1.10,1.11 In-Reply-To: <200004212126.RAA18041@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: >... > * Base address for all extension modules updated. PC\dllbase_nt.txt > also updated. Erroneous "libpath" directory removed for all > projects. Rather than specifying the base address in each DSP, the Apache project has used a text file for this stuff. Here is the text file used: --snip-- -- Begin New BaseAddr.ref -- ; os/win32/BaseAddr.ref contains the central repository ; of all module base addresses ; to avoid relocation ; WARNING: Update this file by reviewing the image size ; of the debug-generated dll files; release images ; should fit in the larger debug-sized space. ; module name base-address max-size aprlib 0x6FFA0000 0x00060000 ApacheCore 0x6FF00000 0x000A0000 mod_auth_anon 0x6FEF0000 0x00010000 mod_cern_meta 0x6FEE0000 0x00010000 mod_auth_digest 0x6FED0000 0x00010000 mod_expires 0x6FEC0000 0x00010000 mod_headers 0x6FEB0000 0x00010000 mod_info 0x6FEA0000 0x00010000 mod_rewrite 0x6FE80000 0x00020000 mod_speling 0x6FE70000 0x00010000 mod_status 0x6FE60000 0x00010000 mod_usertrack 0x6FE50000 0x00010000 mod_proxy 0x6FE30000 0x00020000 --snip-- And here is what one of the link lines looks like: # ADD LINK32 ApacheCore.lib aprlib.lib kernel32.lib /nologo /base:@BaseAddr.ref,mod_usertrack /subsystem:windows /dll /map /debug /machine:I386 /libpath:"..\..\CoreD" /libpath:"..\..\lib\apr\Debug" This mechanism could be quite helpful for Python. The .ref file replaces the dllbase_nt.txt file, centralizes the management, and directly integrates with the tools. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond@skippinet.com.au Sat Apr 22 01:44:31 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sat, 22 Apr 2000 10:44:31 +1000 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1_socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11python16.dsp In-Reply-To: Message-ID: [Greg writes] > Rather than specifying the base address in each DSP, the > Apache project > has used a text file for this stuff. Here is the text file used: Yes - I saw this in the docs for the linker when I was last playing here. I didnt bother with this, as it still seems to me the best longer term approach is to use the "rebind" tool. This would allow the tool to select the addresses (less chance of getting them wrong), but also would allow us to generate "debug info" for the release builds of Python... But I guess that in the meantime, having the linker process this file is an improvement... I will wait until Guido has got to my other build patches and look into this... Mark. From gstein@lyra.org Sat Apr 22 01:56:22 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 17:56:22 -0700 (PDT) Subject: [Python-Dev] base addresses (was: [Python-checkins] CVS: ...) In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Mark Hammond wrote: > [Greg writes] > > Rather than specifying the base address in each DSP, the > > Apache project > > has used a text file for this stuff. Here is the text file used: > > Yes - I saw this in the docs for the linker when I was last playing > here. > > I didnt bother with this, as it still seems to me the best longer > term approach is to use the "rebind" tool. This would allow the > tool to select the addresses (less chance of getting them wrong), > but also would allow us to generate "debug info" for the release > builds of Python... Yes, although having specific addresses also means that every Python executable/DLL has the same set of addresses. You can glean information from the addresses without having symbols handy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Sat Apr 22 04:53:39 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 22 Apr 2000 06:53:39 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: <200004211729.NAA16454@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > http://www.perl.com/pub/2000/04/whatsnew.html Yeah, loads of cool stuff we should steal... And loads of stuff that we shouldn't steal, no matter how cool it looks (lvaluable subroutines, anyone?) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From Moshe Zadka Sat Apr 22 05:42:47 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 22 Apr 2000 07:42:47 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: <200004211729.NAA16454@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > http://www.perl.com/pub/2000/04/whatsnew.html OK, here's my summary of the good things we should copy: (In that order:) -- Weak references (as weak dictionaries? would "w{}" to signify a weak dictionary is alright parser-wise?) -- Binary numbers -- way way cool (and doesn't seem to hard -- need to patch the tokenizer, PyLong_FromString and PyOS_strtoul: anything I've missed?) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein@lyra.org Sat Apr 22 08:07:09 2000 From: gstein@lyra.org (Greg Stein) Date: Sat, 22 Apr 2000 00:07:09 -0700 (PDT) Subject: [Python-Dev] Inspiration In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Moshe Zadka wrote: > On Fri, 21 Apr 2000, Guido van Rossum wrote: > > http://www.perl.com/pub/2000/04/whatsnew.html > > OK, here's my summary of the good things we should copy: > > (In that order:) > > -- Weak references (as weak dictionaries? would "w{}" to signify a weak > dictionary is alright parser-wise?) > -- Binary numbers -- way way cool (and doesn't seem to hard -- need to > patch the tokenizer, PyLong_FromString and PyOS_strtoul: anything > I've missed?) Yet another numeric format? eek. If anything, we should be dropping octal, rather than adding binary. You want binary? Just use int("10010", 2). No need for more syntax. I'd go for weak objects (proxies) rather than weak dictionaries. Duplicating the dict type just to deal with weak refs seems a bit much. But I'm not a big brain on this stuff -- I tend to skip all the discussions people have had on this stuff. I just avoid the need for circular refs and weak refs :-) Most of the need for weak refs would disappear with some simple form of GC installed. And it seems we'll have that by 1.7. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Moshe Zadka Sat Apr 22 10:46:29 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 22 Apr 2000 12:46:29 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Greg Stein wrote: > Yet another numeric format? eek. If anything, we should be dropping octal, > rather than adding binary. > > You want binary? Just use int("10010", 2). No need for more syntax. Damn, but you're right. > Most of the need for weak refs would disappear with some simple form of GC > installed. And it seems we'll have that by 1.7. Disagree. Think "destructors": with weak references, there's no problems: the referant dies first, and if later, the referer needs the referant to die, well, he'll get a "DeletionError: this object does not exist anymore" in his face, which is alright, because a weak referant should not trust the reference to live. 90%-of-the-cyclic-__del__-full-trash-problem-would-go-away-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer@tismer.com Sat Apr 22 12:53:57 2000 From: tismer@tismer.com (Christian Tismer) Date: Sat, 22 Apr 2000 13:53:57 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <390192D5.57443E99@tismer.com> Greg, Greg Stein wrote: Presuming you can keep your lock contention low, then your overall > performances *goes up* once you have a multiprocessor machine. Sure, each > processor runs Python (say) 10% slower, but you have *two* of them going. > That is 180% compared to a central-lock Python on an MP machine. Why didn't I think of this. MP is a very very good point. Makes now all much sense to me. sorry for being dumb - happy easter - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From akuchlin@mems-exchange.org Sat Apr 22 20:51:47 2000 From: akuchlin@mems-exchange.org (A.M. Kuchling) Date: Sat, 22 Apr 2000 15:51:47 -0400 Subject: [Python-Dev] 1.6 speed Message-ID: <200004221951.PAA09193@mira.erols.com> Python 1.6a2 is around 10% slower than 1.5 on pystone. Any idea why? [amk@mira Python-1.6a2]$ ./python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.59 This machine benchmarks at 2785.52 pystones/second [amk@mira Python-1.6a2]$ python1.5 Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.19 This machine benchmarks at 3134.8 pystones/second --amk From tismer@tismer.com Sun Apr 23 03:21:47 2000 From: tismer@tismer.com (Christian Tismer) Date: Sun, 23 Apr 2000 04:21:47 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> Message-ID: <39025E3B.35639080@tismer.com> "A.M. Kuchling" wrote: > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > Any idea why? > > [amk@mira Python-1.6a2]$ ./python Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.59 > This machine benchmarks at 2785.52 pystones/second > > [amk@mira Python-1.6a2]$ python1.5 Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.19 > This machine benchmarks at 3134.8 pystones/second Hee hee :-) D:\python>python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.92135 This machine benchmarks at 5204.66 pystones/second D:\python>cd \python16 D:\Python16>python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 2.06234 This machine benchmarks at 4848.86 pystones/second D:\Python16>cd \python\spc D:\python\spc>python Lib/test/pystone.py python: can't open file 'Lib/test/pystone.py' D:\python\spc>python ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.81034 This machine benchmarks at 5523.82 pystones/second More hee hee :-) Python has been at a critical size with its main loop. The recently added extra code exceeds this size. I had the same effect with Stackless Python, and I worked around it already. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mhammond@skippinet.com.au Sun Apr 23 03:21:01 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Sun, 23 Apr 2000 12:21:01 +1000 Subject: [Python-Dev] 1.6 speed In-Reply-To: <39025E3B.35639080@tismer.com> Message-ID: > Python has been at a critical size with its main loop. > The recently added extra code exceeds this size. > I had the same effect with Stackless Python, and > I worked around it already. OK - so let us in on your secret! :-) Were your work-arounds specific to the stackless work, or could they be = applied here? Only-2-more-years-of-beating-up-Guido-before-stackless-time-ly, Mark. From tismer@tismer.com Sun Apr 23 15:43:10 2000 From: tismer@tismer.com (Christian Tismer) Date: Sun, 23 Apr 2000 16:43:10 +0200 Subject: [Python-Dev] 1.6 speed References: Message-ID: <39030BFE.1675EE20@tismer.com> Mark Hammond wrote: > > > Python has been at a critical size with its main loop. > > The recently added extra code exceeds this size. > > I had the same effect with Stackless Python, and > > I worked around it already. > > OK - so let us in on your secret! :-) > > Were your work-arounds specific to the stackless work, or could they be applied here? My work-arounds originated from code from last January where I was on a speed trip, but with the (usual) low interest from Guido. Then, with Stackless I saw a minor speed loss and finally came to the conclusion that I would be good to apply my patches to my Python version. That was nothing special so far, and Stackless was still a bit slow. I though this came from the different way to call functions for quite a long time, until I finally found out this February: The central loop of the Python interpreter is at a critical size for caching. Speed depends very much on which code gets near which other code, and how big the whole interpreter loop is. What I did: - Un-inlined several code pieces again, back into functions in order to make the big switch smaller. - simplified error handling, especially I ensured that all local error variables have very short lifetime and are optimized away - simplified the big switch, tuned the why_code handling into special opcodes, therefore the whole things gets much simpler. This reduces code size and therefore the probability that we are in the cache, and due to short variable lifetime and a simpler loop structure, the compiler seems to do a better job of code ordering. > Only-2-more-years-of-beating-up-Guido-before-stackless-time-ly, Yup, and until then I will not apply my patches to Python, this is part of my license: Use it but only *with* Stackless. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer@trixie.triqs.com Sun Apr 23 23:16:38 2000 From: tismer@trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 00:16:38 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> Message-ID: <39037646.DEF8A139@trixie.triqs.com> "A.M. Kuchling" wrote: > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > Any idea why? I submitted a comparison with Stackless Python. Now I actually applied the Stackless Python patches to the current CVS version. My version does again show up as faster than standard Python, with the same relative measures, but I too have this effect: Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. Claim: This is not related to ceval.c . Something else must have introduced a significant speed loss. Stackless Python, upon the pre-unicode tag version of CVS: D:\python\spc>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.80724 This machine benchmarks at 5533.29 pystones/second Stackless Python, upon the recent version of CVS: D:\python\spc\Python-cvs\PCbuild>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.94941 This machine benchmarks at 5129.75 pystones/second Less than 10 percent, but bad enough. I guess we have to use MAL's test suite and measure everything alone. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer@trixie.triqs.com Sun Apr 23 23:45:12 2000 From: tismer@trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 00:45:12 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> Message-ID: <39037CF8.24E1D1BD@trixie.triqs.com> Ack, sorry. Please drop the last message. This one was done with the correct dictionaries. :-() Christian Tismer wrote: > > "A.M. Kuchling" wrote: > > > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > > Any idea why? > > I submitted a comparison with Stackless Python. > Now I actually applied the Stackless Python patches > to the current CVS version. > > My version does again show up as faster than standard Python, > with the same relative measures, but I too have this effect: > > Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. > > Claim: > This is not related to ceval.c . > Something else must have introduced a significant speed loss. > > Stackless Python, upon the pre-unicode tag version of CVS: > > D:\python\spc>python ../lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.80724 > This machine benchmarks at 5533.29 pystones/second > > Stackless Python, upon the recent version of CVS: > this one corrected: D:\python\spc\Python-slp\PCbuild>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.98433 This machine benchmarks at 5039.49 pystones/second > Less than 10 percent, but bad enough. It is 10 percent, and bad enough. > > I guess we have to use MAL's test suite and measure everything > alone. > > ciao - chris > > -- > Christian Tismer :^) > Applied Biometrics GmbH : Have a break! Take a ride on Python's > Kaunstr. 26 : *Starship* http://starship.python.net > 14163 Berlin : PGP key -> http://wwwkeys.pgp.net > PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF > where do you want to jump today? http://www.stackless.com > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido@python.org Mon Apr 24 14:03:56 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 09:03:56 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 14:49:11 +0200." <390442C7.F30179D9@trixie.triqs.com> References: <390442C7.F30179D9@trixie.triqs.com> Message-ID: <200004241303.JAA19894@eric.cnri.reston.va.us> [Moving this to python-dev because it's a musing > > The main point is to avoid string.*. > > Agreed. Also replacing map by a loop might not even be slower. > What remains as open question: Several modules need access > to string constants, and they therefore still have to import > string. > Is there an elegant solution to this? import string > That's why i asked for some way to access "".__class__ or > whatever, to get into some common namespace with the constants. I dunno. However, I've noticed that in many situations where map() could be used with a string.* function (*if* you care about the speed-up and you don't care about the readability issue), there's no equivalent that uses the new string methods. This stems from the fact that map() wants a function, not a method. Python 3000 solves this partly, assuming types and classes are unified there. Where in 1.5 we wrote map(string.strip, L) in Python 3K we will be able to write map("".__class__.strip, L) However, this is *still* not as powerful as map(lambda s: s.strip(), L) because the former requires that all items in L are in fact strings, while the latter works for anything with a strip() method (in particular Unicode objects and UserString instances). Maybe Python 3000 should recognize map(lambda) and generate more efficient code for it... --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@trixie.triqs.com Mon Apr 24 15:01:26 2000 From: tismer@trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 16:01:26 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> Message-ID: <390453B6.745E852B@trixie.triqs.com> > Christian Tismer wrote: > > > > "A.M. Kuchling" wrote: > > > > > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > > > Any idea why? ... > > Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. > > > > Claim: > > This is not related to ceval.c . > > Something else must have introduced a significant speed loss. I guess I can explain now what's happening, at least for the Windows platform. Python 1.5.2's .dll was nearly about 512K, something more. I think to remember that 512K is a common size of the secondary cache. Now, linking with the MS linker does not give you any particularly useful order of modules. When I look into the map file, the modules appear sorted by name. This is for sure not providing optimum performance. As I read the docs, explicit ordering of the linkage would only make sense for C++ and wouldn't work out for C, since we could order the exported functions, but not the private ones, giving even more distance between releated code. My solution to see if I might be right was this: I ripped out almost all builtin extension modules and compiled/linked without them. This shrunk the dll size down from 647K to 557K, very close to the 1.5.2 size. Now I get the following figures: Python 1.6, with stackless patches: D:\python\spc\Python-slp\PCbuild>python /python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.95468 This machine benchmarks at 5115.92 pystones/second Python 1.6, from the dist: D:\Python16>python /python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 2.09214 This machine benchmarks at 4779.8 pystones/second That means my optimizations are in charge again, after the overall code size went below about 512K. I think these 10 percent are quite valuable. These options come to my mind: a) try to do optimum code ordering in the too large .dll . This seems to be hard to achieve. b) Split the dll into two dll's in a way that all the necessary internal stuff sits closely in one of them. c) try to split the library like above, but use a static library layout for one of them, and link the static library into the final dll. This would hopefully keep related things together. I don't know if c) is possible, but it might be tried. Any thoughts? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido@python.org Mon Apr 24 16:11:14 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 11:11:14 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1 _socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10 mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11 python16.dsp,1.2,1.3 pythonw.dsp,1.8,1.9 select.dsp,1.1,1.2 unicodedata.dsp,1.1,1.2 zlib.dsp,1.10,1.11 In-Reply-To: Your message of "Fri, 21 Apr 2000 16:19:03 PDT." References: Message-ID: <200004241511.LAA28854@eric.cnri.reston.va.us> > And here is what one of the link lines looks like: > > # ADD LINK32 ApacheCore.lib aprlib.lib kernel32.lib /nologo > /base:@BaseAddr.ref,mod_usertrack /subsystem:windows /dll /map /debug > /machine:I386 /libpath:"..\..\CoreD" /libpath:"..\..\lib\apr\Debug" > > This mechanism could be quite helpful for Python. The .ref file replaces > the dllbase_nt.txt file, centralizes the management, and directly > integrates with the tools. I agree. Just send me patches -- I'm *really* overwhelmed with patch management at the moment, I don't feel like coming up with new code right now... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer@trixie.triqs.com Mon Apr 24 16:19:41 2000 From: tismer@trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 17:19:41 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> Message-ID: <3904660D.6F22F798@trixie.triqs.com> Sorry, it was not really found... Christian Tismer wrote: [thought he had found the speed leak] After re-inserting all the builtin modules, I got nearly the same result after a complete re-build, just marginally slower. There must something else be happening that I cannot understand. Stackless Python upon 1.5.2+ is still nearly 10 percent faster, regardless what I do to Python 1.6. Testing whether Unicode has some effect? I changed PyUnicode_Check to always return 0. This should optimize most related stuff away. Result: No change at all! Which changes were done after the pre-unicode tag, which might really count for performance? I'm quite desperate, any ideas? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim_one@email.msn.com Tue Apr 25 01:56:18 2000 From: tim_one@email.msn.com (Tim Peters) Date: Mon, 24 Apr 2000 20:56:18 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004241303.JAA19894@eric.cnri.reston.va.us> Message-ID: <000101bfae51$10f467a0$3ea0143f@tim> [Guido] > ... > However, this is *still* not as powerful as > > map(lambda s: s.strip(), L) > > because the former requires that all items in L are in fact strings, > while the latter works for anything with a strip() method (in > particular Unicode objects and UserString instances). > > Maybe Python 3000 should recognize map(lambda) and generate more > efficient code for it... [s.strip() for s in L] That is, list comprehensions solved the speed, generality and clarity problems here before they were discovered . From guido@python.org Tue Apr 25 02:21:42 2000 From: guido@python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 21:21:42 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 20:56:18 EDT." <000101bfae51$10f467a0$3ea0143f@tim> References: <000101bfae51$10f467a0$3ea0143f@tim> Message-ID: <200004250121.VAA00320@eric.cnri.reston.va.us> > > Maybe Python 3000 should recognize map(lambda) and generate more > > efficient code for it... > > [s.strip() for s in L] > > That is, list comprehensions solved the speed, generality and clarity > problems here before they were discovered . Ah! I knew there had to be a solution without lambda! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Tue Apr 25 04:19:35 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 24 Apr 2000 22:19:35 -0500 (CDT) Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <000101bfae51$10f467a0$3ea0143f@tim> References: <200004241303.JAA19894@eric.cnri.reston.va.us> <000101bfae51$10f467a0$3ea0143f@tim> Message-ID: <14597.3783.737317.226791@beluga.mojam.com> Tim> [s.strip() for s in L] Tim> That is, list comprehensions solved the speed, generality and Tim> clarity problems here before they were discovered . What is the status of list comprehensions in Python? I remember some work being done several months ago. They definitely don't appear to be in the 1.6a2. Was there some reason to defer them until later? Skip From tim_one@email.msn.com Tue Apr 25 04:26:24 2000 From: tim_one@email.msn.com (Tim Peters) Date: Mon, 24 Apr 2000 23:26:24 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <14597.3783.737317.226791@beluga.mojam.com> Message-ID: <000801bfae66$09191840$e72d153f@tim> [Skip Montanaro] > What is the status of list comprehensions in Python? I remember some work > being done several months ago. They definitely don't appear to be in the > 1.6a2. Was there some reason to defer them until later? Greg Ewing posted a patch to c.l.py that implemented a good start on the proposal. But nobody has pushed it. I had hoped to, but ran out of time; not sure Guido even knows about Greg's patch. Perhaps the 1.6 source distribution could contain a new "intriguing experimental patches" directory? Greg's list-comp and Christian's Stackless have enough fans that this would probably be appreciated. Perhaps some other things too, if we all run out of time (thinking mostly of Vladimir's malloc cleanup and NeilS's gc). From guido@python.org Tue Apr 25 05:13:51 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 00:13:51 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 23:26:24 EDT." <000801bfae66$09191840$e72d153f@tim> References: <000801bfae66$09191840$e72d153f@tim> Message-ID: <200004250413.AAA00577@eric.cnri.reston.va.us> > Greg Ewing posted a patch to c.l.py that implemented a good start on the > proposal. But nobody has pushed it. I had hoped to, but ran out of time; > not sure Guido even knows about Greg's patch. I vaguely remember, but not really. We did use his f(*args, **kwargs) patches as a starting point for a 1.6 feature though -- if the list comprehensions are in a similar state, they'd be great to start but definitely need work. > Perhaps the 1.6 source distribution could contain a new "intriguing > experimental patches" directory? Greg's list-comp and Christian's Stackless > have enough fans that this would probably be appreciated. Perhaps some > other things too, if we all run out of time (thinking mostly of Vladimir's > malloc cleanup and NeilS's gc). Perhaps a webpage woule make more sense? There's no point in loading every download with this. And e.g. stackless evolves at a much faster page than core Python. I definitely want Vladimir's patches in -- I feel very guilty for not having reviewed his latest proposal yet. I expect that it's right on the mark, but I understand if Vladimir wants to wait with preparing yet another set of patches until I'm happy with the design... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip@mojam.com (Skip Montanaro) Tue Apr 25 05:37:42 2000 From: skip@mojam.com (Skip Montanaro) (Skip Montanaro) Date: Mon, 24 Apr 2000 23:37:42 -0500 (CDT) Subject: [Python-Dev] list comprehensions patch - updated for current CVS version Message-ID: <14597.8470.495090.799119@beluga.mojam.com> --j8B4LC7gVY Content-Type: text/plain; charset=us-ascii Content-Description: message body and .signature Content-Transfer-Encoding: 7bit For those folks that might want to fiddle with list comprehensions I tweaked Greg Ewing's list comprehensions patch to work with the current CVS tree. The attached gzip'd patch contains diffs for Grammar/Grammar Include/graminit.h Lib/test/test_grammar.py Lib/test/output/test_grammar Python/compile.c Python/graminit.c I would have updated the corresponding section of the language reference, but the BNF there didn't match the contents of Grammar/Grammar, so I was a bit unclear what needed doing. If it gets that far perhaps someone else can contribute the necessary verbiage or at least point me in the right direction. -- Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/ --j8B4LC7gVY Content-Type: application/octet-stream Content-Description: list comprehensions patch for Python Content-Disposition: attachment; filename="listcomp.patch.gz" Content-Transfer-Encoding: base64 H4sICGofBTkAA2xpc3Rjb21wLnBhdGNoAN1c/1PbSLL/mfwVE6fuGWOBNfpiWbDZKkLIHu8I cEDq3pWT8glbDtoY2SUJEiqb+9uve0YjjaSRLRJM3i2Jbc2opz/dPd09PZLso3Dif9klv0Xe zY0X9dLPZy9//O/Z+cEFmQYzf5f0FtH8d3+cxL3xXRzN50lvcZ9cz8PeJIiTXhyNeyV87e5Z 5CdR4N8F4UcSwUcczENCd0z72SSYTsn2mGwbZDvCHiH8s62tLXG8Yei63tPNnjEghrlrubu6 vsGGb29vl4isnmET3do1gKiPTOQ/xtSxNcclrEnI1Bsn82iXbLa77T/a2/D6d7uT9pI/yGL+ 2Y+AjH3uEi+Z35Ak8sAO0RaM2dpqp7SdrWfP2eld0t5sk2Hix8kM7PGBtDttYNQeFjs/sM6v 0DkJxsmN98mPoPcb6/1XmwhK1viDnOy/PcSPd29fHZ7DwcXl+dHJb10QbObdXE38KaCyIw8Y 3nmRF32MU6BdzgxIU7lTAYGkLF98exWPo2CRInMRd9oMnVmaWw6Otn+W5cgQ30dBAub6b7Si yh8Hrubq3B/RyNjsCyOn/MHKcHR744fAUWt3tqT2EDo+AMQWF5G1CbMvNj8gumh0OEc2bpcb lbRftj+wkxsvyLnvzWb3ZPjJv/88jyb5uWddAv9f/HJ8dHF5cPr27FdoZROxyw+nbNp571Sc n6KHtOG9Tfwvi4ibJAilqZEmNGOKExFMhT4Fghe/9HIhjni6OwrHs9uJ3/sIiSAIg2Tn+skz XlWEmqRn7FCzmPSwh0iio0vkzQ3qug5mNd0gur0L2Y/qG4wJ+opE1zT72X3NdnJvg2ZfF972 AmIgCH3hdcSktNjLPc6kBs5E2p1ND/Sb5X50CpNaFfIp9Npi/o6Dqx5ONXsbfeTpfGdx/+Sz WCdI7QJWnkvsIWUt0OqlvtKCZtNdam5QMatq4gZTa5mabVnS5GJHnrE3YArIjZ9cG5uxP5tq OKOdXbLw4lg+a4qzFF6GIOAZ4MULNoFkPL9ZRP61H6Ih4p2dHTgZ3t7E5CUZwjBDI6ZGLI3Y GLNxErETrf3FYua3NNJ65YXwD48O5uN5eJu0GN1izOkIySgJJyWClgBMQgijh/+LKMAUGO8A RrDYhPUHHC4mQciYfcgpTLJFvrCzX/AsyiqdLZ0h4J9fyK/EkEgCDfgiVZBR5VigYEPS4ZQd T8UwhGqFLWxOP6RKxbegfeBzo0GbkE2KRpv74H6tjsa7wMatN5Clsw6weOutN57MQ282iVsd 6BVW8qJEZqYD5X4A2Tj0o5wf9h6GH6EzZ4l9B9e+H/tXt9FHpM65opjAuCAk1TsaOzDYgSEO QDZTlyXiluLj4tC78TWywI8O6yLMSHBiDqZk3cxewi4pTUq1QKpFRsWUzSgEp8WIM1uMgDzj BqQSJcHJADLy8iXhA4gXToC16IEjpkI5b81vk8VtMWv8vNylEKa+AK/U34XkU0ld5bxl7hoG FuKVtNU4Z1l9zZKWI2jaZlb8QFGImWk8gwQE2QlsP2yzzNDWSJsnBjzK8kIbvWsIvtbXiAvO CO5HbdEnstEQ3TNlkzqr4CWaaVJqpy4sERtFYqNEbBaIzSKxWSK2CsRWkdgqEdsFYrtIbMvE uY4rtFqhxwrJV8i6XLo2z2U4eSIPMVKpn2ci3otprtyXZzrmAlKOQpA0Qs94kOBqBWG0M37y qCwLUF8V6nalLNRtkkuOwZK1svCCcoNaUBLuWtYGZ4JRVKXLw9BRhqFpDDTTpHkg8g4jKx3G 115EtsYjSNx3XuLvbfS2WGZN2wTTL7nxQigcQbGtHtaNfjhhO4ENzPXjUXLDkvTeBo5N/JsF mc3H3owPHc9vQ6wht3qktyUqfMbm2x68qXMHpZpFpXrHgoi3QNVU6DjxkmBMGPr8ZgTVDiSR GJxidHZ+enm6uQkL8O04Sc3F5NZIOJ/4ZKvT2asyCP3PXOLlHLipkEMXdMn3TKBNV7C8mwcT xjOrkpfzRCvCHyxRXD4t+5TRlMxhTWvI++GsccaWMs/lzriCNR8AUSg0G09dlRmsIMHHZgyY iNn09eT5y1zi7P4AaE+vMPrJVpAGXJX9SLCswhVcrMjv99X8uOE0dVzYjmb1dXlVHcAqKwXz 9q/jESyqyQhTLxRw+l7aDWE6C+eJdwXO/iXhZ7rpgDR+M3JIZLcRVBGsoRSE6tTU4K2fi0J1 w9YotbJF/htGtyJQinMILTlaNscwWRqBojHp8A1GybhjFJvbCui2fGxykyV4+JVVfWf3wuB3 e6wDA90Lx9cQjqnmvC/27vzRlQ+LD/QzW7DGHheTgPDMxSaT6efIn6JwF4eX785Gx6enZxr5 n6vI9z6NOOPOHtMOx1xBMvk0WtzG18UREol8kQWTxB0WrnghRb4qQlJylqJAZeR28Nej49eb oLoJboaM2KCU8A60OLs/CpPRm2h+czwPP27qx51U3SnZvMNy9+Td8bEoxpnKfhTNo7jb3cvA QOE5lNAfEfD4dP/16OD05OJSE+dgxY8TPHfX6eSDhL407QMX/7/Xhwfnh2827wRZZmFhbnTG QA0MdhsdH50cnpyCS2z/GjKXhhK/QJ3Ny5vTczErYj5qJcvTRsGiFMqP07PR/sXF0W8nssSz +XxRsE82Ram7WqnHlsdsb6tV+993bwHn1cXp8bvLQ00yi8pOuY/mzK688aeFl4yZWlV95wvs N7iDCMuQay8mcGrhT0hyDUUVybwxle/qPmE+dgZWeHV8evC3qk9zzkqXrsiliI5vLCXUZ4Bg +jgJoC7evbxZjEHlVUk5/h7mmZVopZ0av2WucPRm9Gb/+OIQnbdIJ0/JJZq7PMmyT5fd0ii6 pQoZnOMf++evVVFT9LFOZrV/+OT3W6jxrv3IJ5+D5BosDVkoiTzih0l0Tz5fB+Nr8hlKx/ln dLiSGetVqvPqb8t9JlV4Act7Q8dZsMCA6cSLnvOpdKWTn0d50V2CPa7y+DqYTeDcDrsMUTeC H/k8K898dkUVs7MfsxKH0QhPRRq25gIvILidJfzym1Q9M/qvWe4OyC/k5OCvm4tOJ+0VlRcm Ce5oaILUnnAy6wZX0LPuGKYMpmfz8p9nh5uhxGzsxb50bT+7fFK3PO9lFCzQ9ypsgululUkx wmt4QDXvgUmKw9kyxRzn/vDLeHRxH4OxDrFTI60gvPNmUBGW5iW5X/itDOUb++Tv/gzEzFQX AY5TAp6TrXt4a0eWU7WaLBm+f3l5DsJ5kHPDSatALzKEX+WSZpmD/ePj0Zt3JweXR6cnCjh1 HJVXgEzl5UFUqMbZDK3Ovnm6BWdO7/MN0yyazwPeJ4O9Hw9y9P602hwOPnDhYnbJcLqZ9oO9 /jJpaaTblWvTYg7LbPTqHfj3COvL3MVrV/zSDF1cnp4fihmuoDRLr4XaoUbUGp+qIjaQ+fXh 8eGlWujtbdleUtosb3WgkxTcgMhuEPI5T0O9U1P9Y6mvG9m99o2Zj9lmE/NT2CFd0IH0iMH2 D+wqbcDWXSJyGNQ5cNx9CQ767DnsI5VLZsD3URvL5hxwc6J0yqBvG23INiXUNmBTYme3XX9Q 0prV/UdFVRvZscDI+QMNG9xOsptg5zdmQUxnOY1qLoulCJKmKRdFwpx9/Op8/+BwF4NZdeMd PIdZ1NCpBm+DzKKrxGKrF1ttMhEMWHZgJ3J+8fdXHZY+CteFGivRFQgr/woI3RJCJfPtVSV6 sK2K1ymz27k/7UJlLkHtlUqjcv/akO5f8wuVebP66A51NhiTwv3r5lcqB66jubqduftX2od5 /qbxY90Ft4HGc2xQCvvfrAG7L0uQQU62sxO23ABmfUb1Tbo640XsFY8se0SHxgcI/q/psxpc GOHky4QxZGHsOmH6csNpLIzKUi7gu9QRllIMN8RwlAOEHWTIOhejHtkUQzPl3GywuWqwNaS5 DYWYxYu2zcSkDxTTlMW0GouptK5haK7RF9at49EvCpC518ZXXZpcZoeUobgu1oyho2aolnig uRaVImcgWeC5CmwgLNDQK9wi/UpHoHpxgKUSCD98/o6DhpTm3uBoGS89NYORd2EkctOi5nKY NtDcKGlOaXO3c+vVamQHe5V4lBYHUJE2lhrOeIjhlC5k9TXXlpKvNEa4Ic27HGGxvGtQpXKr XVRfYq7+qGQtR+TcOvpCznZtXXPzZ56a6mB8lw7dUt9yOUt6DRrrpZwrB/LqwFmeoJycR4aZ LUyD5W7rQGIuyOsuX64cyMZy3h9Ymusay/NdRT73++Uz9MbyKe3p2iCvu8qe/QYpsxiYztDJ rUKhDIJ6XrdW2aWvTACNcFTaAaSDt2rocv0GJR9Fmy7z0UEx9gDA4DeElmpXQaGNUdS6Udwj 0cEq3YyCr+W6sbXVkBdr4Obi3tZcpUiJJVWzVEtt9gHC0msLOBcc1sxTukGl2hEbUmFgFoW3 KHIu3yhvyNlQc1brYOENQGtFFnJHdsVMEoRdEn4ALO0ViaPC0lCzVEpNTYhBalpLpbatspca S70U6Kmc/wAA74Sa7lJFqihmYxS1bn1E7S/PYzZUBcIBSqjgDabwXDi2uAdwhRw0m7M8dVVZ W8tZq7UYINRgxQxhCW89WIsBGmiwYloqrL9HCwMoQShLKsnTWsHujwQHqSuLtazEm0w9fMVD 25HC1YZtRiuGffvMHwXh4jZpaURnj+Sma4HOYpm03sNC8143dfik8NLfU0t/b1r0PbXN94Zu wDkLPnV2jr36+nvDgn7HwrEtoaUNNVgLrzXIeEaGRwWe8Vh4sPa3/DtvptbPyPUDHkaOp3rV 4+k5nov63YbjiT/lYP0MzGQ0AoyqQRq8WsIDDdiuUCP/EkgTt+iiW3SrfupAtKV5kNWihs39 lDf6olE3OMslXZ43q9SFGsN2ZDBRMqMQpVoYWS8BlgoAHNV3lss5KMpputLSWt0+2YNiYYjR YlY2TzK5KZOnFWg9tVXZOoq1U1811JaB9JV7Onsw7EtQWm690sYLLVTtMqpd4gqVkXdZVaqq WLkSbnG96qcpt568sJPFld9YSm5UrGvm1l0+1CxZtzrpReu6hQyv5QqWLeJm1jXyrop13VEV MMvfff3J8rcO+Y06FFPVk+RvJd4a83eG9xPyt3qZt2DTY0DFKpb5hoqtwGKK9S10leSmMmWD ggkfYcmFuhGd8mYBTpIDWhmgWzdnMqCOgA3X3D6LghtvNhupFKT6owNiEOCjGGo8+thFRR+D gN3WlgDzIICtsuws8sv6vsLC1GEfbNKsQP//74iq3LFWR1Qlj7U6ohJwjY6oxHtCR1RnSKcP GXKgVzJk7cto7JgD/O4c3r6d34YTpUGhJpYNWrMCMEOa1hLMPFcO8Mt5wVRCG+Ro/Vpr1k3b qrA2Yb9gWkYlrP+brCcF+JNZT+2LA2ZNq7kvrsTLNcPAjm+DxOd62XkRQgsl3SOsZgMManzK qpwfzUffHQ+wuvLAP3K43D9M84fhzCKci99KDeeJBJcnK9N6dDgRAl4UxPOwop/9EEC9CaCR Ao7mi8rkNfJ/IQGqZ7LJMmt90jXTxaaimPPoimGN8GUejZRwg0eHs1O3VMLVlgnfDcfKhOtg migBrdoy4bsBMZuAUybXasAH1QmNAHlKiW4qUA9KKY2g2IaN/UBMOcKtByWUBmCmjgmF/VBN OdysB2WTSnbB8ijHes6wMJvgY7AcCm/cZXfoHxGL64WJJP1tm/KCY/ULaN/zsvQiGiaS9Odx OJqToznVtFXSzViJVERj+w35t3YqflJIKFRlTwmzwTpn6raMWdHRfXQ8llFmwdgXC0Guna3L FqXfO4cymth1qIxpP3YyMXVRnyjhHrtGMfFhvFb2AGY5GuwfL1JKeOxHNsQvF5SrWNv64eBj LyuHY2mF/25PBa2SWAymnFrBRsoZHI39HlA5Z9r9R7WlfLl05l35M/4eD6lh5ddM2Y+XvD27 /Gd+yQ+CR8+2SBAu1KIP2CL9/KJetVNfU1GvhFpfUa+zOS7CrbGoV8Kts6hXAv4ZinqlYusr 6pVw6yvqlXDrLOqVgOss6pWA6ynqlVDrKuoVYGsr6lkxmmMpi3rjkYr6EtYTFPW0mEj+ZEW9 Yp1ba1GvxPuzFPXKoFtfUa+05RqLehXeUxT1VH+Col6p3PqK+hIcy5n47ZBW9mXZ8uJjO02M aXCDrYoEjmcJvKlYfvJnAOxB47lrCmdn6k0rxnQbK9ckzJfskJwmOyTlbRALkgO8Sb+uRW18 5tnOn3lOlwfx/RGXH3bF5Ip+DJrsOyZLIfHRVMsxC49D8nArOKn05ZD05wbJ6Oz+zItiPxql vwYuntux+RqFz9iI55dhHrh9OE8ww7P86WIbthoUf0RVfvauRoaucKu8YT9Mur5ekc6pke4/ 9E4QXNtdAAA= --j8B4LC7gVY-- From Vladimir.Marangozov@inrialpes.fr Tue Apr 25 07:13:32 2000 From: Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov) Date: Tue, 25 Apr 2000 08:13:32 +0200 (CEST) Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004250413.AAA00577@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 25, 2000 12:13:51 AM Message-ID: <200004250613.IAA10174@python.inrialpes.fr> Hi, I'm back on-line. > [Tim] > > Perhaps the 1.6 source distribution could contain a new "intriguing > > experimental patches" directory? Greg's list-comp and Christian's > > Stackless have enough fans that this would probably be appreciated. > > Perhaps some other things too, if we all run out of time (thinking > > mostly of Vladimir's malloc cleanup and NeilS's gc). I'd be in favor of including gc as an optional (experimental) feature. I'm quite confident that it will evolve into a standard feature, in its current or in an improved state. The overall strategy looks good, but there are some black spots w.r.t its cost, both in speed and space. Neil reported in private mail something like 5-10% mem increase, but I doubt that the picture is so optimistic. My understanding is that these numbers reflect the behavior of the Linux VMM in terms of effectively used pages. In terms of absolute, peak requested virtual memory, things are probably worse than that. We're still unclear on this... For 1.6, the gc option would be a handy tool for detecting cyclic trash. It will answer some expectations, and I believe we're ready to give some good feedback on its functioning, its purpose, its limitations, etc. By the time 1.6 is finalized, I expect that we'll know roughly its cost in terms of mem overhead. Overall, it would be nice to have it in the distrib as an experimental feature -- it would both bootstrap some useful feedback, and would encourage enthousiasts to look more closely at DSA/GC (DSA - dynamic storage allocation). By 1.7 (with Py3K on the horizon), we would have a good understanding on what to do with gc and how to do it. If I go one step further, what I expect is that the garbage collector would be enabled together with a Python-specific memory allocator which will compensate the cost introduced by the collector. There will some some stable state again (in terms of speed and size) similar to what we have now, but with a bonus pack of additional memory services. > I definitely want Vladimir's patches in -- I feel very guilty for not > having reviewed his latest proposal yet. I expect that it's right on > the mark, but I understand if Vladimir wants to wait with preparing > yet another set of patches until I'm happy with the design... Yes, I'd prefer to wait and get it right. There's some basis, but it needs careful rethinking again. I'm willing to fit in the 1.6 timeline but I understand very well that it's a matter of time :-). -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one@email.msn.com Tue Apr 25 07:25:36 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:25:36 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: <000701bfae7f$1174a540$152d153f@tim> [Greg Stein] > ... > Many people have asked for free-threading, and the number of inquiries > that I receive have grown over time. (nobody asked in 1996 when I first > published my patches; I get a query every couple months now) Huh! That means people ask me about it more often than they ask you . I'll add, though, that you have to dig into the inquiry: almost everyone who asks me is running on a uniprocessor machine, and are really after one of two other things: 1. They expect threaded stuff to run faster if free-threaded. "Why?" is a question I can't answer <0.5 wink>. 2. Dealing with the global lock drives them insane, especially when trying to call back into Python from a "foreign" C thread. #2 may be fixable via less radical means (like a streamlined procedure enabled by some relatively minor core interpreter changes, and clearer docs). I'm still a fan of free-threading! It's just one of those things that may yield a "well, ya, that's what I asked for, but turns out it's not what I *wanted*" outcome as often as not. enthusiastically y'rs - tim From tim_one@email.msn.com Tue Apr 25 07:25:38 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:25:38 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: <000801bfae7f$12c456c0$152d153f@tim> [Greg Wilson, on Linda and JavaSpaces] > ... > Personal opinion: I've felt for 15 years that something like Linda could > be to threads and mutexes what structured loops and conditionals are to > the "goto" statement. Were it not for the "Huh" effect, I'd recommend > hanging "Danger!" signs over threads and mutexes, and making tuple spaces > the "standard" concurrency mechanism in Python. There's no question about tuple spaces being easier to learn and to use, but Python slams into a conundrum here akin to the "floating-point versus *anything* sane " one: Python's major real-life use is as a glue language, and threaded apps (ditto IEEE-754 floating-point apps) are overwhelmingly what it needs to glue *to*. So Python has to have a good thread story. Free-threading would be a fine enhancement of it, Tuple spaces (spelled "PyBrenda" or otherwise) would be a fine alternative to it, but Python can't live without threads too. And, yes, everyone who goes down Hoare's CSP road gets lost <0.7 wink>. From tim_one@email.msn.com Tue Apr 25 07:40:26 2000 From: tim_one@email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:40:26 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004250613.IAA10174@python.inrialpes.fr> Message-ID: <000901bfae81$240b5580$152d153f@tim> [Vladimir Marangozov, on NeilS's gc patch] > ... > The overall strategy looks good, but there are some black spots > w.r.t its cost, both in speed and space. Neil reported in private > mail something like 5-10% mem increase, but I doubt that the picture > is so optimistic. My understanding is that these numbers reflect > the behavior of the Linux VMM in terms of effectively used pages. In > terms of absolute, peak requested virtual memory, things are probably > worse than that. We're still unclear on this... Luckily, that's what Open Source is all about: if we have to wait for you (or Neil, or Guido, or anyone else) to do a formal study of the issue, the patch will never go in. Put the code out there and let people try it, and 50 motivated users will run the only 50 tests that really matter: i.e., does their real code suffer or not? If so, a few of them may even figure out why. less-thought-more-eyeballs-ly y'rs - tim From mal@lemburg.com Tue Apr 25 10:43:46 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 25 Apr 2000 11:43:46 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code Message-ID: <390568D2.2CC50766@lemburg.com> This is a multi-part message in MIME format. --------------9972D9B8E9394EC8828CF147 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit After the discussion about #pragmas two weeks ago and some interesting ideas in the direction of source code encodings and ways to implement them, I would like to restart the talk about encodings in source code and runtime auto-conversions. Fredrik recently posted patches to the patches list which loosen the currently hard-coded default encoding used throughout the Unicode design and add a layer of abstraction which would make it easily possible to change the default encoding at some later point. While making things more abstract is certainly a wise thing to do, I am not sure whether this particular case fits into the design decisions made a few months ago. Here's a short summary of what was discussed recently: 1. Fredrik posted the idea of changing the default encoding from UTF-8 to Latin-1 (he calls this 8-bit Unicode which points to the motivation behind this: 8-bit strings should behave like 8-bit Unicode). His recent patches work into this direction. 2. Fredrik also posted an interesting idea which enables writing Python source code in any supported encoding by having the Python tokenizer read Py_UNICODE data instead of char data. A preprocessor would take care of converting the input to Py_UNICODE; the parser would assure that 8-bit string data gets converted back to char data (using e.g. UTF-8 or Latin-1 for the encoding) 3. Regarding the addition of pragmas to allow specifying the used source code encoding several possibilities were mentioned: - addition of a keyword "pragma" to define pragma dictionaries - usage of a "global" as basis for this - adding a new keyword "decl" which also allows defining other things such as type information - XML like syntax embedded into Python comments Some comments: Ad 1. UTF-8 is used as basis in many other languages such as TCL or Perl. It is not an intuitive way of writing strings and causes problems due to one character spanning 1-6 bytes. Still, the world seems to be moving into this direction, so going the same way can't be all wrong... Note that stream IO can be recoded in a way which allows Python to print and read e.g. Latin-1 (see below). The general idea behind the fixed default encoding design was to give all the power to the user, since she eventually knows best which encoding to use or expect. Ad 2. I like this idea because it enables writing Unicode- aware programs *in* Unicode... the only problem which remains is again the encoding to use for the classic 8-bit strings. Ad 3. For 2. to work, the encoding would have to appear close to the top of the file. The preprocessor would have to be BOM-mark aware to tell whether UTF-16 or some ASCII extension is used by the file. Guido asked me for some code which demonstrates Latin-1 recoding using the existing mechanisms. I've attached a simple script to this mail. It is not much tested yet, so please give it a try. You can also change it to use any other encoding you like. Together with the Japanese codecs provided by Tamito Kajiyama (http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/tmp/japanese-codecs.tar.gz) you should be able to type Shift-JIS at the raw_input() or interactive prompt, have it stored as UTF-8 and then printed back as Shift-JIS, provided you put add a recoder similar to the attached one for Latin-1 to your PYTHONSTARTUP or site.py script. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ --------------9972D9B8E9394EC8828CF147 Content-Type: text/python; charset=us-ascii; name="latin1io.py" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="latin1io.py" """ Redirect sys.std[in|out|err] to have them use Latin-1 as encoding. Marc-Andre Lemburg, 2000-04-25. """#" import codecs,sys,types class Latin1IO(codecs.StreamRecoder): """ Latin-1 Recoder. Translates streams encoded in Latin-1 to UTF-8. The Python interface will return UTF-8 encoded strings and will accept both Unicode and UTF-8 encoded strings as input. """ def __init__(self,stream,errors='strict'): """ Creates a Latin1IO instance. stream must be a file-like object. Error handling is done in the same way as defined for the codecs.StreamWriter/Readers. """ self.stream = stream self.errors = errors # Stream backend should translate Unicode <-> Latin-1 (Reader,Writer) = codecs.lookup('latin-1')[2:4] self.reader = Reader(stream, errors) self.writer = Writer(stream, errors) # Interface frontend should translate UTF-8 <-> Unicode (encode,decode) = codecs.lookup('utf-8')[0:2] self.encode = encode self.decode = decode def write(self,data): if type(data) is not types.UnicodeType: data, bytesdecoded = self.decode(data, self.errors) return self.writer.write(data) def writelines(self,list): if type(data) is not types.UnicodeType: data = ''.join(list) data, bytesdecoded = self.decode(data, self.errors) else: data = u''.join(list) return self.writer.write(data) if __name__ == '__main__': # Redirect all standard IO streams sys.stdin = Latin1IO(sys.stdin) sys.stdout = Latin1IO(sys.stdout) sys.stderr = Latin1IO(sys.stderr) --------------9972D9B8E9394EC8828CF147-- From Fredrik Lundh" Message-ID: <00a401bfaec9$3aaae100$34aab5d4@hagrid> I'll follow up with a longer reply later; just one correction: M.-A. Lemburg wrote: > Ad 1. UTF-8 is used as basis in many other languages such=20 > as TCL or Perl. It is not an intuitive way of > writing strings and causes problems due to one character > spanning 1-6 bytes. Still, the world seems to be moving > into this direction, so going the same way can't be all > wrong... the problem here is the current Python implementation doesn't use UTF-8 in the same way as Perl and Tcl. Perl and Tcl only exposes one string type, and that type be- haves exactly like it should: "The Tcl string functions properly handle multi- byte UTF-8 characters as single characters." "By default, Perl now thinks in terms of Unicode characters instead of simple bytes. /.../ All the relevant built-in functions (length, reverse, and so on) now work on a character-by-character basis instead of byte-by-byte, and strings are represented internally in Unicode." or in other words, both languages guarantee that given a string s: - s is a sequence of characters (not bytes) - len(s) is the number of characters in the string - s[i] is the i'th character - len(s[i]) is 1 and as I've pointed out a zillion times, Python 1.6a2 doesn't. this should be solved, and I see (at least) four ways to do that: -- the Tcl 8.1 way: make 8-bit strings UTF-8 aware. operations like len and getitem usually searches from the start of the string. to handle binary data, introduce a special ByteArray type. when mixing ByteArrays and strings, treat each byte in the array as an 8-bit unicode character (conversions from strings to byte arrays are lossy). [imho: lots of code, and seriously affects performance, even when unicode characters are never used. this approach was abandoned in Tcl 8.2] -- the Tcl 8.2 way: use a unified string type, which stores data as UTF-8 and/or 16-bit unicode: struct { char* bytes; /* 8-bit representation (utf-8) */ Tcl_UniChar* unicode; /* 16-bit representation */ } if one of the strings are modified, the other is regenerated on demand. operations like len, slice and getitem always convert to 16-bit first. still need a ByteArray type, similar to the one described above. [imho: faster than before, but still not as good as a pure 8-bit string type. and the need for a separate byte array type would break alot of existing Python code] -- the Perl 5.6 way? (haven't looked at the implementation, but I'm pretty sure someone told me it was done this way). essentially same as Tcl 8.2, but with an extra encoding field (to avoid con- versions if data is just passed through). struct { int encoding; char* bytes; /* 8-bit representation */ Tcl_UniChar* unicode; /* 16-bit representation */ } [imho: see Tcl 8.2] -- my proposal: expose both types, but let them contain characters from the same character set -- at least when used as strings. as before, 8-bit strings can be used to store binary data, so we don't need a separate ByteArray type. in an 8-bit string, there's always one character per byte. [imho: small changes to the existing code base, about as efficient as can be, no attempt to second-guess the user, fully backwards com- patible, fully compliant with the definition of strings in the language reference, patches are available, etc...] From jeremy@cnri.reston.va.us Tue Apr 25 18:20:44 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Tue, 25 Apr 2000 13:20:44 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <3904660D.6F22F798@trixie.triqs.com> References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> Message-ID: <14597.54252.185633.504968@goon.cnri.reston.va.us> The performance difference I see on my Sparc is smaller. The machine is a 200MHz Ultra Sparc 2 with 256MB of RAM, built both versions with GCC 2.8.1. It appears that 1.6a2 is about 3.3% slower. The median pystone time taken from 10 measurements are: 1.5.2 4.87 1.6a2 5.035 For comparison, the numbers I see on my Linux box (dual PII 266) are: 1.5.2 3.18 1.6a2 3.53 That's about 10% faster under 1.5.2. I'm not sure how important this change is. Three percent isn't enough for me to worry about, but it's a minority platform. I suppose 10 percent is right on the cusp. If the performance difference is the cost of the many improvements of 1.6, I think it's worth the price. Jeremy From tismer@tismer.com Tue Apr 25 19:12:39 2000 From: tismer@tismer.com (Christian Tismer) Date: Tue, 25 Apr 2000 20:12:39 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> <14597.54252.185633.504968@goon.cnri.reston.va.us> Message-ID: <3905E017.1565757C@tismer.com> Jeremy Hylton wrote: > > The performance difference I see on my Sparc is smaller. The machine > is a 200MHz Ultra Sparc 2 with 256MB of RAM, built both versions with > GCC 2.8.1. It appears that 1.6a2 is about 3.3% slower. > > The median pystone time taken from 10 measurements are: > 1.5.2 4.87 > 1.6a2 5.035 > > For comparison, the numbers I see on my Linux box (dual PII 266) are: > > 1.5.2 3.18 > 1.6a2 3.53 > > That's about 10% faster under 1.5.2. Which GCC was it on the Linux box, and how much RAM does it have? > I'm not sure how important this change is. Three percent isn't enough > for me to worry about, but it's a minority platform. I suppose 10 > percent is right on the cusp. If the performance difference is the > cost of the many improvements of 1.6, I think it's worth the price. Yes, and I'm happy to pay the price if I can see where I pay. That's the problem, the changes between the pre-unicode tag and the current CVS are not enough to justify that speed loss. There must be something substantial. I also don't grasp why my optimizations are so much more powerful on 1.5.2+ as on 1.6 . Mark Hammond pointed me to the int/long unification. Was this done *after* the unicode patches? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer@tismer.com Tue Apr 25 19:27:20 2000 From: tismer@tismer.com (Christian Tismer) Date: Tue, 25 Apr 2000 20:27:20 +0200 Subject: [Python-Dev] Off-topic Message-ID: <3905E388.2C1911C1@tismer.com> This is a multi-part message in MIME format. --------------F70B0770802B95BF35DC4CE0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hey, don't blame me for posting a joke :-) Please read from the beginning, don't look at the end first. No, this is no offense... -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com --------------F70B0770802B95BF35DC4CE0 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Return-Path: Received: from MX-HD01 (mail.biotech-campus.de [62.154.173.228]) by trixie.triqs.com (8.9.3/8.9.3) with SMTP id CAA09997 for ; Tue, 25 Apr 2000 02:10:03 -0500 Received: from 248 ([192.168.179.31]) by MX-HD01; Tue, 25 Apr 2000 09:07:56 +0200 Message-ID: <001701bfae84$fb169320$1fb3a8c0@248.brahms.lan> From: "A.Bergmann bei BRAHMS" To: "Christian Tismer" Subject: Moin..... Date: Tue, 25 Apr 2000 09:07:49 +0200 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0014_01BFAE95.BAF21290" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 X-Mozilla-Status2: 00000000 This is a multi-part message in MIME format. ------=_NextPart_000_0014_01BFAE95.BAF21290 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable The Great Writer There was once a young man who, in his youth, wanted more than anything = else to become a great writer. When asked to define "great" he said, "I want to write stuff that the = whole world will read, stuff that people=20 will react to on a truly emotional level, stuff that will make them = understand the helplessness of the human condition, stuff that will make them scream and cry in pain and anger!" He worked very hard at this dream for years and in the end = succeeded................... He now works for Microsoft, writing error messages. ------=_NextPart_000_0014_01BFAE95.BAF21290 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable

The Great Writer

There was once a young man who, in his youth, wanted more than = anything else=20 to become a great writer.

When asked to define "great" he said, "I want to write = stuff=20 that the whole world will read, stuff that people

will react to on a truly emotional level, stuff that will make them=20 understand the helplessness of the human

condition, stuff that will make them scream and cry in pain and=20 anger!"

He worked very hard at this dream for years and in the end=20 succeeded...................

He now works for Microsoft, writing error=20 messages.

------=_NextPart_000_0014_01BFAE95.BAF21290-- --------------F70B0770802B95BF35DC4CE0-- From mal@lemburg.com Tue Apr 25 21:13:39 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Tue, 25 Apr 2000 22:13:39 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> Message-ID: <3905FC73.7D7D6B1D@lemburg.com> Fredrik Lundh wrote: > > I'll follow up with a longer reply later; just one correction: > > M.-A. Lemburg wrote: > > Ad 1. UTF-8 is used as basis in many other languages such > > as TCL or Perl. It is not an intuitive way of > > writing strings and causes problems due to one character > > spanning 1-6 bytes. Still, the world seems to be moving > > into this direction, so going the same way can't be all > > wrong... > > the problem here is the current Python implementation > doesn't use UTF-8 in the same way as Perl and Tcl. Perl > and Tcl only exposes one string type, and that type be- > haves exactly like it should: > > "The Tcl string functions properly handle multi- > byte UTF-8 characters as single characters." > > "By default, Perl now thinks in terms of Unicode > characters instead of simple bytes. /.../ All the > relevant built-in functions (length, reverse, and > so on) now work on a character-by-character > basis instead of byte-by-byte, and strings are > represented internally in Unicode." > > or in other words, both languages guarantee that given a > string s: > > - s is a sequence of characters (not bytes) > - len(s) is the number of characters in the string > - s[i] is the i'th character > - len(s[i]) is 1 > > and as I've pointed out a zillion times, Python 1.6a2 doesn't. Just a side note: we never discussed turning the native 8-bit strings into any encoding aware type. > this > should be solved, and I see (at least) four ways to do that: > > ... > -- the Perl 5.6 way? (haven't looked at the implementation, but I'm > pretty sure someone told me it was done this way). essentially > same as Tcl 8.2, but with an extra encoding field (to avoid con- > versions if data is just passed through). > > struct { > int encoding; > char* bytes; /* 8-bit representation */ > Tcl_UniChar* unicode; /* 16-bit representation */ > } > > [imho: see Tcl 8.2] > > -- my proposal: expose both types, but let them contain characters > from the same character set -- at least when used as strings. > > as before, 8-bit strings can be used to store binary data, so we > don't need a separate ByteArray type. in an 8-bit string, there's > always one character per byte. > > [imho: small changes to the existing code base, about as efficient as > can be, no attempt to second-guess the user, fully backwards com- > patible, fully compliant with the definition of strings in the language > reference, patches are available, etc...] Why not name the beast ?! In your proposal, the old 8-bit strings simply use Latin-1 as native encoding. The current version doesn't make any encoding assumption as long as the 8-bit strings do not get auto-converted. In that case they are interpreted as UTF-8 -- which will (usually) fail for Latin-1 encoded strings using the 8th bit, but hey, at least you get an error message telling you what is going wrong. The key to these problems is using explicit conversions where 8-bit strings meet Unicode objects. Some more ideas along the convenience path: Perhaps changing just the way 8-bit strings are coerced to Unicode would help: strings would then be interpreted as Latin-1. str(Unicode) and "t" would still return UTF-8 to assure loss-less conversion. Another way to tackle this would be to first try UTF-8 conversion during auto-conversion and then fallback to Latin-1 in case it fails. Has anyone tried this ? Guido mentioned that TCL does something along these lines... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin@mems-exchange.org Tue Apr 25 21:54:11 2000 From: akuchlin@mems-exchange.org (Andrew M. Kuchling) Date: Tue, 25 Apr 2000 16:54:11 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <3905E017.1565757C@tismer.com> References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> <14597.54252.185633.504968@goon.cnri.reston.va.us> <3905E017.1565757C@tismer.com> Message-ID: <14598.1523.533352.759437@amarok.cnri.reston.va.us> Christian Tismer writes: >Mark Hammond pointed me to the int/long unification. >Was this done *after* the unicode patches? Before. It seems unlikely they're the cause (they just add a 'if (PyLong_Check(key)' branch to the slicing functions in abstract.c. OTOH, if pystone really exercises sequence multiplication, maybe they're related (but 10% worth?). -- A.M. Kuchling http://starship.python.net/crew/amk/ I know flattery when I hear it; but I do not often hear it. -- Robertson Davies, _Fifth Business_ From Fredrik Lundh" Message-ID: <002601bfaf00$74d462c0$34aab5d4@hagrid> > + insint(d, "MSG_DONWAIT", MSG_DONTWAIT); better make that > + insint(d, "MSG_DONTWAIT", MSG_DONTWAIT); right? From Fredrik Lundh" <00a401bfaec9$3aaae100$34aab5d4@hagrid> <3905FC73.7D7D6B1D@lemburg.com> Message-ID: <002701bfaf02$734a2e60$34aab5d4@hagrid> M.-A. Lemburg wrote: > > and as I've pointed out a zillion times, Python 1.6a2 doesn't. >=20 > Just a side note: we never discussed turning the native > 8-bit strings into any encoding aware type. hey, you just argued that we should use UTF-8 because Tcl and Perl use it, didn't you? my point is that they don't use it the way Python 1.6a2 uses it, and that their design is correct, while our design is slightly broken. so let's fix it ! > Why not name the beast ?! In your proposal, the old 8-bit > strings simply use Latin-1 as native encoding. in my proposal, there's an important distinction between character sets and character encodings. unicode is a character set. latin 1 is one of many possible encodings of (portions of) that set. maybe it's easier to grok if we get rid of the term "character set"? http://www.hut.fi/u/jkorpela/chars.html suggests the following replacements: character repertoire=20 A set of distinct characters. character code=20 A mapping, often presented in tabular form, which defines one-to-one correspondence between characters in a character repertoire and a set of nonnegative integers. character encoding=20 A method (algorithm) for presenting characters in digital form by mapping sequences of code numbers of characters into sequences of octets. now, in my proposal, the *repertoire* contains all characters described by the unicode standard. the *codes* are defined by the same standard. but strings are sequences of characters, not sequences of octets: strings have *no* encoding. (the encoding used for the internal string storage is an implementation detail). (but sure, given the current implementation, the internal storage for an 8-bit string happens use Latin-1. just as the internal storage for a 16-bit string happens to use UCS-2 stored in native byte order. but from the outside, they're just character sequences). > The current version doesn't make any encoding assumption as > long as the 8-bit strings do not get auto-converted. In that case > they are interpreted as UTF-8 -- which will (usually) fail > for Latin-1 encoded strings using the 8th bit, but hey, at least > you get an error message telling you what is going wrong. sure, but I don't think you get the right message, or that you get it at the right time. consider this: if you're going from 8-bit strings to unicode using implicit con- version, the current design can give you: "UnicodeError: UTF-8 decoding error: unexpected code byte" if you go from unicode to 8-bit strings, you'll never get an error. however, the result is not always a string -- if the unicode string happened to contain any characters larger than 127, the result is a binary buffer containing encoded data. you cannot use string methods on it, you cannot use regular expressions on it. indexing and slicing won't work. unlike earlier versions of Python, and unlike unicode-aware versions of Tcl and Perl, the fundamental assumption that a string is a sequence of characters no longer holds. =20 in my proposal, going from 8-bit strings to unicode always works. a character is a character, no matter what string type you're using. however, going from unicode to an 8-bit string may given you an OverflowError, say: "OverflowError: unicode character too large to fit in a byte" the important thing here is that if you don't get an exception, the result is *always* a string. string methods always work. etc. [8. Special cases aren't special enough to break the rules.] > The key to these problems is using explicit conversions where > 8-bit strings meet Unicode objects. yeah, but the flaw in the current design is the implicit conversions, not the explicit ones. [2. Explicit is better than implicit.] (of course, the 8-bit string type also needs an "encode" method under my proposal, but that's just a detail ;-) > Some more ideas along the convenience path: >=20 > Perhaps changing just the way 8-bit strings are coerced > to Unicode would help: strings would then be interpreted > as Latin-1. ok. > str(Unicode) and "t" would still return UTF-8 to assure loss- > less conversion. maybe. or maybe str(Unicode) should return a unicode string? think about it! (after all, I'm pretty sure that ord() and chr() should do the right thing, also for character codes above 127) > Another way to tackle this would be to first try UTF-8 > conversion during auto-conversion and then fallback to > Latin-1 in case it fails. Has anyone tried this ? Guido > mentioned that TCL does something along these lines... haven't found any traces of that in the source code. hmm, you're right -- it looks like it attempts to "fix" invalid UTF-8 data (on a character by character basis), instead of choking on it. scary. [12. In the face of ambiguity, refuse the temptation to guess.] more tomorrow. From guido@python.org Tue Apr 25 23:35:30 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 18:35:30 -0400 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code In-Reply-To: Your message of "Tue, 25 Apr 2000 17:16:25 +0200." <00a401bfaec9$3aaae100$34aab5d4@hagrid> References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> Message-ID: <200004252235.SAA02554@eric.cnri.reston.va.us> [Fredrik] > -- my proposal: expose both types, but let them contain characters > from the same character set -- at least when used as strings. > > as before, 8-bit strings can be used to store binary data, so we > don't need a separate ByteArray type. in an 8-bit string, there's > always one character per byte. > > [imho: small changes to the existing code base, about as efficient as > can be, no attempt to second-guess the user, fully backwards com- > patible, fully compliant with the definition of strings in the language > reference, patches are available, etc...] Sorry, all this proposal does is change the default encoding on conversions from UTF-8 to Latin-1. That's very western-culture-centric. You already have control over the encoding: use unicode(s, "latin-1"). If there are places where you don't have enough control (e.g. file I/O), let's add control there. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Wed Apr 26 00:08:39 2000 From: guido@python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 19:08:39 -0400 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) Message-ID: <200004252308.TAA05717@eric.cnri.reston.va.us> The email below is a serious bug report. A quick analysis shows that UserString.count() calls the count() method on a string object, which calls PyArg_ParseTuple() with the format string "O|ii". The 'i' format code truncates integers. It probably should raise an overflow exception instead. But that would still cause the test to fail -- just in a different way (more explicit). Then the string methods should be fixed to use long ints instead -- and then something else would probably break... --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Mon, 24 Apr 2000 19:26:27 -0400 From: mark.favas@per.dem.csiro.au To: python-bugs-list@python.org cc: bugs-py@python.org Subject: [Python-bugs-list] 1.6a2 issues with int/long on 64bit platforms - eg stringobject (PR#306) Full_Name: Mark Favas Version: 1.6a2 CVS of 25 April OS: DEC Alpha, Tru64 Unix 4.0F Submission from: wa107.dialup.csiro.au (130.116.4.107) There seems to be issues (and perhaps lurking cans of worms) on 64-bit platforms where sizeof(long) != sizeof(int). For example, the CVS version of 1.6a2 of 25 April fails the UserString regression test. The tests fail as follows (verbose set to 1): abcabcabc.count(('abc',)) no 'abcabcabc' 3 <> 2 abcabcabc.count(('abc', 1)) no 'abcabcabc' 2 <> 1 abcdefghiabc.find(('abc', 1)) no 'abcdefghiabc' 9 < > - -1 abcdefghiabc.rfind(('abc',)) no 'abcdefghiabc' 9 <> 0 abcabcabc.rindex(('abc',)) no 'abcabcabc' 6 <> 3 abcabcabc.rindex(('abc', 1)) no 'abcabcabc' 6 <> 3 These tests are failing because the calls from the UserString methods to the underlying string methods are setting the default value of the end-of-string parameter to sys.maxint, which is defined as LONG_MAX (9223372036854775807), whereas the string methods in stringobject.c are using ints and expecting them to be no larger than INT_MAX (2147483647). Thus the end-of-string parameter becomes -1 in the default case. The size of an int on my platform is 4, and the size of a long is 8, so the "natural size of a Python integer" should be 8, by my understanding. The obvious fix is to change stringobject.c to use longs, rather than ints, but the problem might be more widespread than that. INT_MAX is used in unicodeobject.c, pypcre.c, _sre.c, stropmodule.c, and ceval.c as well as stringobject.c. Some of these look as though LONG_MAX should have been used (variables compared to INT_MAX are longs, but I am not confident enough to submit patches for them... Mark _______________________________________________ Python-bugs-list maillist - Python-bugs-list@python.org http://www.python.org/mailman/listinfo/python-bugs-list ------- End of Forwarded Message From pf@artcom-gmbh.de Wed Apr 26 08:34:09 2000 From: pf@artcom-gmbh.de (Peter Funk) Date: Wed, 26 Apr 2000 09:34:09 +0200 (MEST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.104,1.105 In-Reply-To: <200004252134.RAA02207@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 25, 2000 5:34:56 pm" Message-ID: Guido van Rossum: > Modified Files: > socketmodule.c [...] > *** 2526,2529 **** > --- 2526,2532 ---- > #ifdef MSG_DONTROUTE > insint(d, "MSG_DONTROUTE", MSG_DONTROUTE); > + #endif > + #ifdef MSG_DONTWAIT > + insint(d, "MSG_DONWAIT", MSG_DONTWAIT); -------------------------^^? Shouldn't this read "MSG_DONTWAIT"? ----------------------------^! Nitpicking, Peter From fredrik@pythonware.com Wed Apr 26 10:00:03 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 11:00:03 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> <200004252235.SAA02554@eric.cnri.reston.va.us> Message-ID: <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> > Sorry, all this proposal does is change the default encoding on > conversions from UTF-8 to Latin-1. That's very > western-culture-centric. That decision was made by ISO and the Unicode consortium, not me. I don't know why, and I don't really care -- I'm arguing that strings should contain characters, just like the language reference says, and that all characters should be from the same character repertoire and use the same character codes. From the user's perspective, that's the way it's done in Perl, Tcl, XML, Java, and Windows. But alright, I give up. I've wasted way too much time on this, my patches were rejected, and nobody seems to care. Not exactly inspiring. From just@letterror.com Wed Apr 26 13:04:08 2000 From: just@letterror.com (Just van Rossum) Date: Wed, 26 Apr 2000 13:04:08 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <8e6bsl$f1a$1@nnrp1.deja.com> References: <1256565470-46720619@hypernet.com> <6M1J4.662$rc9.209708544@newsb.telia.net> <8daop0$8fk$1@slb6.atl.mindspring.net> Message-ID: Fredrik Lundh replied to himself in c.l.py: >> as far as I can tell, it's supposed to be a feature. >> >> if you mix 8-bit strings with unicode strings, python 1.6a2 >> attempts to interpret the 8-bit string as an utf-8 encoded >> unicode string. >> >> but yes, I also think it's a bug. but this far, my attempts >> to get someone else to fix it has failed. might have to do >> it myself... ;-) > >postscript: the powers-that-be has decided that this is not >a bug. if you thought that strings were just sequences of >characters, just as in Perl and Tcl, you're in for one big >surprise in Python 1.6... I just read the last few posts of the powers-that-be-list on this subject (Thanks to Christian for pointing out the archives in c.l.py ;-), and I must say I completely agree with Fredrik. The current situation sucks. A string should always be a sequence of characters. A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". An 8-bit string should never be assumed to be utf-8 because of that distinction. (The default encoding for the builtin unicode() function may be another story.) Just From mal@lemburg.com Wed Apr 26 13:03:36 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 26 Apr 2000 14:03:36 +0200 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) References: <200004252308.TAA05717@eric.cnri.reston.va.us> Message-ID: <3906DB18.CB76EEC0@lemburg.com> Guido van Rossum wrote: > > The email below is a serious bug report. A quick analysis shows that > UserString.count() calls the count() method on a string object, which > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > format code truncates integers. It probably should raise an overflow > exception instead. But that would still cause the test to fail -- > just in a different way (more explicit). Then the string methods > should be fixed to use long ints instead -- and then something else > would probably break... All uses in stringobject.c and unicodeobject.c use INT_MAX together with integers, so there's no problem on that side of the fence ;-) Since strings and Unicode objects use integers to describe the length of the object (as well as most if not all other builtin sequence types), the correct default value should thus be something like sys.maxlen which then gets set to INT_MAX. I'd suggest adding sys.maxlen and the modifying UserString.py, re.py and sre_parse.py accordingly. > --Guido van Rossum (home page: http://www.python.org/~guido/) > > ------- Forwarded Message > > Date: Mon, 24 Apr 2000 19:26:27 -0400 > From: mark.favas@per.dem.csiro.au > To: python-bugs-list@python.org > cc: bugs-py@python.org > Subject: [Python-bugs-list] 1.6a2 issues with int/long on 64bit platforms - eg > stringobject (PR#306) > > Full_Name: Mark Favas > Version: 1.6a2 CVS of 25 April > OS: DEC Alpha, Tru64 Unix 4.0F > Submission from: wa107.dialup.csiro.au (130.116.4.107) > > There seems to be issues (and perhaps lurking cans of worms) on 64-bit > platforms > where sizeof(long) != sizeof(int). > > For example, the CVS version of 1.6a2 of 25 April fails the UserString > regression test. The tests fail as follows (verbose set to 1): > > abcabcabc.count(('abc',)) no > 'abcabcabc' 3 <> > 2 > abcabcabc.count(('abc', 1)) no > 'abcabcabc' 2 <> > 1 > abcdefghiabc.find(('abc', 1)) no > 'abcdefghiabc' 9 < > > > - -1 > abcdefghiabc.rfind(('abc',)) no > 'abcdefghiabc' 9 > <> 0 > abcabcabc.rindex(('abc',)) no > 'abcabcabc' 6 <> > 3 > abcabcabc.rindex(('abc', 1)) no > 'abcabcabc' 6 <> > 3 > > These tests are failing because the calls from the UserString methods to the > underlying string methods are setting the default value of the end-of-string > parameter to sys.maxint, which is defined as LONG_MAX (9223372036854775807), > whereas the string methods in stringobject.c are using ints and expecting them > to be no larger than INT_MAX (2147483647). > Thus the end-of-string parameter becomes -1 in the default case. The size of an > int on my platform is 4, and the size of a long is 8, so the "natural size of > a Python integer" should be 8, by my understanding. The obvious fix is to > change > stringobject.c to use longs, rather than ints, but the problem might be more > widespread than that. INT_MAX is used in unicodeobject.c, pypcre.c, _sre.c, > stropmodule.c, and ceval.c as well as stringobject.c. Some of these look as > though LONG_MAX should have been used (variables compared to INT_MAX are longs, > but I am not confident enough to submit patches for them... > > Mark > > _______________________________________________ > Python-bugs-list maillist - Python-bugs-list@python.org > http://www.python.org/mailman/listinfo/python-bugs-list > > ------- End of Forwarded Message > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein@lyra.org Wed Apr 26 14:00:21 2000 From: gstein@lyra.org (Greg Stein) Date: Wed, 26 Apr 2000 06:00:21 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <000701bfae7f$1174a540$152d153f@tim> Message-ID: On Tue, 25 Apr 2000, Tim Peters wrote: > [Greg Stein] > > ... > > Many people have asked for free-threading, and the number of inquiries > > that I receive have grown over time. (nobody asked in 1996 when I first > > published my patches; I get a query every couple months now) > > Huh! That means people ask me about it more often than they ask you . > > I'll add, though, that you have to dig into the inquiry: almost everyone > who asks me is running on a uniprocessor machine, and are really after one > of two other things: > > 1. They expect threaded stuff to run faster if free-threaded. "Why?" is > a question I can't answer <0.5 wink>. Heh. Yes, I definitely see this one. But there are some clueful people out there, too, so I'm not totally discouraged :-) > 2. Dealing with the global lock drives them insane, especially when trying > to call back into Python from a "foreign" C thread. > > #2 may be fixable via less radical means (like a streamlined procedure > enabled by some relatively minor core interpreter changes, and clearer > docs). No doubt. I was rather upset with Guido's "Swap" API for the thread state. Grr. I sent him a very nice (IMO) API that I used for my patches. The Swap was simply a poor choice on his part. It implies that you are swapping a thread state for another (specifically: the "current" thread state). Of course, that is wholly inappropriate in a free-threading environment. All those calls to _Swap() will be overhead in an FT world. I liked my "PyThreadState *PyThreadState_Ensure()" function. It would create the sucker if it didn't exist, then return *this* thread's state to you. Handy as hell. No monkeying around with "Get. oops. didn't exist. let's create one now." > I'm still a fan of free-threading! It's just one of those things that may > yield a "well, ya, that's what I asked for, but turns out it's not what I > *wanted*" outcome as often as not. hehe. Damn straight. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From just@letterror.com Wed Apr 26 15:13:13 2000 From: just@letterror.com (Just van Rossum) Date: Wed, 26 Apr 2000 15:13:13 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: I wrote: >A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". Another way of putting this is: - utf-8 in an 8-bit string is to a unicode string what a pickle is to an object. - defaulting to utf-8 upon coercing is like implicitly trying to unpickle an 8-bit string when comparing it to an instance. Bad idea. Defaulting to Latin-1 is the only logical choice, no matter how western-culture-centric this may seem. Just From mal@lemburg.com Wed Apr 26 19:01:48 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Wed, 26 Apr 2000 20:01:48 +0200 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) References: Message-ID: <39072F0C.5214E339@lemburg.com> Just van Rossum wrote: > > I wrote: > >A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". > > Another way of putting this is: > - utf-8 in an 8-bit string is to a unicode string what a pickle is to an > object. > - defaulting to utf-8 upon coercing is like implicitly trying to unpickle > an 8-bit string when comparing it to an instance. Bad idea. > > Defaulting to Latin-1 is the only logical choice, no matter how > western-culture-centric this may seem. Please note that the support for mixing strings and Unicode objects is really only there to aid porting applications to Unicode. New code should use Unicode directly and apply all needed conversions explicitly using one of the many ways to encode or decode Unicode data. The auto-conversions are only there to help out and provide some convenience. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido@python.org Wed Apr 26 19:51:56 2000 From: guido@python.org (Guido van Rossum) Date: Wed, 26 Apr 2000 14:51:56 -0400 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: Your message of "Wed, 26 Apr 2000 14:03:36 +0200." <3906DB18.CB76EEC0@lemburg.com> References: <200004252308.TAA05717@eric.cnri.reston.va.us> <3906DB18.CB76EEC0@lemburg.com> Message-ID: <200004261851.OAA06794@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > > > The email below is a serious bug report. A quick analysis shows that > > UserString.count() calls the count() method on a string object, which > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > format code truncates integers. It probably should raise an overflow > > exception instead. But that would still cause the test to fail -- > > just in a different way (more explicit). Then the string methods > > should be fixed to use long ints instead -- and then something else > > would probably break... > > All uses in stringobject.c and unicodeobject.c use INT_MAX > together with integers, so there's no problem on that side > of the fence ;-) > > Since strings and Unicode objects use integers to describe the > length of the object (as well as most if not all other > builtin sequence types), the correct default value should > thus be something like sys.maxlen which then gets set to > INT_MAX. > > I'd suggest adding sys.maxlen and the modifying UserString.py, > re.py and sre_parse.py accordingly. Hm, I'm not so sure. It would be much better if passing sys.maxint would just WORK... Since that's what people have been doing so far. --Guido van Rossum (home page: http://www.python.org/~guido/) From nascheme@enme.ucalgary.ca Wed Apr 26 20:06:51 2000 From: nascheme@enme.ucalgary.ca (Neil Schemenauer) Date: Wed, 26 Apr 2000 13:06:51 -0600 Subject: [Python-Dev] L1 data cache profile for Python 1.5.2 and 1.6 Message-ID: <20000426130651.C23227@acs.ucalgary.ca> Using this tool: http://www.cacheprof.org/ I got this output: http://www.enme.ucalgary.ca/~nascheme/python/cache.out http://www.enme.ucalgary.ca/~nascheme/python/cache-152.out The cache miss rate for eval_code2 is about two times larger in 1.6. The overall miss rate is about the same. Is this significant? I suspect that the instruction cache is more important for eval_code2. Unfortunately cacheprof can only profile the L1 data cache. Perhaps someone will find this data useful or interesting. Neil From tismer@tismer.com Wed Apr 26 22:24:39 2000 From: tismer@tismer.com (Christian Tismer) Date: Wed, 26 Apr 2000 23:24:39 +0200 Subject: [Fwd: [Python-Dev] Where the speed is lost! (was: 1.6 speed)] Message-ID: <39075E97.23DBDD63@tismer.com> This is a multi-part message in MIME format. --------------68440C5CBBDDB83FA1ADCF41 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I forgot to cc python-dev. This file is closed for me. the sun is shining again, life is so wonderful and now for something completely different - chris --------------68440C5CBBDDB83FA1ADCF41 Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline X-Mozilla-Status2: 00000000 Message-ID: <39075D58.C549938E@tismer.com> Date: Wed, 26 Apr 2000 23:19:20 +0200 From: Christian Tismer X-Mailer: Mozilla 4.72 [en] (Windows NT 5.0; I) X-Accept-Language: en MIME-Version: 1.0 To: jeremy@cnri.reston.va.us CC: Guido van Rossum , Neil Schemenauer , Mark Hammond , "M.-A. Lemburg" Subject: Re: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <3905EEB4.4153A845@tismer.com> <14598.9873.769055.198345@goon.cnri.reston.va.us> <39074295.FA136113@tismer.com> <14599.17827.23033.266024@goon.cnri.reston.va.us> <3907498B.C596C495@tismer.com> <14599.20985.493264.876095@goon.cnri.reston.va.us> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi Friends, here my end of the long story. Full stop. Now everything fits together. Jeremy Hylton wrote: ... > Unfortunately, this doesn't explain the overall performance change. > This case only shows up on pybench and our nano-benchmark. There is > no instance creation to speak of in pystone. It may be, however, that > some other part of Python is raising and clearing expensive > exceptions. Wrong! This solves nearly everything. Here is Python 1.6's pystone: bash-2.02# python d:/python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 2.08081 This machine benchmarks at 4805.81 pystones/second bash-2.02# pwd //d/python16 Here is Python 1.5.2 pre-unicode plus Stackless Patches: bash-2.02# python d:/python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.79462 This machine benchmarks at 5572.2 pystones/second bash-2.02# pwd //d/python/spc And here, Python 1.6 plus Patch plus Stackless: bash-2.02# python d:/python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.83961 This machine benchmarks at 5435.94 pystones/second Now stay tuned: I rename tupleobject.c to dupleobject.c and stringobject.c to dstringobject.c, save the project, close, open it and rebuild all. *SUPRISE SURPRISE* bash-2.02# python d:/python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.77519 This machine benchmarks at 5633.19 pystones/second bash-2.02# pwd //d/python/spc/Python-slp/PCbuild Summary: We had two effects here. Effect 1: Wasting time with extra errors in instance creation. Effect 2: Loss of locality due to code size increase. Solution to 1 is Jeremy's patch. Solution to 2 could be a little renaming of the one or the other module, in order to get the default link order to support locality better. Now everything is clear to me. My first attempts with reordering could not reveal the loss with the instance stuff. All together, Python 1.6 is a bit faster than 1.5.2 if we try to get related code ordered better. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com --------------68440C5CBBDDB83FA1ADCF41-- From Fredrik Lundh" <39072F0C.5214E339@lemburg.com> Message-ID: <002f01bfafc6$779804a0$34aab5d4@hagrid> (forwarded from c.l.py, on request) > New code should use Unicode directly and apply all needed > conversions explicitly using one of the many ways to > encode or decode Unicode data. The auto-conversions are > only there to help out and provide some convenience. does this mean that the 8-bit string type is deprecated ??? From Fredrik Lundh" >>> filename =3D u"gr=F6t" >>> file =3D open(filename, "w") >>> file.close() >>> import glob >>> print glob.glob("gr*") ['gr\303\266t'] >>> print glob.glob(u"gr*") [u'gr\366t'] >>> import os >>> os.system("dir gr*") ... GR=C7=F4T 0 01-02-03 12.34 gr=C7=F4t 1 fil(es) 0 byte 0 dir 12 345 678 byte free hmm. From mhammond@skippinet.com.au Thu Apr 27 01:08:23 2000 From: mhammond@skippinet.com.au (Mark Hammond) Date: Thu, 27 Apr 2000 10:08:23 +1000 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <002f01bfafc6$779804a0$34aab5d4@hagrid> Message-ID: It is necessary for us to also have this scrag-fight in public? Most of the thread on c.l.py is filled in by people who are also py-dev members! [MAL writes] > Please note that the support for mixing strings and Unicode > objects is really only there to aid porting applications > to Unicode. > > New code should use Unicode directly and apply all needed > conversions explicitly using one of the many ways to > encode or decode Unicode data. This will _never_ happen. The Python programmer should never need to be aware they have a Unicode string versus a standard string - just a "string"! The fact there are 2 string types should be considered an implementation detail, and not a conceptual model for people to work within. I think we will be mixing Unicode and strings for ever! The only way to avoid it would be a unified type - possibly Py3k. Until then, people will still generally use strings as literals in their code, and should not even be aware they are mixing. Im never going to prefix my ascii-only strings with u"" just to avoid the possibility of mixing! Listening to the arguments, Ive got to say Im coming down squarely on the side of Fredrik and Just. strings must be sequences of characters, whose length is the number of characters. A string holding an encoding should be considered logically a byte array, and conversions should be explicit. > The auto-conversions are only there to help out and provide some convenience. Doesn't sound like it is working :-( Mark. From akuchlin@mems-exchange.org Thu Apr 27 02:45:37 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Wed, 26 Apr 2000 21:45:37 -0400 (EDT) Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug In-Reply-To: References: <002f01bfafc6$779804a0$34aab5d4@hagrid> Message-ID: <14599.39873.159386.778558@newcnri.cnri.reston.va.us> Mark Hammond writes: >It is necessary for us to also have this scrag-fight in public? >Most of the thread on c.l.py is filled in by people who are also >py-dev members! Attempting to walk a delicate line here, my reading of the situation is that Fredrik's frustration level is increaing as he points out problems, but nothing much is done about them. Marc-Andre will usually respond, but there's been no indication from Guido about what to do. But GvR might be waiting to hear from more users about their experience with Unicode; so far I don't know if anyone has much experience with the new code. But why not have it in public? The python-dev archives are publicly available anyway, so it's not like this discussion was going on behind closed doors. The problem with discussing this on c.l.py is that not everyone reads c.l.py any more due to volume. --amk From paul@prescod.net Thu Apr 27 02:47:41 2000 From: paul@prescod.net (Paul Prescod) Date: Wed, 26 Apr 2000 20:47:41 -0500 Subject: [Python-Dev] Python Unicode References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> <200004252235.SAA02554@eric.cnri.reston.va.us> <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> Message-ID: <39079C3D.4000C74C@prescod.net> Fredrik Lundh wrote: > > ... > > But alright, I give up. I've wasted way too much time on this, my > patches were rejected, and nobody seems to care. Not exactly > inspiring. I can understand how frustrating this is. Sometimes something seems just so clean and mathematically obvious that you can't see why others don't see it that way. A character is the "smallest unit of text." Strings are lists of characters. Characters in character sets have numbers. Python users should never know or care whether a string object is an 8-bit string or a Unicode string. There should be no distinction. u"" should be a syntactic shortcut. The primary reason I have not been involved is that I have not had a chance to look at the implementation and figure out if there is an overriding implementation-based reason to ignore the obvious right thing (e.g the right thing will break too much code or be too slow or...). "Unicode objects" should be an implementation detail (if they exist at all). Strings are strings are strings. The Python programmer shouldn't care about whether one string was read from a Unicode file and another from an ASCII file and one typed in with "u" and one without. It's all the same thing! If the programmer wants to do an explicit UTF-8 decode on a string (whether it is Unicode or 8-bit string...no difference) then that decode should proceed by looking at each character, deriving an integer and then treating that integer as an octet according to the UTF-8 specification. Char -> Integer -> Byte -> Char The end result (and hopefully the performance) would be the same but the model is much, much cleaner if there is only one kind of string. We should not ignore the example set by every other language (and yes, I'm including XML here :) ). I'm as desperate (if not as vocal) as Fredrick is here. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From gmcm@hypernet.com Thu Apr 27 03:13:00 2000 From: gmcm@hypernet.com (Gordon McMillan) Date: Wed, 26 Apr 2000 22:13:00 -0400 Subject: [Python-Dev] Python Unicode In-Reply-To: <39079C3D.4000C74C@prescod.net> Message-ID: <1255320912-64004084@hypernet.com> I haven't weighed in on this one, mainly because I don't even need ISO-1, let alone Unicode, (and damned proud of it, too!). But Fredrik's glob example was horrifying. I do know that I am always concious of whether a particular string is a sequence of characters, or a sequence of bytes. Seems to me the Py3K answer is to make those separate types. Until then, I guess I'll just remain completely xenophobic (and damned proud of it, too!). - Gordon From tim_one@email.msn.com Thu Apr 27 03:27:47 2000 From: tim_one@email.msn.com (Tim Peters) Date: Wed, 26 Apr 2000 22:27:47 -0400 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: Message-ID: <000101bfaff0$2d5f5ee0$272d153f@tim> [Just van Rossum] > ... > Defaulting to Latin-1 is the only logical choice, no matter how > western-culture-centric this may seem. Indeed, if someone from an inferior culture wants to chime in, let them find Python-Dev with their own beady little eyes . western-culture-is-better-than-none-&-at-least-*we*-understand-it-ly y'rs - tim From tim_one@email.msn.com Thu Apr 27 05:39:21 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 27 Apr 2000 00:39:21 -0400 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code In-Reply-To: <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> Message-ID: <000001bfb002$8f720260$0d2d153f@tim> [/F] > ... > But alright, I give up. I've wasted way too much time on this, my > patches were rejected, and nobody seems to care. Not exactly > inspiring. I lost track of this stuff months ago, and since I use only 7-bit ASCII in my own source code and file names and etc etc, UTF-8 and Latin-1 are identical to me <0.5 wink>. [Guido] > Sorry, all this proposal does is change the default encoding on > conversions from UTF-8 to Latin-1. That's very > western-culture-centric. Well, if you talk with an Asian, they'll probably tell you that Unicode itself is Eurocentric, and especially UTF-8 (UTF-7 introduces less bloat for non-Latin-1 Unicode characters). Most everyone likes their own national gimmicks best. Or, as Andy once said (paraphrasing), the virtue of UTF-8 is that it annoys everyone. I do expect that the vase bulk of users would be less surprised if Latin-1 *were* the default encoding. Then the default would be usable as-is for many more people; UTF-8 is usable as-is only for me (i.e., 7-bit Americans). The non-Euros are in for a world of pain no matter what. just-because-some-groups-can't-win-doesn't-mean-everyone-must- lose-ly y'rs - tim From just@letterror.com Thu Apr 27 06:42:43 2000 From: just@letterror.com (Just van Rossum) Date: Thu, 27 Apr 2000 06:42:43 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <000101bfaff0$2d5f5ee0$272d153f@tim> References: Message-ID: At 10:27 PM -0400 26-04-2000, Tim Peters wrote: >Indeed, if someone from an inferior culture wants to chime in, let them find >Python-Dev with their own beady little eyes . All irony aside, I think you've nailed one of the problems spot on: - most core Python developers seem to be too busy to read *anything* at all in c.l.py - most people that care about the issues are not on python-dev Just From tim_one@email.msn.com Thu Apr 27 06:08:11 2000 From: tim_one@email.msn.com (Tim Peters) Date: Thu, 27 Apr 2000 01:08:11 -0400 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparingstrings and ints) In-Reply-To: Message-ID: <000101bfb006$95962280$0d2d153f@tim> [Just van Rossum] > All irony aside, I think you've nailed one of the problems spot on: > - most core Python developers seem to be too busy to read > *anything* at all in c.l.py > - most people that care about the issues are not on python-dev But they're not on c.l.py either, are they? I still read everything there, although that's gotten so time-consuming I rarely reply anymore. In any case, I've seen almost nothing useful about Unicode issues on c.l.py that wasn't also on Python-Dev; perhaps I missed something. ask-10-more-people-&-you'll-get-20-more-opinions-ly y'rs - tim From alisa@robanal.demon.co.uk Thu Apr 27 11:29:54 2000 From: alisa@robanal.demon.co.uk (Alisa Pasic Robinson) Date: Thu, 27 Apr 2000 10:29:54 GMT Subject: [Python-Dev] Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: <39080ddd.9837445@post.demon.co.uk> >I wrote: >>A utf-8-encoded 8-bit string in Python is *not* a string, but a = "ByteArray". > >Another way of putting this is: >- utf-8 in an 8-bit string is to a unicode string what a pickle is to an >object. >- defaulting to utf-8 upon coercing is like implicitly trying to = unpickle >an 8-bit string when comparing it to an instance. Bad idea. > >Defaulting to Latin-1 is the only logical choice, no matter how >western-culture-centric this may seem. > >Just The Van Rossum Common Sense gene strikes again! You guys owe it to the world to have lots of children. I agree 100%. Let me also add that if you want to do encoding work that goes beyond what the library gives you, you absolutely need a 'byte array' type which makes no assumptions and does nothing magic to its content. I have always thought of 8-bit strings as 'byte arrays' and not 'characer arrays', and doing anything magic to them in literals or standard input is going to cause lots of trouble. I think our proposal is BETTER than Java, Tcl, Visual Basic etc for the following reasons: - you can work with old fashioned strings, which are understood by everyone to be arrays of bytes, and there is no magic conversion going on. The bytes in literal strings in your script file are the bytes that end up in the program. - you can work with Unicode strings if you want - you are in explicit control of conversions between them - both types have similar methods so there isn't much to learn or remember The 'no magic' thing is very important with Japanese, where very=20 often you need to roll your own codecs and look at the raw bytes;=20 any auto-conversion might not go through the filter you want and you've already lost information before you started. Especially If your job is to repair possibly corrupt data. Any company with a few extra custom characters in the user-defined Shift-JIS range is going to suddenly find their Perl scripts are failing or trashing all their data as a result of the UTF-8 decision. I'm also convinced that the majority of Python scripts won't need to work in Unicode. Even working with exotic languages, there is always a native 8-bit encoding. I have only used Unicode when=20 (a) working with data that is in several languages (b) doing conversions, which requires a 'central point' (b) wanting to do per-character operations safely on multi-byte data I still haven't sorted out in my head whether the default encoding thing is a big red herring or is important; I already have a safe way to construct Unicode literals in my source files if I want to using unicode('rawdata','myencoding'). =20 But if there has to be one I'd say the following: - strict ASCII is an option - Latin-1 is the more generous option that is right for the most people, and has a 'special status' among 8-bit encodings - UTF-8 is not one byte per character and will confuse people Just my 2p worth, Andy From mal@lemburg.com Thu Apr 27 12:23:23 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 27 Apr 2000 13:23:23 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <000001bfb002$8f720260$0d2d153f@tim> Message-ID: <3908232B.C2668122@lemburg.com> Tim Peters wrote: > > [Guido about going Latin-1] > > Sorry, all this proposal does is change the default encoding on > > conversions from UTF-8 to Latin-1. That's very > > western-culture-centric. > > Well, if you talk with an Asian, they'll probably tell you that Unicode > itself is Eurocentric, and especially UTF-8 (UTF-7 introduces less bloat for > non-Latin-1 Unicode characters). Most everyone likes their own national > gimmicks best. Or, as Andy once said (paraphrasing), the virtue of UTF-8 is > that it annoys everyone. > > I do expect that the vase bulk of users would be less surprised if Latin-1 > *were* the default encoding. Then the default would be usable as-is for > many more people; UTF-8 is usable as-is only for me (i.e., 7-bit Americans). > The non-Euros are in for a world of pain no matter what. > > just-because-some-groups-can't-win-doesn't-mean-everyone-must- > lose-ly y'rs - tim People tend to forget that UTF-8 is a loss-less Unicode encoding while Latin-1 reduces Unicode to its lower 8 bits: conversion from non-Latin-1 Unicode to strings would simply not work, conversion from non-Latin-1 strings to Unicode would only be possible via unicode(). Thus mixing Unicode and strings would then run perfectly in all western countries using Latin-1 while the rest of the world would need to convert all their strings to Unicode... giving them an advantage over the western world we couldn't possibly accept ;-) FYI, here's a summary of which conversions take place (going Latin-1 would disable most of the Unicode integration in favour of conversion errors): Python: ------- string + unicode: unicode(string,'utf-8') + unicode string.method(unicode): unicode(string,'utf-8').method(unicode) print unicode: print unicode.encode('utf-8'); with stdout redirection this can be changed to any other encoding str(unicode): unicode.encode('utf-8') repr(unicode): repr(unicode.encode('unicode-escape')) C (PyArg_ParserTuple): ---------------------- "s" + unicode: same as "s" + unicode.encode('utf-8') "s#" + unicode: same as "s#" + unicode.encode('unicode-internal') "t" + unicode: same as "t" + unicode.encode('utf-8') "t#" + unicode: same as "t#" + unicode.encode('utf-8') This effects all C modules and builtins. In case a C module wants to receive a certain predefined encoding, it can use the new "es" and "es#" parser markers. Ways to enter Unicode: ---------------------- u'' + string same as unicode(string,'utf-8') unicode(string,encname) any supported encoding u'...unicode-escape...' unicode-escape currently accepts Latin-1 chars as single-char input; using escape sequences any Unicode char can be entered (*) codecs.open(filename,mode,encname) opens an encoded file for reading and writing Unicode directly raw_input() + stdin redirection (see one of my earlier posts for code) returns UTF-8 strings based on the input encoding Hmm, perhaps a codecs.raw_input(encname) which returns Unicode directly wouldn't be a bad idea either ?! (*) This should probably be changed to be source code encoding dependent, so that u"...data..." matches "...data..." in appearance in the Python source code (see below). IO: --- open(file,'w').write(unicode) same as open(file,'w').write(unicode.encode('utf-8')) open(file,'wb').write(unicode) same as open(file,'wb').write(unicode.encode('unicode-internal')) codecs.open(file,'wb',encname).write(unicode) same as open(file,'wb').write(unicode.encode(encname)) codecs.open(file,'rb',encname).read() same as unicode(open(file,'rb').read(),encname) stdin + stdout can be redirected using StreamRecoders to handle any of the supported encodings The Python parser should probably also be extended to read encoded Python source code using some hint at the start of the source file (perhaps only allowing a small subset of the supported encodings, e.g. ASCII, Latin-1, UTF-8 and UTF-16). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal@lemburg.com Thu Apr 27 11:27:18 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Thu, 27 Apr 2000 12:27:18 +0200 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> Message-ID: <39081606.2932F5FD@lemburg.com> Fredrik Lundh wrote: > > >>> filename = u"gröt" > > >>> file = open(filename, "w") > >>> file.close() > > >>> import glob > >>> print glob.glob("gr*") > ['gr\303\266t'] > > >>> print glob.glob(u"gr*") > [u'gr\366t'] > > >>> import os > >>> os.system("dir gr*") > ... > GRÇôT 0 01-02-03 12.34 grÇôt > 1 fil(es) 0 byte > 0 dir 12 345 678 byte free > > hmm. Where is the problem ? If you pass the output of glob() to open() you'll get the same file in both cases... even better, you can now even use Chinese in your filenames without the OS having to support Unicode filenames :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik@pythonware.com Thu Apr 27 12:49:07 2000 From: fredrik@pythonware.com (Fredrik Lundh) Date: Thu, 27 Apr 2000 13:49:07 +0200 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> Message-ID: <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> > Fredrik Lundh wrote: > >=20 > > >>> filename =3D u"gr=F6t" > >=20 > > >>> file =3D open(filename, "w") > > >>> file.close() > >=20 > > >>> import glob > > >>> print glob.glob("gr*") > > ['gr\303\266t'] > >=20 > > >>> print glob.glob(u"gr*") > > [u'gr\366t'] > >=20 > > >>> import os > > >>> os.system("dir gr*") > > ... > > GR=C7=F4T 0 01-02-03 12.34 gr=C7=F4t > > 1 fil(es) 0 byte > > 0 dir 12 345 678 byte free > >=20 > > hmm. >=20 > Where is the problem ? I'm speechless. From akuchlin@mems-exchange.org Thu Apr 27 13:00:18 2000 From: akuchlin@mems-exchange.org (Andrew Kuchling) Date: Thu, 27 Apr 2000 08:00:18 -0400 (EDT) Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> Message-ID: <14600.11218.24960.705642@newcnri.cnri.reston.va.us> Fredrik Lundh writes: >M.A. Lemburg wrote: >> Where is the problem ? >I'm speechless. Ummm... since I'm not sure how open() currently reacts to being passed a Unicode file or if there's something special in open() for Windows, and don't know how you think it should react (an exception? fold to UTF-8? fold to Latin1?), I don't see what the particular problem is either. For the sake of people who haven't followed this debate closely, or who were busy during the earlier lengthy threads and simply deleted most of the messages, please try to be explicit. Ilya Zakharevich on the perl5-porters mailing list often employs the "This code is buggy and if you're too clueless to see how it's broken *I* certainly won't go explaining it to you" strategy, to devastatingly divisive effect, and with little effectiveness in getting the bugs fixed. Let's not go down that road. --amk From guido@python.org Thu Apr 27 16:01:48 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 27 Apr 2000 11:01:48 -0400 Subject: [Python-Dev] Unicode debate In-Reply-To: Your message of "Thu, 27 Apr 2000 06:42:43 BST." References: Message-ID: <200004271501.LAA13535@eric.cnri.reston.va.us> I'd like to reset this discussion. I don't think we need to involve c.l.py yet -- I haven't seen anyone with Asian language experience chime in there, and that's where this matters most. I am directing this to the Python i18n-sig mailing list, because that's where the debate belongs, and there interested parties can join the discussion without having to be vetted as "fit for python-dev" first. I apologize for having been less than responsive in the matter; unfortunately there's lots of other stuff on my mind right now that has recently had a tendency to distract me with higher priority crises. I've heard a few people claim that strings should always be considered to contain "characters" and that there should be one character per string element. I've also heard a clamoring that there should only be one string type. You folks have never used Asian encodings. In countries like Japan, China and Korea, encodings are a fact of life, and the most popular encodings are ASCII supersets that use a variable number of bytes per character, just like UTF-8. Each country or language uses different encodings, even though their characters look mostly the same to western eyes. UTF-8 and Unicode is having a hard time getting adopted in these countries because most software that people use deals only with the local encodings. (Sounds familiar?) These encodings are much less "pure" than UTF-8, because they only encode the local characters (and ASCII), and because of various problems with slicing: if you look "in the middle" of an encoded string or file, you may not know how to interpret the bytes you see. There are overlaps (in most of these encodings anyway) between the codes used for single-byte and double-byte encodings, and you may have to look back one or more characters to know what to make of the particular byte you see. To get an idea of the nightmares that non-UTF-8 multibyte encodings give C/C++ programmers, see the Multibyte Character Set (MBCS) Survival Guide (http://msdn.microsoft.com/library/backgrnd/html/msdn_mbcssg.htm). See also the home page of the i18n-sig for more background information on encoding (and other i18n) issues (http://www.python.org/sigs/i18n-sig/). UTF-8 attempts to solve some of these problems: the multi-byte encodings are chosen such that you can tell by the high bits of each byte whether it is (1) a single-byte (ASCII) character (top bit off), (2) the start of a multi-byte character (at least two top bits on; how many indicates the total number of bytes comprising the character), or (3) a continuation byte in a multi-byte character (top bit on, next bit off). Many of the problems with non-UTF-8 multibyte encodings are the same as for UTF-8 though: #bytes != #characters, a byte may not be a valid character, regular expression patterns using "." may give the wrong results, and so on. The truth of the matter is: the encoding of string objects is in the mind of the programmer. When I read a GIF file into a string object, the encoding is "binary goop". When I read a line of Japanese text from a file, the encoding may be JIS, shift-JIS, or ENC -- this has to be an assumption built-in to my program, or perhaps information supplied separately (there's no easy way to guess based on the actual data). When I type a string literal using Latin-1 characters, the encoding is Latin-1. When I use octal escapes in a string literal, e.g. '\303\247', the encoding could be UTF-8 (this is a cedilla). When I type a 7-bit string literal, the encoding is ASCII. The moral of all this? 8-bit strings are not going away. They are not encoded in UTF-8 henceforth. Like before, and like 8-bit text files, they are encoded in whatever encoding you want. All you get is an extra mechanism to convert them to Unicode, and the Unicode conversion defaults to UTF-8 because it is the only conversion that is reversible. And, as Tim Peters quoted Andy Robinson (paraphrasing Tim's paraphrase), UTF-8 annoys everyone equally. Where does the current approach require work? - We need a way to indicate the encoding of Python source code. (Probably a "magic comment".) - We need a way to indicate the encoding of input and output data files, and we need shortcuts to set the encoding of stdin, stdout and stderr (and maybe all files opened without an explicit encoding). Marc-Andre showed some sample code, but I believe it is still cumbersome. (I have to play with it more to see how it could be improved.) - We need to discuss whether there should be a way to change the default conversion between Unicode and 8-bit strings (currently hardcoded to UTF-8), in order to make life easier for people who want to continue to use their favorite 8-bit encoding (e.g. Latin-1, or shift-JIS) but who also want to make use of the new Unicode datatype. We're still in alpha, so we can still fix things. --Guido van Rossum (home page: http://www.python.org/~guido/) From paul@prescod.net Thu Apr 27 16:01:00 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 10:01:00 -0500 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> <14600.11218.24960.705642@newcnri.cnri.reston.va.us> Message-ID: <3908562C.C2A2E1BC@prescod.net> You're asking the file system to "find you a filename". Depending on how you ask, you get two different file names for the same file. They are "==" equal (I think) but are of different length. I agree with /F that it's a little strange. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From guido@python.org Thu Apr 27 16:23:50 2000 From: guido@python.org (Guido van Rossum) Date: Thu, 27 Apr 2000 11:23:50 -0400 Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: Your message of "Wed, 26 Apr 2000 23:45:40 +0200." <004501bfafc8$c51d1240$34aab5d4@hagrid> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> Message-ID: <200004271523.LAA13614@eric.cnri.reston.va.us> > >>> filename = u"gröt" > > >>> file = open(filename, "w") > >>> file.close() > > >>> import glob > >>> print glob.glob("gr*") > ['gr\303\266t'] > > >>> print glob.glob(u"gr*") > [u'gr\366t'] > > >>> import os > >>> os.system("dir gr*") > ... > GRÇôT 0 01-02-03 12.34 grÇôt > 1 fil(es) 0 byte > 0 dir 12 345 678 byte free > > hmm. I presume that Fredrik's gripe is that the filename has been converted to UTF-8, while the encoding used by Windows to display his directory listing is Latin-1. (Not Microsoft's own 8-bit character set???) I'd like to solve this problem, but I have some questions: what *IS* the encoding used for filenames on Windows? This may differ per Windows version; perhaps it can differ drive letter? Or per application or per thread? On Windows NT, filenames are supposed to be Unicode. (I suppose also on Windowns 2000?) How do I open a file with a given Unicode string for its name, in a C program? I suppose there's a Win32 API call for that which has a Unicode variant. On Windows 95/98, the Unicode variants of the Win32 API calls don't exist. So what is the poor Python runtime to do there? Can Japanese people use Japanese characters in filenames on Windows 95/98? Let's assume they can. Since the filesystem isn't Unicode aware, the filenames must be encoded. Which encoding is used? Let's assume they use Microsoft's multibyte encoding. If they put such a file on a floppy and ship it to Linköping, what will Fredrik see as the filename? (I.e., is the encoding fixed by the disk volume, or by the operating system?) Once we have a few answers here, we can solve the problem. Note that sometimes we'll have to refuse a Unicode filename because there's no mapping for some of the characters it contains in the filename encoding used. Question: how does Fredrik create a file with a Euro character (u'\u20ac') in its name? --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn@worldonline.dk Thu Apr 27 17:21:20 2000 From: bckfnn@worldonline.dk (Finn Bock) Date: Thu, 27 Apr 2000 16:21:20 GMT Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: <200004271523.LAA13614@eric.cnri.reston.va.us> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <200004271523.LAA13614@eric.cnri.reston.va.us> Message-ID: <3908679a.16700013@smtp.worldonline.dk> On Thu, 27 Apr 2000 11:23:50 -0400, you wrote: >> >>> filename = u"gröt" >> >> >>> file = open(filename, "w") >> >>> file.close() >> >> >>> import glob >> >>> print glob.glob("gr*") >> ['gr\303\266t'] >> >> >>> print glob.glob(u"gr*") >> [u'gr\366t'] >> >> >>> import os >> >>> os.system("dir gr*") >> ... >> GRÇôT 0 01-02-03 12.34 grÇôt >> 1 fil(es) 0 byte >> 0 dir 12 345 678 byte free >> >> hmm. > >I presume that Fredrik's gripe is that the filename has been converted >to UTF-8, while the encoding used by Windows to display his directory >listing is Latin-1. (Not Microsoft's own 8-bit character set???) > >I'd like to solve this problem, but I have some questions: what *IS* >the encoding used for filenames on Windows? [This is just for inspiration] JDK "solves" this by running the filename through a CharToByteConverter (a codec) which is setup as the default encoding used for the platform. On my danish w2k this is encoding happens to be called 'Cp1252'. The codec name is chosen based on the users language and region with fall back to Cp1252. The mapping table is: "ar", "Cp1256", "be", "Cp1251", "bg", "Cp1251", "cs", "Cp1250", "el", "Cp1253", "et", "Cp1257", "iw", "Cp1255", "hu", "Cp1250", "ja", "MS932", "ko", "MS949", "lt", "Cp1257", "lv", "Cp1257", "mk", "Cp1251", "pl", "Cp1250", "ro", "Cp1250", "ru", "Cp1251", "sh", "Cp1250", "sk", "Cp1250", "sl", "Cp1250", "sq", "Cp1250", "sr", "Cp1251", "th", "MS874", "tr", "Cp1254", "uk", "Cp1251", "zh", "GBK", "zh_TW", "MS950", >This may differ per >Windows version; perhaps it can differ drive letter? Or per >application or per thread? On Windows NT, filenames are supposed to >be Unicode. (I suppose also on Windowns 2000?) JDK only uses GetThreadLocale() for the starting thread. It does not appears to check for windows versions at all. >How do I open a file >with a given Unicode string for its name, in a C program? I suppose >there's a Win32 API call for that which has a Unicode variant. The JDK does not make use the unicode API is it exists on the platform. >On Windows 95/98, the Unicode variants of the Win32 API calls don't >exist. So what is the poor Python runtime to do there? > >Can Japanese people use Japanese characters in filenames on Windows >95/98? Let's assume they can. Since the filesystem isn't Unicode >aware, the filenames must be encoded. Which encoding is used? Let's >assume they use Microsoft's multibyte encoding. If they put such a >file on a floppy and ship it to Linköping, what will Fredrik see as >the filename? (I.e., is the encoding fixed by the disk volume, or by >the operating system?) > >Once we have a few answers here, we can solve the problem. Note that >sometimes we'll have to refuse a Unicode filename because there's no >mapping for some of the characters it contains in the filename >encoding used. JDK silently replaced the offending character with a '?' which cause an exception when attempting to open the file. The filename, directory name, or volume label syntax is incorrect >Question: how does Fredrik create a file with a Euro >character (u'\u20ac') in its name? import java.io.*; public class x { public static void main(String[] args) throws Exception { String filename = "An eurosign \u20ac"; System.out.println(filename); new FileOutputStream(filename).close(); } } The resulting file contains an euro sign when shown in FileExplorer. The output of the program also contains an euro sign when shown with notepad. But the filename/program output does *not* contain an euro when dir'ed/type'd in my DOS box. regards, finn From gresham@mediavisual.com Thu Apr 27 17:41:04 2000 From: gresham@mediavisual.com (Paul Gresham) Date: Fri, 28 Apr 2000 00:41:04 +0800 Subject: [Python-Dev] Re: [I18n-sig] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <010f01bfb067$64e43260$9a2b440a@miv01> Hi, I'm not sure how much value I can add, as I know little about the charsets etc. and a bit more about Python. As a user of these, and running a consultancy firm in Hong Kong, I can at least pass on some points and perhaps help you with testing later on. My first touch on international PCs was fixing a Japanese 8086 back in 1989, it didn't even have colour ! Hong Kong is quite an experience as there are two formats in common use, plus occasionally another gets thrown in. In HK they use the Traditional Chinese, whereas the mainland uses Simplified, as Guido says, there are a number of different types of these. Occasionally we see the Taiwanese charsets used. It seems to me that having each individual string variable encoded might just be too atomic, perhaps creating a cumbersome overhead in the system. For most applications I can settle for the entire app to be using a single charset, however from experience there are exceptions. We are normally working with prior knowledge of the charset being used, rather than having to deal with any charset which may come along (at an application level), and therefore generally work in a context, just as a European programmer would be working in say English or German. As you know, storage/retrieval is not a problem, but manipulation and comparison is. A nice way to handle this would be like operator overloading such that string operations would be perfomed in the context of the current charset, I could then change context as needed, removing the need for metadata surrounding the actual data. This should speed things up as each overloaded library could be optimised given the different quirks, and new ones could be added easily. My code could be easily re-used on different charsets by simply changing context externally to the code, rather than passing in lots of stuff and expecting Python to deal with it. Also I'd like very much to compile/load in only the International charsets that I need. I wouldn't want to see Java type bloat occurring to Python, and adding internationalisation for everything, is huge. I think what I am suggesting is a different approach which obviously places more onus on the programmer rather than Python. Perhaps this is not acceptable, I don't know as I've never developed a programming language. I hope this is a helpful point of view to get you thinking further, otherwise ... please ignore me and I'll keep quiet : ) Regards Paul ----- Original Message ----- From: "Guido van Rossum" To: ; Cc: "Just van Rossum" Sent: Thursday, April 27, 2000 11:01 PM Subject: [I18n-sig] Unicode debate > I'd like to reset this discussion. I don't think we need to involve > c.l.py yet -- I haven't seen anyone with Asian language experience > chime in there, and that's where this matters most. I am directing > this to the Python i18n-sig mailing list, because that's where the > debate belongs, and there interested parties can join the discussion > without having to be vetted as "fit for python-dev" first. > > I apologize for having been less than responsive in the matter; > unfortunately there's lots of other stuff on my mind right now that > has recently had a tendency to distract me with higher priority > crises. > > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) > > These encodings are much less "pure" than UTF-8, because they only > encode the local characters (and ASCII), and because of various > problems with slicing: if you look "in the middle" of an encoded > string or file, you may not know how to interpret the bytes you see. > There are overlaps (in most of these encodings anyway) between the > codes used for single-byte and double-byte encodings, and you may have > to look back one or more characters to know what to make of the > particular byte you see. To get an idea of the nightmares that > non-UTF-8 multibyte encodings give C/C++ programmers, see the > Multibyte Character Set (MBCS) Survival Guide > (http://msdn.microsoft.com/library/backgrnd/html/msdn_mbcssg.htm). > See also the home page of the i18n-sig for more background information > on encoding (and other i18n) issues > (http://www.python.org/sigs/i18n-sig/). > > UTF-8 attempts to solve some of these problems: the multi-byte > encodings are chosen such that you can tell by the high bits of each > byte whether it is (1) a single-byte (ASCII) character (top bit off), > (2) the start of a multi-byte character (at least two top bits on; how > many indicates the total number of bytes comprising the character), or > (3) a continuation byte in a multi-byte character (top bit on, next > bit off). > > Many of the problems with non-UTF-8 multibyte encodings are the same > as for UTF-8 though: #bytes != #characters, a byte may not be a valid > character, regular expression patterns using "." may give the wrong > results, and so on. > > The truth of the matter is: the encoding of string objects is in the > mind of the programmer. When I read a GIF file into a string object, > the encoding is "binary goop". When I read a line of Japanese text > from a file, the encoding may be JIS, shift-JIS, or ENC -- this has to > be an assumption built-in to my program, or perhaps information > supplied separately (there's no easy way to guess based on the actual > data). When I type a string literal using Latin-1 characters, the > encoding is Latin-1. When I use octal escapes in a string literal, > e.g. '\303\247', the encoding could be UTF-8 (this is a cedilla). > When I type a 7-bit string literal, the encoding is ASCII. > > The moral of all this? 8-bit strings are not going away. They are > not encoded in UTF-8 henceforth. Like before, and like 8-bit text > files, they are encoded in whatever encoding you want. All you get is > an extra mechanism to convert them to Unicode, and the Unicode > conversion defaults to UTF-8 because it is the only conversion that is > reversible. And, as Tim Peters quoted Andy Robinson (paraphrasing > Tim's paraphrase), UTF-8 annoys everyone equally. > > Where does the current approach require work? > > - We need a way to indicate the encoding of Python source code. > (Probably a "magic comment".) > > - We need a way to indicate the encoding of input and output data > files, and we need shortcuts to set the encoding of stdin, stdout and > stderr (and maybe all files opened without an explicit encoding). > Marc-Andre showed some sample code, but I believe it is still > cumbersome. (I have to play with it more to see how it could be > improved.) > > - We need to discuss whether there should be a way to change the > default conversion between Unicode and 8-bit strings (currently > hardcoded to UTF-8), in order to make life easier for people who want > to continue to use their favorite 8-bit encoding (e.g. Latin-1, or > shift-JIS) but who also want to make use of the new Unicode datatype. > > We're still in alpha, so we can still fix things. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > From petrilli@amber.org Thu Apr 27 17:48:16 2000 From: petrilli@amber.org (Christopher Petrilli) Date: Thu, 27 Apr 2000 12:48:16 -0400 Subject: [Python-Dev] Unicode debate In-Reply-To: <200004271501.LAA13535@eric.cnri.reston.va.us>; from guido@python.org on Thu, Apr 27, 2000 at 11:01:48AM -0400 References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <20000427124816.C1723@trump.amber.org> Guido van Rossum [guido@python.org] wrote: > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) Actually a bigger concern that we hear from our customers in Japan is that Unicode has *serious* problems in asian languages. Theey took the "unification" of Chinese and Japanese, rather than both, and therefore can not represent los of phrases quite right. I can have someone write up a better dscription, but I was told by several Japanese people that they wouldn't use Unicode come hell or high water, basically. Basically it's JJIS, Shift-JIS or nothing for most Japanese companies. This was my experience working with Konica a few years ago as well. Chris -- | Christopher Petrilli | petrilli@amber.org From andy@reportlab.python.org Thu Apr 27 17:50:28 2000 From: andy@reportlab.python.org (Andy Robinson) Date: Thu, 27 Apr 2000 16:50:28 GMT Subject: [Python-Dev] Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: <39086e6a.34554266@post.demon.co.uk> >Alisa Pasic Robinson Drat! my wife's been hacking my email headers! Sorry... - Andy Robinson From jeremy@cnri.reston.va.us Thu Apr 27 23:12:15 2000 From: jeremy@cnri.reston.va.us (Jeremy Hylton) Date: Thu, 27 Apr 2000 18:12:15 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <39075D58.C549938E@tismer.com> References: <3905EEB4.4153A845@tismer.com> <14598.9873.769055.198345@goon.cnri.reston.va.us> <39074295.FA136113@tismer.com> <14599.17827.23033.266024@goon.cnri.reston.va.us> <3907498B.C596C495@tismer.com> <14599.20985.493264.876095@goon.cnri.reston.va.us> <39075D58.C549938E@tismer.com> Message-ID: <14600.47935.704157.565225@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> Summary: We had two effects here. Effect 1: Wasting time with CT> extra errors in instance creation. Effect 2: Loss of locality CT> due to code size increase. CT> Solution to 1 is Jeremy's patch. Solution to 2 could be a CT> little renaming of the one or the other module, in order to get CT> the default link order to support locality better. CT> Now everything is clear to me. My first attempts with reordering CT> could not reveal the loss with the instance stuff. CT> All together, Python 1.6 is a bit faster than 1.5.2 if we try to CT> get related code ordered better. I reach a different conclusion. The performance difference 1.5.2 and 1.6, measured with pystone and pybench, is so small that effects like the order in which the compiler assembles the code make a difference. I don't think we should make any non-trivial effort to improve performance based on this kind of voodoo. I also question the claim that the two effects here explain the performance difference between 1.5.2 and 1.6. Rather, they explain the performance difference of pystone and pybench running on different versions of the interpreter. Saying that pystone is the same speed is a far cry from saying that python is the same speed! Remember that performance on a benchmark is just that. (It's like the old joke about a person's IQ: It is a very good indicator of how well they did on the IQ test.) I think we could use better benchmarks of two sorts. The pybench microbenchmarks are quite helpful individually, though the overall number isn't particularly meaningful. However, these benchmarks are sometimes a little too big to be useful. For example, the instance creation effect was tracked down by running this code: class Foo: pass for i in range(big_num): Foo() The pybench test "CreateInstance" does all sorts of other stuff. It tests creation with and without an __init__ method. It tests instance deallocation (because all the created objected need to be dealloced, too). It also tests attribute assignment, since many of the __init__ methods make assignments. What would be better (and I'm not sure what priority should be placed on doing it) is a set of nano-benchmarks that try to limit themselves to a single feature or small set of features. Guido suggested having a hierarchy so that there are multiple nano-benchmarks for instance creation, each identifying a particular effect, and a micro-benchmark that is the aggregate of all these nano-benchmarks. We could also use some better large benchmarks. Using pystone is pretty crude, because it doesn't necessarily measure the performance of things we care about. It would be better to have a collection of 5-10 apps that each do something we care about -- munging text files or XML data, creating lots of objects, etc. For example, I used the compiler package (in nondist/src/Compiler) to compile itself. Based on that benchmark, an interpreter built from the current CVS tree is still 9-11% slower than 1.5. Jeremy From tismer@tismer.com Fri Apr 28 01:48:34 2000 From: tismer@tismer.com (Christian Tismer) Date: Fri, 28 Apr 2000 02:48:34 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <3905EEB4.4153A845@tismer.com> <14598.9873.769055.198345@goon.cnri.reston.va.us> <39074295.FA136113@tismer.com> <14599.17827.23033.266024@goon.cnri.reston.va.us> <3907498B.C596C495@tismer.com> <14599.20985.493264.876095@goon.cnri.reston.va.us> <39075D58.C549938E@tismer.com> <14600.47935.704157.565225@goon.cnri.reston.va.us> Message-ID: <3908DFE1.F43A62EB@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> Summary: We had two effects here. Effect 1: Wasting time with > CT> extra errors in instance creation. Effect 2: Loss of locality > CT> due to code size increase. > > CT> Solution to 1 is Jeremy's patch. Solution to 2 could be a > CT> little renaming of the one or the other module, in order to get > CT> the default link order to support locality better. > > CT> Now everything is clear to me. My first attempts with reordering > CT> could not reveal the loss with the instance stuff. from here... > CT> All together, Python 1.6 is a bit faster than 1.5.2 if we try to > CT> get related code ordered better. ...to here I was not clear. The rest of it is at least 100% correct. > I reach a different conclusion. The performance difference 1.5.2 and > 1.6, measured with pystone and pybench, is so small that effects like > the order in which the compiler assembles the code make a difference. Sorry, it is 10 percent. Please do not shift the topic. I agree that there must be better measurements to be able to do my thoughtless claim ...from here to here..., but the question was raised in the py-dev thread "Python 1.6 speed" by Andrew, who was exactly asking why pystone gets 10 percent slower. I have been hunting that for a week now, and with your help, it is solved. > I don't think we should make any non-trivial effort to improve > performance based on this kind of voodoo. Thanks. I've already built it in - it was trivial, but I'll keep it for my version. > I also question the claim that the two effects here explain the > performance difference between 1.5.2 and 1.6. Rather, they explain > the performance difference of pystone and pybench running on different > versions of the interpreter. Exactly. I didn't want to claim anything else, it was all in the context of the inital thread. ciao - chris Oops, p.s: interesting: ... > For example, I used the compiler package (in nondist/src/Compiler) to > compile itself. Based on that benchmark, an interpreter built from > the current CVS tree is still 9-11% slower than 1.5. Did you adjust the string methods? I don't believe these are still fast. -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From paul@prescod.net Fri Apr 28 03:20:22 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 21:20:22 -0500 Subject: [Python-Dev] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <3908F566.8E5747C@prescod.net> Guido van Rossum wrote: > > ... > > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) I think that maybe an important point is getting lost here. I could be wrong, but it seems that all of this emphasis on encodings is misplaced. The physical and logical makeup of character strings are entirely separate issues. Unicode is a character set. It works in the logical domain. Dozens of different physical encodings can be used for Unicode characters. There are XML users who work with XML (and thus Unicode) every day and never see UTF-8, UTF-16 or any other Unicode-consortium "sponsored" encoding. If you invent an encoding tomorrow, it can still be XML-compatible. There are many encodings older than Unicode that are XML (and Unicode) compatible. I have not heard complaints about the XML way of looking at the world and in fact it was explicitly endorsed by many of the world's leading experts on internationalization. I haven't followed the Java situation as closely but I have also not heard screams about its support for il8n. > The truth of the matter is: the encoding of string objects is in the > mind of the programmer. When I read a GIF file into a string object, > the encoding is "binary goop". IMHO, it's a mistake of history that you would even think it makes sense to read a GIF file into a "string" object and we should be trying to erase that mistake, as quickly as possible (which is admittedly not very quickly) not building more and more infrastructure around it. How can we make the transition to a "binary goops are not strings" world easiest? > The moral of all this? 8-bit strings are not going away. If that is a statement of your long term vision, then I think that it is very unfortunate. Treating string literals as if they were isomorphic with byte arrays was probably the right thing in 1991 but it won't be in 2005. It doesn't meet the definition of string used in the Unicode spec., nor in XML, nor in Java, nor at the W3C nor in most other up and coming specifications. From the W3C site: ""While ISO-2022-JP is not sufficient for every ISO10646 document, it is the case that ISO10646 is a sufficient document character set for any entity encoded with ISO-2022-JP."" http://www.w3.org/MarkUp/html-spec/charset-harmful.html -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From paul@prescod.net Fri Apr 28 03:21:44 2000 From: paul@prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 21:21:44 -0500 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> Message-ID: <3908F5B8.9F8D8A9A@prescod.net> Andy Robinson wrote: > > - you can work with old fashioned strings, which are understood > by everyone to be arrays of bytes, and there is no magic > conversion going on. The bytes in literal strings in your script file > are the bytes that end up in the program. Who is "everyone"? Are you saying that CP4E hordes are going to understand that the syntax "abcde" is constructing a *byte array*? It seems like you think that Python users are going to be more sophisticated in their understanding of these issues than Java programmers. In most other things, Python is simpler. > ... > > I'm also convinced that the majority of Python scripts won't need > to work in Unicode. Anything working with XML will need to be Unicode. Anything working with the Win32 API (especially COM) will want to do Unicode. Over time the entire Web infrastructure will move to Unicode. Anything written in JPython pretty much MOST use Unicode (doesn't it?). > Even working with exotic languages, there is always a native > 8-bit encoding. Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use 8-bit encodings of Unicode if you want. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From petrilli@amber.org Fri Apr 28 05:12:29 2000 From: petrilli@amber.org (Christopher Petrilli) Date: Fri, 28 Apr 2000 00:12:29 -0400 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: <3908F5B8.9F8D8A9A@prescod.net>; from paul@prescod.net on Thu, Apr 27, 2000 at 09:21:44PM -0500 References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> Message-ID: <20000428001229.A4790@trump.amber.org> Paul Prescod [paul@prescod.net] wrote: > > I'm also convinced that the majority of Python scripts won't need > > to work in Unicode. > > Anything working with XML will need to be Unicode. Anything working with > the Win32 API (especially COM) will want to do Unicode. Over time the > entire Web infrastructure will move to Unicode. Anything written in > JPython pretty much MOST use Unicode (doesn't it?). I disagree with this. Unicode has been a very long time, and it's not been adopted by a lot of people for a LOT of very valid reasons. > > Even working with exotic languages, there is always a native > > 8-bit encoding. > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > 8-bit encodings of Unicode if you want. Um, if you go: JIS -> Unicode -> JIS you don't get the same thing out that you put in (at least this is what I've been told by a lot of Japanese developers), and therefore it's not terribly popular because of the nature of the Japanese (and Chinese) langauge. My experience with Unicode is that a lot of Western people think it's the answer to every problem asked, while most asian language people disagree vehemently. This says the problem isn't solved yet, even if people wish to deny it. Chris -- | Christopher Petrilli | petrilli@amber.org From just@letterror.com Fri Apr 28 09:33:16 2000 From: just@letterror.com (Just van Rossum) Date: Fri, 28 Apr 2000 09:33:16 +0100 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: <200004271501.LAA13535@eric.cnri.reston.va.us> References: Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: At 11:01 AM -0400 27-04-2000, Guido van Rossum wrote: >Where does the current approach require work? > >- We need a way to indicate the encoding of Python source code. >(Probably a "magic comment".) How will other parts of a program know which encoding was used for non-unicode string literals? It seems to me that an encoding attribute for 8-bit strings solves this nicely. The attribute should only be set automatically if the encoding of the source file was specified or when the string has been encoded from a unicode string. The attribute should *only* be used when converting to unicode. (Hm, it could even be used when calling unicode() without the encoding argument.) It should *not* be used when comparing (or adding, etc.) 8-bit strings to each other, since they still may contain binary goop, even in a source file with a specified encoding! >- We need a way to indicate the encoding of input and output data >files, and we need shortcuts to set the encoding of stdin, stdout and >stderr (and maybe all files opened without an explicit encoding). Can you open a file *with* an explicit encoding? Just From mal@lemburg.com Fri Apr 28 10:39:37 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 28 Apr 2000 11:39:37 +0200 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> Message-ID: <39095C59.A5916EEB@lemburg.com> [Note: These discussion should all move to 18n-sig... CCing there] Christopher Petrilli wrote: > > Paul Prescod [paul@prescod.net] wrote: > > > Even working with exotic languages, there is always a native > > > 8-bit encoding. > > > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > > 8-bit encodings of Unicode if you want. > > Um, if you go: > > JIS -> Unicode -> JIS > > you don't get the same thing out that you put in (at least this is > what I've been told by a lot of Japanese developers), and therefore > it's not terribly popular because of the nature of the Japanese (and > Chinese) langauge. > > My experience with Unicode is that a lot of Western people think it's > the answer to every problem asked, while most asian language people > disagree vehemently. This says the problem isn't solved yet, even if > people wish to deny it. Isn't this a problem of the translation rather than Unicode itself (Andy mentioned several times that you can use the private BMP areas to implement 1-1 round-trips) ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tree@basistech.com Fri Apr 28 11:44:00 2000 From: tree@basistech.com (Tom Emerson) Date: Fri, 28 Apr 2000 06:44:00 -0400 (EDT) Subject: [Python-Dev] [I18n-sig] Re: Unicode debate In-Reply-To: References: Message-ID: <14601.27504.337569.201251@cymru.basistech.com> Just van Rossum writes: > How will other parts of a program know which encoding was used for > non-unicode string literals? This is the exact reason that Unicode should be used for all string literals: from a language design perspective I don't understand the rationale for providing "traditional" and "unicode" string. > It seems to me that an encoding attribute for 8-bit strings solves this > nicely. The attribute should only be set automatically if the encoding of > the source file was specified or when the string has been encoded from a > unicode string. The attribute should *only* be used when converting to > unicode. (Hm, it could even be used when calling unicode() without the > encoding argument.) It should *not* be used when comparing (or adding, > etc.) 8-bit strings to each other, since they still may contain binary > goop, even in a source file with a specified encoding! In Dylan there is an explicit split between 'characters' (which are always Unicode) and 'bytes'. What are the compelling reasons to not use UTF-8 as the (source) document encoding? In the past the usual response is, "the tools are't there for authoring UTF-8 documents". This argument becomes more specious as more OS's move towards Unicode. I firmly believe this can be done without Java's bloat. One off-the-cuff solution is this: All character strings are Unicode (utf-8 encoding). Language terminals and operators are restricted to US-ASCII, which are identical to UTF8. The contents of comments are not interpreted in any way. > >- We need a way to indicate the encoding of input and output data > >files, and we need shortcuts to set the encoding of stdin, stdout and > >stderr (and maybe all files opened without an explicit encoding). > > Can you open a file *with* an explicit encoding? If you cannot, you lose. You absolutely must be able to specify the encoding of a file when opening it, so that the runtime can transcode into the native encoding as you read it. This should be otherwise transparent the user. -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From tree@basistech.com Fri Apr 28 11:56:50 2000 From: tree@basistech.com (Tom Emerson) Date: Fri, 28 Apr 2000 06:56:50 -0400 (EDT) Subject: [I18n-sig] Re: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: <39095C59.A5916EEB@lemburg.com> References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> <39095C59.A5916EEB@lemburg.com> Message-ID: <14601.28274.667733.660938@cymru.basistech.com> M.-A. Lemburg writes: > > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > > > 8-bit encodings of Unicode if you want. This is meaningless: legacy encodings of national character sets such Shift-JIS, Big Five, GB2312, or TIS620 are not "encodings" of Unicode. TIS620 is a single-byte, 8-bit encoding: each character is represented by a single byte. The Japanese and Chinese encodings are multibyte, 8-bit, encodings. ISO-2022 is a multi-byte, 7-bit encoding for multiple character sets. Unicode has several possible encodings: UTF-8, UCS-2, UCS-4, UTF-16... You can view all of these as 8-bit encodings, if you like. Some are multibyte (such as UTF-8, where each character in Unicode is represented in 1 to 3 bytes) while others are fixed length, two or four bytes per character. > > Um, if you go: > > > > JIS -> Unicode -> JIS > > > > you don't get the same thing out that you put in (at least this is > > what I've been told by a lot of Japanese developers), and therefore > > it's not terribly popular because of the nature of the Japanese (and > > Chinese) langauge. This is simply not true any more. The ability to round trip between Unicode and legacy encodings is dependent on the software: being able to use code points in the PUA for this is acceptable and commonly done. The big advantage is in using Unicode as a pivot when transcoding between different CJK encodings. It is very difficult to map between, say, Shift JIS and GB2312, directly. However, Unicode provides a good go-between. It isn't a panacea: transcoding between legacy encodings like GB2312 and Big Five is still difficult: Unicode or not. > > My experience with Unicode is that a lot of Western people think it's > > the answer to every problem asked, while most asian language people > > disagree vehemently. This says the problem isn't solved yet, even if > > people wish to deny it. This is a shame: it is an indication that they don't understand the technology. Unicode is a tool: nothing more. -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From gstein@lyra.org Fri Apr 28 13:41:11 2000 From: gstein@lyra.org (Greg Stein) Date: Fri, 28 Apr 2000 05:41:11 -0700 (PDT) Subject: [Python-Dev] c.l.py readership datapoint (was: Python 1.6a2 Unicode bug) In-Reply-To: Message-ID: On Thu, 27 Apr 2000, Just van Rossum wrote: > At 10:27 PM -0400 26-04-2000, Tim Peters wrote: > >Indeed, if someone from an inferior culture wants to chime in, let them find > >Python-Dev with their own beady little eyes . > > All irony aside, I think you've nailed one of the problems spot on: > - most core Python developers seem to be too busy to read *anything* at all > in c.l.py Datapoint: I stopped reading c.l.py almost two years ago. For a while, I would pop up a newsreader every month or so and skim what kinds of things were happening. That stopped at least a year or so ago. I get a couple hundred messages a day. Another 100+ from c.l.py would be way too much. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido@python.org Fri Apr 28 14:24:29 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 09:24:29 -0400 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: Your message of "Fri, 28 Apr 2000 11:39:37 +0200." <39095C59.A5916EEB@lemburg.com> References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> <39095C59.A5916EEB@lemburg.com> Message-ID: <200004281324.JAA15642@eric.cnri.reston.va.us> > [Note: These discussion should all move to 18n-sig... CCing there] > > Christopher Petrilli wrote: > > you don't get the same thing out that you put in (at least this is > > what I've been told by a lot of Japanese developers), and therefore > > it's not terribly popular because of the nature of the Japanese (and > > Chinese) langauge. > > > > My experience with Unicode is that a lot of Western people think it's > > the answer to every problem asked, while most asian language people > > disagree vehemently. This says the problem isn't solved yet, even if > > people wish to deny it. [Marc-Andre Lenburg] > Isn't this a problem of the translation rather than Unicode > itself (Andy mentioned several times that you can use the private > BMP areas to implement 1-1 round-trips) ? Maybe, but apparently such high-quality translations are rare (note that Andy said "can"). Anyway, a word of caution here. Years ago I attended a number of IETF meetings on internationalization, in a time when Unicode wasn't as accepted as it is now. The one thing I took away from those meetings was that this is a *highly* emotional and controversial issue. As the Python community, I feel we have no need to discuss "why Unicode." Therein lies madness, controversy, and no progress. We know there's a clear demand for Unicode, and we've committed to support it. The question now at hand is "how Unicode." Let's please focus on that, e.g. in the other thread ("Unicode debate") in i18n-sig and python-dev. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 28 15:10:27 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:10:27 -0400 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 09:33:16 BST." References: Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: <200004281410.KAA16104@eric.cnri.reston.va.us> [GvR] > >- We need a way to indicate the encoding of Python source code. > >(Probably a "magic comment".) [JvR] > How will other parts of a program know which encoding was used for > non-unicode string literals? > > It seems to me that an encoding attribute for 8-bit strings solves this > nicely. The attribute should only be set automatically if the encoding of > the source file was specified or when the string has been encoded from a > unicode string. The attribute should *only* be used when converting to > unicode. (Hm, it could even be used when calling unicode() without the > encoding argument.) It should *not* be used when comparing (or adding, > etc.) 8-bit strings to each other, since they still may contain binary > goop, even in a source file with a specified encoding! Marc-Andre took this idea a bit further, but I think it's not practical given the current implementation: there are too many places where the C code would have to be changed in order to propagate the string encoding information, and there are too many sources of strings with unknown encodings to make it very useful. Plus, it would slow down 8-bit string ops. I have a better idea: rather than carrying around 8-bit strings with an encoding, use Unicode literals in your source code. If the source encoding is known, these will be converted using the appropriate codec. If you object to having to write u"..." all the time, we could say that "..." is a Unicode literal if it contains any characters with the top bit on (of course the source file encoding would be used just like for u"..."). But I think this should be enabled by a separate pragma -- people who want to write Unicode-unaware code manipulating 8-bit strings in their favorite encoding (e.g. shift-JIS or Latin-1) should not silently get Unicode strings. (I thought about an option to make *all strings* (not just literals) Unicode, but the current implementation would require too much hacking. This is what JPython does, and maybe it should be what Python 3000 does; I don't see it as a realistic option for the 1.x series.) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake@acm.org Fri Apr 28 15:27:18 2000 From: fdrake@acm.org (Fred L. Drake, Jr.) Date: Fri, 28 Apr 2000 10:27:18 -0400 (EDT) Subject: [Python-Dev] Brian Hooper's patch to add u & u# to Py_BuildValue Message-ID: <14601.40902.531340.684389@seahag.cnri.reston.va.us> Brian Hooper submitted a patch to add U and U# to the format strings for Py_BuildValue(), and there were comments that indicated u and u# would be better. He's submitted a documentation update for this as well the implementation. If there are no objections, I'll incorporate these changes. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido@python.org Fri Apr 28 15:32:28 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:32:28 -0400 Subject: [Python-Dev] Re: [I18n-sig] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 06:44:00 EDT." <14601.27504.337569.201251@cymru.basistech.com> References: <14601.27504.337569.201251@cymru.basistech.com> Message-ID: <200004281432.KAA16418@eric.cnri.reston.va.us> > This is the exact reason that Unicode should be used for all string > literals: from a language design perspective I don't understand the > rationale for providing "traditional" and "unicode" string. In Python 3000, you would have a point. In current Python, there simply are too many programs and extensions written in other languages that manipulating 8-bit strings to ignore their existence. We're trying to add Unicode support to Python 1.6 without breaking code that used to run under Python 1.5.x; practicalities just make it impossible to go with Unicode for everything. I think that if Python didn't have so many extension modules (many maintained by 3rd party modules) it would be a lot easier to switch to Unicode for all strings (I think JavaScript has done this). In Python 3000, we'll have to seriously consider having separate character string and byte array objects, along the lines of Java's model. Note that I say "seriously consider." We'll first have to see how well the current solution works *in practice*. There's time before we fix Py3k in stone. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 28 15:33:24 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:33:24 -0400 Subject: [Python-Dev] Brian Hooper's patch to add u & u# to Py_BuildValue In-Reply-To: Your message of "Fri, 28 Apr 2000 10:27:18 EDT." <14601.40902.531340.684389@seahag.cnri.reston.va.us> References: <14601.40902.531340.684389@seahag.cnri.reston.va.us> Message-ID: <200004281433.KAA16446@eric.cnri.reston.va.us> > Brian Hooper submitted a patch to add U and U# to the format strings > for Py_BuildValue(), and there were comments that indicated u and u# > would be better. He's submitted a documentation update for this as > well the implementation. > If there are no objections, I'll incorporate these changes. Please go ahead, changing U/U# to u/u#. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido@python.org Fri Apr 28 15:50:05 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:50:05 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode debate In-Reply-To: Your message of "Thu, 27 Apr 2000 21:20:22 CDT." <3908F566.8E5747C@prescod.net> References: <200004271501.LAA13535@eric.cnri.reston.va.us> <3908F566.8E5747C@prescod.net> Message-ID: <200004281450.KAA16493@eric.cnri.reston.va.us> [Paul Prescod] > I think that maybe an important point is getting lost here. I could be > wrong, but it seems that all of this emphasis on encodings is misplaced. In practical applications that manipulate text, encodings creep up all the time. I remember a talk or message by Andy Robinson about the messiness of producing printed reports in Japanese for a large investment firm. Most off the issues that took his time had to do with encodings, if I recall correctly. (Andy, do you remember what I'm talking about? Do you have a URL?) > > The truth of the matter is: the encoding of string objects is in the > > mind of the programmer. When I read a GIF file into a string object, > > the encoding is "binary goop". > > IMHO, it's a mistake of history that you would even think it makes sense > to read a GIF file into a "string" object and we should be trying to > erase that mistake, as quickly as possible (which is admittedly not very > quickly) not building more and more infrastructure around it. How can we > make the transition to a "binary goops are not strings" world easiest? I'm afraid that's a bigger issue than we can solve for Python 1.6. We're committed to by and large backwards compatibility while supporting Unicode -- the backwards compatibility with tons of extension module (many 3rd party) requires that we deal with 8-bit strings in basically the same way as we did before. > > The moral of all this? 8-bit strings are not going away. > > If that is a statement of your long term vision, then I think that it is > very unfortunate. Treating string literals as if they were isomorphic > with byte arrays was probably the right thing in 1991 but it won't be in > 2005. I think you're a tad too optimistic about the evolution speed of software (Windows 2000 *still* has to support DOS programs), but I see your point. As I stated in another message, in Python 3000 we'll have to consider a more Java-esque solution: *character* strings are Unicode, and for bytes we have (mutable!) byte arras. Certainly 8-bit bytes as the smallest storage unit aren't going away. > It doesn't meet the definition of string used in the Unicode spec., nor > in XML, nor in Java, nor at the W3C nor in most other up and coming > specifications. OK, so that's a good indication of where you're coming from. Maybe you should spend a little more time in the trenches and a little less in standards bodies. Standards are good, but sometimes disconnected from reality (remember ISO networking? :-). > From the W3C site: > > ""While ISO-2022-JP is not sufficient for every ISO10646 document, it is > the case that ISO10646 is a sufficient document character set for any > entity encoded with ISO-2022-JP."" And this is exactly why encodings will remain important: entities encoded in ISO-2022-JP have no compelling reason to be recoded permanently into ISO10646, and there are lots of forces that make it convenient to keep it encoded in ISO-2022-JP (like existing tools). > http://www.w3.org/MarkUp/html-spec/charset-harmful.html I know that document well. --Guido van Rossum (home page: http://www.python.org/~guido/) From just@letterror.com Fri Apr 28 18:51:03 2000 From: just@letterror.com (Just van Rossum) Date: Fri, 28 Apr 2000 18:51:03 +0100 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: <200004281410.KAA16104@eric.cnri.reston.va.us> References: Your message of "Fri, 28 Apr 2000 09:33:16 BST." Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: [GvR, on string.encoding ] >Marc-Andre took this idea a bit further, but I think it's not >practical given the current implementation: there are too many places >where the C code would have to be changed in order to propagate the >string encoding information, I may miss something, but the encoding attr just travels with the string object, no? Like I said in my reply to MAL, I think it's undesirable to do *anything* with the encoding attr if not in combination with a unicode string. >and there are too many sources of strings >with unknown encodings to make it very useful. That's why the default encoding must be settable as well, as Fredrik suggested. >Plus, it would slow down 8-bit string ops. Not if you ignore it most of the time, and just pass it along when concatenating. >I have a better idea: rather than carrying around 8-bit strings with >an encoding, use Unicode literals in your source code. Explain that to newbies... I guess is that they will want simple 8 bit strings in their native encoding. Dunno. >If the source >encoding is known, these will be converted using the appropriate >codec. > >If you object to having to write u"..." all the time, we could say >that "..." is a Unicode literal if it contains any characters with the >top bit on (of course the source file encoding would be used just like >for u"..."). Only if "\377" would still yield an 8-bit string, for binary goop... Just From guido@python.org Fri Apr 28 19:31:19 2000 From: guido@python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 14:31:19 -0400 Subject: [I18n-sig] Re: [Python-Dev] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 18:51:03 BST." References: Your message of "Fri, 28 Apr 2000 09:33:16 BST." Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: <200004281831.OAA17406@eric.cnri.reston.va.us> > [GvR, on string.encoding ] > >Marc-Andre took this idea a bit further, but I think it's not > >practical given the current implementation: there are too many places > >where the C code would have to be changed in order to propagate the > >string encoding information, [JvR] > I may miss something, but the encoding attr just travels with the string > object, no? Like I said in my reply to MAL, I think it's undesirable to do > *anything* with the encoding attr if not in combination with a unicode > string. But just propagating affects every string op -- s+s, s*n, s[i], s[:], s.strip(), s.split(), s.lower(), ... > >and there are too many sources of strings > >with unknown encodings to make it very useful. > > That's why the default encoding must be settable as well, as Fredrik > suggested. I'm open for debate about this. There's just something about a changeable global default encoding that worries me -- like any global property, it requires conventions and defensive programming to make things work in larger programs. For example, a module that deals with Latin-1 strings can't just set the default encoding to Latin-1: it might be imported by a program that needs it to be UTF-8. This model is currently used by the locale in C, where all locale properties are global, and it doesn't work well. For example, Python needs to go through a lot of hoops so that Python numeric literals use "." for the decimal indicator even if the user's locale specifies "," -- we can't change Python to swap the meaning of "." and "," in all contexts. So I think that a changeable default encoding is of limited value. That's different from being able to set the *source file* encoding -- this only affects Unicode string literals. > >Plus, it would slow down 8-bit string ops. > > Not if you ignore it most of the time, and just pass it along when > concatenating. And slicing, and indexing, and... > >I have a better idea: rather than carrying around 8-bit strings with > >an encoding, use Unicode literals in your source code. > > Explain that to newbies... I guess is that they will want simple 8 bit > strings in their native encoding. Dunno. If they are hap-py with their native 8-bit encoding, there's no need for them to ever use Unicode objects in their program, so they should be fine. 8-bit strings aren't ever interpreted or encoded except when mixed with Unicode objects. > >If the source > >encoding is known, these will be converted using the appropriate > >codec. > > > >If you object to having to write u"..." all the time, we could say > >that "..." is a Unicode literal if it contains any characters with the > >top bit on (of course the source file encoding would be used just like > >for u"..."). > > Only if "\377" would still yield an 8-bit string, for binary goop... Correct. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal@lemburg.com Fri Apr 28 19:57:18 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Fri, 28 Apr 2000 20:57:18 +0200 Subject: [Python-Dev] Changing PYC-Magic Message-ID: <3909DF0E.1D886485@lemburg.com> I have just looked at the Python/import.c file and the hard coded PYC magic number... /* Magic word to reject .pyc files generated by other Python versions */ /* Change for each incompatible change */ /* The value of CR and LF is incorporated so if you ever read or write a .pyc file in text mode the magic number will be wrong; also, the Apple MPW compiler swaps their values, botching string constants */ /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */ /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ #define MAGIC (20121 | ((long)'\r'<<16) | ((long)'\n'<<24)) A bit outdated, I'd say. With the addition of Unicode, the PYC files will contain marshalled Unicode objects which are not readable by older versions. I'd suggest bumping the magic number to 50428 ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From trentm@ActiveState.com Fri Apr 28 23:08:57 2000 From: trentm@ActiveState.com (Trent Mick) Date: Fri, 28 Apr 2000 15:08:57 -0700 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: <200004261851.OAA06794@eric.cnri.reston.va.us> Message-ID: > > Guido van Rossum wrote: > > > > > > The email below is a serious bug report. A quick analysis shows that > > > UserString.count() calls the count() method on a string object, which > > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > > format code truncates integers. It probably should raise an overflow > > > exception instead. But that would still cause the test to fail -- > > > just in a different way (more explicit). Then the string methods > > > should be fixed to use long ints instead -- and then something else > > > would probably break... > > MAL wrote: > > All uses in stringobject.c and unicodeobject.c use INT_MAX > > together with integers, so there's no problem on that side > > of the fence ;-) > > > > Since strings and Unicode objects use integers to describe the > > length of the object (as well as most if not all other > > builtin sequence types), the correct default value should > > thus be something like sys.maxlen which then gets set to > > INT_MAX. > > > > I'd suggest adding sys.maxlen and the modifying UserString.py, > > re.py and sre_parse.py accordingly. > Guido wrote: > Hm, I'm not so sure. It would be much better if passing sys.maxint > would just WORK... Since that's what people have been doing so far. > Possible solutions (I give 4 of them): 1. The 'i' format code could raise an overflow exception and the PyArg_ParseTuple() call in string_count() could catch it and truncate to INT_MAX (reasoning that any overflow of the end position of a string can be bound to INT_MAX because that is the limit for any string in Python). Pros: - This "would just WORK" for usage of sys.maxint. Cons: - This overflow exception catching should then reasonably be propagated to other similar functions (like string.endswith(), etc). - We have to assume that the exception raised in the PyArg_ParseTuple(args, "O|ii:count", &subobj, &i, &last) call is for the second integer (i.e. 'last'). This is subtle and ugly. Pro or Con: - Do we want to start raising overflow exceptions for other conversion formats (i.e. 'b' and 'h' and 'l', the latter *can* overflow on Win64 where sizeof(long) < size(void*))? I think this is a good idea in principle but may break code (even if it *does* identify bugs in that code). 2. Just change the definitions of the UserString methods to pass a variable length argument list instead of default value parameters. For example change UserString.count() from: def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) to: def count(self, *args)): return self.data.count(*args) The result is that the default value for 'end' is now set by string_count() rather than by the UserString implementation: >>> from UserString import UserString >>> s= 'abcabcabc' >>> u = UserString('abcabcabc') >>> s.count('abc') 3 >>> u.count('abc') 3 Pros: - Easy change. - Fixes the immediate bug. - This is a safer way to copy the string behaviour in UserString anyway (is it not?). Cons: - Does not fix the general problem of the (common?) usage of sys.maxint to mean INT_MAX rather than the actual LONG_MAX (this matters on 64-bit Unices). - The UserString code is no longer really self-documenting. 3. As MAL suggested: add something like sys.maxlen (set to INT_MAX) with breaks the logical difference with sys.maxint (set to LONG_MAX): - sys.maxint == "the largest value a Python integer can hold" - sys.maxlen == "the largest value for the length of an object in Python (e.g. length of a string, length of an array)" Pros: - More explicit in that it separates two distinct meanings for sys.maxint (which now makes a difference on 64-bit Unices). - The code changes should be fairly straightforward. Cons: - Places in the code that still use sys.maxint where they should use sys.maxlen will unknowingly be overflowing ints and bringing about this bug. - Something else for coders to know about. 4. Add something like sys.maxlen, but set it to SIZET_MAX (c.f. ANSI size_t type). It is probably not a biggie, but Python currently makes the assumption that string never exceed INT_MAX in length. While this assumption is not likely to be proven false it technically could be on 64-bit systems. As well, when you start compiling on Win64 (where sizeof(int) == sizeof(long) < sizeof(size_t)) then you are going to be annoyed by hundreds of warnings about implicit casts from size_t (64-bits) to int (32-bits) for every strlen, str*, fwrite, and sizeof call that you make. Pros: - IMHO logically more correct. - Might clean up some subtle bugs. - Cleans up annoying and disconcerting warnings. - Will probably mean less pain down the road as 64-bit systems (esp. Win64) become more prevalent. Cons: - Lot of coding changes. - As Guido said: "and then something else would probably break". (Though, on currently 32-bits system, there should be no effective change). Only 64-bit systems should be affected and, I would hope, the effect would be a clean up. I apologize for not being succinct. Note that I am volunteering here. Opinions and guidance please. Trent From Moshe Zadka Sat Apr 29 03:08:48 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 29 Apr 2000 05:08:48 +0300 (IDT) Subject: [I18n-sig] Re: [Python-Dev] Unicode debate In-Reply-To: <200004281450.KAA16493@eric.cnri.reston.va.us> Message-ID: I agree with most of what you say, but... On Fri, 28 Apr 2000, Guido van Rossum wrote: > As I stated in another message, in Python 3000 we'll have > to consider a more Java-esque solution: *character* strings are > Unicode, and for bytes we have (mutable!) byte arras. I would prefer a different distinction: mutable immutable chars string string_buffer bytes bytes bytes_buffer Why not allow me the freedom to index a dictionary with goop? (Here's a sample application: UNIX "file" command) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal@lemburg.com Sat Apr 29 13:50:07 2000 From: mal@lemburg.com (M.-A. Lemburg) Date: Sat, 29 Apr 2000 14:50:07 +0200 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) References: Message-ID: <390ADA7F.2C01C6C3@lemburg.com> Trent Mick wrote: > > > > Guido van Rossum wrote: > > > > > > > > The email below is a serious bug report. A quick analysis shows that > > > > UserString.count() calls the count() method on a string object, which > > > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > > > format code truncates integers. It probably should raise an overflow > > > > exception instead. But that would still cause the test to fail -- > > > > just in a different way (more explicit). Then the string methods > > > > should be fixed to use long ints instead -- and then something else > > > > would probably break... > > > > MAL wrote: > > > All uses in stringobject.c and unicodeobject.c use INT_MAX > > > together with integers, so there's no problem on that side > > > of the fence ;-) > > > > > > Since strings and Unicode objects use integers to describe the > > > length of the object (as well as most if not all other > > > builtin sequence types), the correct default value should > > > thus be something like sys.maxlen which then gets set to > > > INT_MAX. > > > > > > I'd suggest adding sys.maxlen and the modifying UserString.py, > > > re.py and sre_parse.py accordingly. > > > Guido wrote: > > Hm, I'm not so sure. It would be much better if passing sys.maxint > > would just WORK... Since that's what people have been doing so far. > > > > Possible solutions (I give 4 of them): > [...] Here is another one... I don't really like it because I think that silent truncations are a bad idea, but to make things "just work it would help: * Change PyArg_ParseTuple() to truncate the range(INT_MAX+1, LONG_MAX+1) to INT_MAX and the same for negative numbers when passing a Python integer to a "i" marked variable. This would map range(INT_MAX+1, LONG_MAX+1) to INT_MAX and thus sys.maxint would turn out as INT_MAX in all those cases where "i" is used as parser marker. Dito for negative values. With this truncation passing sys.maxint as default argument for length parameters would "just work" :-). The more radical alternative would be changing the Python object length fields to long -- I don't think this is practical though (and probably not really needed unless you intend to work with 3GB strings ;). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From paul@prescod.net Sat Apr 29 15:18:05 2000 From: paul@prescod.net (Paul Prescod) Date: Sat, 29 Apr 2000 09:18:05 -0500 Subject: [I18n-sig] Re: [Python-Dev] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> <3908F566.8E5747C@prescod.net> <200004281450.KAA16493@eric.cnri.reston.va.us> Message-ID: <390AEF1D.253B93EF@prescod.net> Guido van Rossum wrote: > > [Paul Prescod] > > I think that maybe an important point is getting lost here. I could be > > wrong, but it seems that all of this emphasis on encodings is misplaced. > > In practical applications that manipulate text, encodings creep up all > the time. I'm not saying that encodings are unimportant. I'm saying that that they are *different* than what Fredrik was talking about. He was talking about a coherent logical model for characters and character strings based on the conventions of more modern languages and systems than C and Python. > > How can we > > make the transition to a "binary goops are not strings" world easiest? > > I'm afraid that's a bigger issue than we can solve for Python 1.6. I understand that we can't fix the problem now. I just think that we shouldn't go out of our ways to make it worst. If we make byte-array strings "magically" cast themselves into character-strings, people will expect that behavior forever. > > It doesn't meet the definition of string used in the Unicode spec., nor > > in XML, nor in Java, nor at the W3C nor in most other up and coming > > specifications. > > OK, so that's a good indication of where you're coming from. Maybe > you should spend a little more time in the trenches and a little less > in standards bodies. Standards are good, but sometimes disconnected > from reality (remember ISO networking? :-). As far as I know, XML and Java are used a fair bit in the real world...even somewhat in Asia. In fact, there is a book titled "XML and Java" written by three Japanese men. > And this is exactly why encodings will remain important: entities > encoded in ISO-2022-JP have no compelling reason to be recoded > permanently into ISO10646, and there are lots of forces that make it > convenient to keep it encoded in ISO-2022-JP (like existing tools). You cannot recode an ISO-2022-JP document into ISO10646 because 10646 is a character *set* and not an encoding. ISO-2022-JP says how you should represent characters in terms of bits and bytes. ISO10646 defines a mapping from integers to characters. They are both important, but separate. I think that this automagical re-encoding conflates them. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From Moshe Zadka Sat Apr 29 19:09:40 2000 From: Moshe Zadka (Moshe Zadka) Date: Sat, 29 Apr 2000 21:09:40 +0300 (IDT) Subject: [Python-Dev] At the interactive port Message-ID: Continuing the recent debate about what is appropriate to the interactive prompt printing, and the wide agreement that whatever we decide, users might think otherwise, I've written up a patch to have the user control via a function in __builtin__ the way things are printed at the prompt. This is not patches@python level stuff for two reasons: 1. I'm not sure what to call this function. Currently, I call it __print_expr__, but I'm not sure it's a good name 2. I haven't yet supplied a default in __builtin__, so the user *must* override this. This is unacceptable, of course. I'd just like people to tell me if they think this is worth while, and if there is anything I missed. *** ../python/dist/src/Python/ceval.c Fri Mar 31 04:42:47 2000 --- Python/ceval.c Sat Apr 29 03:55:36 2000 *************** *** 1014,1047 **** case PRINT_EXPR: v = POP(); ! /* Print value except if None */ ! /* After printing, also assign to '_' */ ! /* Before, set '_' to None to avoid recursion */ ! if (v != Py_None && ! (err = PyDict_SetItemString( ! f->f_builtins, "_", Py_None)) == 0) { ! err = Py_FlushLine(); ! if (err == 0) { ! x = PySys_GetObject("stdout"); ! if (x == NULL) { ! PyErr_SetString( ! PyExc_RuntimeError, ! "lost sys.stdout"); ! err = -1; ! } ! } ! if (err == 0) ! err = PyFile_WriteObject(v, x, 0); ! if (err == 0) { ! PyFile_SoftSpace(x, 1); ! err = Py_FlushLine(); ! } ! if (err == 0) { ! err = PyDict_SetItemString( ! f->f_builtins, "_", v); ! } } ! Py_DECREF(v); break; case PRINT_ITEM: --- 1014,1035 ---- case PRINT_EXPR: v = POP(); ! x = PyDict_GetItemString(f->f_builtins, ! "__print_expr__"); ! if (x == NULL) { ! PyErr_SetString(PyExc_SystemError, ! "__print_expr__ not found"); ! Py_DECREF(v); ! break; ! } ! t = PyTuple_New(1); ! if (t != NULL) { ! PyTuple_SET_ITEM(t, 0, v); ! w = PyEval_CallObject(x, t); ! Py_XDECREF(w); } ! /*Py_DECREF(x);*/ ! Py_XDECREF(t); break; case PRINT_ITEM: -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From trentm@activestate.com Sat Apr 29 19:12:07 2000 From: trentm@activestate.com (Trent Mick) Date: Sat, 29 Apr 2000 11:12:07 -0700 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: <390ADA7F.2C01C6C3@lemburg.com> References: <390ADA7F.2C01C6C3@lemburg.com> Message-ID: <20000429111207.A16414@activestate.com> On Sat, Apr 29, 2000 at 02:50:07PM +0200, M.-A. Lemburg wrote: > Here is another one... I don't really like it because I think that > silent truncations are a bad idea, but to make things "just work > it would help: > > * Change PyArg_ParseTuple() to truncate the range(INT_MAX+1, LONG_MAX+1) > to INT_MAX and the same for negative numbers when passing a > Python integer to a "i" marked variable. This would map > range(INT_MAX+1, LONG_MAX+1) to INT_MAX and thus sys.maxint > would turn out as INT_MAX in all those cases where "i" is > used as parser marker. Dito for negative values. > > With this truncation passing sys.maxint as default argument > for length parameters would "just work" :-). > > The more radical alternative would be changing the Python > object length fields to long -- I don't think this is If we *do* make this change however, say "size_t" please, rather than long because on Win64 sizeof(long) < sizeof(size_t) == sizeof(void*). > practical though (and probably not really needed unless > you intend to work with 3GB strings ;). I know that 3GB+ strings are not likely to come along but if the length fields were size_t it would clean up implicit downcasts that you currently get from size_t to int on calls to strlen and the like on 64-bit systems. Trent From bjorn at roguewave.com Sat Apr 1 00:02:07 2000 From: bjorn at roguewave.com (Bjorn Pettersen) Date: Fri, 31 Mar 2000 15:02:07 -0700 Subject: [Python-Dev] Re: Python 1.6 alpha 1 released References: <200003312130.QAA04361@eric.cnri.reston.va.us> Message-ID: <38E5205F.DE811F61@roguewave.com> Guido van Rossum wrote: > > I've just released a source tarball and a Windows installer for Python > 1.6 alpha 1 to the Python website: > > http://www.python.org/1.6/ > > Probably the biggest news (if you hadn't heard the rumors) is Unicode > support. More news on the above webpage. > > Note: this is an alpha release. Some of the code is very rough! > Please give it a try with your favorite Python application, but don't > trust it for production use yet. I plan to release several more alpha > and beta releases over the next two months, culminating in an 1.6 > final release around June first. > > We need your help to make the final 1.6 release as robust as possible > -- please test this alpha release!!! > > --Guido van Rossum (home page: http://www.python.org/~guido/) Just read the announcement page, and found that socket.connect() no longer takes two arguments as was previously documented. If this change is staying I'm assuming the examples in the manual that uses a two argument socket.connect() will be changed? A quick look shows that this breaks all the network scripts I have installed (at least the ones that I found, undoubtedly there are many more). Because of this I will put any upgrade plans on hold. -- bjorn From tim_one at email.msn.com Sat Apr 1 02:55:54 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 31 Mar 2000 19:55:54 -0500 Subject: [Python-Dev] A surprising case of cyclic trash Message-ID: <000d01bf9b75$08bf58e0$1aa2143f@tim> This comes (indirectly) from a user of my doctest.py, who noticed that sometimes tempfiles created by his docstring tests got cleaned up (via __del__), but other times not. Here's a hard-won self-contained program illustrating the true cause: class Critical: count = 0 def __init__(self): Critical.count = Critical.count + 1 self.id = Critical.count print "acquiring Critical", self.id def __del__(self): print "releasing Critical", self.id good = "temp = Critical()\n" bad = "def f(): pass\n" + good basedict = {"Critical": Critical} for test in good, bad, good: print "\nStarting test case:" print test exec compile(test, "", "exec") in basedict.copy() And here's output: D:\Python>python misc\doccyc.py Starting test case: temp = Critical() acquiring Critical 1 releasing Critical 1 Starting test case: def f(): pass temp = Critical() acquiring Critical 2 Starting test case: temp = Critical() acquiring Critical 3 releasing Critical 3 D:\Python> That is, in the "bad" case, which differs from the "good" case merely in defining an unreferenced function, temp.__del__ not only doesn't get executed "when expected", it never gets executed at all. This appears to be due to a cycle between the function object and the anonymous dict passed to exec, causing the entire dict to become immortal, thus making "temp" immortal too. I can fiddle the doctest framework to manually nuke the temp dict it creates for execution context; the same kind of leak likely occurs in any exec'ed string that contains a function defn. For future reference, note that the finalizer in question belongs to an object not itself in a cycle, it's an object reachable only from a dead cycle. the-users-don't-stand-a-chance-ly y'rs - tim From tismer at tismer.com Sat Apr 1 16:55:50 2000 From: tismer at tismer.com (Christian Tismer) Date: Sat, 01 Apr 2000 16:55:50 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 References: Message-ID: <38E60DF6.9C4C9443@tismer.com> Moshe Zadka wrote: > > On Fri, 31 Mar 2000, Guido van Rossum wrote: > > > + Christian Tismer > > + Christian Tismer > > Ummmmm....I smell something fishy here. Are there two Christian Tismers? Yes! From time to time I'm re-doing my cloning experiments. This isn't so hard as it seems. The hard thing is to keep them from killing each other. BTW: I'm the second copy from the last experiment (the surviver). > That would explain how Christian has so much time to work on Stackless. > > Well, between the both of them, Guido will have no chance but to put > Stackless in the standard distribution. Guido is stronger, even between three of me :-) ciao - chris-and-the-undead-heresy -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From guido at python.org Sat Apr 1 19:00:00 2000 From: guido at python.org (Guido van Rossum) Date: Sat, 1 Apr 2000 12:00:00 -0500 EST Subject: [Python-Dev] New Features in Python 1.6 Message-ID: <200004011740.MAA04675@eric.cnri.reston.va.us> New Features in Python 1.6 ========================== With the recent release of Python 1.6 alpha 1, a lot of people have been wondering what's new. This short note aims to explain the major changes in Python 1.6. Core Language ------------- 1. Unicode strings Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating a Unicode string, while the double-quote character defaults to ASCII strings. If you need to create a Unicode string with double quotes, just preface it with the letter "u"; likewise, an ASCII string can be created by prefacing single quotes with the letter "a". For example: foo = 'hello' # Unicode foo = "hello" # ASCII foo = a'hello' # ASCII foo = u"hello" # Unicode You can still use the "r" character to quote strings in a manner convenient for the regular expression engine, but with subtle changes in semantics: see "New regular expression engine" below for more information. Also, for compatibility with most editors and operating systems, Python source code is still 7-bit ASCII. Thus, for portability it's best to write Unicode strings using one of two new escapes: \u and \N. \u lets you specify a Unicode character as a 16-bit hexadecimal number, and \N lets you specify it by name: message = 'Bienvenue \N{LATIN SMALL LETTER A WITH GRAVE} ' \ + 'Python fran\N{LATIN SMALL LETTER C WITH CEDILLA}ais!' message = 'Bienvenue \u00E0 Python fran\u00E7ais!' 2. string methods Python strings have grown methods, just like lists and dictionaries. For instance, to split a string on spaces, you can now say this: tokens = "foo bar baz".split(" ") Or, equivalently, this: tokens = " ".split("foo bar baz") (Python figures out which string is the delimiter and which is the string to split by examining both strings to see which one occurs more frequently inside the other.) Be careful not to mix Unicode and ASCII strings when doing this, though. Other examples: foo = "The quick red fox jumped over the lazy brown dog." foo.find("dog") foo.strip() foo.lower() Note that use of any string method on a particular string renders it mutable. This is for consistency with lists, which are mutable and have methods like 'append()' and 'sort()' that modify the list. Thus, "foo.strip()" modifies the string 'foo' in-place. "strip(foo)" retains its old behavior of returning a modified copy of 'foo'. 3. extended call syntax The variable argument list and keyword argument syntax introduced in Python 1.3 has been extended. Previously, it only worked in function/method signatures; calling other functions with the same arguments required the use of 'apply()' def spam(arg1,arg2,*more_args,**keyword_args): # ... apply(foo,(arg1,arg2) + more_args,keyword_args) Now it works for calling functions too. For consistency with C and C++, asterisks in the function signature become ampersands in the function body: foo(arg1,arg2,&more_args,&&keyword_args) 4. assignment to None now works In previous version of Python, values assigned to None were lost. For example, this code: (username,None,None,None,realname,homedir,None) = getpwuid(uid) would only preserve the user name, real name, and home directory fields from a password file entry -- everything else of interest was lost. In Python 1.6, you can meaningfully assign to None. In the above example, None would be replaced by a tuple containing the four values of interest. You can also use the variable argument list syntax here, for example: (username,password,uid,uid,*None) = getpwuid(uid) would set None to a tuple containing the last three elements of the tuple returned by getpwuid. Library ------- 1. Distutils In the past, lots of people have complained about the lack of a standard mechanism for distributing and installing Python modules. This has been fixed by the Distutils, or Distribution Utilities. We took the approach of leveraging past efforts in this area rather than reinventing a number of perfectly good wheels. Thus, the Distutils take advantage of a number of "best-of-breed" tools for distributing, configuring, building, and installing software. The core of the system is a set of m4 macros that augment the standard macros supplied by GNU Autoconf. Where the Autoconf macros generate shell code that becomes a configure script, the Distutils macros generate Python code that creates a Makefile. (This is a similar idea to Perl's MakeMaker system, but of course this Makefile builds Python modules and extensions!) Using the Distutils is easy: you write a script called "setup.in" which contains both Autoconf and Distutils m4 macros; the Autoconf macros are used to create a "configure" script which examines the target system to find out how to build your extensions there, and the Distutils macros create a "setup.py" script, which generates a Makefile that knows how to build your particular collection of modules. You process "setup.in" before distributing your modules, and bundle the resulting "configure" and "setup.py" with your modules. Then, the user just has to run "configure", "setup.py", and "make" to build everything. For example, here's a small, simple "setup.in" for a hypothetical module distribution that uses Autoconf to check for a C library "frob" and builds a Python extension called "_frob" and a pure Python module "frob": AC_INIT(frobmodule.c) AC_CHECK_HEADER(frob.h) AC_HAVE_LIBRARY(frob) AC_OUTPUT() DU_INIT(Frob,1.0) DU_EXTENSION(_frob,frobmodule.c,-lfrob) DU_MODULE(frob,frob.py) DU_OUTPUT(setup.py) First, you run this setup.in using the "prepare_dist" script; this creates "configure" and "setup.py": % prepare_dist Next, you configure the package and create a makefile: % ./configure % ./setup.py Finally, to create a source distribution, use the "sdist" target of the generated Makefile: % make sdist This creates Frob-1.0.tar.gz, which you can then share with the world. A user who wishes to install your extension would download Frob-1.0.tar.gz and create local, custom versions of the "configure" and "setup.py" scripts: % gunzip -c Frob-1.0.tar.gz | tar xf - % cd Frob-1.0 % ./configure % ./setup.py Then, she can build and install your modules: % make % make install Hopefully this will foster even more code sharing in the Python community, and prevent unneeded duplication of effort by module developers. Note that the Python installer for Windows now installs GNU m4, the bash shell, and Autoconf, so that Windows users will be able to use the Distutils just like on Unix. 2. Imputils Complementary to the Distutils are the Imputils, or Import Utilities. Python's import mechanism has been reworked to make it easy for Python programmers to put "hooks" into the code that finds and loads modules. The default import mechanism now includes hooks, written in Python, to load modules via HTTP from a known URL. This has allowed us to drop most of the standard library from the distribution. Now, for example, when you import a less-commonly-needed module from the standard library, Python fetches the code for you. For example, if you say import tokenize then Python -- via the Imputils -- will fetch http://modules.python.org/lib/tokenize.py for you and install it on your system for future use. (This is why the Python interpreter is now installed as a setuid binary under Unix -- if you turn off this bit, you will be unable to load modules from the standard library!) If you try to import a module that's not part of the standard library, then the Imputils will find out -- again from modules.python.org -- where it can find this module. It then downloads the entire relevant module distribution, and uses the Distutils to build and install it on your system. It then loads the module you requested. Simplicity itself! 3. New regular expression engine Python 1.6 includes a new regular expression engine, accessed through the "sre" module, to support Unicode strings. Be sure to use the *old* engine for ASCII strings, though: import re, sre # ... re.match(r"(\d+)", "The number is 42.") # ASCII sre.match(r'(\d+)', 'The number is \N{SUPERSCRIPT TWO}') # Unicode If you're not sure whether a string is ASCII or Unicode, you can always determine this at runtime: from types import * # ... if type(s) is StringType: m = re.match(r"...", s) elif type(s) is UnicodeType: m = sre.match(r'...', s) From gvwilson at nevex.com Sat Apr 1 20:01:13 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Sat, 1 Apr 2000 13:01:13 -0500 (EST) Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: > On Sat, 1 Apr 2000, Guido van Rossum wrote: > New Features in Python 1.6 > ========================== > [lots 'n' lots] > tokens = "foo bar baz".split(" ") > tokens = " ".split("foo bar baz") Has anyone started working up a style guide that'll recommend when to use these new methods, when to use the string module's calls, etc.? Ditto for the other changes --- where there are now two or more ways of doing something, how do I (or my students) tell which one is preferred? Greg p.s. "There's More Than One Way To Do It" == "No Matter How Much Of This Language You Learn, Other People's Code Will Always Look Strange" From gvwilson at nevex.com Sat Apr 1 20:45:16 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Sat, 1 Apr 2000 13:45:16 -0500 (EST) Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <000101bf9c09$71011bc0$182d153f@tim> Message-ID: > >> On Sat, 1 Apr 2000, Guido van Rossum wrote: > >> New Features in Python 1.6 > >> ========================== > >> [lots 'n' lots] > >> tokens = "foo bar baz".split(" ") > >> tokens = " ".split("foo bar baz") > >> [and Python guesses which to split on by studying the contents] > > > Has anyone started working up a style guide that'll recommend when > > to use these new methods, when to use the string module's calls, > > etc.? Ditto for the other changes --- where there are now two or > > more ways of doing something, how do I (or my students) tell which > > one is preferred? > > Greg, you should pay real close attention to the date on Guido's msg. > It's quite a comment on the state of programming languages in general > that this all reads sooooooo plausibly! Well, you have to remember, I'm the guy who asked for "<" to be a legal Python token :-). Greg From est at hyperreal.org Sun Apr 2 00:00:54 2000 From: est at hyperreal.org (est at hyperreal.org) Date: Sat, 1 Apr 2000 14:00:54 -0800 (PST) Subject: [Python-Dev] linuxaudiodev minimal test Message-ID: <20000401220054.13820.qmail@hyperreal.org> The appended script works for me. I think the module should be called something like OSS (since it uses the Open Sound System API) with a -I entry in Setup.in to indicate that this will probably need to be specified to find (e.g., -I/usr/include/linux for Linux, -I/usr/include/machine for FreeBSD...). I'm sure I'll have other suggestions for the module, but they'll have to wait until I finish moving to California. :) Best, Eric #!/usr/bin/python import linuxaudiodev import math, struct, fcntl, FCNTL a = linuxaudiodev.open('w') a.setparameters(44100, 16, 1, linuxaudiodev.AFMT_S16_LE) N = 500 data = apply(struct.pack, ['<%dh' % N] + map(lambda n: 32767 * math.sin((2 * math.pi * n) / N), range(N))) fd = a.fileno() fcntl.fcntl(fd, FCNTL.F_SETFL, ~FCNTL.O_NONBLOCK & fcntl.fcntl(fd, FCNTL.F_GETFL)) for i in xrange(200): a.write(data) From Vladimir.Marangozov at inrialpes.fr Sun Apr 2 01:30:46 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sun, 2 Apr 2000 01:30:46 +0200 (CEST) Subject: [Python-Dev] python -t gets confused? Message-ID: <200004012330.BAA10022@python.inrialpes.fr> The tab/space checking code in the tokenizer seems to get confused by the recently checked in test_pyexpat.py With python -t or -tt, there are inconsistency reports at places where there doesn't seem to be one. (tabnanny seems to be confused too, btw :) ./python -tt Lib/test/test_pyexpat.py File "Lib/test/test_pyexpat.py", line 13 print 'Start element:\n\t', name, attrs ^ SyntaxError: inconsistent use of tabs and spaces in indentation Thus, "make test" reports a failure on test_pyexpat due to a syntax error, instead of a missing optional feature (expat not compiled in). I'm not an expert of the tokenizer code, so someone might want to look at it and tell us what's going on. Without -t or -tt, the code runs fine. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond at skippinet.com.au Sun Apr 2 01:53:50 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun, 2 Apr 2000 09:53:50 +1000 Subject: [Python-Dev] string.ato? and Unicode Message-ID: Is this an over-sight, or by design? >>> string.atoi(u"1") ... TypeError: argument 1: expected string, unicode found It appears easy to support Unicode - there is already an explicit StringType check in these functions, and it simply delegates to int(), which already _does_ work for Unicode A patch would leave the following behaviour: >>> string.atio(u"1") 1 >>> string.atio(u"1", 16) ... TypeError: can't convert non-string with explicit base IMO, this is better than what we have now. I'll put together a patch if one is wanted... Mark. From tim_one at email.msn.com Sun Apr 2 06:14:23 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 1 Apr 2000 23:14:23 -0500 Subject: [Python-Dev] python -t gets confused? In-Reply-To: <200004012330.BAA10022@python.inrialpes.fr> Message-ID: <000601bf9c59$ef8a3da0$752d153f@tim> [Vladimir Marangozov] > The tab/space checking code in the tokenizer seems to get confused > by the recently checked in test_pyexpat.py > > With python -t or -tt, there are inconsistency reports at places where > there doesn't seem to be one. (tabnanny seems to be confused too, btw :) They're not confused, they're simply reporting that the indentation is screwed up in this file -- which it is. It mixes tabs and spaces in ambiguous ways. > ... > I'm not an expert of the tokenizer code, so someone might want to look > at it and tell us what's going on. Without -t or -tt, the code runs fine. If you set your editor to believe that tab chars are 4 columns (as my Windows editor does), the problem (well, problems -- many lines are flawed) will be obvious. It runs anyway because tab=8 is hardcoded in the Python parser. Quickest fix is for someone at CNRI to just run this thru one of the Unix detabifier programs. From tim_one at email.msn.com Sun Apr 2 08:18:28 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 2 Apr 2000 01:18:28 -0500 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <200003311547.KAA15538@eric.cnri.reston.va.us> Message-ID: <000c01bf9c6b$428af560$752d153f@tim> > The Windows installer is always hard to get just right. ... > ... > I'd love to hear that it also installs cleanly on Windows 95. Please > test IDLE from the start menu! All worked without incident for me under Win95. Nice! Would still prefer that it install to D:\Python-1.6\ by default, though (instead of burying it under "Program Files" -- if you're not on the Help list, you can't believe how hard it is to explain how to deal with embedded spaces in paths). So far I've seen one system crash in TK83.DLL upon closing an IDLE window, but haven't been able to reproduce. OK, I can, it's easy: Open IDLE. Ctrl+O, then navigate to e.g. Tools\idle\config.txt and open it. Click the "close window" button. Boom -- invalid page fault in TK83.DLL. No time to dig further now. From tim_one at email.msn.com Sun Apr 2 08:18:31 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 2 Apr 2000 01:18:31 -0500 Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: Message-ID: <000d01bf9c6b$447957e0$752d153f@tim> [Peter Funk] > -1 for C reformatting. The 4 space intendation seesm reasonable for > Python sources, but I disaggree for C code. C is not Python. Code is code. The project I work on professionally is a half million lines of C++, and 4-space indents are rigidly enforced -- works great. It makes just as much sense for C as for Python, and for all the same reasons. The one formal study I've seen on this showed that comprehension levels peaked at indent levels of 3 and 4, dropping off on both sides. However, tabs in C is one of Guido's endearing inconsistencies, and we don't want to lose the only two of those he has (his other is trying to avoid curly braces whenever possible in C, perhaps out of the same perverse sense of pride I used to take in avoiding redundant semicolons in Pascal <;{} wink>. From pf at artcom-gmbh.de Sun Apr 2 10:03:29 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Sun, 2 Apr 2000 10:03:29 +0200 (MEST) Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: <000d01bf9c6b$447957e0$752d153f@tim> from Tim Peters at "Apr 2, 2000 1:18:31 am" Message-ID: Hi! > [Peter Funk] > > -1 for C reformatting. The 4 space intendation seesm reasonable for > > Python sources, but I disaggree for C code. C is not Python. Tim Peters: > Code is code. The project I work on professionally is a half million lines > of C++, and 4-space indents are rigidly enforced -- works great. It makes > just as much sense for C as for Python, and for all the same reasons. The > one formal study I've seen on this showed that comprehension levels peaked > at indent levels of 3 and 4, dropping off on both sides. Sigh... Well, if the Python-Interpreter C sources were indented with 4 spaces from the very beginning, I would have kept my mouth shut! But as we can't get the whole world to aggree on how to indent C-Sources, we should at least try to avoid the loss off energy and time, the debate on this topic will cause. So what's my point? IMO reformatting the C-sources wouldn't do us any favor. There will always be people, who like another indentation style more. The GNU software and the Linux kernel have set some standards within the open source community. These projects represent a reasonable fraction of programmers that may be potential contributors to other open source projects. So the only effect a reformatting from 8 to 4 space indents would be to disturb the "8-spacers" and causing endless discussions like this one. Period. > However, tabs in C is one of Guido's endearing inconsistencies, and we don't > want to lose the only two of those he has (his other is trying to > avoid curly braces whenever possible in C, perhaps out of the same perverse > sense of pride I used to take in avoiding redundant semicolons in Pascal > <;{} wink>. Aggreed. Best reagrds, Peter From effbot at telia.com Sun Apr 2 10:37:11 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sun, 2 Apr 2000 10:37:11 +0200 Subject: [Python-Dev] SRE: regex.set_syntax Message-ID: <004701bf9c7e$a5045480$34aab5d4@hagrid> one of my side projects for SRE is to create a regex-compatible frontend. since both engines have NFA semantics, this mostly involves writing an alternate parser. however, when I started playing with that, I completely forgot about the regex.set_syntax() function. supporting one extra syntax isn't that much work, but a whole bunch of them? so what should we do? 1. completely get rid of regex (bjorn would love that, don't you think?) 2. remove regex.set_syntax(), and tell people who've used it that they're SOL. 3. add all the necessary flags to the new parser... 4. keep regex around as before, and live with the extra code bloat. comments? From pf at artcom-gmbh.de Sun Apr 2 14:49:26 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Sun, 2 Apr 2000 14:49:26 +0200 (MEST) Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 1, 2000 12: 0: 0 pm" Message-ID: Hi! Guido van Rossum on april 1st: [...] > With the recent release of Python 1.6 alpha 1, a lot of people have > been wondering what's new. This short note aims to explain the major > changes in Python 1.6. [...] > Python strings can now be stored as Unicode strings. To make it easier > to type Unicode strings, the single-quote character defaults to creating -------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > a Unicode string, while the double-quote character defaults to ASCII ----^^^^^^^^^^^^^^ > strings. As I read this my first thoughts were: "Huh? Is that really true? To me this sounds like a april fools joke. But to be careful I checked first before I read on: pf at artcom0:ttyp4 ~/archiv/freeware/python/CVS_01_04_00/dist/src 41> ./python Python 1.6a1 (#2, Apr 1 2000, 19:19:18) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 'a' 'a' >>> '?' '\344' >>> u'?' u'\344' Since www.python.org happens to be down at that moment, I was unable to check, whether my CVS tarball I downloaded from Davids starship account was recent enough and whether this single-quote-defaults-to-unicode has been discussed earlier before I got subscribed to python-dev. Better I should have read on first, before starting to wonder... [...] > tokens = "foo bar baz".split(" ") > Or, equivalently, this: > tokens = " ".split("foo bar baz") > > (Python figures out which string is the delimiter and which is the > string to split by examining both strings to see which one occurs more > frequently inside the other.) Now it becomes clearer that this *must* be an april fools joke! ;-) : >>> tokens = "foo bar baz".split(" ") >>> print tokens ['foo', 'bar', 'baz'] >>> tokens = " ".split("foo bar baz") >>> print tokens [' '] [...] > Note that use of any string method on a particular string renders it > mutable. [...] > For consistency with C and C++, > asterisks in the function signature become ampersands in the function > body: [...] > load modules via HTTP from a known URL. [...] > This has allowed us to drop most of the standard library from the > distribution... [...] Pheeew... Oh Well. And pigs can fly. Sigh! ;-) That was a well prepared April fools joke! Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From tismer at tismer.com Sun Apr 2 15:53:12 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 02 Apr 2000 15:53:12 +0200 Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) References: Message-ID: <38E750C8.A559DF19@tismer.com> Peter Funk wrote: > > Hi! > > Guido van Rossum on april 1st: [turns into a Perli for a moment - well done! ] ... > Since www.python.org happens to be down at that moment, I was unable to check, > whether my CVS tarball I downloaded from Davids starship account > was recent enough and whether this single-quote-defaults-to-unicode > has been discussed earlier before I got subscribed to python-dev. Better > I should have read on first, before starting to wonder... You should not give up when python.org is down. As a fallback, I used to use www.cwi.nl which appears to be quite up-to-date. You can find the files and the *true* change list at http://www.cwi.nl/www.python.org/1.6/ Note that today is April 2, so you may believe me at-least-not-less-than-usually - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From fdrake at acm.org Sun Apr 2 22:34:39 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sun, 2 Apr 2000 16:34:39 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <004701bf9c7e$a5045480$34aab5d4@hagrid> References: <004701bf9c7e$a5045480$34aab5d4@hagrid> Message-ID: <14567.44767.357265.167396@seahag.cnri.reston.va.us> Fredrik Lundh writes: > 1. completely get rid of regex (bjorn would love that, > don't you think?) The regex module has been documented as obsolete for a while now. Just leave the module alone and will disappear in time. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Mon Apr 3 00:11:02 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 00:11:02 +0200 Subject: [Python-Dev] string.ato? and Unicode References: Message-ID: <38E7C576.5D3530E4@lemburg.com> Mark Hammond wrote: > > Is this an over-sight, or by design? > > >>> string.atoi(u"1") > ... > TypeError: argument 1: expected string, unicode found Probably an oversight... and it may well not be the only one: there are many explicit string checks in the code which might need to be fixed for Unicode support. As for string.ato? I'm not sure: these functions are obsoleted by int(), float() and long(). > It appears easy to support Unicode - there is already an explicit > StringType check in these functions, and it simply delegates to > int(), which already _does_ work for Unicode Right. I fixed the above three APIs to support Unicode. > A patch would leave the following behaviour: > >>> string.atio(u"1") > 1 > >>> string.atio(u"1", 16) > ... > TypeError: can't convert non-string with explicit base > > IMO, this is better than what we have now. I'll put together a > patch if one is wanted... BTW, the code in string.py for atoi() et al. looks really complicated: """ def atoi(*args): """atoi(s [,base]) -> int Return the integer represented by the string s in the given base, which defaults to 10. The string s must consist of one or more digits, possibly preceded by a sign. If base is 0, it is chosen from the leading characters of s, 0 for octal, 0x or 0X for hexadecimal. If base is 16, a preceding 0x or 0X is accepted. """ try: s = args[0] except IndexError: raise TypeError('function requires at least 1 argument: %d given' % len(args)) # Don't catch type error resulting from too many arguments to int(). The # error message isn't compatible but the error type is, and this function # is complicated enough already. if type(s) == _StringType: return _apply(_int, args) else: raise TypeError('argument 1: expected string, %s found' % type(s).__name__) """ Why not simply... def atoi(s, base=10): return int(s, base) dito for atol() and atof()... ?! This would not only give us better performance, but also Unicode support for free. (I'll fix int() and long() to accept Unicode when using an explicit base too.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Mon Apr 3 11:44:52 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 02:44:52 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <000c01bf9c6b$428af560$752d153f@tim> Message-ID: On Sun, 2 Apr 2000, Tim Peters wrote: > > The Windows installer is always hard to get just right. ... > > ... > > I'd love to hear that it also installs cleanly on Windows 95. Please > > test IDLE from the start menu! > > All worked without incident for me under Win95. Nice! Would still prefer > that it install to D:\Python-1.6\ by default, though (instead of burying it > under "Program Files" -- if you're not on the Help list, you can't believe > how hard it is to explain how to deal with embedded spaces in paths). Ack! No way... Keep my top-level clean! :-) This is Windows. Apps go into Program Files. That is Just The Way It Is. When was the last time you saw /python on a Unix box? Never? Always in .../bin/? Thought so. Cheers, -g -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Mon Apr 3 11:55:53 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 3 Apr 2000 11:55:53 +0200 Subject: [Python-Dev] Windows installer pre-prelease References: Message-ID: <004f01bf9d52$ce40de20$34aab5d4@hagrid> Greg Stein wrote: > > All worked without incident for me under Win95. Nice! Would still prefer > > that it install to D:\Python-1.6\ by default, though (instead of burying it > > under "Program Files" -- if you're not on the Help list, you can't believe > > how hard it is to explain how to deal with embedded spaces in paths). > > Ack! No way... Keep my top-level clean! :-) > > This is Windows. Apps go into Program Files. That is Just The Way It Is. if you're on a US windows box, sure. but "Program Files" isn't exactly an international standard... we install our python distribution under the \py, and we get lot of positive responses. as far as I remember, nobody has ever reported problems setting up the path... > When was the last time you saw /python on a Unix box? Never? Always in > .../bin/? Thought so. if the Unix designers had come up with the bright idea of translating "bin" to "whatever might seem to make sense in this language", I think you'd see many more non-std in- stallations under Unix... especially if they'd made the root directory writable to everyone :-) From gstein at lyra.org Mon Apr 3 12:08:54 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:08:54 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <004f01bf9d52$ce40de20$34aab5d4@hagrid> Message-ID: On Mon, 3 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > > All worked without incident for me under Win95. Nice! Would still prefer > > > that it install to D:\Python-1.6\ by default, though (instead of burying it > > > under "Program Files" -- if you're not on the Help list, you can't believe > > > how hard it is to explain how to deal with embedded spaces in paths). > > > > Ack! No way... Keep my top-level clean! :-) > > > > This is Windows. Apps go into Program Files. That is Just The Way It Is. > > if you're on a US windows box, sure. but "Program Files" > isn't exactly an international standard... Yes it is... if you use the appropriate Windows APIs (or registry... forget where). Windows specifies a way to get the localized name for Program Files. > we install our python distribution under the \py, > and we get lot of positive responses. as far as I remember, > nobody has ever reported problems setting up the path... *shrug* This doesn't dispute the standard Windows recommendation to install software into Program Files. > > When was the last time you saw /python on a Unix box? Never? Always in > > .../bin/? Thought so. > > if the Unix designers had come up with the bright idea of > translating "bin" to "whatever might seem to make sense > in this language", I think you'd see many more non-std in- > stallations under Unix... especially if they'd made the root > directory writable to everyone :-) heh :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Mon Apr 3 12:18:30 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:18:30 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: Message-ID: I don't recall the termination of the discussion, but I don't know that consensus was ever reached. Personally, I find this of little value over the similar (not exact) code: def supplement(dict, extra): d = extra.copy() d.update(dict) return d If the dictionary needs to be modified in place, then the loop from your UserDict.supplement would be used. Another view: why keep adding methods to service all possible needs? Cheers, -g On Mon, 3 Apr 2000, Peter Funk wrote: > Dear Python patcher! > > Please consider to apply the patch appended below and commit into the CVS tree. > It applies to: Python 1.6a1 as released on april 1st. > --=-- argument: --=--=--=--=--=--=--=--=--=--=-->8--=- > This patch adds a new method to dictionary and UserDict objects: > '.supplement()' is a "sibling" of '.update()', but it add only > those items that are not already there instead of replacing them. > > This idea has been discussed on python-dev last month. > --=-- obligatory disclaimer: -=--=--=--=--=--=-->8--=- > I confirm that, to the best of my knowledge and belief, this > contribution is free of any claims of third parties under > copyright, patent or other rights or interests ("claims"). To > the extent that I have any such claims, I hereby grant to CNRI a > nonexclusive, irrevocable, royalty-free, worldwide license to > reproduce, distribute, perform and/or display publicly, prepare > derivative versions, and otherwise use this contribution as part > of the Python software and its related documentation, or any > derivative versions thereof, at no cost to CNRI or its licensed > users, and to authorize others to do so. > > I acknowledge that CNRI may, at its sole discretion, decide > whether or not to incorporate this contribution in the Python > software and its related documentation. I further grant CNRI > permission to use my name and other identifying information > provided to CNRI by me for use in connection with the Python > software and its related documentation. > --=-- dry signature: =--=--=--=--=--=--=--=--=-->8--=- > Regards, Peter > -- > Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 > office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) > --=-- patch: --=--=--=--=--=--=--=--=--=--=--=-->8--=- > *** ../../cvs_01_04_00_orig/dist/src/Objects/dictobject.c Fri Mar 31 11:45:02 2000 > --- src/Objects/dictobject.c Mon Apr 3 10:30:11 2000 > *************** > *** 734,739 **** > --- 734,781 ---- > } > > static PyObject * > + dict_supplement(mp, args) > + register dictobject *mp; > + PyObject *args; > + { > + register int i; > + dictobject *other; > + dictentry *entry, *oldentry; > + if (!PyArg_Parse(args, "O!", &PyDict_Type, &other)) > + return NULL; > + if (other == mp) > + goto done; /* a.supplement(a); nothing to do */ > + /* Do one big resize at the start, rather than incrementally > + resizing as we insert new items. Expect that there will be > + no (or few) overlapping keys. */ > + if ((mp->ma_fill + other->ma_used)*3 >= mp->ma_size*2) { > + if (dictresize(mp, (mp->ma_used + other->ma_used)*3/2) != 0) > + return NULL; > + } > + for (i = 0; i < other->ma_size; i++) { > + entry = &other->ma_table[i]; > + if (entry->me_value != NULL) { > + oldentry = lookdict(mp, entry->me_key, entry->me_hash); > + if (oldentry->me_value == NULL) { > + /* TODO: optimize: > + 'insertdict' does another call to 'lookdict'. > + But for sake of readability and symmetry with > + 'dict_update' I didn't tried to avoid this. > + At least not now as we go into 1.6 alpha. */ > + Py_INCREF(entry->me_key); > + Py_INCREF(entry->me_value); > + insertdict(mp, entry->me_key, entry->me_hash, > + entry->me_value); > + } > + } > + } > + done: > + Py_INCREF(Py_None); > + return Py_None; > + } > + > + > + static PyObject * > dict_copy(mp, args) > register dictobject *mp; > PyObject *args; > *************** > *** 1045,1050 **** > --- 1087,1093 ---- > {"clear", (PyCFunction)dict_clear}, > {"copy", (PyCFunction)dict_copy}, > {"get", (PyCFunction)dict_get, METH_VARARGS}, > + {"supplement", (PyCFunction)dict_supplement}, > {NULL, NULL} /* sentinel */ > }; > > *** ../../cvs_01_04_00_orig/dist/src/Lib/test/test_types.py Wed Feb 23 23:23:17 2000 > --- src/Lib/test/test_types.py Mon Apr 3 10:41:53 2000 > *************** > *** 242,247 **** > --- 242,250 ---- > d.update({2:20}) > d.update({1:1, 2:2, 3:3}) > if d != {1:1, 2:2, 3:3}: raise TestFailed, 'dict update' > + d.supplement({1:"not", 2:"neither", 4:4}) > + if d != {1:1, 2:2, 3:3, 4:4}: raise TestFailed, 'dict supplement' > + del d[4] > if d.copy() != {1:1, 2:2, 3:3}: raise TestFailed, 'dict copy' > if {}.copy() != {}: raise TestFailed, 'empty dict copy' > # dict.get() > *** ../../cvs_01_04_00_orig/dist/src/Lib/UserDict.py Wed Feb 2 16:10:14 2000 > --- src/Lib/UserDict.py Mon Apr 3 10:45:17 2000 > *************** > *** 32,36 **** > --- 32,45 ---- > else: > for k, v in dict.items(): > self.data[k] = v > + def supplement(self, dict): > + if isinstance(dict, UserDict): > + self.data.supplement(dict.data) > + elif isinstance(dict, type(self.data)): > + self.data.supplement(dict) > + else: > + for k, v in dict.items(): > + if not self.data.has_key(k): > + self.data[k] = v > def get(self, key, failobj=None): > return self.data.get(key, failobj) > *** ../../cvs_01_04_00_orig/dist/src/Lib/test/test_userdict.py Fri Mar 26 16:32:02 1999 > --- src/Lib/test/test_userdict.py Mon Apr 3 10:50:29 2000 > *************** > *** 93,101 **** > --- 93,109 ---- > t.update(u2) > assert t == u2 > > + # Test supplement > + > + t = UserDict(d1) > + t.supplement(u2) > + assert t == u2 > + > # Test get > > for i in u2.keys(): > assert u2.get(i) == u2[i] > assert u1.get(i) == d1.get(i) > assert u0.get(i) == d0.get(i) > + > + # TODO: Add a test using dir({}) to test for unimplemented methods > > _______________________________________________ > Patches mailing list > Patches at python.org > http://www.python.org/mailman/listinfo/patches > -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Mon Apr 3 12:25:05 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 3 Apr 2000 12:25:05 +0200 Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' References: Message-ID: <008b01bf9d57$0555fc20$34aab5d4@hagrid> Greg Stein wrote: > I don't recall the termination of the discussion, but I don't know that > consensus was ever reached. iirc, Ping liked it, but I'm not sure anybody else contributed much to that thread... (and to neutralize Ping, just let me say that I don't like it :-) > Personally, I find this of little value over the similar (not exact) code: > > def supplement(dict, extra): > d = extra.copy() > d.update(dict) > return d has anyone benchmarked this? for some reason, I doubt that the difference between copy/update and supplement is that large... > Another view: why keep adding methods to service all possible needs? exactly. From effbot at telia.com Mon Apr 3 12:31:42 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 3 Apr 2000 12:31:42 +0200 Subject: [Python-Dev] Windows installer pre-prelease References: Message-ID: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Greg Stein wrote: > > we install our python distribution under the \py, > > and we get lot of positive responses. as far as I remember, > > nobody has ever reported problems setting up the path... > > *shrug* This doesn't dispute the standard Windows recommendation to > install software into Program Files. no, but Tim's and my experiences from doing user support show that the standard Windows recommendation doesn't work for command line applications. we don't care about Microsoft, we care about Python's users. to quote a Linus Torvalds, "bad standards _should_ be broken" (after all, Microsoft doesn't put their own command line applications down there -- there's no "\Program Files" [sub]directory in the default PATH, at least not on any of my boxes. maybe they've changed that in Windows 2000?) From gstein at lyra.org Mon Apr 3 12:49:27 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:49:27 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Message-ID: On Mon, 3 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > > we install our python distribution under the \py, > > > and we get lot of positive responses. as far as I remember, > > > nobody has ever reported problems setting up the path... > > > > *shrug* This doesn't dispute the standard Windows recommendation to > > install software into Program Files. > > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. Valid point. But there are other solutions, too. VC distributes a thing named "VCVARS.BAT" to set up paths and other environ vars. Python could certainly do the same thing (to overcome the embedded-space issue). > to quote a Linus Torvalds, "bad standards _should_ be broken" Depends on the audience of that standard. Programmers: yah. Consumers? They just want the damn thing to work like they expect it to. That expectation is usually "I can find my programs in Program Files." > (after all, Microsoft doesn't put their own command line applications > down there -- there's no "\Program Files" [sub]directory in the default > PATH, at least not on any of my boxes. maybe they've changed that > in Windows 2000?) Incorrect. Site Server had command-line tools down there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ajung at sz-sb.de Mon Apr 3 13:17:20 2000 From: ajung at sz-sb.de (Andreas Jung) Date: Mon, 3 Apr 2000 13:17:20 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us>; from guido@python.org on Sat, Apr 01, 2000 at 12:00:00PM -0500 References: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: <20000403131720.A10313@sz-sb.de> On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: > > Python strings can now be stored as Unicode strings. To make it easier > to type Unicode strings, the single-quote character defaults to creating > a Unicode string, while the double-quote character defaults to ASCII > strings. If you need to create a Unicode string with double quotes, > just preface it with the letter "u"; likewise, an ASCII string can be > created by prefacing single quotes with the letter "a". For example: > > foo = 'hello' # Unicode > foo = "hello" # ASCII Is single-quoting for creating unicode clever ? I think there might be a problem with old code when the operations on unicode strings are not 100% compatible to the standard string operations. I don't know if this is a real problem - it's just a point for discussion. Cheers, Andreas From pf at artcom-gmbh.de Mon Apr 3 13:12:25 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 3 Apr 2000 13:12:25 +0200 (MEST) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: <008b01bf9d57$0555fc20$34aab5d4@hagrid> from Fredrik Lundh at "Apr 3, 2000 12:25: 5 pm" Message-ID: Hi! > Greg Stein wrote: > > I don't recall the termination of the discussion, but I don't know that > > consensus was ever reached. > Fredrik Lundh: > iirc, Ping liked it, but I'm not sure anybody else contributed > much to that thread... That was my impression: It is hard to guess what you guys think from mere silence. ;-) > (and to neutralize Ping, just let me say that I don't like it :-) > > > Personally, I find this of little value over the similar (not exact) code: > > > > def supplement(dict, extra): [...] > > Another view: why keep adding methods to service all possible needs? > > exactly. A agree that we should avoid adding new methods all over the place. But IMO this is an exception: I proposed it for the sake of symmetry with 'update'. From my POV 'supplement' relates to 'update' as '+' relates to '-'. YMMV and I will not be angry, if this idea will be finally rejected. But it would have saved me an hour or two of coding and testing time if you had expressed your opinions a little bit earlier. ;-) But I know: you are all busy. To get an impression of possible uses for supplement, I sketch some code here: class MysticMegaWidget(MyMegaWidget): _config = { horizontal_elasticity = 1000, vertical_elasticity = 10, mentalplex_fg_color = "#FF0000", mentalplex_bg_color = "#0000FF", font = "Times", } def __init__(self, *args, **kw): if kw: self._config = kw self._config.supplement(self.__class__._config) .... Of course this can also be implemented using 'copy' and 'update'. It's only slightly more complicated. But you can also emulate any boolean operation using only NAND. Nevertheless any serious programming language contains at least OR, AND, NOT and possibly XOR. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mal at lemburg.com Mon Apr 3 13:48:05 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 13:48:05 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 References: <200004011740.MAA04675@eric.cnri.reston.va.us> <20000403131720.A10313@sz-sb.de> Message-ID: <38E884F5.8F2FB271@lemburg.com> Andreas Jung wrote: > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: The above line has all the answers ;-) ... > > Python strings can now be stored as Unicode strings. To make it easier > > to type Unicode strings, the single-quote character defaults to creating > > a Unicode string, while the double-quote character defaults to ASCII > > strings. If you need to create a Unicode string with double quotes, > > just preface it with the letter "u"; likewise, an ASCII string can be > > created by prefacing single quotes with the letter "a". For example: > > > > foo = 'hello' # Unicode > > foo = "hello" # ASCII > > Is single-quoting for creating unicode clever ? I think there might be a problem > with old code when the operations on unicode strings are not 100% compatible to > the standard string operations. I don't know if this is a real problem - it's > just a point for discussion. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Mon Apr 3 14:22:17 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 3 Apr 2000 22:22:17 +1000 Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <38E884F5.8F2FB271@lemburg.com> Message-ID: > > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van > Rossum wrote: > > The above line has all the answers ;-) ... That was pretty sneaky tho! Had the added twist of being half-true... Mark. From mal at lemburg.com Mon Apr 3 14:59:21 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 14:59:21 +0200 Subject: [Python-Dev] Unicode and numerics Message-ID: <38E895A9.94504851@lemburg.com> I've just posted a new patch set to the patches list which contains better support for Unicode in the int(), long(), float() and complex() builtins. There are some new APIs now which can be used by extension writer to convert from Unicode to integers, floats and longs. These APIs are fully Unicode aware, meaning that you can also pass them any Unicode characters with decimal mappings, not only the standard ASCII '0'-'9' ones. One thing I noticed, which needs some discussion: There are two separate APIs which convert long string literals to long objects: PyNumber_Long() and PyLong_FromString(). The first applies the same error checking as does the PyInt_FromString() API, while the latter does not apply this check... Question is: shouldn't the check for truncated data ("9.5" -> 9L) be moved into PyLong_FromString() ? BTW, should I also post patches to string.py which use the simplified versions for string.ato?() I posted a few days ago ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Apr 3 15:12:58 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 15:12:58 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 References: Message-ID: <38E898DA.B69D7ED6@lemburg.com> Mark Hammond wrote: > > > > > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van > > Rossum wrote: > > > > The above line has all the answers ;-) ... > > That was pretty sneaky tho! Had the added twist of being > half-true... ... and on time like a CRON-job ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Mon Apr 3 16:11:55 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 3 Apr 2000 16:11:55 +0200 (CEST) Subject: [Python-Dev] Suggested PyMem & PyObject_NEW includes (fwd) Message-ID: <200004031411.QAA12486@python.inrialpes.fr> Vladimir Marangozov wrote: From Vladimir.Marangozov at inrialpes.fr Mon Apr 3 16:07:43 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 3 Apr 2000 16:07:43 +0200 (CEST) Subject: Suggested PyMem & PyObject_NEW includes Message-ID: Sorry for the delay -- I simply could't progress on this as I wanted to. Here's the includes I suggest for PyMem and PyObject_New cs. PyMem is okay. Some questions arise with PyObject_NEW. 1) I'm willing to unify the implementation on Windows and Unix (so I'm retaining the Windows variant of _PyObject_New reincarnated by PyObject_FromType -- see the comments in pyobjimpl.h). 2) For the user, there's the principle to use functions if binary compatibility is desired, and macros if she needs to trade compatibility for speed. But there's the issue of allocating the user objects with PyMem, or, allocate the objects with a custom allocator. After scratching my head on how to preserve bin compatibility with old libraries and offer the freedom to the user, I ended with the following (subject to discussion): - Use the functions for bin compat (but have also an exception _PyObject_Del(), with a leading underscore, for the core...) Objects in this case are allocated with PyMem. - Use the macros for allocating the objects with the potentially custom allocator (through malloc, realloc, free -- see below) What do you think? -----------------------------[ mymalloc.h ]--------------------------- ... /* * Core memory allocator * ===================== */ /* To make sure the interpreter is user-malloc friendly, all memory and object APIs are implemented on top of this one. The PyCore_* macros can be changed to make the interpreter use a custom allocator. Note that they are for internal use only. Both the core and extension modules should use the PyMem_* API. */ #define PyCore_MALLOC_FUNC malloc #define PyCore_REALLOC_FUNC realloc #define PyCore_FREE_FUNC free #define PyCore_MALLOC_PROTO Py_PROTO((size_t)) #define PyCore_REALLOC_PROTO Py_PROTO((ANY *, size_t)) #define PyCore_FREE_PROTO Py_PROTO((ANY *)) #define PyCore_MALLOC(n) PyCore_MALLOC_FUNC(n) #define PyCore_REALLOC(p, n) PyCore_REALLOC_FUNC((p), (n)) #define PyCore_FREE(p) PyCore_FREE_FUNC(p) /* The following should never be necessary */ #ifdef NEED_TO_DECLARE_MALLOC_AND_FRIEND extern ANY *PyCore_MALLOC_FUNC PyCore_MALLOC_PROTO; extern ANY *PyCore_REALLOC_FUNC PyCore_REALLOC_PROTO; extern void PyCore_FREE_FUNC PyCore_FREE_PROTO; #endif /* BEWARE: Each interface exports both functions and macros. Extension modules should normally use the functions for ensuring binary compatibility of the user's code across Python versions. Subsequently, if Python switches to its own malloc (different from standard malloc), no recompilation is required for the extensions. The macro versions trade compatibility for speed. They can be used whenever there is a performance problem, but their use implies recompilation of the code for each new Python release. The Python core uses the macros because it *is* compiled on every upgrade. This might not be the case with 3rd party extensions in a custom setup (for example, a customer does not always have access to the source of 3rd party deliverables). You have been warned! */ /* * Raw memory interface * ==================== */ /* Functions */ /* Two sets of function wrappers around malloc and friends; useful if you need to be sure that you are using the same memory allocator as Python. Note that the wrappers make sure that allocating 0 bytes returns a non-NULL pointer, even if the underlying malloc doesn't. */ /* These wrappers around malloc call PyErr_NoMemory() on failure */ extern DL_IMPORT(ANY *) Py_Malloc Py_PROTO((size_t)); extern DL_IMPORT(ANY *) Py_Realloc Py_PROTO((ANY *, size_t)); extern DL_IMPORT(void) Py_Free Py_PROTO((ANY *)); /* These wrappers around malloc *don't* call anything on failure */ extern DL_IMPORT(ANY *) PyMem_Malloc Py_PROTO((size_t)); extern DL_IMPORT(ANY *) PyMem_Realloc Py_PROTO((ANY *, size_t)); extern DL_IMPORT(void) PyMem_Free Py_PROTO((ANY *)); /* Macros */ #define PyMem_MALLOC(n) PyCore_MALLOC(n) #define PyMem_REALLOC(p, n) PyCore_REALLOC((ANY *)(p), (n)) #define PyMem_FREE(p) PyCore_FREE((ANY *)(p)) /* * Type-oriented memory interface * ============================== */ /* Functions */ #define PyMem_New(type, n) \ ( (type *) PyMem_Malloc((n) * sizeof(type)) ) #define PyMem_Resize(p, type, n) \ ( (p) = (type *) PyMem_Realloc((n) * sizeof(type)) ) #define PyMem_Del(p) PyMem_Free(p) /* Macros */ #define PyMem_NEW(type, n) \ ( (type *) PyMem_MALLOC(_PyMem_EXTRA + (n) * sizeof(type)) ) #define PyMem_RESIZE(p, type, n) \ if ((p) == NULL) \ (p) = (type *) PyMem_MALLOC( \ _PyMem_EXTRA + (n) * sizeof(type)); \ else \ (p) = (type *) PyMem_REALLOC((p), \ _PyMem_EXTRA + (n) * sizeof(type)) #define PyMem_DEL(p) PyMem_FREE(p) /* PyMem_XDEL is deprecated. To avoid the call when p is NULL, it's recommended to write the test explicitely in the code. Note that according to ANSI C, free(NULL) has no effect. */ #define PyMem_XDEL(p) if ((p) == NULL) ; else PyMem_DEL(p) ... -----------------------------[ mymalloc.h ]--------------------------- ... /* Functions and macros for modules that implement new object types. You must first include "object.h". PyObject_New(type, typeobj) allocates memory for a new object of the given type; here 'type' must be the C structure type used to represent the object and 'typeobj' the address of the corresponding type object. Reference count and type pointer are filled in; the rest of the bytes of the object are *undefined*! The resulting expression type is 'type *'. The size of the object is actually determined by the tp_basicsize field of the type object. PyObject_NewVar(type, typeobj, n) is similar but allocates a variable-size object with n extra items. The size is computed as tp_basicsize plus n * tp_itemsize. This fills in the ob_size field as well. PyObject_Del(op) releases the memory allocated for an object. Two versions of the object constructors/destructors are provided: 1) PyObject_{New, NewVar, Del} delegate the allocation of the objects to the Python allocator which places them within the bounds of the Python heap. This way, Python keeps control on the user's objects regarding their memory management; for instance, they may be subject to automatic garbage collection, once their reference count drops to zero. Binary compatibility is preserved and there's no need to recompile the extension every time a new Python release comes out. 2) PyObject_{NEW, NEW_VAR, DEL} use the allocator of the extension module which *may* differ from the one used by the Python library. Typically, in a C++ module one may wish to redefine the default allocation strategy by overloading the operators new and del. In this case, however, the extension does not cooperate with the Python memory manager. The latter has no control on the user's objects as they won't be allocated within the Python heap. Therefore, automatic garbage collection may not be performed, binary compatibility is not guaranteed and recompilation is required on every new Python release. Unless a specific memory management is needed, it's recommended to use 1). */ /* In pre-Python-1.6 times, only the PyObject_{NEW, NEW_VAR} macros were defined in terms of internal functions _PyObject_{New, NewVar}, the implementation of which used to differ for Windows and non-Windows platforms (see object.c -- these functions are left for backwards compatibility with old libraries). Starting from 1.6, an unified interface was introduced for both 1) & 2) */ extern DL_IMPORT(PyObject *) PyObject_FromType Py_PROTO((PyTypeObject *, PyObject *)); extern DL_IMPORT(PyVarObject *) PyObject_VarFromType Py_PROTO((PyTypeObject *, int, PyVarObject *)); extern DL_IMPORT(void) PyObject_Del Py_PROTO((PyObject *)); /* Functions */ #define PyObject_New(type, typeobj) \ ((type *) PyObject_FromType(typeobj, NULL)) #define PyObject_NewVar(type, typeobj, n) \ ((type *) PyObject_VarFromType((typeobj), (n), NULL)) #define PyObject_Del(op) PyObject_Del((PyObject *)(op)) /* XXX This trades binary compatibility for speed. */ #include "mymalloc.h" #define _PyObject_Del(op) PyMem_FREE((PyObject *)(op)) /* Macros */ #define PyObject_NEW(type, typeobj) \ ((type *) PyObject_FromType(typeobj, \ (PyObject *) malloc((typeobj)->tp_basicsize))) #define PyObject_NEW_VAR(type, typeobj, n) \ ((type *) PyObject_VarFromType(typeobj, \ (PyVarObject *) malloc((typeobj)->tp_basicsize + \ n * (typeobj)->tp_itemsize))) #define PyObject_DEL(op) free(op) ---------------------------------------------------------------------- So with this, I'm planning to "give the example" by renaming everywhere in the distrib PyObject_NEW with PyObject_New, but use for the core _PyObject_Del instead of PyObject_Del. I'll use PyObject_Del for the objects defined in extension modules. The point is that I don't want to define PyObject_Del in terms of PyMem_FREE (or define PyObject_New in terms of PyMem_MALLOC) as this would break the principle of binary compatibility when the Python allocator is changed to a custom malloc from one build to another. OTOH, I don't like the underscore... Do you have a better suggestion? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Mon Apr 3 16:50:25 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 16:50:25 +0200 Subject: [Python-Dev] Re: Unicode and numerics References: <38E895A9.94504851@lemburg.com> Message-ID: <38E8AFB1.9798186E@lemburg.com> "M.-A. Lemburg" wrote: > > BTW, should I also post patches to string.py which use the > simplified versions for string.ato?() I posted a few days ago ? I've just added these to the patch set... they no longer use the same error string, but the error type still is the same when e.g. string.atoi() is called with a non-string. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Apr 3 18:04:02 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 12:04:02 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: Your message of "Sat, 01 Apr 2000 12:00:00 EST." <200004011740.MAA04675@eric.cnri.reston.va.us> References: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: <200004031604.MAA05283@eric.cnri.reston.va.us> Not only was it an April fool's joke, but it wasn't mine! It was forged by an insider. I know by who, but won't tell, because it was so good. It shows that I can trust to delegate way more to the Python community than I think I can! :-) BTW, the biggest give-away that it wasn't mine was the absence of my standard sign-off line: --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at cnri.reston.va.us Mon Apr 3 18:36:24 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 3 Apr 2000 12:36:24 -0400 (EDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: References: Message-ID: <14568.51336.811523.937351@bitdiddle.cnri.reston.va.us> I agree with Greg. Jeremy From bwarsaw at cnri.reston.va.us Mon Apr 3 19:20:19 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Mon, 3 Apr 2000 13:20:19 -0400 (EDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' References: <14568.51336.811523.937351@bitdiddle.cnri.reston.va.us> Message-ID: <14568.53971.777162.624760@anthem.cnri.reston.va.us> -0 on dict.supplement(), not the least because I'll always missspell it :) -Barry From pf at artcom-gmbh.de Mon Apr 3 20:01:50 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 3 Apr 2000 20:01:50 +0200 (MEST) Subject: [Python-Dev] {}.supplement() -- poll results so far Message-ID: Look's like I should better forget my proposal to add a new method '.supplement()' to dictionaries, which should do the opposite of the already available method '.update()'. I summarize in cronological order: Ka-Ping Yee: +1 Fred Drake: +0 Greg Stein: -1 Fredrik Lundh: -1 Jeremy Hylton: -1 Barry Warsaw: -0 Are there other opinions which may change the picture? <0.1 wink> Regards, Peter From gstein at lyra.org Mon Apr 3 20:31:33 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 11:31:33 -0700 (PDT) Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Message-ID: On Mon, 3 Apr 2000, Peter Funk wrote: > Look's like I should better forget my proposal to add a new method > '.supplement()' to dictionaries, which should do the opposite of > the already available method '.update()'. > I summarize in cronological order: > > Ka-Ping Yee: +1 > Fred Drake: +0 > Greg Stein: -1 > Fredrik Lundh: -1 > Jeremy Hylton: -1 > Barry Warsaw: -0 > > Are there other opinions which may change the picture? <0.1 wink> Guido's :-) -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Mon Apr 3 21:40:00 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 3 Apr 2000 21:40:00 +0200 Subject: [Python-Dev] unicode: strange exception Message-ID: <020701bf9da4$670d8580$34aab5d4@hagrid> >>> "!" in ("a", None) 0 >>> u"!" in ("a", None) Traceback (innermost last): File "", line 1, in ? TypeError: expected a character buffer object From guido at python.org Mon Apr 3 21:48:25 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 15:48:25 -0400 Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Your message of "Mon, 03 Apr 2000 11:31:33 PDT." References: Message-ID: <200004031948.PAA05532@eric.cnri.reston.va.us> > On Mon, 3 Apr 2000, Peter Funk wrote: > > Look's like I should better forget my proposal to add a new method > > '.supplement()' to dictionaries, which should do the opposite of > > the already available method '.update()'. > > I summarize in cronological order: > > > > Ka-Ping Yee: +1 > > Fred Drake: +0 > > Greg Stein: -1 > > Fredrik Lundh: -1 > > Jeremy Hylton: -1 > > Barry Warsaw: -0 > > > > Are there other opinions which may change the picture? <0.1 wink> > > Guido's :-) If I have to, it's a -1. I personally wouldn't be able to remember which one was update() and which one was supplement(). --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 3 21:57:26 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 12:57:26 -0700 (PDT) Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: <200004031948.PAA05532@eric.cnri.reston.va.us> Message-ID: On Mon, 3 Apr 2000, Guido van Rossum wrote: > > On Mon, 3 Apr 2000, Peter Funk wrote: > > > Look's like I should better forget my proposal to add a new method > > > '.supplement()' to dictionaries, which should do the opposite of > > > the already available method '.update()'. > > > I summarize in cronological order: > > > > > > Ka-Ping Yee: +1 > > > Fred Drake: +0 > > > Greg Stein: -1 > > > Fredrik Lundh: -1 > > > Jeremy Hylton: -1 > > > Barry Warsaw: -0 > > > > > > Are there other opinions which may change the picture? <0.1 wink> > > > > Guido's :-) > > If I have to, it's a -1. You don't have to, but yours *is* the only one that counts. Ours are "merely advisory" ;-) hehe... Cheers, -g -- Greg Stein, http://www.lyra.org/ From gward at cnri.reston.va.us Mon Apr 3 22:56:21 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 3 Apr 2000 16:56:21 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <200004031604.MAA05283@eric.cnri.reston.va.us>; from guido@python.org on Mon, Apr 03, 2000 at 12:04:02PM -0400 References: <200004011740.MAA04675@eric.cnri.reston.va.us> <200004031604.MAA05283@eric.cnri.reston.va.us> Message-ID: <20000403165621.A9955@cnri.reston.va.us> On 03 April 2000, Guido van Rossum said: > Not only was it an April fool's joke, but it wasn't mine! It was > forged by an insider. I know by who, but won't tell, because it was > so good. It shows that I can trust to delegate way more to the Python > community than I think I can! :-) > > BTW, the biggest give-away that it wasn't mine was the absence of my > standard sign-off line: > > --Guido van Rossum (home page: http://www.python.org/~guido/) D'ohhh!!! Hasn't anyone noticed that the largest amount of text in the joke feature list was devoted to the Distutils? I thought *that* would give it away "fer shure". You people are *so* gullible! ;-) And for my next trick... *poof*! Greg From mal at lemburg.com Mon Apr 3 23:45:20 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 23:45:20 +0200 Subject: [Python-Dev] unicode: strange exception References: <020701bf9da4$670d8580$34aab5d4@hagrid> Message-ID: <38E910F0.5EB00566@lemburg.com> Fredrik Lundh wrote: > > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (innermost last): > File "", line 1, in ? > TypeError: expected a character buffer object Good catch. The same happens when you try to compare Unicode and a different non-string type: >>> '1' == None 0 >>> u'1' == None Traceback (most recent call last): File "", line 1, in ? TypeError: expected a character buffer object The reason is the same in both cases: failing auto-coercion. I will send a patch for this tomorrow. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Tue Apr 4 01:11:13 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 4 Apr 2000 09:11:13 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. Message-ID: The 1.6a1 installer on Windows copies Python16.dll into the Python directory, rather than the system32 directory like 1.5.x. We discussed too long ago on this list not why this was probably not going to work. I guess Guido decided to "suck it and see" - which is fine. But guess what - it doesnt work :-( I couldnt get past the installer! The win32all installer executes some Python code at the end of the install (to generate the .pyc files and install the COM objects). This Python code is executed directly to the installation .EXE, by loading and executing a "shim" DLL I wrote for the purpose. Problem is, try as I might, my shim DLL could not load Python16.dll. The shim DLL _was_ in the same directory as Python16.dll. The only way I could have solved it was to insist the WISE installation .EXE be run from the main Python directory - obviously not an option. And the problem is quite obviously going to exist with COM objects. The problem would appear to go away if the universe switched over the LoadLibraryEx() - but we dont have that control in most cases (eg, COM, WISE etc dictate this to us). So, my solution was to copy Python16.dll to the system directory during win32all installation. This results in duplicate copies of this DLL, so to my mind, it is preferable that Python itself go back to using the System32 directory. The problem this will lead to is that Python 1.6.0 and 1.6.1 will not be able to be installed concurrently. Putting entries on the PATH doesnt solve the underlying problem - you will only be able to have one Python 1.6 directory on your path, else you end up with the same coflicts for the DLL. I dont see any better answer than System32 :-( Thoughts? Mark. From gstein at lyra.org Tue Apr 4 02:32:12 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 17:32:12 -0700 (PDT) Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: On Tue, 4 Apr 2000, Mark Hammond wrote: >... > The problem this will lead to is that Python 1.6.0 and 1.6.1 will > not be able to be installed concurrently. Same thing happened with Python 1.5, so we're no worse off. If we do want this behavior, then we need to add another version digit... > Putting entries on the > PATH doesnt solve the underlying problem - you will only be able to > have one Python 1.6 directory on your path, else you end up with the > same coflicts for the DLL. > > I dont see any better answer than System32 :-( Thoughts? I don't have a better answer, as you and I explained on several occasions. Dunno why Guido decided to skip our recommendations, but hey... it happens :-). IMO, put the DLL back into System32. If somebody can *demonstrate* (not hypothesize) a mechanism that works, then it can be switched. The underlying issue is this: Python16.dll in the app directory works for Python as an executable. However, it completely disables any possibility for *embedding* Python. On Windows, embedding is practically required because of the COM stuff (sure... a person could avoid COM but...). Cheers, -g -- Greg Stein, http://www.lyra.org/ From nascheme at enme.ucalgary.ca Tue Apr 4 03:38:41 2000 From: nascheme at enme.ucalgary.ca (Neil Schemenauer) Date: 4 Apr 2000 01:38:41 -0000 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <20000403165621.A9955@cnri.reston.va.us> References: <200004011740.MAA04675@eric.cnri.reston.va.us> <200004031604.MAA05283@eric.cnri.reston.va.us> <20000403165621.A9955@cnri.reston.va.us> Message-ID: <20000404013841.15629.qmail@cranky.arctrix.com> In comp.lang.python, you wrote: >You people are *so* gullible! ;-) Well done. You had me going for a while. You had just enough truth in there. Guido releasing the alpha at that time helped your cause as well. Neil -- Tact is the ability to tell a man he has an open mind when he has a hole in his head. From guido at python.org Tue Apr 4 04:52:52 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 22:52:52 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." References: Message-ID: <200004040252.WAA06637@eric.cnri.reston.va.us> > > The problem this will lead to is that Python 1.6.0 and 1.6.1 will > > not be able to be installed concurrently. > > Same thing happened with Python 1.5, so we're no worse off. If we do want > this behavior, then we need to add another version digit... Actually, I don't plan on releasing a 1.6.1. The next one will be 1.7. Of course, alpha and beta versions for 1.6 won't be able to live along, but I can live with that. > > Putting entries on the > > PATH doesnt solve the underlying problem - you will only be able to > > have one Python 1.6 directory on your path, else you end up with the > > same coflicts for the DLL. > > > > I dont see any better answer than System32 :-( Thoughts? > > I don't have a better answer, as you and I explained on several occasions. > Dunno why Guido decided to skip our recommendations, but hey... it > happens :-). Actually, I just wanted to get the discussion started. It worked. :-) I'm waiting for Tim Peters' response in this thread -- if I recall he was the one who said that python1x.dll should not go into the system directory. Note that I've made it easy to switch: the WISE script defines a separate variable DLLDEST which is currently set to MAINDIR, but which I could easily change to SYS32 to get the semantics you prefer. Hey, we could even give the user a choice here! <0.4 wink> > IMO, put the DLL back into System32. If somebody can *demonstrate* (not > hypothesize) a mechanism that works, then it can be switched. > > The underlying issue is this: Python16.dll in the app directory works for > Python as an executable. However, it completely disables any possibility > for *embedding* Python. On Windows, embedding is practically required > because of the COM stuff (sure... a person could avoid COM but...). Yes, I know this. I'm just not happy with it, and I've definitely heard people complain that it is evil to install directories in the system directory. Seems there are different schools of thought... Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into the system directory. I will now be distributing with the VC++ 6.0 servicepack 1 versions of these files. Won't this be a problem for installations that already have an older version? (Now that I think of it, this is another reason why I decided that at least the alpha release should install everything in MAINDIR -- to limit the damage. Any informed opinions?) David Ascher: if you're listening, could you forward this to someone at ActiveState who might understand the issues here? They should have the same problems with ActivePerl, right? Or don't they have COM support? (Personally, I think that it wouldn't be so bad if we made it so that if you install just Python, the DLLs go into MAINDIR -- if you install the COM support, it can move/copy them to the system directory. But you may find this inelegant...) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Tue Apr 4 05:11:33 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 20:11:33 -0700 (PDT) Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> Message-ID: On Mon, 3 Apr 2000, Guido van Rossum wrote: >... > Actually, I just wanted to get the discussion started. It worked. :-) hehe. True :-) > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. What's his physical address again? I have this nice little package to send him... >... > > IMO, put the DLL back into System32. If somebody can *demonstrate* (not > > hypothesize) a mechanism that works, then it can be switched. > > > > The underlying issue is this: Python16.dll in the app directory works for > > Python as an executable. However, it completely disables any possibility > > for *embedding* Python. On Windows, embedding is practically required > > because of the COM stuff (sure... a person could avoid COM but...). > > Yes, I know this. I'm just not happy with it, and I've definitely > heard people complain that it is evil to install directories in the > system directory. Seems there are different schools of thought... It is evil, but it is also unavoidable. The alternative is to munge the PATH variable, but that is a Higher Evil than just dropping DLLs into the system directory. > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > the system directory. I will now be distributing with the VC++ 6.0 > servicepack 1 versions of these files. Won't this be a problem for > installations that already have an older version? Not at all. In fact, Microsoft explicitly recommends including those in the distribution and installing them over the top of *previous* versions. They should never be downgraded (i.e. always check their version stamp!), but they should *always* be upgraded. Microsoft takes phenomenal pains to ensure that OLD applications are compatible with NEW runtimes. It is certainly possible that you could have a new app was built against a new runtime, and breaks when used against an old runtime. But that is why you always upgrade :-) And note that I do mean phenomenal pains. It is one of their ship requirements that you can always drop in a new RT without breaking old apps. So: regardless of where you decide to put python16.dll, you really should be upgrading the RT DLLs. > David Ascher: if you're listening, could you forward this to someone > at ActiveState who might understand the issues here? They should have > the same problems with ActivePerl, right? Or don't they have COM > support? ActivePerl does COM, but I dunno much more than that. > (Personally, I think that it wouldn't be so bad if we made it so that > if you install just Python, the DLLs go into MAINDIR -- if you install > the COM support, it can move/copy them to the system directory. But > you may find this inelegant...) Eek. Now you're talking about one guy reaching into another installation and munging it around. Especially for a move (boy, would that throw off the uninstall!). If you copied, then it is possible to have *two* copies of the DLL loaded into a process. The primary key is the pathname. I've had two pythoncom DLLs loaded in a process, and boy does that suck! The bugs are quite interesting, to say the least :-) And a total bear to track down until you have seen the double-load several times and can start to recognize the effects. In other words, moving is bad for elegance/uninstall reasons, and copy is bad for (potential) runtime reasons. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Tue Apr 4 06:28:54 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:54 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: <000201bf9dee$49638760$162d153f@tim> [Mark Hammond] > The 1.6a1 installer on Windows copies Python16.dll into the Python > directory, rather than the system32 directory like 1.5.x. We > discussed too long ago on this list not why this was probably not > going to work. I guess Guido decided to "suck it and see" - which > is fine. > > But guess what - it doesnt work :-( > ... > I dont see any better answer than System32 :-( Thoughts? Same as yours! Guido went off and innovated here -- always a bad sign . OTOH, I've got no use for "Program Files" -- make the cmdline version easy to use too. From tim_one at email.msn.com Tue Apr 4 06:28:59 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:59 -0400 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: Message-ID: <000401bf9dee$4bf9c2a0$162d153f@tim> [/F] > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. [Greg Stein] > Valid point. But there are other solutions, too. VC distributes a thing > named "VCVARS.BAT" to set up paths and other environ vars. Python could > certainly do the same thing (to overcome the embedded-space issue). And put the .bat file where, exactly? In the Python root, somewhere under "Program Files"? Begs the question. MS doesn't want you to put stuff in System32 either, but it's the only rational place to put the DLL. Likewise the only rational place to put the cmdline EXE is in an easy-to-get-at directory. If C:\Quickenw\ is good enough for the best-selling non-MS Windows app, C:\Python-1.6\ is good enough for Python . Besides, it's a *default*. If you love MS guidelines and are savvy enough to know what the heck they are, you're savvy enough to install it under "Program Files" yourself. The people we're trying to help here have scant idea what they're doing, and dealing with the embedded space drives them nuts at the very start of their experience. Other languages understand this. For example, here are pieces of the PATH on my machine: C:\PERL5\BIN D:\JDK1.1.5\BIN C:\WINICON\BIN E:\OCAML\BIN From tim_one at email.msn.com Tue Apr 4 06:28:56 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:56 -0400 Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Message-ID: <000301bf9dee$4acba2e0$162d153f@tim> [Peter Funk] > Look's like I should better forget my proposal to add a new method > '.supplement()' to dictionaries, which should do the opposite of > the already available method '.update()'. > I summarize in cronological order: > > Ka-Ping Yee: +1 > Fred Drake: +0 > Greg Stein: -1 > Fredrik Lundh: -1 > Jeremy Hylton: -1 > Barry Warsaw: -0 > > Are there other opinions which may change the picture? <0.1 wink> -1 on dict.supplement(), -0 on an optional arg to dict.update(), dict.update(otherdict, overwrite=1) From guido at python.org Tue Apr 4 07:25:26 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 01:25:26 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Mon, 03 Apr 2000 20:11:33 PDT." References: Message-ID: <200004040525.BAA11585@eric.cnri.reston.va.us> > What's his physical address again? I have this nice little package to send > him... Now, now, you don't want to sound like Ted Kazinsky, do you? :-) > It is evil, but it is also unavoidable. The alternative is to munge the > PATH variable, but that is a Higher Evil than just dropping DLLs into the > system directory. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > the system directory. I will now be distributing with the VC++ 6.0 > > servicepack 1 versions of these files. Won't this be a problem for > > installations that already have an older version? > > Not at all. In fact, Microsoft explicitly recommends including those in > the distribution and installing them over the top of *previous* versions. > They should never be downgraded (i.e. always check their version stamp!), > but they should *always* be upgraded. > > Microsoft takes phenomenal pains to ensure that OLD applications are > compatible with NEW runtimes. It is certainly possible that you could have > a new app was built against a new runtime, and breaks when used against an > old runtime. But that is why you always upgrade :-) > > And note that I do mean phenomenal pains. It is one of their ship > requirements that you can always drop in a new RT without breaking old > apps. > > So: regardless of where you decide to put python16.dll, you really should > be upgrading the RT DLLs. OK. That means I need two separate variables: where to install the MS DLLs and where to install the Py DLLs. > > David Ascher: if you're listening, could you forward this to someone > > at ActiveState who might understand the issues here? They should have > > the same problems with ActivePerl, right? Or don't they have COM > > support? > > ActivePerl does COM, but I dunno much more than that. I just downloaded and installed it. I've never seen an installer like this -- they definitely put a lot of effort in it. Annoying nit: they tell you to install "MS Windows Installer" first, and of course, being a MS tool, it requires a reboot. :-( Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So there. It also didn't change PATH for me, even though the docs mention that it does -- maybe only on NT? (PATH on Win9x is still a mystery to me. Is it really true that in order to change PATH an installer has to edit autoexec.bat? Or is there a better way? Anything that claims to change PATH for me doesn't seem to do so. Could I have screwed something up?) > > (Personally, I think that it wouldn't be so bad if we made it so that > > if you install just Python, the DLLs go into MAINDIR -- if you install > > the COM support, it can move/copy them to the system directory. But > > you may find this inelegant...) > > Eek. Now you're talking about one guy reaching into another installation > and munging it around. Especially for a move (boy, would that throw off > the uninstall!). If you copied, then it is possible to have *two* copies > of the DLL loaded into a process. The primary key is the pathname. I've > had two pythoncom DLLs loaded in a process, and boy does that suck! The > bugs are quite interesting, to say the least :-) And a total bear to track > down until you have seen the double-load several times and can start to > recognize the effects. > > In other words, moving is bad for elegance/uninstall reasons, and copy is > bad for (potential) runtime reasons. OK, got it. But I'm still hoping that there's something we can do differently. Didn't someone tell me that at least on Windows 2000 installing app-specific files (as opposed to MS-provided files) in the system directory is a no-no? What's the alternative there? Is the same mechanism supported on NT or Win98? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Tue Apr 4 06:28:48 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:48 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <20000403165621.A9955@cnri.reston.va.us> Message-ID: <000001bf9dee$45f318c0$162d153f@tim> [Greg Ward, fesses up] > Hasn't anyone noticed that the largest amount of text in the joke > feature list was devoted to the Distutils? I thought *that* would > give it away "fer shure". You people are *so* gullible! ;-) Me too! My first suspect was me, but for the life of me, me couldn't remember writing that. You were only second on me list (it had to be one of us, as nobody else could have described legitimate Python features as if they had been implemented in Perl <0.9 wink>). > And for my next trick... *poof*! Nice try. You're not only not invisible, I've posted your credit card info to a hacker list. crushing-guido's-enemies-cuz-he's-too-much-of-a-wuss-ly y'rs - tim From tim_one at email.msn.com Tue Apr 4 07:00:55 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 01:00:55 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> Message-ID: <000901bf9df2$c224c7a0$162d153f@tim> [Guido] > ... > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. Not that I don't say a lot of dumb-ass things , but I strongly doubt I would have said this one. In my brief career as a Windows app provider, I learned four things, the first three loudly gotten across by seriously unhappy users: 1. Contra MS guidelines, dump the core DLLs in the system directory. 2. Contra MS guidelines, install the app by default in C:\name_of_app\. 3. Contra MS guidelines, put all the config options you can in a text file C:\name_of_app\name_of_app.ini instead of the registry. 4. This one was due to my boss: Contra MS guidelines, put a copy of every MS system DLL you rely on under C:\name_of_app\, so you don't get screwed when MS introduces an incompatible DLL upgrade. In the end, the last one is the only one I disagreed with (in recent years I believe MS DLL upgrades have gotten much more likely to fix bugs than to introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 apps on my home machine that use their own copy of msvcrt.dll -- /F, if you're reading, how come the Pythonworks beta does this?). > ... > I've definitely heard people complain that it is evil to install > directories in the system directory. Seems there are different > schools of thought... Well, mucking with the system directories is horrid! Nobody likes doing it. AFAIK, though, there's really no realistic alternative. It's the only place you *know* will be on the PATH, and if an app embedding Python can't rely on PATH, it will have to hardcode the Python DLL path itself. > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > the system directory. I will now be distributing with the VC++ 6.0 > servicepack 1 versions of these files. Won't this be a problem for > installations that already have an older version? (Now that I think > of it, this is another reason why I decided that at least the alpha > release should install everything in MAINDIR -- to limit the damage. > Any informed opinions?) You're using a std installer, and MS has rigid rules for these DLLs that the installer will follow by magic. Small comfort if things break, but this one is (IMO) worth playing along with. From tim_one at email.msn.com Tue Apr 4 07:42:55 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 01:42:55 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <000201bf9df8$a066b8c0$6d2d153f@tim> [Guido, on changing socket.connect() to require a single arg] > ... > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... I think this one already caused too much pain: it appears virtually everyone uses the two-argument form routinely, and the reason for getting rid of that seems pretty weak. As Tres Seaver just wrote on c.l.py, Constructing a spurious "address" object (which has no behavior, and exists only to be torn apart inside the implementation) seems a foolish consistency, beyond doubt. So offer to back off on this one, in return for making 1/2 yield 0.5 . From guido at python.org Tue Apr 4 09:03:58 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 03:03:58 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 04 Apr 2000 01:42:55 EDT." <000201bf9df8$a066b8c0$6d2d153f@tim> References: <000201bf9df8$a066b8c0$6d2d153f@tim> Message-ID: <200004040703.DAA11944@eric.cnri.reston.va.us> > I think this one already caused too much pain: it appears virtually > everyone uses the two-argument form routinely, and the reason for getting > rid of that seems pretty weak. As Tres Seaver just wrote on c.l.py, > > Constructing a spurious "address" object (which has no behavior, and > exists only to be torn apart inside the implementation) seems a > foolish consistency, beyond doubt. No more foolish than passing a point as an (x, y) tuple instead of separate x and y arguments. There are good reasons for passing it as a tuple, such as being able to store and recall it as a single entity. > So offer to back off on this one, in return for making 1/2 yield 0.5 . Unfortunately, I think I will have to. And it will have to be documented. The problem is that I can't document it as connect(host, port) -- there are Unix domain sockets that only take a single string argument (a filename). Also, sendto() takes a (host, port) tuple only. It has other arguments so that's the only form. Maybe I'll have to document it as connect(address) with a backwards compatible syntax connect(a, b) being equivalent to connect((a, b)). At least that sets the record straight without breaking old code. Still torn, --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Tue Apr 4 10:59:02 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 4 Apr 2000 18:59:02 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <000901bf9df2$c224c7a0$162d153f@tim> Message-ID: > 2. Contra MS guidelines, install the app by default in > C:\name_of_app\. Ive got to agree here. While I also see Greg's point, the savvy user can place it where they want, while the "average user" is better of with a more reasonable default. However, I would tend to go for "\name_of_app" rooted from the Windows drive. It is likely that this will be the default drive when a command prompt is open, so a simple "cd \python1.6" will work. This is also generally the same drive the default "Program Files" is on too. > You're using a std installer, and MS has rigid rules for > these DLLs that the > installer will follow by magic. Small comfort if things > break, but this one > is (IMO) worth playing along with. I checked the installer, and these MSVC dlls are indeed set to install only if the existing version is the "same or older". Annoyingly, it doesnt have an option for only "if older"! They are also set to correctly reference count in the registry. I believe that by installing a single custom DLL into the system directory, plus correctly installing some MS system DLLs into the system directory we are being perfect citizens. [Interestingly, Windows 2000 has a system process that continually monitors the system directory. If it detects that a "protected file" has been changed, it promptly copies the original back over the top! I believe the MSVC*.dlls are in the protected list, so can only be changed with a service pack release anyway. Everything _looks_ like it updates - Windows just copies it back!] Mark. From mal at lemburg.com Tue Apr 4 11:26:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 11:26:53 +0200 Subject: [Python-Dev] Unicode and comparisons Message-ID: <38E9B55D.F2B6409C@lemburg.com> Fredrik bug report made me dive a little deeper into compares and contains tests. Here is a snapshot of what my current version does: >>> '1' == None 0 >>> u'1' == None 0 >>> '1' == 'a???' 0 >>> u'1' == 'a???' Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data >>> '1' in ('a', None, 1) 0 >>> u'1' in ('a', None, 1) 0 >>> '1' in (u'a???', None, 1) 0 >>> u'1' in ('a???', None, 1) Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data The decoding errors occur because 'a???' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode). Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From joachim at medien.tecmath.de Tue Apr 4 11:28:37 2000 From: joachim at medien.tecmath.de (Joachim Koenig-Baltes) Date: Tue, 4 Apr 2000 11:28:37 +0200 (MEST) Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <20000403131720.A10313@sz-sb.de> References: <200004011740.MAA04675@eric.cnri.reston.va.us> <20000403131720.A10313@sz-sb.de> Message-ID: <20000404092837.944E889@tmpc200.medien.tecmath.de> In comp.lang.python, you wrote: >On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: >> >> Python strings can now be stored as Unicode strings. To make it easier >> to type Unicode strings, the single-quote character defaults to creating >> a Unicode string, while the double-quote character defaults to ASCII >> strings. If you need to create a Unicode string with double quotes, >> just preface it with the letter "u"; likewise, an ASCII string can be >> created by prefacing single quotes with the letter "a". For example: >> >> foo = 'hello' # Unicode >> foo = "hello" # ASCII > >Is single-quoting for creating unicode clever ? I think there might be a problem >with old code when the operations on unicode strings are not 100% compatible to >the standard string operations. I don't know if this is a real problem - it's >just a point for discussion. > >Cheers, >Andreas > Hallo Andreas, hast Du mal auf das Datum des Beitrages von Guido geschaut? Echt guter April- Scherz, da er die Scherze sehr gut mit der Realit?t mischt. Liebe Gr??e, auch an die anderen, Joachim From guido at python.org Tue Apr 4 13:51:42 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 07:51:42 -0400 Subject: [Python-Dev] Unicode and comparisons In-Reply-To: Your message of "Tue, 04 Apr 2000 11:26:53 +0200." <38E9B55D.F2B6409C@lemburg.com> References: <38E9B55D.F2B6409C@lemburg.com> Message-ID: <200004041151.HAA12035@eric.cnri.reston.va.us> > Fredrik bug report made me dive a little deeper into compares > and contains tests. > > Here is a snapshot of what my current version does: > > >>> '1' == None > 0 > >>> u'1' == None > 0 > >>> '1' == 'a???' > 0 > >>> u'1' == 'a???' > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > >>> '1' in ('a', None, 1) > 0 > >>> u'1' in ('a', None, 1) > 0 > >>> '1' in (u'a???', None, 1) > 0 > >>> u'1' in ('a???', None, 1) > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > The decoding errors occur because 'a???' is not a valid > UTF-8 string (Unicode comparisons coerce both arguments > to Unicode by interpreting normal strings as UTF-8 > encodings of Unicode). > > Question: is this behaviour acceptable or should I go > even further and mask decoding errors during compares > and contains tests too ? I think this is right -- I expect it will catch more errors than it will cause. This made me go out and see what happens if you compare a numeric class instance (one that defines __int__) to another int -- it doesn't even call the __int__ method! This should be fixed in 1.7 when we do the smart comparisons and rich coercions (or was it the other way around? :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 4 15:24:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 09:24:12 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 01:00:55 EDT." <000901bf9df2$c224c7a0$162d153f@tim> References: <000901bf9df2$c224c7a0$162d153f@tim> Message-ID: <200004041324.JAA12173@eric.cnri.reston.va.us> > [Guido] > > ... > > I'm waiting for Tim Peters' response in this thread -- if I recall he > > was the one who said that python1x.dll should not go into the system > > directory. [Tim] > Not that I don't say a lot of dumb-ass things , but I strongly doubt I > would have said this one. OK, it must be my overworked tired brain that is playing games with me. It might have been Jim Ahlstrom then, our resident Windows 3.1 supporter. :-) > In my brief career as a Windows app provider, I > learned four things, the first three loudly gotten across by seriously > unhappy users: > > 1. Contra MS guidelines, dump the core DLLs in the system directory. > 2. Contra MS guidelines, install the app by default in C:\name_of_app\. It's already been said that the drive letter could be chosen more carefully. I wonder if the pathname should also be an 8+3 (max) name, so that it can be relyably typed into a DOS window. > 3. Contra MS guidelines, put all the config options you can in a text file > C:\name_of_app\name_of_app.ini > instead of the registry. > 4. This one was due to my boss: Contra MS guidelines, put a copy of > every MS system DLL you rely on under C:\name_of_app\, so you don't > get screwed when MS introduces an incompatible DLL upgrade. > > In the end, the last one is the only one I disagreed with (in recent years I > believe MS DLL upgrades have gotten much more likely to fix bugs than to > introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 apps on > my home machine that use their own copy of msvcrt.dll -- /F, if you're > reading, how come the Pythonworks beta does this?). Probably because Pythonworks doesn't care about COM or embedding. Anyway, I now agree with you on 1-2 and on not following 4. As for 3, I think that for Mark's COM support to work, the app won't necessarily be able to guess what \name_of_app\ is, so that's where the registry comes in handy. PATH info is really about all that Python puts in the registry, so I think we're okay here. (Also if you read PC\getpathp.c in 1.6, you'll see that it now ignores most of the registry when it finds the installation through a search based on argv[0].) > > ... > > I've definitely heard people complain that it is evil to install > > directories in the system directory. Seems there are different > > schools of thought... > > Well, mucking with the system directories is horrid! Nobody likes doing it. > AFAIK, though, there's really no realistic alternative. It's the only place > you *know* will be on the PATH, and if an app embedding Python can't rely on > PATH, it will have to hardcode the Python DLL path itself. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > the system directory. I will now be distributing with the VC++ 6.0 > > servicepack 1 versions of these files. Won't this be a problem for > > installations that already have an older version? (Now that I think > > of it, this is another reason why I decided that at least the alpha > > release should install everything in MAINDIR -- to limit the damage. > > Any informed opinions?) > > You're using a std installer, and MS has rigid rules for these DLLs that the > installer will follow by magic. Small comfort if things break, but this one > is (IMO) worth playing along with. One more thing that I just realized. There are a few Python extension modules (_tkinter and the new pyexpat) that rely on external DLLs: _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs xmlparse.dll and xmltok.dll. If I understand correctly how the path rules work, these have to be on PATH too (although the pyd files don't have to be). This worries me -- these aren't official MS DLLs and neither are the our own, so we could easily stomp on some other app's version of the same... (The tcl folks don't change their filename when the 3rd version digit changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) Is there a better solution? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Apr 4 16:20:19 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 10:20:19 -0400 (EDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200004040703.DAA11944@eric.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> Message-ID: <14569.64035.285070.760022@seahag.cnri.reston.va.us> Guido van Rossum writes: > Maybe I'll have to document it as connect(address) with a backwards > compatible syntax connect(a, b) being equivalent to connect((a, b)). > At least that sets the record straight without breaking old code. If you *must* support the two-arg flavor (which I've never actually seen outside this discussion), I'd suggest not documenting it as a backward compatibility, only that it will disappear in 1.7. This can be done fairly easily and cleanly in the library reference. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Tue Apr 4 16:45:36 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 16:45:36 +0200 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> Message-ID: <005101bf9e44$71bade60$34aab5d4@hagrid> Tim Peters wrote: > 4. This one was due to my boss: Contra MS guidelines, put a copy of > every MS system DLL you rely on under C:\name_of_app\, so you don't > get screwed when MS introduces an incompatible DLL upgrade. > > In the end, the last one is the only one I disagreed with (in recent years I > believe MS DLL upgrades have gotten much more likely to fix bugs than to > introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 apps on > my home machine that use their own copy of msvcrt.dll -- /F, if you're > reading, how come the Pythonworks beta does this?). we've been lazy... in the pre-IE days, some machines came without any msvcrt.dll at all. so since we have to ship it, I guess it was easier to ship it along with all the other components, rather than implementing the "install in system directory only if newer" stuff... (I think it's on the 2.0 todo list ;-) From guido at python.org Tue Apr 4 16:52:30 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 10:52:30 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 04 Apr 2000 10:20:19 EDT." <14569.64035.285070.760022@seahag.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> Message-ID: <200004041452.KAA12455@eric.cnri.reston.va.us> > If you *must* support the two-arg flavor (which I've never actually > seen outside this discussion), I'd suggest not documenting it as a > backward compatibility, only that it will disappear in 1.7. This can > be done fairly easily and cleanly in the library reference. Yes, I must. Can you fix up the docs? --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 4 16:52:08 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 16:52:08 +0200 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <006301bf9e45$5b168dc0$34aab5d4@hagrid> Guido van Rossum wrote: > I wonder if the pathname should also be an 8+3 (max) name, so that it > can be relyably typed into a DOS window. "\py" is reserved ;-) From guido at python.org Tue Apr 4 16:56:17 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 10:56:17 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:52:08 +0200." <006301bf9e45$5b168dc0$34aab5d4@hagrid> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <006301bf9e45$5b168dc0$34aab5d4@hagrid> Message-ID: <200004041456.KAA12509@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > I wonder if the pathname should also be an 8+3 (max) name, so that it > > can be relyably typed into a DOS window. > > "\py" is reserved ;-) OK, it'll be \python16 then. --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 4 17:04:40 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 17:04:40 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.99,1.100 References: <200004041410.KAA12405@eric.cnri.reston.va.us> Message-ID: <009701bf9e47$1cf13660$34aab5d4@hagrid> > Socket methods: > + (NB: an argument list of the form (sockaddr...) means that multiple > + arguments are treated the same as a single tuple argument, for backwards > + compatibility.) how about threatening to remove this in 1.7? IOW: > + (NB: an argument list of the form (sockaddr...) means that multiple > + arguments are treated the same as a single tuple argument, for backwards > + compatibility. This is deprecated, and will be removed in future versions.) From skip at mojam.com Tue Apr 4 16:23:44 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 4 Apr 2000 09:23:44 -0500 (CDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14569.64035.285070.760022@seahag.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> Message-ID: <14569.64240.80221.587062@beluga.mojam.com> Fred> If you *must* support the two-arg flavor (which I've never Fred> actually seen outside this discussion), I'd suggest not Fred> documenting it as a backward compatibility, only that it will Fred> disappear in 1.7. Having surprisingly little opportunity to call socket.connect directly in my work (considering the bulk of my programming is for the web), I'll note for the record that the direct calls I've made to socket.connect all have two arguments: host and port. It never occurred to me that there would even be a one-argument version. After all, why look at the docs for help if what you're doing already works? Skip From gvwilson at nevex.com Tue Apr 4 17:34:38 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 11:34:38 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <38E9B55D.F2B6409C@lemburg.com> Message-ID: Random thought (hopefully more sensible than my last one): Would it make sense in P3K to keep using '/' for CS-style division (int/int -> rounded-down-int), and to introduce '?' for math-style division (int?int -> float-when-necessary)? Greg From gmcm at hypernet.com Tue Apr 4 17:39:52 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 11:39:52 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> References: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." Message-ID: <1257259699-4963377@hypernet.com> [Guido] > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. Some time ago Tim and I said that the place for a DLL that is intimately tied to an EXE is in the EXE's directory. The search path: 1) the EXE's directory 2) the current directory (useless) 3) the system directory 4) the Windows directory 5) the PATH For a general purpose DLL, that makes the system directory the only sane choice (if modifying PATH was sane, then PATH would be saner, but a SpecTCL will just screw you up). Things that go in the system directory should maintain backwards compatibility. For a DLL, that means all the old entry points are still there, in the same order with new ones at the end. For Python, there's no crying need to conform for now, but if (when?) embedding Python becomes ubiquitous, this (or some other scheme) may need to be considered. - Gordon From guido at python.org Tue Apr 4 17:45:39 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 11:45:39 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 11:39:52 EDT." <1257259699-4963377@hypernet.com> References: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." <1257259699-4963377@hypernet.com> Message-ID: <200004041545.LAA12635@eric.cnri.reston.va.us> > Some time ago Tim and I said that the place for a DLL that is > intimately tied to an EXE is in the EXE's directory. But the conclusion seems to be that python1x.dll is not closely tied to python.exe -- it may be invoked via COM. > The search path: > 1) the EXE's directory > 2) the current directory (useless) > 3) the system directory > 4) the Windows directory > 5) the PATH > > For a general purpose DLL, that makes the system directory > the only sane choice (if modifying PATH was sane, then > PATH would be saner, but a SpecTCL will just screw you up). > > Things that go in the system directory should maintain > backwards compatibility. For a DLL, that means all the old > entry points are still there, in the same order with new ones at > the end. For Python, there's no crying need to conform for > now, but if (when?) embedding Python becomes ubiquitous, > this (or some other scheme) may need to be considered. Where should I put tk83.dll etc.? In the Python\DLLs directory, where _tkinter.pyd also lives? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 4 17:43:49 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 11:43:49 -0400 Subject: [Python-Dev] re: division In-Reply-To: Your message of "Tue, 04 Apr 2000 11:34:38 EDT." References: Message-ID: <200004041543.LAA12616@eric.cnri.reston.va.us> > Random thought (hopefully more sensible than my last one): > > Would it make sense in P3K to keep using '/' for CS-style division > (int/int -> rounded-down-int), and to introduce '?' for math-style > division (int?int -> float-when-necessary)? Careful with your character sets there... The symbol you typed looks like a lowercase o with dieresis to me. :-( Assuming you're proposing something like this: . --- . I'm not so sure that choosing a non-ASCII symbol is going to work. For starters, it's on very few keyboards, and that won't change soon! In the past we've talked about using // for integer division and / for regular (int/int->float) division. This would mean that we have to introduce // now as an alias for /, and encourage people to use it for int division (only); then in 1.7 using / between ints will issue a compatibility warning, and in Py3K int/int will yield a float. It's still going to be painful, though. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Tue Apr 4 17:52:52 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 17:52:52 +0200 Subject: [Python-Dev] Unicode and comparisons References: <38E9B55D.F2B6409C@lemburg.com> <200004041151.HAA12035@eric.cnri.reston.va.us> Message-ID: <38EA0FD4.DB0D96BF@lemburg.com> Guido van Rossum wrote: > > > Fredrik bug report made me dive a little deeper into compares > > and contains tests. > > > > Here is a snapshot of what my current version does: > > > > >>> '1' == None > > 0 > > >>> u'1' == None > > 0 > > >>> '1' == 'a???' > > 0 > > >>> u'1' == 'a???' > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: UTF-8 decoding error: invalid data > > > > >>> '1' in ('a', None, 1) > > 0 > > >>> u'1' in ('a', None, 1) > > 0 > > >>> '1' in (u'a???', None, 1) > > 0 > > >>> u'1' in ('a???', None, 1) > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: UTF-8 decoding error: invalid data > > > > The decoding errors occur because 'a???' is not a valid > > UTF-8 string (Unicode comparisons coerce both arguments > > to Unicode by interpreting normal strings as UTF-8 > > encodings of Unicode). > > > > Question: is this behaviour acceptable or should I go > > even further and mask decoding errors during compares > > and contains tests too ? > > I think this is right -- I expect it will catch more errors than it > will cause. Ok, I'll only mask the TypeErrors then. (UnicodeErrors are subclasses of ValueErrors and thus do not get masked.) > This made me go out and see what happens if you compare a numeric > class instance (one that defines __int__) to another int -- it doesn't > even call the __int__ method! This should be fixed in 1.7 when we do > the smart comparisons and rich coercions (or was it the other way > around? :-). Not sure ;-) I think both go hand in hand. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Tue Apr 4 17:53:20 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 17:53:20 +0200 Subject: [Python-Dev] re: division References: Message-ID: <010901bf9e4d$eb097840$34aab5d4@hagrid> gvwilson at nevex.com wrote: > Random thought (hopefully more sensible than my last one): > > Would it make sense in P3K to keep using '/' for CS-style division > (int/int -> rounded-down-int), and to introduce '?' for math-style > division (int?int -> float-when-necessary)? where's the ? key? (oh, look, my PC keyboard has one. but if I press it, I get a /. hmm...) From martin at loewis.home.cs.tu-berlin.de Tue Apr 4 17:44:17 2000 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 4 Apr 2000 17:44:17 +0200 Subject: [Python-Dev] Re: Unicode and comparisons Message-ID: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> > Question: is this behaviour acceptable or should I go even further > and mask decoding errors during compares and contains tests too ? I always thought it is a core property of cmp that it works between all objects. Because of that, >>> x=[u'1','a???'] >>> x.sort() Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data fails. As always in cmp, I'd expect to get a consistent outcome here (ie. cmp should give a total order on objects). OTOH, I'm not so sure why cmp between plain and unicode strings needs to perform UTF-8 conversion? IOW, why is it desirable that >>> 'a' == u'a' 1 Anyway, I'm not objecting to that outcome - I only think that, to get cmp consistent, it may be necessary to drop this result. If it is not necessary, the better. Regards, Martin From jim at interet.com Tue Apr 4 18:06:27 2000 From: jim at interet.com (James C. Ahlstrom) Date: Tue, 04 Apr 2000 12:06:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <38EA1303.B393D7F8@interet.com> Guido van Rossum wrote: > OK, it must be my overworked tired brain that is playing games with > me. It might have been Jim Ahlstrom then, our resident Windows 3.1 > supporter. :-) I think I've been insulted. What's wrong with Windows 3.1?? :-) > > 1. Contra MS guidelines, dump the core DLLs in the system directory. The Python DLL must really go in the Windows system directory. I don't see any other choice. This is in accordance with Microsoft guidelines AFAIK, or anyway, that's the only way it Just Works. The Python16.dll is a system file if you are using COM, and it supports an embedded scripting language, so it goes into the system dir. QED. > > 3. Contra MS guidelines, put all the config options you can in a text file > > C:\name_of_app\name_of_app.ini > > instead of the registry. This is an excellent practice, and there should be a standard module to deal with .ini files. But, as you say, the registry is sometimes needed. > > 4. This one was due to my boss: Contra MS guidelines, put a copy of > > every MS system DLL you rely on under C:\name_of_app\, so you don't > > get screwed when MS introduces an incompatible DLL upgrade. Yuk. More trouble than it's worth. > > > I've definitely heard people complain that it is evil to install > > > directories in the system directory. Seems there are different > > > schools of thought... It is very illegal to install directories as opposed to DLL's. Do you really mean directories? If so, don't do that. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > > the system directory. I will now be distributing with the VC++ 6.0 If you distribute these, you must check version numbers and only replace old versions. Wise and other installers do this easily. Doing otherwise is evil and unacceptable. Checking file dates is not good enough either. > > > servicepack 1 versions of these files. Won't this be a problem for > > > installations that already have an older version? Probably not, thanks to Microsoft's valiant testing efforts. > > > (Now that I think > > > of it, this is another reason why I decided that at least the alpha > > > release should install everything in MAINDIR -- to limit the damage. > > > Any informed opinions?) Distribute these files with a valid Wise install script which checks VERSIONS. > One more thing that I just realized. There are a few Python extension > modules (_tkinter and the new pyexpat) that rely on external DLLs: > _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs > xmlparse.dll and xmltok.dll. Welcome to the club. > If I understand correctly how the path rules work, these have to be on > PATH too (although the pyd files don't have to be). This worries me > -- these aren't official MS DLLs and neither are the our own, so we > could easily stomp on some other app's version of the same... > (The tcl folks don't change their filename when the 3rd version digit > changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) > > Is there a better solution? This is a daily annoyance and risk in the Windows world. If you require Tk, then you need to completely understand how to produce a valid Tk distribution. Same with PIL (which requires Tk). Often you won't know that some pyd requires some other obscure DLL. To really do this you need something high level. Like rpm's on linux. On Windows, people either write complex install programs with Wise et al, or run third party installers provided with (for example) Tk from simpler install scripts. It is then up to the Tk people to know how to install it, and how to deal with version upgrades. JimA From gmcm at hypernet.com Tue Apr 4 18:10:38 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 12:10:38 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041545.LAA12635@eric.cnri.reston.va.us> References: Your message of "Tue, 04 Apr 2000 11:39:52 EDT." <1257259699-4963377@hypernet.com> Message-ID: <1257257855-5074057@hypernet.com> [Gordon] > > Some time ago Tim and I said that the place for a DLL that is > > intimately tied to an EXE is in the EXE's directory. [Guido] > But the conclusion seems to be that python1x.dll is not closely tied > to python.exe -- it may be invoked via COM. Right. > Where should I put tk83.dll etc.? In the Python\DLLs directory, where > _tkinter.pyd also lives? Won't work (unless there are some tricks in MSVC 6 I don't know about). Assuming no one is crazy enough to use Tk in a COM server, (or rather, that their insanity need not be catered to), then I'd vote for the directory where python.exe and pythonw.exe live. - Gordon From gvwilson at nevex.com Tue Apr 4 18:20:22 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 12:20:22 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <200004041543.LAA12616@eric.cnri.reston.va.us> Message-ID: > Assuming you're proposing something like this: > > . > --- > . > > I'm not so sure that choosing a non-ASCII symbol is going to work. For > starters, it's on very few keyboards, and that won't change soon! I realize that, but neither are many of the accented characters used in non-English names (said the Canadian). If we assume 18-24 months until P3K, will it be safe to assume support for non-7-bit characters, or will we continue to be constrained by what was available on PDP-11's in 1975? (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', but harder to track down, since you'll have to scrutinize values very carefully to spot the difference. Haven't done any field tests, though...) Greg From bwarsaw at cnri.reston.va.us Tue Apr 4 19:56:23 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 4 Apr 2000 13:56:23 -0400 (EDT) Subject: [Python-Dev] re: division References: <200004041543.LAA12616@eric.cnri.reston.va.us> Message-ID: <14570.11463.83210.17189@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> If we assume 18-24 months until P3K, will it be safe to gvwilson> assume support for non-7-bit characters, or will we gvwilson> continue to be constrained by what was available on gvwilson> PDP-11's in 1975? Undoubtedly. From gvwilson at nevex.com Tue Apr 4 20:08:36 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 14:08:36 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case Message-ID: Here's a longer, and hopefully more coherent, argument for using the divided-by sign in P3K: 1. If P3K source is allowed to be Unicode, then all Python programming systems (custom-made or pre-existing) are going to have to be able to handle more than just 1970s-vintage 7-bit ASCII. If that support has to be there, it seems a shame not to make use of it in the language itself where that would be helpful. [1,2] 2. As I understand it, support for (int,int)->float division is being added to help people who think that arithmetic on computers ought to behave like arithmetic did in grade 4. I have no data to support this, but I expect that such people will understand the divided-by sign more readily than a forward slash. [3] 3. I also expect, again without data, that '//' vs. '/' will lead to as high a proportion of errors as '==' vs. '='. These errors may even prove harder to track down, since the result is a slightly wrong answer instead of a state change leading (often) to early loop termination or something equally noticeable. Greg [1] I'm aware that there are encoding issues (the replies to my first post mentioned at least two different ways for "my" divided-by sign to display), but this is an issue that will have to be tackled in general in order to support Unicode anyway. [2] I'd be grateful if everyone posting objections along the lines of, "But what about emacs/vi/some other favored bit of legacy technology?" could also indicate whether they use lynx(1) as their web browser, and/or are sure that 100% of the web pages they have built are accessible to people who don't have bit-mapped graphics. I am *not* trying to be inflammatory, I just think that if a technology is taken for granted as part of one tool, then it is legitimate to ask that it be taken for granted in another. [3] Please note that I am not asking for a multiplication sign, a square root sign, or any of APL's mystic runes. From fdrake at acm.org Tue Apr 4 20:27:08 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 14:27:08 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <14570.13308.147675.434718@seahag.cnri.reston.va.us> gvwilson at nevex.com writes: > 1. If P3K source is allowed to be Unicode, then all Python programming > systems (custom-made or pre-existing) are going to have to be able > to handle more than just 1970s-vintage 7-bit ASCII. If that support > has to be there, it seems a shame not to make use of it in the language > itself where that would be helpful. [1,2] I don't recall any requirement that the host be able to deal with Unicode specially (meaning "other than as binary data"). Perhaps I missed that? > 2. As I understand it, support for (int,int)->float division is being > added to help people who think that arithmetic on computers ought to > behave like arithmetic did in grade 4. I have no data to support this, > but I expect that such people will understand the divided-by sign more > readily than a forward slash. [3] I don't think the division sign itself is a problem. Re-training experianced programmers might be; I don't think there's any intention of alienating that audience. > 3. I also expect, again without data, that '//' vs. '/' will lead to as > high a proportion of errors as '==' vs. '='. These errors may even > prove harder to track down, since the result is a slightly wrong answer > instead of a state change leading (often) to early loop termination or > something equally noticeable. A agree. > [3] Please note that I am not asking for a multiplication sign, a square > root sign, or any of APL's mystic runes. As I indicated above, I don't think the specific runes are the problem (outside of programmer alienation). The *biggest* problem (IMO) is that the runes are not on our keyboards. This has nothing to do with the appropriateness of the runes to the semantic meanings bound to them in the language definition, this has to do convenience for typing without any regard to cultured habits in the current programmer population. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson at nevex.com Tue Apr 4 20:38:29 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 14:38:29 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14570.13308.147675.434718@seahag.cnri.reston.va.us> Message-ID: Hi, Fred; thanks for your mail. > gvwilson at nevex.com writes: > > 1. If P3K source is allowed to be Unicode > I don't recall any requirement that the host be able to deal with > Unicode specially (meaning "other than as binary data"). Perhaps I > missed that? I'm sorry, I didn't mean to imply that this decision had been taken --- hence the "if". However, allowing Unicode in source doesn't seem to have slowed down adoption of Java... :-) > I don't think the division sign itself is a problem. Re-training > experianced programmers might be; I don't think there's any intention > of alienating that audience. I think this comes down to spin. If this is presented as, "We're adding a symbol that isn't on your keyboard in order to help newbies," it'll be flamed. If it's presented as, "Python is the first scripting language to fully embrace internationalization, so get with the twenty-first century!" (or something like that), I could see it getting a much more positive response. I also think that, despite their grumbling, experienced programmers are pretty adaptable. After all, I switch from Emacs Lisp to Python to C++ half-a-dozen times a day... :-) > The *biggest* problem (IMO) is that the runes are not on our > keyboards. Agreed. Perhaps non-native English speakers could pitch in and describe how easy/difficult it is for them to (for example) put properly-accented Spanish comments in code? Thanks, Greg From klm at digicool.com Tue Apr 4 20:48:52 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 4 Apr 2000 14:48:52 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14570.13308.147675.434718@seahag.cnri.reston.va.us> Message-ID: On Tue, 4 Apr 2000, Fred L. Drake, Jr. wrote: > gvwilson at nevex.com writes: > > 1. If P3K source is allowed to be Unicode, then all Python programming > > systems (custom-made or pre-existing) are going to have to be able > > to handle more than just 1970s-vintage 7-bit ASCII. If that support > > has to be there, it seems a shame not to make use of it in the language > > itself where that would be helpful. [1,2] > [...] > As I indicated above, I don't think the specific runes are the > problem (outside of programmer alienation). The *biggest* problem > (IMO) is that the runes are not on our keyboards. This has nothing to > do with the appropriateness of the runes to the semantic meanings > bound to them in the language definition, this has to do convenience > for typing without any regard to cultured habits in the current > programmer population. In general, it seems that there are some places where a programming language implementation should not be on the leading edge, and this is one. I think we'd have to be very confident that this new division sign (or whatever) is going to be in ubiquitous use, on everyone's keyboard, etc, before we could even consider making it a necessary part of the standard language. Do you have that confidence? Ken Manheimer klm at digicool.com From gvwilson at nevex.com Tue Apr 4 20:53:52 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 14:53:52 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: HI, Ken; thanks for your mail. > In general, it seems that there are some places where a programming > language implementation should not be on the leading edge, and this is > one. I think we'd have to be very confident that this new division > sign (or whatever) is going to be in ubiquitous use, on everyone's > keyboard, etc, before we could even consider making it a necessary > part of the standard language. Do you have that confidence? I wouldn't expect the division sign to be on keyboards. On the other hand, I would expect that having to type a two-stroke sequence every once in a while would help native English speakers appreciate what people in other countries sometimes have to go through in order to spell their names correctly... :-) Greg From pf at artcom-gmbh.de Tue Apr 4 20:48:11 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 4 Apr 2000 20:48:11 +0200 (MEST) Subject: .ini files - was Re: [Python-Dev] DLL in the system dir In-Reply-To: <38EA1303.B393D7F8@interet.com> from "James C. Ahlstrom" at "Apr 4, 2000 12: 6:27 pm" Message-ID: Hi! [...] > > > 3. Contra MS guidelines, put all the config options you can in a text file > > > C:\name_of_app\name_of_app.ini > > > instead of the registry. James C. Ahlstrom: > This is an excellent practice, and there should be a standard module to > deal > with .ini files. [...] One half of it is already there in the standard library: 'ConfigParser'. From my limited knowledge about windows (shrug) this can at least read .ini files. Writing this info again out to a file shouldn't be too hard. Regards, Peter From effbot at telia.com Tue Apr 4 20:57:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 20:57:17 +0200 Subject: [Python-Dev] a slightly more coherent case References: Message-ID: <024401bf9e67$9a1a53e0$34aab5d4@hagrid> gvwilson at nevex.com wrote: > > The *biggest* problem (IMO) is that the runes are not on our > > keyboards. > > Agreed. Perhaps non-native English speakers could pitch in and describe > how easy/difficult it is for them to (for example) put properly-accented > Spanish comments in code? you know, people who do use foreign languages a lot tend to use keyboards designed for their language. I have keys for all swedish characters on my keyboard -- att skriva korrekt svenska p? mitt tangentbord ?r hur enkelt som helst... to type less common latin 1 characters, I ?s??ll? o?l? have to use tw? keys -- one "d??d ke?" for the ?ccent, f?llow?d by th? c?rre- sp?nding ch?r?ct?r. (visst, ? och ? anv?nds ibland i svensk text, och fanns f?rr ofta som separata tangenter -- i alla fall innan pc'n kom och f?rst?rde allting). besides, the use of indentation causes enough problems when doing trivial things like mailing, posting, and typesetting Python code. adding odd characters to the mix won't exactly help... From gmcm at hypernet.com Tue Apr 4 21:46:34 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 15:46:34 -0400 Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <1257244899-5853377@hypernet.com> Greg Wilson wrote: > I wouldn't expect the division sign to be on keyboards. On the other hand, > I would expect that having to type a two-stroke sequence every once in a > while would help native English speakers appreciate what people in other > countries sometimes have to go through in order to spell their names > correctly... Certain stuffy (and now deceased) members of my family, despite emigrating to the Americas during the Industrial Revolution, insisted that the proper spelling of McMillan involved elevating the "c". Wonder if there's a unicode character for that, so I can get righteously indignant whenever people fail to use it. Personally, I'm delighted when people don't add extra letters to my name, and even that's pretty silly, since all the variations on M*M*ll*n come down to how some government clerk chose to spell it. - Gordon From guido at python.org Tue Apr 4 21:49:32 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 15:49:32 -0400 Subject: [Python-Dev] Re: Unicode and comparisons In-Reply-To: Your message of "Tue, 04 Apr 2000 17:44:17 +0200." <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> References: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> Message-ID: <200004041949.PAA13102@eric.cnri.reston.va.us> > I always thought it is a core property of cmp that it works between > all objects. Not any more. Comparisons can raise exceptions -- this has been so since release 1.5. This is rarely used between standard objects, but not unheard of; and class instances can certainly do anything they want in their __cmp__. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Apr 4 21:51:14 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 15:51:14 -0400 (EDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14569.64240.80221.587062@beluga.mojam.com> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> <14569.64240.80221.587062@beluga.mojam.com> Message-ID: <14570.18354.151349.452329@seahag.cnri.reston.va.us> Skip Montanaro writes: > arguments: host and port. It never occurred to me that there would even be > a one-argument version. After all, why look at the docs for help if what > you're doing already works? And it never occurred to me that there would be two args; I vaguely recall the C API having one argument (a structure). Ah, well. I've patched up the documents to warn those who expect intuitive APIs. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Tue Apr 4 21:57:47 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 15:57:47 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Sun, 02 Apr 2000 10:37:11 +0200." <004701bf9c7e$a5045480$34aab5d4@hagrid> References: <004701bf9c7e$a5045480$34aab5d4@hagrid> Message-ID: <200004041957.PAA13168@eric.cnri.reston.va.us> > one of my side projects for SRE is to create a regex-compatible > frontend. since both engines have NFA semantics, this mostly > involves writing an alternate parser. > > however, when I started playing with that, I completely forgot > about the regex.set_syntax() function. supporting one extra > syntax isn't that much work, but a whole bunch of them? > > so what should we do? > > 1. completely get rid of regex (bjorn would love that, > don't you think?) (Who's bjorn?) > 2. remove regex.set_syntax(), and tell people who've > used it that they're SOL. > > 3. add all the necessary flags to the new parser... > > 4. keep regex around as before, and live with the > extra code bloat. > > comments? I'm for 4, then deprecating it, and eventually switching to 1. This saves you effort debugging compatibility with an obsolete module. If it ain't broken, don't "fix" it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 4 22:10:07 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:10:07 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> Message-ID: <200004042010.QAA13180@eric.cnri.reston.va.us> [me] > > One more thing that I just realized. There are a few Python extension > > modules (_tkinter and the new pyexpat) that rely on external DLLs: > > _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs > > xmlparse.dll and xmltok.dll. [Jim A] > Welcome to the club. I'm not sure what you mean by this? > > If I understand correctly how the path rules work, these have to be on > > PATH too (although the pyd files don't have to be). This worries me > > -- these aren't official MS DLLs and neither are the our own, so we > > could easily stomp on some other app's version of the same... > > (The tcl folks don't change their filename when the 3rd version digit > > changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) > > > > Is there a better solution? > > This is a daily annoyance and risk in the Windows world. If you require > Tk, then you need to completely understand how to produce a valid Tk > distribution. Same with PIL (which requires Tk). Often you won't > know that some pyd requires some other obscure DLL. To really do this > you need something high level. Like rpm's on linux. On Windows, people > either write complex install programs with Wise et al, or run third > party installers provided with (for example) Tk from simpler install > scripts. It is then up to the Tk people to know how to install it, and > how to deal with version upgrades. Calculating the set of required DLLs isn't the problem. I have a tool (Dependency Viewer) that shows me exactly the dependencies (it recurses down any DLLs it finds and shows their dependencies too, using the nice MFC tree widget). The problem is where should I install these extra DLLs. In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer from Scriptics). The feedback over the past year showed that this was a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk installations stomped over it, people chose to install Tcl/Tk on a different volume than Python, etc. In 1.6, I am copying the necessary files from the Tcl/Tk installation into the Python directory. This actually installs fewer files than the full Tcl/Tk installation (but you don't get the Tcl/Tk docs). It gives me complete control over which Tcl/Tk version I use without affecting other Tcl/Tk installations that might exist. This is how professional software installations deal with inclusions. However the COM DLL issue might cause problems: if the Python directory is not in the search path because we're invoked via COM, there are only two places where the Tcl/Tk DLLs can be put so they will be found: in the system directory or somewhere along PATH. Assuming it is still evil to modify PATH, we would end up with Tcl/Tk in the system directory, where it could once again interfere with (or be interfered by) other Tcl/Tk installations! Someone suggested that COM should not use Tcl/Tk, and then the Tcl/Tk DLLs can live in the Python tree. I'm not so sure -- I can at least *imagine* that someone would use Tcl/Tk to give their COM object a bit of a GUI. Moreover, this argument doesn't work for pyexpat -- COM apps are definitely going to expect to be able to use pyexpat! It's annoying. I have noticed, however, that you can use os.putenv() (or assignment to os.environ[...]) to change the PATH environment variable. The FixTk.py script in Python 1.5.2 used this -- it looked in a few places for signs of a Tcl/Tk installation, and then adjusted PATH to include the proper directory before trying to import _tkinter. Maybe there's a solution here? The Python DLL could be the only thing in the system directory, and from the registry it could know where the Python directory was. It could then prepend this directory to PATH. This is not so evil as mucking with PATH at install time, I think, since it is only done when Python16.dll is actually loaded. Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run out of environment space? Wouldn't it break other COM apps? Is the PATH truly separate per process? --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 4 22:11:02 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 22:11:02 +0200 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <000201bf9df8$a066b8c0$6d2d153f@tim><200004040703.DAA11944@eric.cnri.reston.va.us><14569.64035.285070.760022@seahag.cnri.reston.va.us><14569.64240.80221.587062@beluga.mojam.com> <14570.18354.151349.452329@seahag.cnri.reston.va.us> Message-ID: <02d501bf9e71$e85f0e60$34aab5d4@hagrid> Fred L. Drake wrote: > Skip Montanaro writes: > > arguments: host and port. It never occurred to me that there would even be > > a one-argument version. After all, why look at the docs for help if what > > you're doing already works? > > And it never occurred to me that there would be two args; I vaguely > recall the C API having one argument (a structure). Ah, well. I've > patched up the documents to warn those who expect intuitive APIs. ;) while you're at it, and when you find the time, could you perhaps grep for "pair" and change places which use "pair" to mean a tuple with two elements to actually say "tuple" or "2-tuple"... after all, numerous people have claimed that stuff like "a pair (host, port)" isn't enough to make them understand that "pair" actually means "tuple". unless pair refers to a return value, of course. and only if the function doesn't use the optional argument syntax, of course. etc. (I suspect they're making it up as they go, but that's another story...) From skip at mojam.com Tue Apr 4 21:15:26 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 4 Apr 2000 14:15:26 -0500 (CDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <14570.16206.210676.756348@beluga.mojam.com> Greg> On the other hand, I would expect that having to type a two-stroke Greg> sequence every once in a while would help native English speakers Greg> appreciate what people in other countries sometimes have to go Greg> through in order to spell their names correctly... I'm sure this is a practical problem, but aren't there country-specific keyboards available to Finnish, Spanish, Russian and non-English-speaking users to avoid precisely these problems? I grumble every time I have to enter some accented characters, but that's just because I do it rarely and use a US ASCII keyboard. I suspect Fran?ois Pinard has a keyboard with a "?" key. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From effbot at telia.com Tue Apr 4 22:23:06 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 22:23:06 +0200 Subject: [Python-Dev] a slightly more coherent case References: <14570.16206.210676.756348@beluga.mojam.com> Message-ID: <02eb01bf9e73$9d4c1560$34aab5d4@hagrid> Skip wrote: > I'm sure this is a practical problem, but aren't there country-specific > keyboards available to Finnish, Spanish, Russian and non-English-speaking > users to avoid precisely these problems? fwiw, my windows box supports about 80 different language-related keyboard layouts. that's western european and american keyboard layouts only, of course (mostly latin-1). haven't installed all the others... From gmcm at hypernet.com Tue Apr 4 22:35:23 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 16:35:23 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004042010.QAA13180@eric.cnri.reston.va.us> References: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> Message-ID: <1257241967-6030557@hypernet.com> [Guido] > Someone suggested that COM should not use Tcl/Tk, and then the Tcl/Tk > DLLs can live in the Python tree. I'm not so sure -- I can at least > *imagine* that someone would use Tcl/Tk to give their COM object a bit > of a GUI. Moreover, this argument doesn't work for pyexpat -- COM > apps are definitely going to expect to be able to use pyexpat! Me. Would you have any sympathy for someone who wanted to make a GUI an integral part of a web server? Or would you tell them to get a brain and write a GUI that talks to the web server? Same issue. (Though not, I guess, for pyexpat). > It's annoying. > > I have noticed, however, that you can use os.putenv() (or assignment > to os.environ[...]) to change the PATH environment variable. The > FixTk.py script in Python 1.5.2 used this -- it looked in a few places > for signs of a Tcl/Tk installation, and then adjusted PATH to include > the proper directory before trying to import _tkinter. Maybe there's > a solution here? The Python DLL could be the only thing in the system > directory, and from the registry it could know where the Python > directory was. It could then prepend this directory to PATH. This is > not so evil as mucking with PATH at install time, I think, since it is > only done when Python16.dll is actually loaded. The drawback of relying on PATH is that then some other jerk (eg you, last year ) will stick something of the same name in the system directory and break your installation. > Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run > out of environment space? Wouldn't it break other COM apps? Is the > PATH truly separate per process? Are there any exceptions to this: - dynamically load a .pyd - .pyd implicitly loads the .dll ? If that's always the case, then you can temporarily cd to the right directory before the dynamic load, and the implicit load should work. As for the others: probably not; can't see how; yes. - Gordon From guido at python.org Tue Apr 4 22:45:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:45:12 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:35:23 EDT." <1257241967-6030557@hypernet.com> References: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> <1257241967-6030557@hypernet.com> Message-ID: <200004042045.QAA13343@eric.cnri.reston.va.us> > Me. Would you have any sympathy for someone who wanted > to make a GUI an integral part of a web server? Or would you > tell them to get a brain and write a GUI that talks to the web > server? Same issue. (Though not, I guess, for pyexpat). Not all COM objects are used in web servers. Some are used in GUI contexts (aren't Word and Excel and even IE really mostly COM objects these days?). > > It's annoying. > > > > I have noticed, however, that you can use os.putenv() (or assignment > > to os.environ[...]) to change the PATH environment variable. The > > FixTk.py script in Python 1.5.2 used this -- it looked in a few places > > for signs of a Tcl/Tk installation, and then adjusted PATH to include > > the proper directory before trying to import _tkinter. Maybe there's > > a solution here? The Python DLL could be the only thing in the system > > directory, and from the registry it could know where the Python > > directory was. It could then prepend this directory to PATH. This is > > not so evil as mucking with PATH at install time, I think, since it is > > only done when Python16.dll is actually loaded. > > The drawback of relying on PATH is that then some other jerk > (eg you, last year ) will stick something of the same > name in the system directory and break your installation. Yes, that's a problem, especially since it appears that PATH is searched *last*. (I wonder if this could explain the hard-to-reproduce crashes that people report when quitting IDLE?) > > Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run > > out of environment space? Wouldn't it break other COM apps? Is the > > PATH truly separate per process? > > Are there any exceptions to this: > - dynamically load a .pyd > - .pyd implicitly loads the .dll > ? I think this is always the pattern (except that some DLLs will implicitly load other DLLs, and so on). > If that's always the case, then you can temporarily cd to the > right directory before the dynamic load, and the implicit load > should work. Hm, I would think that the danger of temporarily changing the current directory is at least as big as that of changing PATH. (What about other threads? What if you run into an error and don't get a chance to cd back?) > As for the others: probably not; can't see how; yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From jim at interet.com Tue Apr 4 22:53:20 2000 From: jim at interet.com (James C. Ahlstrom) Date: Tue, 04 Apr 2000 16:53:20 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> Message-ID: <38EA5640.D3FC112F@interet.com> Guido van Rossum wrote: > [Jim A] > > Welcome to the club. > > I'm not sure what you mean by this? It sounded like you were joining the Microsoft afflicted... > In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer > from Scriptics). The feedback over the past year showed that this was > a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk > installations stomped over it, people chose to install Tcl/Tk on a > different volume than Python, etc. My first thought was that this was the preferred solution. It is up to Scriptics to provide an installer for Tk and for Tk customers to use it. Any problems with installing Tk are Scriptics' problem. I don't know the reasons it stomped over other installs etc. But either Tk customers are widely using non-standard installs, or the Scriptics installer is broken, or there is no such thing as a standard Tk install. This is fundamentally a Scriptics problem, but I understand it is a Python problem too. There may still be the problem that a standard Tk install might not be accessible to Python. This needs to be worked out with Scriptics. An environment variable could be set, the registry used etc. Assuming there is a standard Tk install and a way for external apps to use Tk, then we can still use the (fixed) Scriptics installer. > Assuming it is still evil to modify PATH, we would end up with Tcl/Tk > in the system directory, where it could once again interfere with (or > be interfered by) other Tcl/Tk installations! I seems to me that the correct Tk install script would put Tk DLL's in the system dir, and use the registry to find the libraries and other needed files. The exe's could go in a program directory somewhere. This is what I have to come to expect from professional software for DLL's which are expected to be used from multiple apps, as opposed to DLL's which are peculiar to one app. If the Tk installer did this, Tk would Just Work, and it would Just Work with third party apps (Tk clients) like Python too. Sorry, I have to run to a class. To be continued tomorrow.... JimA From guido at python.org Tue Apr 4 22:58:08 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:58:08 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:53:20 EDT." <38EA5640.D3FC112F@interet.com> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> <38EA5640.D3FC112F@interet.com> Message-ID: <200004042058.QAA13437@eric.cnri.reston.va.us> > > [Jim A] > > > Welcome to the club. [me] > > I'm not sure what you mean by this? > > It sounded like you were joining the Microsoft afflicted... Indeed :-( > > In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer > > from Scriptics). The feedback over the past year showed that this was > > a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk > > installations stomped over it, people chose to install Tcl/Tk on a > > different volume than Python, etc. > > My first thought was that this was the preferred solution. It is up > to Scriptics to provide an installer for Tk and for Tk customers > to use it. Any problems with installing Tk are Scriptics' problem. > I don't know the reasons it stomped over other installs etc. But > either Tk customers are widely using non-standard installs, or > the Scriptics installer is broken, or there is no such thing > as a standard Tk install. This is fundamentally a Scriptics > problem, but I understand it is a Python problem too. > > There may still be the problem that a standard Tk install might not > be accessible to Python. This needs to be worked out with Scriptics. > An environment variable could be set, the registry used etc. Assuming > there is a standard Tk install and a way for external apps to use Tk, > then we can still use the (fixed) Scriptics installer. The Tk installer has had these problems for a long time. I don't want to have to argue with them, I think it would be a waste of time. > > Assuming it is still evil to modify PATH, we would end up with Tcl/Tk > > in the system directory, where it could once again interfere with (or > > be interfered by) other Tcl/Tk installations! > > I seems to me that the correct Tk install script would put Tk > DLL's in the system dir, and use the registry to find the libraries > and other needed files. The exe's could go in a program directory > somewhere. This is what I have to come to expect from professional > software for DLL's which are expected to be used from multiple > apps, as opposed to DLL's which are peculiar to one app. If > the Tk installer did this, Tk would Just Work, and it would > Just Work with third party apps (Tk clients) like Python too. OK, you go argue with the Tcl folks. They create a vaguely unix-like structure under c:\Program Files\Tcl: subdirectories lib, bin, include, and then they dump their .exe and their .dll files in the bin directory. They also try to munge PATH to include their bin directory, but that often doesn't work (not on Windows 95/98 anyway). --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Tue Apr 4 23:14:59 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 4 Apr 2000 23:14:59 +0200 (MEST) Subject: [Python-Dev] Re: Unicode and comparisons In-Reply-To: <200004041949.PAA13102@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 4, 2000 3:49:32 pm" Message-ID: Hi! Guido van Rossum: > > I always thought it is a core property of cmp that it works between > > all objects. > > Not any more. Comparisons can raise exceptions -- this has been so > since release 1.5. This is rarely used between standard objects, but > not unheard of; and class instances can certainly do anything they > want in their __cmp__. Python 1.6a1 (#6, Apr 2 2000, 02:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> a = '1' >>> b = 2 >>> a < b 0 >>> a > b # Newbies are normally baffled here 1 >>> a = '?' >>> b = u'?' >>> a < b Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: unexpected end of data IMO we will have a *very* hard to time to explain *this* behaviour to newbiews! Unicode objects are similar to normal string objects from the users POV. It is unintuitive that objects that are far less similar (like for example numbers and strings) compare the way they do now, while the attempt to compare an unicode string with a standard string object containing the same character raises an exception. Mit freundlichen Gr??en (Regards), Peter (BTW: using an 12year old US keyboard and a custom xmodmap all the time to write umlauts lots of other interisting chars: ?? ? ?? ?? ? ? ?? ?? ?! ;-) From mal at lemburg.com Tue Apr 4 18:47:51 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 18:47:51 +0200 Subject: [Python-Dev] Re: Unicode and comparisons References: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> Message-ID: <38EA1CB7.BBECA305@lemburg.com> "Martin v. Loewis" wrote: > > > Question: is this behaviour acceptable or should I go even further > > and mask decoding errors during compares and contains tests too ? > > I always thought it is a core property of cmp that it works between > all objects. It does, but not necessarily without exceptions. I could easily mask the decoding errors too and then have cmp() work exactly as for strings, but the outcome may be different to what the user had expected due to the failing conversion. Sorting order may then look quite unsorted... > Because of that, > > >>> x=[u'1','a???'] > >>> x.sort() > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > fails. As always in cmp, I'd expect to get a consistent outcome here > (ie. cmp should give a total order on objects). > > OTOH, I'm not so sure why cmp between plain and unicode strings needs > to perform UTF-8 conversion? IOW, why is it desirable that > > >>> 'a' == u'a' > 1 This is needed to enhance inter-operability between Unicode and normal strings. Note that they also have the same hash value (provided both use the ASCII code range), making them interchangeable in dictionaries: >>> d={u'a':1} >>> d['a'] = 2 >>> d[u'a'] 2 >>> d['a'] 2 This is per design. > Anyway, I'm not objecting to that outcome - I only think that, to get > cmp consistent, it may be necessary to drop this result. If it is not > necessary, the better. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Apr 4 23:47:16 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 23:47:16 +0200 Subject: [Python-Dev] Re: Unicode and comparisons References: Message-ID: <38EA62E4.7E2B0E43@lemburg.com> Peter Funk wrote: > > Hi! > > Guido van Rossum: > > > I always thought it is a core property of cmp that it works between > > > all objects. > > > > Not any more. Comparisons can raise exceptions -- this has been so > > since release 1.5. This is rarely used between standard objects, but > > not unheard of; and class instances can certainly do anything they > > want in their __cmp__. > > Python 1.6a1 (#6, Apr 2 2000, 02:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> a = '1' > >>> b = 2 > >>> a < b > 0 > >>> a > b # Newbies are normally baffled here > 1 > >>> a = '?' > >>> b = u'?' > >>> a < b > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: unexpected end of data > > IMO we will have a *very* hard to time to explain *this* behaviour > to newbiews! > > Unicode objects are similar to normal string objects from the users POV. > It is unintuitive that objects that are far less similar (like for > example numbers and strings) compare the way they do now, while the > attempt to compare an unicode string with a standard string object > containing the same character raises an exception. I don't think newbies will really want to get into the UTF-8 business right from the start... when they do, they probably know about the above problems already. Changing this behaviour to silently swallow the decoding error would cause more problems than do good, IMHO. Newbies sure would find (u'a' not in 'a???') == 1 just as sursprising... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Wed Apr 5 00:51:01 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 08:51:01 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: > I wonder if the pathname should also be an 8+3 (max) > name, so that it > can be relyably typed into a DOS window. To be honest, I can not see a good reason for this any more. The installation package only works on Win95/98/NT/2000 - all of these support long file names on all their supported file systems. So, any where that this installer will run, the "command prompt" on this system will correctly allow "cd \Python-1.6-and-any-thing-else-I-like-ly" :-) [OTOH, I tend to prefer "Python1.6" purely from an "easier to type" POV] Mark. From mhammond at skippinet.com.au Wed Apr 5 00:59:24 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 08:59:24 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <1257259699-4963377@hypernet.com> Message-ID: [Gordon writes] > Things that go in the system directory should maintain > backwards compatibility. For a DLL, that means all the old > entry points are still there, in the same order with new ones at > the end. Actually, the order is not important unless people link to you by ordinal (in which case you are likely to specify the ordinal in the .def file anyway). The Win32 loader is smart enough to be able to detect that all ordinals are the same as when it was linked, and use a fast-path. If ordinal name-to-number mappings have changed, the runtime loader takes a slower path that fixes up these differences. So what you suggest is ideal, but not really necessary. > For Python, there's no crying need to conform for > now, but if (when?) embedding Python becomes ubiquitous, > this (or some other scheme) may need to be considered. I believe Python will already do this, almost by accident, due to the conservative changes with each minor Python release. Eg, up until Python 1.6 was branded as 1.6, I was still linking my win32all extensions against the CVS version. When I remembered I would switch back to the 1.5.2 release ones, but when I forgot I never had a problem. People running a release version 1.5.2 could happily use my extensions linked with the latest 1.5.2+ binaries. We-could-even-blame-the-time-machine-at-a-strecth-ly, Mark. From mhammond at skippinet.com.au Wed Apr 5 01:08:58 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 09:08:58 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <1257257855-5074057@hypernet.com> Message-ID: > > Where should I put tk83.dll etc.? In the Python\DLLs > directory, where > > _tkinter.pyd also lives? > > Won't work (unless there are some tricks in MSVC 6 I don't > know about). Assuming no one is crazy enough to use Tk in a > COM server, (or rather, that their insanity need not be catered > to), then I'd vote for the directory where python.exe and > pythonw.exe live. What we can do is have Python itself use LoadLibraryEx() to load the .pyd files. This _will_ allow any dependant DLLs to be found in the same directory as the .pyd. [And as I mentioned, if the whole world would use LoadLibraryEx(), our problem would go away] LoadLibraryEx() is documented as working on all Win9x and NT from 3.1. From guido at python.org Wed Apr 5 01:14:22 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 19:14:22 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Wed, 05 Apr 2000 09:08:58 +1000." References: Message-ID: <200004042314.TAA15407@eric.cnri.reston.va.us> > What we can do is have Python itself use LoadLibraryEx() to load the > .pyd files. This _will_ allow any dependant DLLs to be found in the > same directory as the .pyd. [And as I mentioned, if the whole world > would use LoadLibraryEx(), our problem would go away] Doh! [Sound of forehead being slapped violently] We already use LoadLibraryEx()! So we can drop all the dependent dlls in the DLLs directory which has the PYD files as well. Case closed. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Wed Apr 5 03:20:44 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 4 Apr 2000 18:20:44 -0700 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Message-ID: > Greg Stein wrote: > > > we install our python distribution under the \py, > > > and we get lot of positive responses. as far as I remember, > > > nobody has ever reported problems setting up the path... > > > > *shrug* This doesn't dispute the standard Windows recommendation to > > install software into Program Files. > > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. > > to quote a Linus Torvalds, "bad standards _should_ be broken" > > (after all, Microsoft doesn't put their own command line applications > down there -- there's no "\Program Files" [sub]directory in the default > PATH, at least not on any of my boxes. maybe they've changed that > in Windows 2000?) Sorry I'm late -- I've been out of town. Just two FYIs: 1) ActivePerl goes into /Perl5.6, and my guess is that it's based on user feedback. 2) I've switched to changing the default installation to C:/Python in all my installs, and am much happier since I made that switchover. --david From DavidA at ActiveState.com Wed Apr 5 03:24:57 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 4 Apr 2000 18:24:57 -0700 Subject: FW: [Python-Dev] Windows installer pre-prelease Message-ID: Forgot to cc: python-dev on my reply to Greg -----Original Message----- From: David Ascher [mailto:DavidA at ActiveState.com] Sent: Tuesday, April 04, 2000 6:23 PM To: Greg Stein Subject: RE: [Python-Dev] Windows installer pre-prelease > Valid point. But there are other solutions, too. VC distributes a thing > named "VCVARS.BAT" to set up paths and other environ vars. Python could > certainly do the same thing (to overcome the embedded-space issue). I hate VCVARS -- it doesn't work from my Cygnus shell, it has to be invoked by the user as opposed to automatically started by the installer, etc. > Depends on the audience of that standard. Programmers: yah. Consumers? > They just want the damn thing to work like they expect it to. That > expectation is usually "I can find my programs in Program Files." In my experience, the /Program Files location works fine for tools which have strictly GUI interfaces and which are launched by the Start menu or other GUI mechanisms. Anything which you might need to invoke at the command line lives best in a non-space-containing path, IMO of course. --david From guido at python.org Wed Apr 5 03:26:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 21:26:12 -0400 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: Your message of "Tue, 04 Apr 2000 18:20:44 PDT." References: Message-ID: <200004050126.VAA15836@eric.cnri.reston.va.us> I've pretty much made my mind up about this one. Mark's mention of LoadLibraryEx() solved the last puzzle. I'm making the changes to the installer and hope to release alpha 2 with these changes later this week. - Default install root is \Python1.6 on the same drive as the default Program Files - MSVC*RT.DLL and PYTHON16.DLL go into the system directory; the MSV*RT.DLL files are only replaced if we bring a newer or same version - I'm using Tcl/Tk 8.2.3 instead of 8.3.0; the latter often crashes when closing a window - The Tcl/Tk and expat DLLs go in the DLLs subdirectory of the install root Thanks a lot for your collective memory!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Wed Apr 5 04:19:18 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 19:19:18 -0700 Subject: [Python-Dev] re: division In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > but harder to track down, since you'll have to scrutinize values very > carefully to spot the difference. Haven't done any field tests, > though...) My favourite symbol for integer division is _/ (read it as "floor-divide"). It makes visually apparent what is going on. -- ?!ng "There's no point in being grown up if you can't be childish sometimes." -- Dr. Who --KAC01325.954869821/skuld.lfw.org-- --KAD01325.954869821/skuld.lfw.org-- From ping at lfw.org Wed Apr 5 04:19:09 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 19:19:09 -0700 Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: Message-ID: On Sun, 2 Apr 2000, Peter Funk wrote: > > As I read this my first thoughts were: > "Huh? Is that really true? To me this sounds like a april fools joke. > But to be careful I checked first before I read on: My favourite part was the distutils section. The great thing about this announcement is that it would have been almost believable if we were talking about any language other than Python! -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell --JAA00536.954694740/skuld.lfw.org-- --JAB00536.954694740/skuld.lfw.org-- From ping at lfw.org Sun Apr 2 18:58:57 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 2 Apr 2000 09:58:57 -0700 (PDT) Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: Message-ID: On Sun, 2 Apr 2000, Peter Funk wrote: > > As I read this my first thoughts were: > "Huh? Is that really true? To me this sounds like a april fools joke. > But to be careful I checked first before I read on: My favourite part was the distutils section. The great thing about this announcement is that it would have been almost believable if we were talking about any language other than Python! -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell --JAA00536.954694740/skuld.lfw.org-- --JAB00536.954694740/skuld.lfw.org-- From ping at lfw.org Tue Apr 4 19:25:07 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 10:25:07 -0700 (PDT) Subject: [Python-Dev] re: division In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > but harder to track down, since you'll have to scrutinize values very > carefully to spot the difference. Haven't done any field tests, > though...) My favourite symbol for integer division is _/ (read it as "floor-divide"). It makes visually apparent what is going on. -- ?!ng "There's no point in being grown up if you can't be childish sometimes." -- Dr. Who --KAC01325.954869821/skuld.lfw.org-- --KAD01325.954869821/skuld.lfw.org-- From tim_one at email.msn.com Wed Apr 5 06:57:27 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040525.BAA11585@eric.cnri.reston.va.us> Message-ID: <000601bf9ebb$723f7ea0$3e2d153f@tim> [Guido] > ... > (PATH on Win9x is still a mystery to me. You're not alone. > Is it really true that in order to change PATH an installer has > to edit autoexec.bat? AFAIK, yes. A specific PATH setting can be associated with a specific exe via the registry, though. > ... > Anything that claims to change PATH for me doesn't seem to do so. Almost always the same here; suspect documentation rot. > Could I have screwed something up? Yes, but I doubt it. > ... > Didn't someone tell me that at least on Windows 2000 installing > app-specific files (as opposed to MS-provided files) in the system > directory is a no-no? MS was threatening to do this in (the then-named) NT5, but I believe they backed down. Don't have (the now-named) W2000 here to check on for sure, though. From tim_one at email.msn.com Wed Apr 5 06:57:33 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:33 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: <000701bf9ebb$740c4f60$3e2d153f@tim> [Mark Hammond] > ... > However, I would tend to go for "\name_of_app" rooted from the > Windows drive. It is likely that this will be the default drive > when a command prompt is open, so a simple "cd \python1.6" will > work. This is also generally the same drive the default "Program > Files" is on too. Yes, "C:\" doesn't literally mean "C:\" any more than "Program Files" literally means "Program Files" <0.1 wink>. By "C:\" I meant "drive where the thing we conveniently but naively call 'Program Files' lives"; naming the registry key whose value is this thing is more accurate but less helpful; the installer will have some magic predefined name which would be most helpful to give, but without the installer docs here I can't guess what that is. > ... > [Interestingly, Windows 2000 has a system process that continually > monitors the system directory. If it detects that a "protected > file" has been changed, it promptly copies the original back over > the top! I believe the MSVC*.dlls are in the protected list, so can > only be changed with a service pack release anyway. Everything > _looks_ like it updates - Windows just copies it back!] Thanks for making my day . From tim_one at email.msn.com Wed Apr 5 06:57:36 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:36 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <000801bf9ebb$75c85740$3e2d153f@tim> [Guido] > ... > I wonder if the pathname should also be an 8+3 (max) name, so that it > can be relyably typed into a DOS window. Yes, but for a different reason: Many sites still use older Novell file servers that screw up on non-8.3 names in a variety of unpleasant ways. Just went thru this at Dragon again, where the language modeling group created a new file format with a 4-letter extension; they had to back off to 3 letters because half the company couldn't get at the new files. BTW, two years ago it was much worse, and one group started using Python instead of Java partly because .java files didn't work over the network at all! From nascheme at enme.ucalgary.ca Wed Apr 5 07:19:45 2000 From: nascheme at enme.ucalgary.ca (Neil Schemenauer) Date: Tue, 4 Apr 2000 23:19:45 -0600 Subject: [Python-Dev] Re: A surprising case of cyclic trash Message-ID: <20000404231945.A16978@acs.ucalgary.ca> An even simpler example: >>> import sys >>> d = {} >>> print sys.getrefcount(d) 2 >>> exec("def f(): pass\n") in d >>> print sys.getrefcount(d) 3 >>> d.clear() >>> print sys.getrefcount(d) 2 exec adds the function to the dictionary. The function references the dictionary through globals. Neil -- "If elected mayor, my first act will be to kill the whole lot of you, and burn your town to cinders!" -- Groundskeeper Willie From moshez at math.huji.ac.il Wed Apr 5 08:44:10 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 08:44:10 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > Agreed. Perhaps non-native English speakers could pitch in and describe > how easy/difficult it is for them to (for example) put properly-accented > Spanish comments in code? As the only (I think?) person here who is a native right-to-left language native speaker let me put my 2cents in. I program in two places: at work, and at home. At work, we have WinNT machines, with *no Hebrew support*. We right everything in English, including internal Word documents. We figured that if 1000 programmers working on NT couldn't produce a stable system, the hacks of half-a-dozen programmers thrown in could only make it worse. At home, I have a Linux machine with no Hebrew support either -- it just didn't seem to be worth the hassle, considering that most of what I write is sent out to the world, so it needs to be in English anyway. My previous machine had some Esperanto support, and I intend to put some on my new machine. True, not many people know Esperanto, but at least its easy enough to learn. It was easy enough to write comments in Esperanto in "vim", but since I was thinking in English anyway while programming (if, while, StringIO etc.), it was more natural to write the comments in English too. The only non-English comments I've seen in sources I had to read were in French, and I won't repeat what I've said about French people then . -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From artcom0!pf at artcom-gmbh.de Wed Apr 5 08:39:56 2000 From: artcom0!pf at artcom-gmbh.de (artcom0!pf at artcom-gmbh.de) Date: Wed, 5 Apr 2000 08:39:56 +0200 (MEST) Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: <200004042332.TAA15480@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 4, 2000 7:32:23 pm" Message-ID: Hi! Guido van Rossum: > Modified Files: > FixTk.py > Log Message: > Work the Tcl version number in the path we search for. [...] > ! import sys, os, _tkinter > ! ver = str(_tkinter.TCL_VERSION) > ! v = os.path.join(sys.prefix, "tcl", "tcl"+ver) > if os.path.exists(os.path.join(v, "init.tcl")): > os.environ["TCL_LIBRARY"] = v [...] Just a wild idea: Does it make sense to have several incarnations of the shared object file _tkinter.so (or _tkinter.pyd on WinXX)? Something like _tkint83.so, _tkint82.so and so on, so that Tkinter.py can do something like the following to find a available Tcl/Tk version: for tkversion in range(83,79,-1): try: _tkinter = __import__("_tkint"+str(tkversion)) break except ImportError: pass else: raise Of course this does only make sense on platforms with shared object loading and if preparing Python binary distributions without including a particular Tcl/Tk package into the Python package. This idea might be interesting for Red Hat, SuSE Linux distribution users to allow partial system upgrades with a binary python-1.6.rpm Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From moshez at math.huji.ac.il Wed Apr 5 08:46:23 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 08:46:23 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > I wouldn't expect the division sign to be on keyboards. On the other hand, > I would expect that having to type a two-stroke sequence every once in a > while would help native English speakers appreciate what people in other > countries sometimes have to go through in order to spell their names > correctly... Not to mention what we have to do to get Americans to pronounce our name correctly. (I've learned to settle for not calling me Moshi) i18n-sucks-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Wed Apr 5 08:55:16 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 08:55:16 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <1257244899-5853377@hypernet.com> Message-ID: On Tue, 4 Apr 2000, Gordon McMillan wrote: > despite emigrating to the Americas during the Industrial > Revolution, insisted that the proper spelling of McMillan > involved elevating the "c". Wonder if there's a unicode > character for that, so I can get righteously indignant whenever > people fail to use it. Hmmmm...I think the Python ACKS file should be moved to UTF-8, and write *my* name in Hebrew letters: mem, shin, hey, space, tsadi, aleph, dalet, kuf, hey. now-i-can-get-righteously-indignant-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From billtut at microsoft.com Wed Apr 5 08:18:49 2000 From: billtut at microsoft.com (Bill Tutt) Date: Tue, 4 Apr 2000 23:18:49 -0700 Subject: [Python-Dev] _PyUnicode_New/PyUnicode_Resize Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> should be exported as part of the unicode object API. Otherwise, external C codec developers have to jump through some useless and silly hoops in order to construct a PyUnicode object. Additionally, you mentioned to Andrew that the decoders don't have to return a tuple anymore. Thats currently incorrect with whats currently in CVS: Python\codecs.c:PyCodec_Decode() current requires, but ignores the integer returned in the tuple. Should this be fixed, or must codecs return the integer as Misc\unicode.txt says? Thanks, Bill From mal at lemburg.com Wed Apr 5 11:40:56 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 11:40:56 +0200 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> Message-ID: <38EB0A28.8E8F6397@lemburg.com> Bill Tutt wrote: > > should be exported as part of the unicode object API. > > Otherwise, external C codec developers have to jump through some useless and > silly hoops in order to construct a PyUnicode object. Hmm, resize would be useful, agreed. The reason I haven't made these public is that the internal allocation logic could be changed in some future version to more elaborate and faster techniques. Having the _PyUnicode_* API private makes these changes possible without breaking external C code. E.g. say Unicode gets interned someday, then resize will need to watch out not resizing a Unicode object which is already stored in the interning dict. Perhaps a wrapper with additional checks around _PyUnicode_Resize() would be useful. Note that you don't really need _PyUnicode_New(): call PyUnicode_FromUnicode() with NULL argument and then fill in the buffer using PyUnicode_AS_UNICODE()... works just like PyString_FromStringAndSize() with NULL argument. > Additionally, you mentioned to Andrew that the decoders don't have to return > a tuple anymore. > Thats currently incorrect with whats currently in CVS: > Python\codecs.c:PyCodec_Decode() current requires, but ignores the integer > returned in the tuple. > Should this be fixed, or must codecs return the integer as Misc\unicode.txt > says? That was a misunderstanding on my part: I was thinking of the .read()/.write() methods which are now in synch with the other file objects. .read() previously returned a tuple and .write() an integer. .encode() and .decode() must return a tuple. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Wed Apr 5 12:42:37 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 12:42:37 +0200 (MEST) Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: <38EB0BD5.66048804@lemburg.com> from "M.-A. Lemburg" at "Apr 5, 2000 11:48: 5 am" Message-ID: Hi! [me]: > > From my POV (using ISO Latin-1 all the time) it would be > > "intuitive"(TM) to assume ISO Latin-1 when interpreting u'???' in a > > Python source file so that (u'???' == '???') == 1. This is what I see > > on *my* screen, whether there is a 'u' in Front of the string or not. M.-A. Lemburg: > u"???" is being interpreted as Latin-1. The problem is the > string '???' to the right: during coercion this string is > being interpreted as UTF-8 and this causes the failure. > > You could say: ok, all my strings use Latin-1, but that would > introduce other problems... esp. when you take different > modules with different encoding assumptions and try to > integrate them into an application. Okay. This wouldn't occur here but we have deal with this possibility. > > In dist/src/Misc/unicode.txt you wrote: > > > > > Note that you should provide some hint to the encoding you used to > > > write your programs as pragma line in one the first few comment lines > > > of the source file (e.g. '# source file encoding: latin-1'). [me]: > > The upcoming 1.6 documentation should probably clarify whether > > the interpreter pays attention to "pragma"s or not. > > This is otherwise misleading. > > This "pragma" is nothing more than a hint for the source code > reader to switch his viewing encoding. The interpreter doesn't > treat the file differently. In fact, Python source code is > supposed to tbe 7-bit ASCII ! Sigh. In our company we use 'german' as our master language so we have string literals containing iso-8859-1 umlauts all over the place. Okay as long as we don't mix them with Unicode objects, this doesn't hurt anybody. What I would love to see, would be a well defined way to tell the interpreter to use 'latin-1' as default encoding instead of 'UTF-8' when dealing with string literals from our modules. The tokenizer in Python 1.6 already contains smart logic to get the size of TABs right (pasting from tokenizer.c): /* Skip comment, while looking for tab-setting magic */ if (c == '#') { static char *tabforms[] = { "tab-width:", /* Emacs */ ":tabstop=", /* vim, full form */ ":ts=", /* vim, abbreviated form */ "set tabsize=", /* will vi never die? */ /* more templates can be added here to support other editors */ }; .. It wouldn't be to hard to add something there to recognize other "pragma" comments like for example: #content-transfer-encoding: iso-8859-1 But what to do with it? May be adding a default encoding to every string object? Is this bloat? Just an idea. Regards, Peter From mal at lemburg.com Wed Apr 5 13:28:58 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 13:28:58 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: Message-ID: <38EB237A.5B16575B@lemburg.com> Peter Funk wrote: > > Hi! > > [me]: > > > From my POV (using ISO Latin-1 all the time) it would be > > > "intuitive"(TM) to assume ISO Latin-1 when interpreting u'???' in a > > > Python source file so that (u'???' == '???') == 1. This is what I see > > > on *my* screen, whether there is a 'u' in Front of the string or not. > > M.-A. Lemburg: > > u"???" is being interpreted as Latin-1. The problem is the > > string '???' to the right: during coercion this string is > > being interpreted as UTF-8 and this causes the failure. > > > > You could say: ok, all my strings use Latin-1, but that would > > introduce other problems... esp. when you take different > > modules with different encoding assumptions and try to > > integrate them into an application. > > Okay. This wouldn't occur here but we have deal with this possibility. > > > > In dist/src/Misc/unicode.txt you wrote: > > > > > > > Note that you should provide some hint to the encoding you used to > > > > write your programs as pragma line in one the first few comment lines > > > > of the source file (e.g. '# source file encoding: latin-1'). > > [me]: > > > The upcoming 1.6 documentation should probably clarify whether > > > the interpreter pays attention to "pragma"s or not. > > > This is otherwise misleading. > > > > This "pragma" is nothing more than a hint for the source code > > reader to switch his viewing encoding. The interpreter doesn't > > treat the file differently. In fact, Python source code is > > supposed to tbe 7-bit ASCII ! > > Sigh. In our company we use 'german' as our master language so > we have string literals containing iso-8859-1 umlauts all over the place. > Okay as long as we don't mix them with Unicode objects, this doesn't > hurt anybody. > > What I would love to see, would be a well defined way to tell the > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > when dealing with string literals from our modules. > > The tokenizer in Python 1.6 already contains smart logic to get the > size of TABs right (pasting from tokenizer.c): > > /* Skip comment, while looking for tab-setting magic */ > if (c == '#') { > static char *tabforms[] = { > "tab-width:", /* Emacs */ > ":tabstop=", /* vim, full form */ > ":ts=", /* vim, abbreviated form */ > "set tabsize=", /* will vi never die? */ > /* more templates can be added here to support other editors */ > }; > .. > > It wouldn't be to hard to add something there to recognize > other "pragma" comments like for example: > #content-transfer-encoding: iso-8859-1 > But what to do with it? May be adding a default encoding to every string > object? Is this bloat? Just an idea. As I have already indicated above this would only solve the problem of string literals in Python source code. It would not however solve the problem with strings in general, since these can be built dynamically or from user input. The only way I can see for #pragma to work here is by auto- converting all static strings in the source code to Unicode and that would probably break more code than do good. Even worse, writing 'abc' in such a program would essentially mean the same thing as u'abc'. I'd suggest turning your Latin-1 strings into Unicode... this will hurt at first, but in the long rung, you win. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at interet.com Wed Apr 5 15:33:29 2000 From: jim at interet.com (James C. Ahlstrom) Date: Wed, 05 Apr 2000 09:33:29 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> <38EA5640.D3FC112F@interet.com> <200004042058.QAA13437@eric.cnri.reston.va.us> Message-ID: <38EB40A9.32A60EA2@interet.com> Guido van Rossum wrote: > OK, you go argue with the Tcl folks. They create a vaguely unix-like > structure under c:\Program Files\Tcl: subdirectories lib, bin, > include, and then they dump their .exe and their .dll files in the bin > directory. They also try to munge PATH to include their bin > directory, but that often doesn't work (not on Windows 95/98 anyway). That is even worse than I thought. Obviously they are incompetent in Windows. Mark's suggestion is a great one! JimA From bwarsaw at cnri.reston.va.us Wed Apr 5 15:34:39 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 5 Apr 2000 09:34:39 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case References: <1257244899-5853377@hypernet.com> Message-ID: <14571.16623.493822.231793@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Hmmmm...I think the Python ACKS file should be moved to UTF-8, MZ> and write *my* name in Hebrew letters: mem, shin, hey, space, MZ> tsadi, aleph, dalet, kuf, hey. Shouldn't that be hey kuf dalet aleph tsadi space hey shin mem? :) lamed-alef-vav-mem-shin-ly y'rs, -Barry From moshez at math.huji.ac.il Wed Apr 5 15:44:15 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 15:44:15 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14571.16623.493822.231793@anthem.cnri.reston.va.us> Message-ID: On Wed, 5 Apr 2000, Barry A. Warsaw wrote: > MZ> Hmmmm...I think the Python ACKS file should be moved to UTF-8, > MZ> and write *my* name in Hebrew letters: mem, shin, hey, space, > MZ> tsadi, aleph, dalet, kuf, hey. > > Shouldn't that be > > hey kuf dalet aleph tsadi space hey shin mem? No, just stick the unicode directional shifting characters around it. now-you-see-why-i18n-is-a-pain-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gward at cnri.reston.va.us Wed Apr 5 15:48:24 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Wed, 5 Apr 2000 09:48:24 -0400 Subject: [Python-Dev] re: division In-Reply-To: ; from ping@lfw.org on Tue, Apr 04, 2000 at 10:25:07AM -0700 References: Message-ID: <20000405094823.A11890@cnri.reston.va.us> On 04 April 2000, Ka-Ping Yee said: > On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > > but harder to track down, since you'll have to scrutinize values very > > carefully to spot the difference. Haven't done any field tests, > > though...) > > My favourite symbol for integer division is _/ > (read it as "floor-divide"). It makes visually > apparent what is going on. Gaackk! Why is this even an issue? As I recall, Pascal got it right 30 years ago: / is what you learned in grade school (1/2 = 0.5), div is what you learn in first-year undergrad CS (1/2 = 0). Either add a "div" operator or a "div()" builtin to Python and you take care of the spelling issue. (The fixing-old-code issue is another problem entirely.) I think that means I favour keeping operator.div and the __div__() method as-is, and adding operator.fdiv (?) and __fdiv__ for "floating-point" division. In other words: 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 (where I have used artistic license in applying __div__ to actual numbers -- you know what I mean). -1 on adding any non-7-bit-ASCII characters to the character set required to express Python; +0 on allowing any (alphanumeric) Unicode character in identifiers (all for Py3k). Not sure what "alphanumeric" means in Unicode, but I'm sure someone has worried about this. Greg From guido at python.org Wed Apr 5 16:04:53 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:04:53 -0400 Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: Your message of "Wed, 05 Apr 2000 08:39:56 +0200." References: Message-ID: <200004051404.KAA16039@eric.cnri.reston.va.us> > Guido van Rossum: > > Modified Files: > > FixTk.py > > Log Message: > > Work the Tcl version number in the path we search for. > [...] > > ! import sys, os, _tkinter > > ! ver = str(_tkinter.TCL_VERSION) > > ! v = os.path.join(sys.prefix, "tcl", "tcl"+ver) > > if os.path.exists(os.path.join(v, "init.tcl")): > > os.environ["TCL_LIBRARY"] = v > [...] Note that this is only used on Windows, where Python is distributed with a particular version of Tk. I decided I needed to back down from 8.3 to 8.2 (8.3 sometimes crashes on close) so I decided to make the FixTk module independent of the version. > Just a wild idea: > > Does it make sense to have several incarnations of the shared object file > _tkinter.so (or _tkinter.pyd on WinXX)? > > Something like _tkint83.so, _tkint82.so and so on, so that > Tkinter.py can do something like the following to find a > available Tcl/Tk version: > > for tkversion in range(83,79,-1): > try: > _tkinter = __import__("_tkint"+str(tkversion)) > break > except ImportError: > pass > else: > raise > > Of course this does only make sense on platforms with shared object loading > and if preparing Python binary distributions without including a > particular Tcl/Tk package into the Python package. This idea might be > interesting for Red Hat, SuSE Linux distribution users to allow partial > system upgrades with a binary python-1.6.rpm Can you tell me what problem you are trying to solve here? It makes no sense to me, but maybe I'm missing something. Typically Python is built to match the Tcl/Tk version you have installed, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 5 16:11:02 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:11:02 -0400 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize In-Reply-To: Your message of "Wed, 05 Apr 2000 11:40:56 +0200." <38EB0A28.8E8F6397@lemburg.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> <38EB0A28.8E8F6397@lemburg.com> Message-ID: <200004051411.KAA16095@eric.cnri.reston.va.us> > E.g. say Unicode gets interned someday, then resize will > need to watch out not resizing a Unicode object which is > already stored in the interning dict. Note that string objects deal with this by requiring that the reference count is 1 when a string is resized. This effectively enforces that resizes are only used when the original creator is still working on the string. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 5 16:16:15 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:16:15 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 12:42:37 +0200." References: Message-ID: <200004051416.KAA16112@eric.cnri.reston.va.us> > Sigh. In our company we use 'german' as our master language so > we have string literals containing iso-8859-1 umlauts all over the place. > Okay as long as we don't mix them with Unicode objects, this doesn't > hurt anybody. > > What I would love to see, would be a well defined way to tell the > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > when dealing with string literals from our modules. It would be better if this was supported for u"..." literals, so that it was taken care of at the source code level completely. The running program shouldn't have to worry about what encoding its source code was! For 8-bit literals, this would mean that if you had source code using Latin-1, the literals would be translated from Latin-1 to UTF-8 by the code generator. This would mean that len('?') would return 2. I'm not sure this is a great idea -- but then I'm not sure that using Latin-1 in source code is a great idea either. > The tokenizer in Python 1.6 already contains smart logic to get the > size of TABs right (pasting from tokenizer.c): > > /* Skip comment, while looking for tab-setting magic */ > if (c == '#') { > static char *tabforms[] = { > "tab-width:", /* Emacs */ > ":tabstop=", /* vim, full form */ > ":ts=", /* vim, abbreviated form */ > "set tabsize=", /* will vi never die? */ > /* more templates can be added here to support other editors */ > }; > .. > > It wouldn't be to hard to add something there to recognize > other "pragma" comments like for example: > #content-transfer-encoding: iso-8859-1 > But what to do with it? May be adding a default encoding to every string > object? Is this bloat? Just an idea. Before we go any further we should design pragmas. The current approach is inefficient and only designed to accommodate editor-specific magical commands. I say it's a Python 1.7 issue. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Wed Apr 5 16:08:53 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 16:08:53 +0200 (IST) Subject: [Python-Dev] re: division In-Reply-To: <20000405094823.A11890@cnri.reston.va.us> Message-ID: On Wed, 5 Apr 2000, Greg Ward wrote: > Gaackk! Why is this even an issue? As I recall, Pascal got it right 30 > years ago: / is what you learned in grade school (1/2 = 0.5) Greg, here's an easy way for you to make money: sue your grade school . I learned that 1/2 is 1/2. Rationals are a much more natural entities then decimals (just think 1/3). FWIW, I think Python should support Rationals, and have integer division return a rational. I'm still working on the details of my great Python numeric tower change. > Not sure what "alphanumeric" > means in Unicode, but I'm sure someone has worried about this. I think Unicode has a clear definition of a letter and a number. How do you feel about letting arbitrary Unicode whitespace into Python? (Other then the indentation of non-empty lines ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer at tismer.com Wed Apr 5 16:29:03 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 05 Apr 2000 16:29:03 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? Message-ID: <38EB4DAE.2F538F9F@tismer.com> Hi, while fixing my design flaws after Just's Stackless Mac port, I was dealing with some overflow conditions and tracebacks. When there is a recursion depth overflow condition, we create a lot of new structure for the tracebacks. This usually happens in a situation where memory is quite exhausted. Even worse if we crash because of a memory error: The system will not have enough memory to build the traceback structure, to report the error. Puh :-) When I look into tracebacks, it turns out to be just a chain like the frame chain, but upward down. It holds references to the frames in a 1-to-1 manner, and it keeps copies of f->f_lasti and f->f_lineno. I don't see why this is needed. I'm thinking to replace the tracebacks by a single pointer in the frames for this purpose. It appears further to be possible to do that without any extra memory, since all the frames have extra temporary fields for exception info, and that isn't used in this context. Traceback objects exist each for one and only one frame, and they could be embedded into their frame. Does this make sense? Do I miss something? I'm considering this for Stackless and would like to know if I should prepare it for orthodox Python as well? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Wed Apr 5 16:32:05 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:32:05 -0400 Subject: [Python-Dev] re: division In-Reply-To: Your message of "Wed, 05 Apr 2000 16:08:53 +0200." References: Message-ID: <200004051432.KAA16210@eric.cnri.reston.va.us> > FWIW, I think Python should support Rationals, and have integer division > return a rational. I'm still working on the details of my great Python > numeric tower change. Forget it. ABC did this, and the problem is that where you *think* you are doing something simple like calculating interest rates, you are actually manipulating rational numbers with 1000s of digits in their numerator and denumerator. If you want to change it, consider emulating what kids currently use in school: a decimal floating point calculator with N digits of precision. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 5 16:33:18 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:33:18 -0400 Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: Your message of "Wed, 05 Apr 2000 16:29:03 +0200." <38EB4DAE.2F538F9F@tismer.com> References: <38EB4DAE.2F538F9F@tismer.com> Message-ID: <200004051433.KAA16229@eric.cnri.reston.va.us> > When I look into tracebacks, it turns out to be just a chain > like the frame chain, but upward down. It holds references > to the frames in a 1-to-1 manner, and it keeps copies of > f->f_lasti and f->f_lineno. I don't see why this is needed. > > I'm thinking to replace the tracebacks by a single pointer > in the frames for this purpose. It appears further to be > possible to do that without any extra memory, since all the > frames have extra temporary fields for exception info, and > that isn't used in this context. Traceback objects exist > each for one and only one frame, and they could be embedded > into their frame. > > Does this make sense? Do I miss something? Yes. It is quite possible to have multiple stack traces lingering around that all point to the same stack frames. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Apr 5 17:04:31 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 17:04:31 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> Message-ID: <38EB55FF.C900CF8A@lemburg.com> Guido van Rossum wrote: > > > Sigh. In our company we use 'german' as our master language so > > we have string literals containing iso-8859-1 umlauts all over the place. > > Okay as long as we don't mix them with Unicode objects, this doesn't > > hurt anybody. > > > > What I would love to see, would be a well defined way to tell the > > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > > when dealing with string literals from our modules. > > It would be better if this was supported for u"..." literals, so that > it was taken care of at the source code level completely. The running > program shouldn't have to worry about what encoding its source code > was! u"..." currently interprets the characters it finds as Latin-1 (this is by design, since the first 256 Unicode ordinals map to the Latin-1 characters). > For 8-bit literals, this would mean that if you had source code using > Latin-1, the literals would be translated from Latin-1 to UTF-8 by the > code generator. This would mean that len('?') would return 2. I'm > not sure this is a great idea -- but then I'm not sure that using > Latin-1 in source code is a great idea either. > > > The tokenizer in Python 1.6 already contains smart logic to get the > > size of TABs right (pasting from tokenizer.c): ... > > Before we go any further we should design pragmas. The current > approach is inefficient and only designed to accommodate > editor-specific magical commands. > > I say it's a Python 1.7 issue. Good idea :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Wed Apr 5 17:01:24 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 05 Apr 2000 17:01:24 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? References: <38EB4DAE.2F538F9F@tismer.com> <200004051433.KAA16229@eric.cnri.reston.va.us> Message-ID: <38EB5544.5D428C01@tismer.com> Guido van Rossum wrote: [me, about embedding tracebacks into frames] > > Does this make sense? Do I miss something? > > Yes. It is quite possible to have multiple stack traces lingering > around that all point to the same stack frames. Oh, I see. This is a Standard Python specific thing, which I was about to forget. In my version, this can happen, too, unless you are in a continuation-protected context already. There (and that was what I looked at while debugging), this situation can never happen, since an exception creates continuation-copies of all the frames while it crawls up. Since the traceback causes refcount increase, all the frames protect themselves. Thank you. I see it is a stackless feature. I can implement it if I put protection into the core, not just the co-extension. Frames can carry the tracebacks under the condition that they are protected (copied) if the traceback fields are occupied. Great, since this is a rare condition. Thanks again for the enlightment - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From pf at artcom-gmbh.de Wed Apr 5 17:08:35 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 17:08:35 +0200 (MEST) Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: <200004051404.KAA16039@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 5, 2000 10: 4:53 am" Message-ID: Hi! [me]: [...] > > particular Tcl/Tk package into the Python package. This idea might be > > interesting for Red Hat, SuSE Linux distribution users to allow partial > > system upgrades with a binary python-1.6.rpm > > Can you tell me what problem you are trying to solve here? It makes > no sense to me, but maybe I'm missing something. Typically Python is > built to match the Tcl/Tk version you have installed, right? If you build from source this is true. But the Linux world is now different: The two major Linux distributions (RedHat, SuSE) both use the RPM format to distribute precompiled binary packages. Tcl/Tk usually lives in a separate package. (BTW.: SuSE in their perverse mood has splitted Python 1.5.2 itself into more than half a dozen separate packages, but that's another story). If someone wants to prebuild a Python 1.6 binary RPM for installation on any RPM based Linux system it is unknown, which version of Tcl/Tk is installed on the destination system. So either you can build a monster RPM, which includes the Tcl/Tk shared libs or use the RPM Spec file to force the user to install a specific version of Tcl/Tk (for example 8.2.3) or implement something like I suggested above. Of course this places a lot of burden on the RPM builder: he has to install at least all the four major versions of Tcl/Tk (8.0 - 8.3) on his machine and has to build _tkinter four times against each particular shared library and header files... but this would be possible. Currently the situation with SuSE Python 1.5.2 RPMs is even more dangerous, since the SPEC files used by SuSE simply contains the following 'Requires'-definitions: %package -n pyth_tk Requires: python tk tix blt This makes RPM believe that *any* version of Tcl/Tk would fit. Luckily SuSE 6.4 (released last week) still ships with the old Tcl/Tk 8.0.5, so this will not break until SuSE decides to upgrade their Tcl/Tk. But I guess that Red Hat comes with a newer version of Tcl/Tk. Hopefully they have got their SPEC file right (they invented RPM in the first place) RPM can be a really powerful tool protecting people from breaking their system with binary updates --- if used the right way... :-( May be I should go ahead and write a RPM Python.SPEC file? Would that have a chance to get included into src/Misc? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido at python.org Wed Apr 5 17:25:38 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 11:25:38 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 17:04:31 +0200." <38EB55FF.C900CF8A@lemburg.com> References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> Message-ID: <200004051525.LAA16345@eric.cnri.reston.va.us> > u"..." currently interprets the characters it finds as Latin-1 > (this is by design, since the first 256 Unicode ordinals map to > the Latin-1 characters). Nice, except that now we seem to be ambiguous about the source character encoding: it's Latin-1 for Unicode strings and UTF-8 for 8-bit strings...! --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Wed Apr 5 17:54:12 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 17:54:12 +0200 (MEST) Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: <200004051525.LAA16345@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 5, 2000 11:25:38 am" Message-ID: Guido van Rossum: > > u"..." currently interprets the characters it finds as Latin-1 > > (this is by design, since the first 256 Unicode ordinals map to > > the Latin-1 characters). > > Nice, except that now we seem to be ambiguous about the source > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > 8-bit strings...! This is a little bit difficult to understand and will make the task to write the upcoming 1.6 documentation even more challenging. ;-) But I agree: Changing this should go into 1.7 BTW: Our umlaut strings are sooner or later passed through one central function. All modules usually contain something like this: try: import fintl _ = fintl.gettext execpt ImportError: def _(msg): return msg ... MenuEntry(_("?ffnen"), self.open), MenuEntry(_("Schlie?en"), self.close) .... you get the picture. It would be easy to change the implementation of 'fintl.gettext' to coerce the resulting strings into Unicode or do whatever is required. But we currently use GNU gettext to produce the messages files that are translated into english, french and italian. AFAIK GNU gettext handles only 8 bit strings anyway. Our customers in far east currently live with the english version but this has merely financial than technical reasons. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido at python.org Wed Apr 5 20:01:29 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 14:01:29 -0400 Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: Your message of "Wed, 05 Apr 2000 17:08:35 +0200." References: Message-ID: <200004051801.OAA16736@eric.cnri.reston.va.us> > RPM can be a really powerful tool protecting people from breaking their > system with binary updates --- if used the right way... :-( > > May be I should go ahead and write a RPM Python.SPEC file? > Would that have a chance to get included into src/Misc? I'd say yes! But check with Oliver Andrich first, who's maintaining Python RPMs already. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Apr 5 20:32:26 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 20:32:26 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> Message-ID: <38EB86BA.5225C381@lemburg.com> Guido van Rossum wrote: > > > u"..." currently interprets the characters it finds as Latin-1 > > (this is by design, since the first 256 Unicode ordinals map to > > the Latin-1 characters). > > Nice, except that now we seem to be ambiguous about the source > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > 8-bit strings...! Noo... there is no definition for non-ASCII 8-bit strings in Python source code using the ordinal range 127-255. If you were to define Latin-1 as source code encoding, then we would have to change auto-coercion to make a Latin-1 assumption instead, but... I see the picture: people are getting pretty confused about what is going on. If you write u"xyz" then the ordinals of those characters are taken and stored directly as Unicode characters. If you live in a Latin-1 world, then you happen to be lucky: the Unicode characters match your input. If not, some totally different characters are likely to show if the string were written to a file and displayed using a Unicode aware editor. The same will happen to your normal 8-bit string literals. Nothing unusual so far... if you use Latin-1 strings and write them to a file, you get Latin-1. If you happen to program on DOS, you'll get the DOS ANSI encoding for the German umlauts. Now the key point where all this started was that u'?' in '???' will raise an error due to '???' being *interpreted* as UTF-8 -- this doesn't mean that '???' will be interpreted as UTF-8 elsewhere in your application. The UTF-8 assumption had to be made in order to get the two worlds to interoperate. We could have just as well chosen Latin-1, but then people currently using say a Russian encoding would get upset for the same reason. One way or another somebody is not going to like whatever we choose, I'm afraid... the simplest solution is to use Unicode for all strings which contain non-ASCII characters and then call .encode() as necessary. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Wed Apr 5 23:39:49 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 5 Apr 2000 23:39:49 +0200 Subject: [Python-Dev] Re: unicode: strange exception Message-ID: <000f01bf9f47$7ea37840$34aab5d4@hagrid> >>> None in "abc" Traceback (most recent call last): File "", line 1, in ? TypeError: coercing to Unicode: need string or charbuffer now that's an interesting error message. I think the old one was better ;-) From effbot at telia.com Wed Apr 5 23:38:10 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 5 Apr 2000 23:38:10 +0200 Subject: [Python-Dev] Re: unicode: strange exception (part 2) Message-ID: <000e01bf9f47$7e47eac0$34aab5d4@hagrid> I wrote: > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (innermost last): > File "", line 1, in ? > TypeError: expected a character buffer object with the latest version, I get: >>> "!" in ("a", None) 0 >>> u"!" in ("a", None) Traceback (most recent call last): File "", line 1, in ? TypeError: coercing to Unicode: need string or charbuffer is this really an improvement? looks like writing code that works with any kind of strings will be harder than I thought... From guido at python.org Wed Apr 5 23:46:47 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 17:46:47 -0400 Subject: [Python-Dev] Re: unicode: strange exception (part 2) In-Reply-To: Your message of "Wed, 05 Apr 2000 23:38:10 +0200." <000e01bf9f47$7e47eac0$34aab5d4@hagrid> References: <000e01bf9f47$7e47eac0$34aab5d4@hagrid> Message-ID: <200004052146.RAA22187@eric.cnri.reston.va.us> > with the latest version, I get: > > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: coercing to Unicode: need string or charbuffer > > is this really an improvement? > > looks like writing code that works with any kind of strings > will be harder than I thought... Are you totally up-to-date? I get >>> u"!" in ("a", None) 0 >>> --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Apr 6 00:37:24 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 18:37:24 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 20:32:26 +0200." <38EB86BA.5225C381@lemburg.com> References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> Message-ID: <200004052237.SAA22215@eric.cnri.reston.va.us> [MAL] > > > u"..." currently interprets the characters it finds as Latin-1 > > > (this is by design, since the first 256 Unicode ordinals map to > > > the Latin-1 characters). [GvR] > > Nice, except that now we seem to be ambiguous about the source > > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > > 8-bit strings...! [MAL] > Noo... there is no definition for non-ASCII 8-bit strings in > Python source code using the ordinal range 127-255. If you were > to define Latin-1 as source code encoding, then we would have > to change auto-coercion to make a Latin-1 assumption instead, but... > I see the picture: people are getting pretty confused about what > is going on. > > If you write u"xyz" then the ordinals of those characters are > taken and stored directly as Unicode characters. If you live > in a Latin-1 world, then you happen to be lucky: the Unicode > characters match your input. If not, some totally different > characters are likely to show if the string were written > to a file and displayed using a Unicode aware editor. > > The same will happen to your normal 8-bit string literals. > Nothing unusual so far... if you use Latin-1 strings and > write them to a file, you get Latin-1. If you happen to > program on DOS, you'll get the DOS ANSI encoding for the > German umlauts. > > Now the key point where all this started was that > u'?' in '???' will raise an error due to '???' being > *interpreted* as UTF-8 -- this doesn't mean that '???' > will be interpreted as UTF-8 elsewhere in your application. > > The UTF-8 assumption had to be made in order to get the two > worlds to interoperate. We could have just as well chosen > Latin-1, but then people currently using say a Russian > encoding would get upset for the same reason. > > One way or another somebody is not going to like whatever > we choose, I'm afraid... the simplest solution is to use > Unicode for all strings which contain non-ASCII characters > and then call .encode() as necessary. I have a different view on this (except that I agree that it's pretty confusing :-). In my definition of a "source character encoding", string literals, whether Unicode or 8-bit strings, are translated from the source encoding to the corresponding run-time values. If I had a C compiler that read its source in EBCDIC but cross-compiled to a machine that used ASCII, I would expect that 'a' in the source would have the integer value 97 (ASCII 'a'), regardless of the EBCDIC value for 'a'. If I type a non-ASCII Latin-1 character in a Unicode literal, it generates the corresponding Unicode character. This means to me that the source character encoding is Latin-1. But when I type the same character in an 8-bit character literal, that literal is interpreted as UTF-8 (e.g. when converting to Unicode using the default conversions). Thus, even though you can do whatever you want with 8-bit literals in your program, the most defensible view is that they are UTF-8 encoded. I would be much happier if all source code was encoded in the same encoding, because otherwise there's no good way to view such code in a general Unicode-aware text viewer! My preference would be to always use UTF-8. This would mean no change for 8-bit literals, but a big change for Unicode literals... And a break with everyone who's currently typing Latin-1 source code and using strings as Latin-1. (Or Latin-7, or whatever.) My next preference would be a pragma to define the source encoding, but that's a 1.7 issue. Maybe the whole thing is... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Thu Apr 6 00:51:51 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 06 Apr 2000 00:51:51 +0200 Subject: [Python-Dev] Re: unicode: strange exception References: <000f01bf9f47$7ea37840$34aab5d4@hagrid> Message-ID: <38EBC387.FAB08D61@lemburg.com> Fredrik Lundh wrote: > > >>> None in "abc" > Traceback (most recent call last): > File "", line 1, in ? > TypeError: coercing to Unicode: need string or charbuffer > > now that's an interesting error message. I think the old one > was better ;-) How come you're always faster on this than I am with my patches ;-) The above is already fixed in my local version (together with some other minor stuff I found in the codec error handling) with the next patch set. It will then again produce this output: >>> None in "abc" Traceback (most recent call last): File "", line 1, in ? TypeError: string member test needs char left operand BTW, my little "don't use tabs use spaces" in C code extravaganza was a complete nightmare... diff just doesn't like it and the Python source code is full of places where tabs and spaces are mixed in many different ways... I'm back to tabs-indent-mode again :-/ -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Thu Apr 6 02:19:30 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 6 Apr 2000 10:19:30 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040525.BAA11585@eric.cnri.reston.va.us> Message-ID: > I just downloaded and installed it. I've never seen an > installer like > this -- they definitely put a lot of effort in it. hehe - guess who "encouraged" that :-) > Annoying nit: they > tell you to install "MS Windows Installer" first that should be a good clue :-) > and of > course, being > a MS tool, it requires a reboot. :-( Actually, MSI is very cool. Now MSI is installed, most future MSI installs should proceed without reboot. In Win2k it is finally close to perfect. I dont think an installer has ever wanted to reboot my PC since Win2k. > Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So > there. It also didn't change PATH for me, even though the docs > mention that it does -- maybe only on NT? In another mail you asked David to look into how Active State handle their DLLs. Well, Trent Mick started the ball rolling... The answer is that Perl extensions never import data from the core DLL. They always import functions. In many cases, they can hide this fact with the pre-processor. In the Python world, this qould be equivilent to never accessing Py_None directly - always via a "PyGetNone()" type function. As mentioned, this could possibly be hidden so that code still uses "Py_None". One advantage they mentioned a number of times is avoiding dependencies on differing Perl versions. By avoiding the import of data, they have far more possibilities, including the use of LoadLibrary(), and a new VC6 linker feature called "delay loading". To my mind, it would be quite difficult to make this work for Python. There are a a large number of data items we import, and adding a function call indirection to each one sounds a pain. [As a semi-related issue: This "delay loading" feature is very cool - basically, the EXE loader will not resolve external DLL references until actually used. This is the same trick mentioned on comp.lang.python, where they saw _huge_ startup increases (although the tool used there was a third-party tool). The thread in question on c.l.py resolved that, for some reason, the initialization of the Windows winsock library was taking many seconds on that particular PC. Guido - are you on VC6 yet? If so, I could look into this linker option, and see how it improves startup performance on Windows. Note - this feature only works if no data is imported - hence, we could use it in Python16.dll, as most of its imports are indeed functions. Python extension modules can not use it against Python16 itself as they import data.] Mark. From tim_one at email.msn.com Thu Apr 6 05:10:39 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 23:10:39 -0400 Subject: [Python-Dev] re: division In-Reply-To: <20000405094823.A11890@cnri.reston.va.us> Message-ID: <000401bf9f75$afb18520$ab2d153f@tim> [Greg Ward] > ... > In other words: > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 > > (where I have used artistic license in applying __div__ to actual > numbers -- you know what I mean). +1 from me provided you can sneak the new keyword past Guido <1/3 wink>. From tim_one at email.msn.com Thu Apr 6 05:10:35 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 23:10:35 -0400 Subject: [Python-Dev] re: division In-Reply-To: <200004051432.KAA16210@eric.cnri.reston.va.us> Message-ID: <000301bf9f75$ada190e0$ab2d153f@tim> [Moshe] > FWIW, I think Python should support Rationals, and have integer division > return a rational. I'm still working on the details of my great Python > numeric tower change. [Guido] > Forget it. ABC did this, and the problem is that where you *think* > you are doing something simple like calculating interest rates, you > are actually manipulating rational numbers with 1000s of digits in > their numerator and denumerator. Let's not be too hasty about this, cuz I doubt we'll get to change it twice . You (Guido) & I agreed that ABC's rationals didn't work out well way back when, but a) That has not been my experience in other languages -- ABC was unique. b) Presumably ABC did usability studies that concluded rationals were least surprising. c) TeachScheme! seems delighted with their use of rationals (indeed, one of TeachScheme!'s primary authors beat up on me in email for Python not doing this). d) I'd much rather saddle newbies with time & space surprises than correctness surprises. Last week I took some time to stare at the ABC manual again, & suspect I hit on the cause: ABC was *aggressively* rational. That is, ABC had no notation for floating point (ABC "approximate") literals; even 6.02e23 was taken to mean "exact rational". In my experience ABC was unique this way, and uniquely surprising for it: it's hard to be surprised by 2/3 returning a rational, but hard not to be surprised by 6.02e23/1.0001e-18 doing so. Give it some thought. > If you want to change it, consider emulating what kids currently use > in school: a decimal floating point calculator with N digits of > precision. This is what REXX does, and is very powerful even for experts (assuming the user can, as in REXX, specify N; but that means writing a whole slew of arbitrary-precision math libraries too -- btw, that is doable! e.g., I worked w/ Dave Gillespie on some of the algorithms for his amazing Emacs calc). It will run at best 10x slower than native fp of comparable precision, though, so experts will hate it in the cases they don't love it <0.5 wink>. one-case-where-one-size-doesn't-fit-anyone-ly y'rs - tim From petrilli at amber.org Thu Apr 6 05:16:28 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Wed, 5 Apr 2000 23:16:28 -0400 Subject: [Python-Dev] re: division In-Reply-To: <000401bf9f75$afb18520$ab2d153f@tim>; from tim_one@email.msn.com on Wed, Apr 05, 2000 at 11:10:39PM -0400 References: <20000405094823.A11890@cnri.reston.va.us> <000401bf9f75$afb18520$ab2d153f@tim> Message-ID: <20000405231628.A24968@trump.amber.org> Tim Peters [tim_one at email.msn.com] wrote: > [Greg Ward] > > ... > > In other words: > > > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 > > > > (where I have used artistic license in applying __div__ to actual > > numbers -- you know what I mean). > > +1 from me provided you can sneak the new keyword past Guido <1/3 wink>. +1 from me as well. I spent a little time going through all my code, and looking through Zope as well, and I couldn't find any place I used 'div' as a variable, much less any place I depended on this behaviour, so I don't think my code would break in any odd ways. The only thing I can imagine is some printed text formatting issues. Chris -- | Christopher Petrilli | petrilli at amber.org From moshez at math.huji.ac.il Thu Apr 6 08:30:44 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 6 Apr 2000 08:30:44 +0200 (IST) Subject: [Python-Dev] re: division In-Reply-To: <000301bf9f75$ada190e0$ab2d153f@tim> Message-ID: On Wed, 5 Apr 2000, Tim Peters wrote: > Last week I took some time to stare at the ABC manual again, & suspect I hit > on the cause: ABC was *aggressively* rational. That is, ABC had no > notation for floating point (ABC "approximate") literals; even 6.02e23 was > taken to mean "exact rational". In my experience ABC was unique this way, > and uniquely surprising for it: it's hard to be surprised by 2/3 returning > a rational, but hard not to be surprised by 6.02e23/1.0001e-18 doing so. Ouch. There is definitely place for floats in the numeric tower. It's just that those shouldn't be reached accidentally <0.3 wink> > one-case-where-one-size-doesn't-fit-anyone-ly y'rs - tim but-in-this-case-two-sizes-do-seem-enough-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Thu Apr 6 10:50:47 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 06 Apr 2000 10:50:47 +0200 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> <38EB0A28.8E8F6397@lemburg.com> <200004051411.KAA16095@eric.cnri.reston.va.us> Message-ID: <38EC4FE7.94F862D7@lemburg.com> Guido van Rossum wrote: > > > E.g. say Unicode gets interned someday, then resize will > > need to watch out not resizing a Unicode object which is > > already stored in the interning dict. > > Note that string objects deal with this by requiring that the > reference count is 1 when a string is resized. This effectively > enforces that resizes are only used when the original creator is still > working on the string. Nice trick ;-) The new PyUnicode_Resize() will have the same interface as _PyString_Resize() since this seems to be the most flexible way to implement it without giving away possibilities for future optimizations... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gvwilson at nevex.com Thu Apr 6 13:31:26 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 6 Apr 2000 07:31:26 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <20000405231628.A24968@trump.amber.org> Message-ID: > > [Greg Ward] > > > In other words: > > > > > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > > > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 +1. Should 'mod' be made a synonym for '%' for symmetry's sake? Greg From guido at python.org Thu Apr 6 15:33:51 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:33:51 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Thu, 06 Apr 2000 10:19:30 +1000." References: Message-ID: <200004061333.JAA23880@eric.cnri.reston.va.us> > > Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So > > there. It also didn't change PATH for me, even though the docs > > mention that it does -- maybe only on NT? > > In another mail you asked David to look into how Active State handle > their DLLs. Well, Trent Mick started the ball rolling... > > The answer is that Perl extensions never import data from the core > DLL. They always import functions. In many cases, they can hide > this fact with the pre-processor. This doesn't answer my question. My question is how they support COM without having a DLL in the system directory. Or at least I don't understand how not importing data makes a difference. > By avoiding the import of data, they have far more possibilities, > including the use of LoadLibrary(), For what do they use LoadLibrary()? What is it? We use LoadLibraryEx() -- isn't that just as good? > and a new VC6 linker feature called "delay loading". > To my mind, it would be quite difficult to make this work for > Python. There are a a large number of data items we import, and > adding a function call indirection to each one sounds a pain. Agreed. > [As a semi-related issue: This "delay loading" feature is very > cool - basically, the EXE loader will not resolve external DLL > references until actually used. This is the same trick mentioned on > comp.lang.python, where they saw _huge_ startup increases (although > the tool used there was a third-party tool). The thread in question > on c.l.py resolved that, for some reason, the initialization of the > Windows winsock library was taking many seconds on that particular > PC. > > Guido - are you on VC6 yet? Yes -- I promised myself I'd start using VC6 for the 1.6 release cycle, and I did. > If so, I could look into this linker > option, and see how it improves startup performance on Windows. > Note - this feature only works if no data is imported - hence, we > could use it in Python16.dll, as most of its imports are indeed > functions. Python extension modules can not use it against Python16 > itself as they import data.] But what DLLs does python16 use that could conceivably be delay-loaded? Note that I have a feeling that there are a few standard extensions that should become separate PYDs -- e.g. socket (for the above reason) and unicodedata. This would greatly reduce the size of python16.dll. Since this way we manage our own DLL loading anyway, what's the point of delay-loading? --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Thu Apr 6 15:43:00 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 6 Apr 2000 15:43:00 +0200 (CEST) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <200004041957.PAA13168@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 04, 2000 03:57:47 PM Message-ID: <200004061343.PAA20218@python.inrialpes.fr> [Guido] > > If it ain't broken, don't "fix" it. > This also explains why socket.connect() generated so much resistance... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond at skippinet.com.au Thu Apr 6 15:53:10 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 6 Apr 2000 23:53:10 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004061333.JAA23880@eric.cnri.reston.va.us> Message-ID: > > The answer is that Perl extensions never import data > from the core > > DLL. They always import functions. In many cases, > they can hide > > this fact with the pre-processor. > > This doesn't answer my question. My question is how they > support COM > without having a DLL in the system directory. Or at least I don't > understand how not importing data makes a difference. By not using data, they can use either "delay load", or fully dynamic loading. Fully dynamic loading really just involves getting every API function via GetProcAddress() rather than having implicit linking via external references. GetProcAddress() can retrieve data items, but only their address, leaving us still in a position where "Py_None" doesnt work without magic. Delay Loading involves not loading the DLL until the first reference is used. This also lets you define code that locates the DLL to be used. This code is special in a "DllMain" kinda way, but does allow runtime binding to a statically linked DLL. However, it still has the "no data" limitation. > But what DLLs does python16 use that could conceivably be > delay-loaded? > > Note that I have a feeling that there are a few standard > extensions > that should become separate PYDs -- e.g. socket (for the > above reason) > and unicodedata. This would greatly reduce the size of > python16.dll. Agreed - these were my motivation. If these are moving to external modules then I am happy. I may have a quick look for other preloaded DLLs we can avoid - worth a look for the sake of a linker option :-) Mark. From guido at python.org Thu Apr 6 15:52:47 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:52:47 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Thu, 06 Apr 2000 15:43:00 +0200." <200004061343.PAA20218@python.inrialpes.fr> References: <200004061343.PAA20218@python.inrialpes.fr> Message-ID: <200004061352.JAA24034@eric.cnri.reston.va.us> [GvR] > > If it ain't broken, don't "fix" it. [VM] > This also explains why socket.connect() generated so much resistance... Yes -- people are naturally conservative. I am too, myself, so I should have known... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Thu Apr 6 15:51:41 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 6 Apr 2000 15:51:41 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004051433.KAA16229@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 05, 2000 10:33:18 AM Message-ID: <200004061351.PAA20261@python.inrialpes.fr> [Christian] > > When I look into tracebacks, it turns out to be just a chain > > like the frame chain, but upward down. It holds references > > to the frames in a 1-to-1 manner, and it keeps copies of > > f->f_lasti and f->f_lineno. I don't see why this is needed. > > ... > > Does this make sense? Do I miss something? > [Guido] > Yes. It is quite possible to have multiple stack traces lingering > around that all point to the same stack frames. This reminds me that some time ago I made an experimental patch for removing SET_LINENO. There was the problem of generating callbacks for pdb (which I think I solved somehow but I don't remember the details). I do remember that I had to look at pdb again for some reason. Is there any interest in reviving this idea? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From guido at python.org Thu Apr 6 15:57:27 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:57:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Thu, 06 Apr 2000 23:53:10 +1000." References: Message-ID: <200004061357.JAA24071@eric.cnri.reston.va.us> > > > The answer is that Perl extensions never import data from the core > > > DLL. They always import functions. In many cases, they can hide > > > this fact with the pre-processor. > > > > This doesn't answer my question. My question is how they support COM > > without having a DLL in the system directory. Or at least I don't > > understand how not importing data makes a difference. > > By not using data, they can use either "delay load", or fully > dynamic loading. > > Fully dynamic loading really just involves getting every API > function via GetProcAddress() rather than having implicit linking > via external references. GetProcAddress() can retrieve data items, > but only their address, leaving us still in a position where > "Py_None" doesnt work without magic. Actually, Py_None is just a macro that expands to the address of some data -- isn't that exactly what we need? > Delay Loading involves not loading the DLL until the first reference > is used. This also lets you define code that locates the DLL to be > used. This code is special in a "DllMain" kinda way, but does allow > runtime binding to a statically linked DLL. However, it still has > the "no data" limitation. > > > But what DLLs does python16 use that could conceivably be > > delay-loaded? > > > > Note that I have a feeling that there are a few standard > > extensions > > that should become separate PYDs -- e.g. socket (for the > > above reason) > > and unicodedata. This would greatly reduce the size of > > python16.dll. > > Agreed - these were my motivation. If these are moving to external > modules then I am happy. I may have a quick look for other > preloaded DLLs we can avoid - worth a look for the sake of a linker > option :-) OK, I'll look into moving socket and unicodedata out of python16.dll. But, I still don't understand why Perl/COM doesn't need a DLL in the system directory. Or is it just because they change PATH? (I don't know zit about COM, so that may be it. I understand that a COM object is registered (in the registry) as an entry point of a DLL. Couldn't that DLL be specified by absolute pathname??? Then no search path would be necessary.) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Thu Apr 6 16:07:38 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 00:07:38 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004061357.JAA24071@eric.cnri.reston.va.us> Message-ID: > But, I still don't understand why Perl/COM doesn't need a > DLL in the > system directory. Or is it just because they change PATH? > > (I don't know zit about COM, so that may be it. I > understand that a > COM object is registered (in the registry) as an entry point of a > DLL. Couldn't that DLL be specified by absolute > pathname??? Then no > search path would be necessary.) Yes - but it all gets back to the exact same problem that got us here in the first place: * COM object points to \Python1.6\PythonCOM16.dll * PythonCOM16.dll has link-time reference to Python16.dll * As COM just uses LoadLibrary(), the path of PythonCOM16.dll is not used to resolve its references - only the path of the host .EXE, the system path, etc. End result is Python16.dll is not found, even though it is in the same directory. So, if you have the opportunity to intercept the link-time reference to a DLL (or, obviously, use LoadLibrary()/GetProcAddress() to reference the DLL), you can avoid override the search path. Thus, if PythonCOM16.dll could intercept its references to Python16.dll, it could locate the correct Python16.dll with runtime code. However, as we import data from Python16.dll rather then purely addresses, we can't use any of these interception solutions. If we could hide all data references behind macros, then we could possibly arrange it. Perl _does_ use such techniques, so can arrange for the runtime type resolution. (Its not clear if Perl uses "dynamic loading" via GetProcAddress(), or delayed loading via the new VC6 feature - I believe the former, but the relevant point is that they definately hide data references behind magic...) Mark. From skip at mojam.com Thu Apr 6 15:08:14 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 6 Apr 2000 08:08:14 -0500 (CDT) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004061351.PAA20261@python.inrialpes.fr> References: <200004051433.KAA16229@eric.cnri.reston.va.us> <200004061351.PAA20261@python.inrialpes.fr> Message-ID: <14572.35902.781258.448592@beluga.mojam.com> Vladimir> This reminds me that some time ago I made an experimental Vladimir> patch for removing SET_LINENO. There was the problem of Vladimir> generating callbacks for pdb (which I think I solved somehow Vladimir> but I don't remember the details). I do remember that I had to Vladimir> look at pdb again for some reason. Is there any interest in Vladimir> reviving this idea? I believe you can get line number information from a code object's co_lnotab attribute, though I don't know the format. I think this should be sufficient to allow SET_LINENO to be eliminated altogether. It's just that there are places in various modules that predate the appearance of co_lnotab. Whoops, wait a minute. I just tried >>> def foo(): pass ... >>> foo.func_code.co_lnotab with both "python" and "python -O". co_lnotab is empty for python -O. I thought it was supposed to always be generated? -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From tismer at tismer.com Thu Apr 6 17:09:51 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 17:09:51 +0200 Subject: [Python-Dev] Re: unicode: strange exception References: <000f01bf9f47$7ea37840$34aab5d4@hagrid> <38EBC387.FAB08D61@lemburg.com> Message-ID: <38ECA8BF.5C47F700@tismer.com> "M.-A. Lemburg" wrote: > BTW, my little "don't use tabs use spaces" in C code extravaganza > was a complete nightmare... diff just doesn't like it and the > Python source code is full of places where tabs and spaces > are mixed in many different ways... I'm back to tabs-indent-mode > again :-/ Isn't this ignorable with the diff -b switch? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake at acm.org Thu Apr 6 17:12:11 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 6 Apr 2000 11:12:11 -0400 (EDT) Subject: [Python-Dev] Unicode documentation Message-ID: <14572.43339.472062.364098@seahag.cnri.reston.va.us> I've added Marc-Andre's documentation updates for Unicode to the Python CVS repository; I don't think I've done any damage. Marc-Andre, please review and let me know if I've missed anything! Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tismer at tismer.com Thu Apr 6 17:16:16 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 17:16:16 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? References: <200004061351.PAA20261@python.inrialpes.fr> Message-ID: <38ECAA40.456F9919@tismer.com> Vladimir Marangozov wrote: > > [Christian] > > > When I look into tracebacks, it turns out to be just a chain > > > like the frame chain, but upward down. It holds references > > > to the frames in a 1-to-1 manner, and it keeps copies of > > > f->f_lasti and f->f_lineno. I don't see why this is needed. > > > ... > > > Does this make sense? Do I miss something? > > > > [Guido] > > Yes. It is quite possible to have multiple stack traces lingering > > around that all point to the same stack frames. > > This reminds me that some time ago I made an experimental patch for > removing SET_LINENO. There was the problem of generating callbacks > for pdb (which I think I solved somehow but I don't remember the > details). I do remember that I had to look at pdb again for some > reason. Is there any interest in reviving this idea? This is a very cheap opcode (at least in my version). What does it buy? Can you drop the f_lineno field from frames, and calculate it for the frame's f_lineno attribute? -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas.heller at ion-tof.com Thu Apr 6 17:40:38 2000 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Thu, 6 Apr 2000 17:40:38 +0200 Subject: [Python-Dev] DLL in the system directory on Windows Message-ID: <01ce01bf9fde$7601f8a0$4500a8c0@thomasnotebook> > However, as we import data from Python16.dll rather then purely > addresses, we can't use any of these interception solutions. What's wrong with: #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) I have only looked at PythonCOM15.dll, and it seems that there are only references to a handfull of exported data items: some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, _PyZero_Struct. Thomas Heller From jim at interet.com Thu Apr 6 17:48:50 2000 From: jim at interet.com (James C. Ahlstrom) Date: Thu, 06 Apr 2000 11:48:50 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <200004061357.JAA24071@eric.cnri.reston.va.us> Message-ID: <38ECB1E2.AD1BAF5C@interet.com> Guido van Rossum wrote: > But, I still don't understand why Perl/COM doesn't need a DLL in the > system directory. Or is it just because they change PATH? Here is some generic info which may help, or perhaps you already know it. If you have a DLL head.dll or EXE head.exe which needs another DLL needed.dll, you can link needed.dll with head, and the system will find all data and module names automatically (well, almost). When head is loaded, needed.dll must be available, or head will fail to load. This can be confusing. For example, I once tried to port PIL to my new Python mini-GUI model, and my DLL failed. Only after some confusion did I realize that PIL is linked with Tk libs, and would fail to load if they were not present, even though I was not using them. I think what Mark is saying is that Microsoft now has an option to do delayed DLL loading. The load of needed.dll is delayed until a function in needed.dll is called. This would have meant that PIL would have worked provided I never called a Tk function. I think he is also saying that this feature can only trap function calls, not pointer access to data, so it won't work in the context of data access (maybe it would if a function call came first). Of course, if you access all data through a function call GetMyData(), it all works. As an alternative, head.[exe|dll] would not be linked with needed.dll, and so needed.dll need not be present. To access functions by name in needed.dll, you call LoadLibrary or LoadLibraryEx to open needed.dll, and then call GetProcAddress() to get a pointer to named functions. In the case of data items, the pointer is dereferenced twice, that is, data = **pt. Python uses this strategy to load PYD's, and accesses the sole function initmodule(). Then the rest of the data is available through Python mechanisms which effectively substitute for normal DLL access. The alternative search path available in LoadLibraryEx only affects head.dll, and causes the system to look in the directory of needed.dll instead of the directory of the ultimate executable for finding other needed DLL's. So on Windows, Python needs PYTHONPATH to find PYD's, and if the PYD's need further DLL's those DLL's can be in the directory of the PYD, or on the usual DLL search path provided the "alternate search path" is used. Probably you alread know this, but maybe it will help the Windozly-challenged follow along. JimA From tismer at tismer.com Thu Apr 6 23:22:30 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 23:22:30 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? Message-ID: <38ED0016.E1C4A26C@tismer.com> Hi, asa side effect, I happened to observe the following rounding bug. It happens in Stackless Python, which is built against the pre-unicode CVS branch. Is this changed for 1.6, or might it be my bug? D:\python\spc>python Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1415999999999999 >>> ^Z D:\python>python Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1416 >>> ^Z ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at tismer.com Thu Apr 6 23:31:03 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 23:31:03 +0200 Subject: [Python-Dev] Long Multiplication is not commutative. Message-ID: <38ED0217.7C44A24F@tismer.com> Yikes! No, it is computatively commutative, just not in terms of computation time. :-)) The following factorial loops differ by a remarkable factor of 1.8, and we can gain this speed by changing long_mult to always put the lower multiplicand into the left. This was reported to me by Lenny Kneler, who thought he had found a Stackless bug, but he was actually testing long math. :-) This buddy... >>> def ifact3(n) : ... p = 1L ... for i in range(1,n+1) : ... p = i*p ... return p performs better by a factor of 1.8 than this one: >>> def ifact1(n) : ... p = 1L ... for i in range(1,n+1) : ... p = p*i ... return p The analysis of this behavior is quite simple if you look at the implementation of long_mult. If the left operand is big and the right is small, there are much more carry operations performed, together with more loop overhead. Swapping the multiplicands would be a 5 line patch. Should I submit it? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From jeremy at cnri.reston.va.us Thu Apr 6 23:29:13 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 6 Apr 2000 17:29:13 -0400 (EDT) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <38ECAA40.456F9919@tismer.com> References: <200004061351.PAA20261@python.inrialpes.fr> <38ECAA40.456F9919@tismer.com> Message-ID: <14573.425.369099.605774@bitdiddle.cnri.reston.va.us> >> Vladimir Marangozov wrote: >> This reminds me that some time ago I made an experimental patch >> for removing SET_LINENO. There was the problem of generating >> callbacks for pdb (which I think I solved somehow but I don't >> remember the details). I do remember that I had to look at pdb >> again for some reason. Is there any interest in reviving this >> idea? I think the details are important. The only thing the SET_LINENO opcode does is to call a trace function if one is installed. It's necessary to have some way to invoke the trace function when the line number changes (or it will be relatively difficult to execute code line-by-line in the debugger ). Off the top of my head, the only other way I see to invoke the trace function would be to add code at the head of the mainloop that computed the line number for each instruction (from lnotab) and called the trace function if the current line number is different than the previous time through the loop. That doesn't sound faster or simpler. Jeremy From guido at python.org Thu Apr 6 23:30:21 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 17:30:21 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Thu, 06 Apr 2000 23:22:30 +0200." <38ED0016.E1C4A26C@tismer.com> References: <38ED0016.E1C4A26C@tismer.com> Message-ID: <200004062130.RAA26273@eric.cnri.reston.va.us> > asa side effect, I happened to observe the following rounding bug. > It happens in Stackless Python, which is built against the > pre-unicode CVS branch. > > Is this changed for 1.6, or might it be my bug? > > D:\python\spc>python > Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1415999999999999 > >>> ^Z > > D:\python>python > Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1416 > >>> ^Z This is because repr() now uses full precision for floating point numbers. round() does what it can, but 3.1416 just can't be represented exactly, and "%.17g" gives 3.1415999999999999. This is definitely the right thing to do for repr() -- ask Tim. However, it may be time to switch so that "immediate expression" values are printed as str() instead of as repr()... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Thu Apr 6 22:31:02 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 6 Apr 2000 15:31:02 -0500 (CDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38ED0016.E1C4A26C@tismer.com> References: <38ED0016.E1C4A26C@tismer.com> Message-ID: <14572.62470.804145.677372@beluga.mojam.com> Chris> I happened to observe the following rounding bug. It happens in Chris> Stackless Python, which is built against the pre-unicode CVS Chris> branch. Chris> Is this changed for 1.6, or might it be my bug? I doubt it's your problem. I see it too with 1.6a2 (no stackless): % ./python Python 1.6a2 (#2, Apr 6 2000, 15:27:22) [GCC pgcc-2.91.66 19990314 (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1415999999999999 Same behavior whether compiled with -O2 or -g. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From guido at python.org Thu Apr 6 23:32:36 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 17:32:36 -0400 Subject: [Python-Dev] Long Multiplication is not commutative. In-Reply-To: Your message of "Thu, 06 Apr 2000 23:31:03 +0200." <38ED0217.7C44A24F@tismer.com> References: <38ED0217.7C44A24F@tismer.com> Message-ID: <200004062132.RAA26296@eric.cnri.reston.va.us> > This buddy... > > >>> def ifact3(n) : > ... p = 1L > ... for i in range(1,n+1) : > ... p = i*p > ... return p > > performs better by a factor of 1.8 than this one: > > >>> def ifact1(n) : > ... p = 1L > ... for i in range(1,n+1) : > ... p = p*i > ... return p > > The analysis of this behavior is quite simple if you look at the > implementation of long_mult. If the left operand is big and the > right is small, there are much more carry operations performed, > together with more loop overhead. > Swapping the multiplicands would be a 5 line patch. > Should I submit it? Yes, go for it. I would appreciate a bunch of new test cases that exercise the new path through the code, too... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 00:43:16 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 00:43:16 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <14573.425.369099.605774@bitdiddle.cnri.reston.va.us> from "Jeremy Hylton" at Apr 06, 2000 05:29:13 PM Message-ID: <200004062243.AAA21491@python.inrialpes.fr> Jeremy Hylton wrote: > > >> Vladimir Marangozov wrote: > >> This reminds me that some time ago I made an experimental patch > >> for removing SET_LINENO. There was the problem of generating > >> callbacks for pdb (which I think I solved somehow but I don't > >> remember the details). I do remember that I had to look at pdb > >> again for some reason. Is there any interest in reviving this > >> idea? > > I think the details are important. The only thing the SET_LINENO > opcode does is to call a trace function if one is installed. It's > necessary to have some way to invoke the trace function when the line > number changes (or it will be relatively difficult to execute code > line-by-line in the debugger ). Looking back at the discussion and the patch I ended up with at that time, I think the callback issue was solved rather elegantly. I'm not positive that it does not have side effects, though... For an overview of the approach and the corresponding patch, go back to: http://www.python.org/pipermail/python-dev/1999-August/002252.html http://sirac.inrialpes.fr/~marangoz/python/lineno/ What happens is that in tracing mode, a copy of the original code stream is created, a new CALL_TRACE opcode is stored in it at the addresses corresponding to each source line number, then the instruction pointer is redirected to execute the modified code string. Whenever a CALL_TRACE opcode is reached, the callback is triggered. On a successful return, the original opcode at the current address is fetched from the original code string, then directly goto the dispatch code. This code string duplication & conditional break-point setting occurs only when a trace function is set; in the "normal case", the interpreter executes a code string without SET_LINENO. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond at skippinet.com.au Fri Apr 7 02:47:06 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 10:47:06 +1000 Subject: [Python-Dev] RE: DLL in the system directory on Windows In-Reply-To: <01ce01bf9fde$7601f8a0$4500a8c0@thomasnotebook> Message-ID: > > However, as we import data from Python16.dll rather then purely > > addresses, we can't use any of these interception solutions. > > What's wrong with: > > #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) My only objection is that this is a PITA. It becomes a maintenance nightmare for Guido as the code gets significantly larger and uglier. > I have only looked at PythonCOM15.dll, and it seems that > there are only references to a handfull of exported data items: > > some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, > _PyZero_Struct. Yep - these structs, all the error objects and all the type objects. However, to do this properly, we must do _every_ exported data item, not just ones that satisfy COM (otherwise the next poor soul will have the exact same issue, and require patches to the core before they can work...) Im really not convinced it is worth it to save one, well-named DLL in the system directory. Mark. From guido at python.org Fri Apr 7 03:25:35 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 21:25:35 -0400 Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: Your message of "Fri, 07 Apr 2000 00:43:16 +0200." <200004062243.AAA21491@python.inrialpes.fr> References: <200004062243.AAA21491@python.inrialpes.fr> Message-ID: <200004070125.VAA26776@eric.cnri.reston.va.us> > What happens is that in tracing mode, a copy of the original code stream > is created, a new CALL_TRACE opcode is stored in it at the addresses > corresponding to each source line number, then the instruction pointer > is redirected to execute the modified code string. Whenever a CALL_TRACE > opcode is reached, the callback is triggered. On a successful return, > the original opcode at the current address is fetched from the original > code string, then directly goto the dispatch code. > > This code string duplication & conditional break-point setting occurs > only when a trace function is set; in the "normal case", the interpreter > executes a code string without SET_LINENO. Ai! This really sounds like a hack. It may be a standard trick in the repertoire of virtual machine implementers, but it is still a hack, and makes my heart cry. I really wonder if it makes enough of a difference to warrant all that code, and the risk that that code isn't quite correct. (Is it thread-safe?) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Fri Apr 7 03:36:30 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 11:36:30 +1000 Subject: [Python-Dev] RE: DLL in the system directory on Windows In-Reply-To: Message-ID: [I wrote] > My only objection is that this is a PITA. It becomes a ... > However, to do this properly, we must do _every_ exported ... > Im really not convinced it is worth it to save one, well-named DLL > in the system directory. ie, lots of good reasons _not_ to do this. However, it is worth pointing out that there is one good - possibly compelling - reason to consider this. Not only would we drop the dependency from the system directory, we could also drop the dependency to the Python version. That is, any C extension compiled for 1.6 would be able to automatically and without recompilation work with Python 1.7, so long as we kept all the same public names. It is too late for Python 1.5, but it would be a nice feature if an upgrade to Python 1.7 did not require waiting for every extension author to catch up. OTOH, if Python 1.7 is really the final in the 1.x family, is it worth it for a single version? Just-musing-ly, Mark. From ping at lfw.org Fri Apr 7 03:47:36 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 6 Apr 2000 20:47:36 -0500 (CDT) Subject: [Python-Dev] Pythons (Like Buses) Considered Harmful Message-ID: So, has anyone not seen Doctor Fun today yet? http://metalab.unc.edu/Dave/Dr-Fun/latest.jpg :) :) -- ?!ng From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 04:02:22 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 04:02:22 +0200 (CEST) Subject: [Python-Dev] python -O weirdness Message-ID: <200004070202.EAA22307@python.inrialpes.fr> Strange. Can somebody confirm/refute, explain this behavior? -------------[ bug.py ]------------ def f(): pass def g(): a = 1 b = 2 def h(): pass def show(func): c = func.func_code print "(%d) %s: %d -> %s" % \ (c.co_firstlineno, c.co_name, len(c.co_lnotab), repr(c.co_lnotab)) show(f) show(g) show(h) ----------------------------------- ~> python bug.py (1) f: 2 -> '\003\001' (4) g: 4 -> '\003\001\011\001' (8) h: 2 -> '\003\000' ~> python -O bug.py (1) f: 2 -> '\000\001' (4) g: 4 -> '\000\001\006\001' (1) f: 2 -> '\000\001' <=== ??? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Fri Apr 7 04:19:02 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 6 Apr 2000 22:19:02 -0400 Subject: [Python-Dev] Long Multiplication is not commutative. In-Reply-To: <200004062132.RAA26296@eric.cnri.reston.va.us> Message-ID: <000701bfa037$a4545960$6c2d153f@tim> > Yes, go for it. I would appreciate a bunch of new test cases that > exercise the new path through the code, too... FYI, a suitable test would be to add a line to function test_division_2 in test_long.py, to verify that x*y == y*x. A variety of bitlengths for x and y are already generated by the framework. From tim_one at email.msn.com Fri Apr 7 04:19:00 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 6 Apr 2000 22:19:00 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38ED0016.E1C4A26C@tismer.com> Message-ID: <000601bfa037$a2c18460$6c2d153f@tim> [posted & mailed] [Christian Tismer] > as a side effect, I happened to observe the following rounding bug. > It happens in Stackless Python, which is built against the > pre-unicode CVS branch. > > Is this changed for 1.6, or might it be my bug? It's a 1.6 thing, and is not a bug. > D:\python\spc>python > Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1415999999999999 > >>> ^Z The best possible IEEE-754 double approximation to 3.1416 is (exactly) 3.141599999999999948130380289512686431407928466796875 so the output you got is correctly rounded to 17 significant digits. IOW, it's a feature. 1.6 boosted the number of decimal digits repr(float) produces so that eval(repr(x)) == x for every finite float on every platform with an IEEE-754-conforming libc. It was actually rare for that equality to hold pre-1.6. repr() cannot produce fewer digits than this without allowing the equality to fail in some cases. The 1.6 str() still produces the *illusion* that the result is 3.1416 (as repr() also did pre-1.6). IMO it would be better if Python stopped using repr() (at least by default) for formatting expressions at the interactive prompt (for much more on this, see DejaNews). the-two-things-you-can-do-about-it-are-nothing-and-love-it-ly y'rs - tim From guido at python.org Fri Apr 7 04:23:11 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 22:23:11 -0400 Subject: [Python-Dev] python -O weirdness In-Reply-To: Your message of "Fri, 07 Apr 2000 04:02:22 +0200." <200004070202.EAA22307@python.inrialpes.fr> References: <200004070202.EAA22307@python.inrialpes.fr> Message-ID: <200004070223.WAA26916@eric.cnri.reston.va.us> > Strange. Can somebody confirm/refute, explain this behavior? > > -------------[ bug.py ]------------ > def f(): > pass > > def g(): > a = 1 > b = 2 > > def h(): pass > > def show(func): > c = func.func_code > print "(%d) %s: %d -> %s" % \ > (c.co_firstlineno, c.co_name, len(c.co_lnotab), repr(c.co_lnotab)) > > show(f) > show(g) > show(h) > ----------------------------------- > > ~> python bug.py > (1) f: 2 -> '\003\001' > (4) g: 4 -> '\003\001\011\001' > (8) h: 2 -> '\003\000' > > ~> python -O bug.py > (1) f: 2 -> '\000\001' > (4) g: 4 -> '\000\001\006\001' > (1) f: 2 -> '\000\001' <=== ??? > > -- Yes. I can confirm and explain it. The functions f and h are sufficiently similar that their code objects actually compare equal. A little-known optimization is that two constants in a const array that compare equal (and have the same type!) are replaced by a single copy. This happens in the module's code object: f's and h's code are the same, so only one copy is kept. The function name is not taken into account for the comparison. Maybe it should? On the other hand, the name is a pretty inessential part of the function, and it's not going to change the semantics of the program... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 04:47:15 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 04:47:15 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004070125.VAA26776@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 06, 2000 09:25:35 PM Message-ID: <200004070247.EAA22442@python.inrialpes.fr> Guido van Rossum wrote: > > > What happens is that in tracing mode, a copy of the original code stream > > is created, a new CALL_TRACE opcode is stored in it at the addresses > > corresponding to each source line number, then the instruction pointer > > is redirected to execute the modified code string. Whenever a CALL_TRACE > > opcode is reached, the callback is triggered. On a successful return, > > the original opcode at the current address is fetched from the original > > code string, then directly goto the dispatch code. > > > > This code string duplication & conditional break-point setting occurs > > only when a trace function is set; in the "normal case", the interpreter > > executes a code string without SET_LINENO. > > Ai! This really sounds like a hack. It may be a standard trick in > the repertoire of virtual machine implementers, but it is still a > hack, and makes my heart cry. The implementation sounds tricky, yes. But there's nothing hackish in the principle of setting breakpoints. The modified code string is in fact the stripped code stream (without LINENO), reverted back to a standard code stream with LINENO. However, to simplify things, the LINENO (aka CALL_TRACE) are not inserted between the instructions for every source line. They overwrite the original opcodes in the copy whenever a trace function is set (i.e. we set all conditional breakpoints (LINENO) at once). And since we overwrite for simplicity, at runtime, we read the ovewritten opcodes from the original stream, after the callback returns. All this magic occurs before the main loop, with finalization on exit of eval_code2. A tricky implementation of the principle of having a set of conditional breakpoints for every source line (these cond. bp are currently the SET_LINENO opcodes, in a more redundant version). > I really wonder if it makes enough of a difference to warrant all > that code, and the risk that that code isn't quite correct. Well, all this business is internal to ceval.c and doesn't seem to affect the rest of the world. I can see only two benefits (if this idea doesn't hide other mysteries -- so anyone interested may want check it out): 1) Some tiny speedup -- we'll reach -O in a standard setup 2) The .pyc files become smaller. (Lib/*.pyc is reduced by ~80K for 1.5.2) No other benefits (hmmm, maybe the pdb code will be simplified wrt linenos) I originally developped this idea because of the redundant, consecutive SET_LINENO in a code object. > (Is it thread-safe?) I think so. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From thomas.heller at ion-tof.com Fri Apr 7 09:10:41 2000 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Fri, 7 Apr 2000 09:10:41 +0200 Subject: [Python-Dev] Re: DLL in the system directory on Windows References: Message-ID: <03fe01bfa060$626a2010$4500a8c0@thomasnotebook> > > > However, as we import data from Python16.dll rather then purely > > > addresses, we can't use any of these interception solutions. > > > > What's wrong with: > > > > #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) > > My only objection is that this is a PITA. It becomes a maintenance > nightmare for Guido as the code gets significantly larger and > uglier. Why is it a nightmare for Guido? It can be done by the extension writer: You in the case for PythonCOM.dll. > > > I have only looked at PythonCOM15.dll, and it seems that > > there are only references to a handfull of exported data items: > > > > some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, > > _PyZero_Struct. > > Yep - these structs, all the error objects and all the type objects. > > However, to do this properly, we must do _every_ exported data item, > not just ones that satisfy COM (otherwise the next poor soul will > have the exact same issue, and require patches to the core before > they can work...) IMHO it is not a problem of exporting, but a question how *you* import these. > > Im really not convinced it is worth it to save one, well-named DLL > in the system directory. As long as no one else installs a modified version there (which *should* have a different name, but...) > > Mark. > Thomas Heller From fredrik at pythonware.com Fri Apr 7 10:47:37 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 10:47:37 +0200 Subject: [Python-Dev] SRE: regex.set_syntax References: <200004061343.PAA20218@python.inrialpes.fr> Message-ID: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Vladimir Marangozov wrote: > [Guido] > > If it ain't broken, don't "fix" it. > > This also explains why socket.connect() generated so much resistance... I'm not sure I see the connection -- the 'regex' module is already declared obsolete... so Guido probably meant "if it's not even in there, don't waste time on it" imo, the main reasons for supporting 'regex' are 1) that lots of people are still using it, often for performance reasons 2) while the import error should be easy to spot, actually changing from 'regex' to 're' requires some quite extensive core restructuring, especially com- pared to what it takes to fix a broken 'append' or 'connect' call, and 3) it's fairly easy to do, since the engines use the same semantics, and 'sre' supports pluggable front-ends. but alright, I think the consensus here is "(1) get rid of it completely". in 1.6a2, perhaps? From fredrik at pythonware.com Fri Apr 7 11:13:16 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 11:13:16 +0200 Subject: [Python-Dev] Pythons (Like Buses) Considered Harmful References: Message-ID: <00cd01bfa071$838c6cb0$0500a8c0@secret.pythonware.com> > So, has anyone not seen Doctor Fun today yet? > > http://metalab.unc.edu/Dave/Dr-Fun/latest.jpg > > :) :) the daily python-url features this link ages ago (in internet time, at least): http://hem.passagen.se/eff/url.htm (everyone should read the daily python URL ;-) From fredrik at pythonware.com Fri Apr 7 11:13:23 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 11:13:23 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> Message-ID: <00ce01bfa071$87fd5b60$0500a8c0@secret.pythonware.com> M.-A. Lemburg wrote: > The UTF-8 assumption had to be made in order to get the two > worlds to interoperate. We could have just as well chosen > Latin-1, but then people currently using say a Russian > encoding would get upset for the same reason. > > One way or another somebody is not going to like whatever > we choose, I'm afraid... the simplest solution is to use > Unicode for all strings which contain non-ASCII characters > and then call .encode() as necessary. just a brief head's up: I've been playing with this a bit, and my current view is that the current unicode design is horridly broken when it comes to mixing 8-bit and 16-bit strings. basically, if you pass a uni- code string to a function slicing and dicing 8-bit strings, it will probably not work. and you will probably not under- stand why. I'm working on a proposal that I think will make things simpler and less magic, and far easier to understand. to appear on sunday. From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 11:53:19 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 11:53:19 +0200 (CEST) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> from "Fredrik Lundh" at Apr 07, 2000 10:47:37 AM Message-ID: <200004070953.LAA25788@python.inrialpes.fr> Fredrik Lundh wrote: > > Vladimir Marangozov wrote: > > [Guido] > > > If it ain't broken, don't "fix" it. > > > > This also explains why socket.connect() generated so much resistance... > > I'm not sure I see the connection -- the 'regex' module is > already declared obsolete... Don't look further -- there's no connection with the re/sre code. It was just a thought about the above citation vs. the connect change. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Fri Apr 7 12:55:30 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 12:55:30 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> <00ce01bfa071$87fd5b60$0500a8c0@secret.pythonware.com> Message-ID: <38EDBEA2.8C843E49@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > The UTF-8 assumption had to be made in order to get the two > > worlds to interoperate. We could have just as well chosen > > Latin-1, but then people currently using say a Russian > > encoding would get upset for the same reason. > > > > One way or another somebody is not going to like whatever > > we choose, I'm afraid... the simplest solution is to use > > Unicode for all strings which contain non-ASCII characters > > and then call .encode() as necessary. > > just a brief head's up: > > I've been playing with this a bit, and my current view is that > the current unicode design is horridly broken when it comes > to mixing 8-bit and 16-bit strings. Why "horribly" ? String and Unicode mix pretty well, IMHO. The magic auto-conversion of Unicode to UTF-8 in C APIs using "s" or "s#" does not always do what the user expects, but it's still better than not having Unicode objects work with these APIs at all. > basically, if you pass a uni- > code string to a function slicing and dicing 8-bit strings, it > will probably not work. and you will probably not under- > stand why. > > I'm working on a proposal that I think will make things simpler > and less magic, and far easier to understand. to appear on > sunday. Looking forward to it, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 13:47:07 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 13:47:07 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <14572.35902.781258.448592@beluga.mojam.com> from "Skip Montanaro" at Apr 06, 2000 08:08:14 AM Message-ID: <200004071147.NAA26437@python.inrialpes.fr> Skip Montanaro wrote: > > Whoops, wait a minute. I just tried > > >>> def foo(): pass > ... > >>> foo.func_code.co_lnotab > > with both "python" and "python -O". co_lnotab is empty for python -O. I > thought it was supposed to always be generated? It is always generated, but since co_lnotab contains only lineno increments starting from co_firstlineno (i.e. only deltas) and your function is a 1-liner (no lineno increments starting from the first line of the function), the table is empty. Move 'pass' to the next line and the table will contain 1-entry (of 2 bytes: delta_addr, delta_line). Generally speaking, the problem really boils down to the callbacks from C to Python when a tracefunc is set. My approach is not that bad in this regard. A decent processor nowadays has (an IRQ pin) a flag for generating interrupts on every processor instruction (trace flag). In Python, we have the same problem - we need to interrupt the (virtual) processor, implemented in eval_code2() on regular intervals. Actually, what we need (for pdb) is to interrupt the processor on every source line, but one could easily imagine a per instruction interrupt (with a callback installed with sys.settracei(). This is exactly what the patch does under the grounds. It interrupts the processor on every new source line (but interrupting it on every instruction would be a trivial extension -- all opcodes in the code stream would be set to CALL_TRACE!) And this is exactly what LINENO does (+ some processor state saving in the frame: f_lasti, f_lineno). Clearly, there are 2 differences with the existing code: a) The interrupting opcodes are installed dynamically, on demand, only when a trace function is set, for the current traced frame. Presently, these opcodes are SET_LINENO; I introduced a new one byte CALL_TRACE opcode which does the same thing (thus preserving backwards compatibility with old .pyc that contain SET_LINENO). b) f_lasti and f_lineno aren't updated when the frame is not traced :-( I wonder whether we really care about them, though. The other implementation details aren't so important. Yet, they look scary, but no more than the co_lnotab business. The problem with my patch is point b). I believe the approach is good, though -- if it weren't, I woudn't have taken the care to talk about it detail. :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Fri Apr 7 13:57:41 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 13:57:41 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings Message-ID: <38EDCD35.DDD5EB4B@lemburg.com> There has been a bug report about the treatment of Unicode objects together with 8-bit format strings. The current implementation converts the Unicode object to UTF-8 and then inserts this value in place of the %s.... I'm inclined to change this to have '...%s...' % u'abc' return u'...abc...' since this is just another case of coercing data to the "bigger" type to avoid information loss. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Fri Apr 7 14:41:19 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 14:41:19 +0200 Subject: [Python-Dev] python -O weirdness References: <200004070202.EAA22307@python.inrialpes.fr> <200004070223.WAA26916@eric.cnri.reston.va.us> Message-ID: <38EDD76F.986D3C39@tismer.com> Guido van Rossum wrote: ... > The function name is not taken into account for the comparison. Maybe > it should? Absolutely, please! > On the other hand, the name is a pretty inessential part > of the function, and it's not going to change the semantics of the > program... If the name of the code object has any meaning, then it must be the name of the function that I meant, not just another function which happens to have the same body, IMHO. or the name should vanish completely. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gward at mems-exchange.org Fri Apr 7 14:49:15 2000 From: gward at mems-exchange.org (Greg Ward) Date: Fri, 7 Apr 2000 08:49:15 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004062130.RAA26273@eric.cnri.reston.va.us>; from guido@python.org on Thu, Apr 06, 2000 at 05:30:21PM -0400 References: <38ED0016.E1C4A26C@tismer.com> <200004062130.RAA26273@eric.cnri.reston.va.us> Message-ID: <20000407084914.A13606@mems-exchange.org> On 06 April 2000, Guido van Rossum said: > This is because repr() now uses full precision for floating point > numbers. round() does what it can, but 3.1416 just can't be > represented exactly, and "%.17g" gives 3.1415999999999999. > > This is definitely the right thing to do for repr() -- ask Tim. > > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... +1 on this: it's easier to change "foo" to "`foo`" than to "str(foo)" or "print foo". It just makes more sense to use str(). Oh, joy! oh happiness! someday soon, I may be able to type "blah.__doc__" at the interactive prompt and get a readable result! Greg From mikael at isy.liu.se Fri Apr 7 14:57:38 2000 From: mikael at isy.liu.se (Mikael Olofsson) Date: Fri, 07 Apr 2000 14:57:38 +0200 (MET DST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <20000407084914.A13606@mems-exchange.org> Message-ID: On 07-Apr-00 Greg Ward wrote: > Oh, joy! oh happiness! someday soon, I may be able to type > "blah.__doc__" at the interactive prompt and get a readable result! Just i case... I hope you haven't missed "print blah.__doc__". /Mikael ----------------------------------------------------------------------- E-Mail: Mikael Olofsson WWW: http://www.dtr.isy.liu.se/dtr/staff/mikael Phone: +46 - (0)13 - 28 1343 Telefax: +46 - (0)13 - 28 1339 Date: 07-Apr-00 Time: 14:56:52 This message was sent by XF-Mail. ----------------------------------------------------------------------- From guido at python.org Fri Apr 7 15:01:45 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:01:45 -0400 Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: Your message of "Fri, 07 Apr 2000 13:57:41 +0200." <38EDCD35.DDD5EB4B@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> Message-ID: <200004071301.JAA27100@eric.cnri.reston.va.us> > There has been a bug report about the treatment of Unicode > objects together with 8-bit format strings. The current > implementation converts the Unicode object to UTF-8 and then > inserts this value in place of the %s.... > > I'm inclined to change this to have '...%s...' % u'abc' > return u'...abc...' since this is just another case of > coercing data to the "bigger" type to avoid information loss. > > Thoughts ? Makes sense. But note that it's going to be difficult to catch all cases: you could have '...%d...%s...%s...' % (3, "abc", u"abc") and '...%(foo)s...' % {'foo': u'abc'} and even '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'} (the latter should *not* convert to Unicode). --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Fri Apr 7 15:06:51 2000 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 07 Apr 2000 15:06:51 +0200 Subject: [Python-Dev] PYTHON_API_VERSION and threading Message-ID: <20000407130652.4D002370CF2@snelboot.oratrix.nl> Something that just struck me: couldn't we use a couple of bits in the PYTHON_API_VERSION to check various other things that make dynamic modules break? WITH_THREAD is the one I just ran in to, but there's a few others such as the object refcounting statistics and platform-dependent things like the debug/nodebug compilation on Windows. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at python.org Fri Apr 7 15:13:21 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:13:21 -0400 Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: Your message of "Fri, 07 Apr 2000 15:06:51 +0200." <20000407130652.4D002370CF2@snelboot.oratrix.nl> References: <20000407130652.4D002370CF2@snelboot.oratrix.nl> Message-ID: <200004071313.JAA27132@eric.cnri.reston.va.us> > Something that just struck me: couldn't we use a couple of bits in the > PYTHON_API_VERSION to check various other things that make dynamic modules > break? WITH_THREAD is the one I just ran in to, but there's a few others such > as the object refcounting statistics and platform-dependent things like the > debug/nodebug compilation on Windows. I'm curious what combination didn't work? The thread APIs are supposed to be designed so that all combinations work -- the APIs are always present, they just don't do anything in the unthreaded version. If an extension is compiled without threads, well, then it won't release the interpreter lock, of course, but otherwise there should be no bad effects. The debug issue on Windows is taken care of by a DLL naming convention: the debug versions are named spam_d.dll (or .pyd). --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at mems-exchange.org Fri Apr 7 15:15:46 2000 From: gward at mems-exchange.org (Greg Ward) Date: Fri, 7 Apr 2000 09:15:46 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: ; from mikael@isy.liu.se on Fri, Apr 07, 2000 at 02:57:38PM +0200 References: <20000407084914.A13606@mems-exchange.org> Message-ID: <20000407091545.B13606@mems-exchange.org> On 07 April 2000, Mikael Olofsson said: > > On 07-Apr-00 Greg Ward wrote: > > Oh, joy! oh happiness! someday soon, I may be able to type > > "blah.__doc__" at the interactive prompt and get a readable result! > > Just i case... I hope you haven't missed "print blah.__doc__". Yeah, I know: my usual mode of operation is this: >>> blah.__doc__ ...repr of docstring... ...sound of me cursing... >>> print blah.__doc__ The real reason for using str() at the interactive prompt is not to save me keystrokes, but because it just seems like the sensible thing to do. People who understand the str/repr difference, and really want the repr version, can slap backquotes around whatever they're printing. Greg From guido at python.org Fri Apr 7 15:18:39 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:18:39 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Fri, 07 Apr 2000 10:47:37 +0200." <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <200004071318.JAA27173@eric.cnri.reston.va.us> > but alright, I think the consensus here is "(1) get rid > of it completely". in 1.6a2, perhaps? I don't think so... If people still use regex, why not keep it? It doesn't cost much to maintain... --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Fri Apr 7 15:43:03 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 15:43:03 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: <20000407084914.A13606@mems-exchange.org> <20000407091545.B13606@mems-exchange.org> Message-ID: <002801bfa097$33228770$0500a8c0@secret.pythonware.com> Greg wrote: > Yeah, I know: my usual mode of operation is this: > > >>> blah.__doc__ > ...repr of docstring... > ...sound of me cursing... > >>> print blah.__doc__ on the other hand, I tend to do this now and then: >>> blah = foo() # returns chunk of binary data >>> blah which, if you use str instead of repr, can reprogram your terminal window in many interesting ways... but I think I'm +1 on this anyway. or at least +0.90000000000000002 From skip at mojam.com Fri Apr 7 15:04:39 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 7 Apr 2000 08:04:39 -0500 (CDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <14573.56551.939560.375409@beluga.mojam.com> Fredrik> 1) that lots of people are still using it, often for Fredrik> performance reasons Speaking of which, how do sre, re and regex compare to one another these days? -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From jack at oratrix.nl Fri Apr 7 16:19:36 2000 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 07 Apr 2000 16:19:36 +0200 Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: Message by Guido van Rossum , Fri, 07 Apr 2000 09:13:21 -0400 , <200004071313.JAA27132@eric.cnri.reston.va.us> Message-ID: <20000407141937.3FBDE370CF2@snelboot.oratrix.nl> > > Something that just struck me: couldn't we use a couple of bits in the > > PYTHON_API_VERSION to check various other things that make dynamic modules > > break? WITH_THREAD is the one I just ran in to, but there's a few others such > > as the object refcounting statistics and platform-dependent things like the > > debug/nodebug compilation on Windows. > > I'm curious what combination didn't work? The thread APIs are > supposed to be designed so that all combinations work -- the APIs are > always present, they just don't do anything in the unthreaded > version. Oops, the problem was mine: not only was the extension module compiled without threading, but also with the previous version of the I/O library used on the mac. Silly me. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fdrake at acm.org Fri Apr 7 16:21:59 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 7 Apr 2000 10:21:59 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <14573.61191.486890.43591@seahag.cnri.reston.va.us> Fredrik Lundh writes: > 1) that lots of people are still using it, often for > performance reasons That's why I never converted Grail; the "re" layer around "pcre" was substantially more expensive to use, and the HTML parser was way too slow already. (Displaying the result was still the slowest part, but we were desparate for every little scrap!) > but alright, I think the consensus here is "(1) get rid > of it completely". in 1.6a2, perhaps? I seem to recall a determination to toss it for Py3K (or Python 2, as it was called at the time). Note that Grail breaks completely as soon as the module can't be imported. I'll propose a compromise: keep it in the set of modules that get built by default, but remove the documentation sections from the manual. This will more strongly encourage migration for actively maintained code. I would be surprised if Grail is the only large application which uses "regex" for performance reasons, and we don't really *want* to break everything. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Fri Apr 7 16:48:31 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 16:48:31 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> Message-ID: <38EDF53F.94071785@lemburg.com> Guido van Rossum wrote: > > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Makes sense. But note that it's going to be difficult to catch all > cases: you could have > > '...%d...%s...%s...' % (3, "abc", u"abc") > > and > > '...%(foo)s...' % {'foo': u'abc'} > > and even > > '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'} > > (the latter should *not* convert to Unicode). No problem... :-) Its a simple fix: once %s in an 8-bit string sees a Unicode object it will stop processing the string and restart using the unicode formatting algorithm. This will cost performance, of course. Optimization is easy though: add a small "u" in front of the string ;-) A sample session: >>> '...%(foo)s...' % {'foo':u"abc"} u'...abc...' >>> '...%(foo)s...' % {'foo':"abc"} '...abc...' >>> '...%(foo)s...' % {u'foo':"abc"} '...abc...' >>> '...%(foo)s...' % {u'foo':u"abc"} u'...abc...' >>> '...%(foo)s...' % {u'foo':u"abc",'def':123} u'...abc...' >>> '...%(foo)s...' % {u'foo':u"abc",u'def':123} u'...abc...' -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Fri Apr 7 16:53:43 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 7 Apr 2000 10:53:43 -0400 (EDT) Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: <38EDF53F.94071785@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> Message-ID: <14573.63095.48171.721921@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > No problem... :-) Its a simple fix: once %s in an 8-bit string > sees a Unicode object it will stop processing the string and > restart using the unicode formatting algorithm. > > This will cost performance, of course. Optimization is easy though: > add a small "u" in front of the string ;-) Seems reasonable to me! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 19:14:03 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 19:14:03 +0200 (CEST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000601bfa037$a2c18460$6c2d153f@tim> from "Tim Peters" at Apr 06, 2000 10:19:00 PM Message-ID: <200004071714.TAA27347@python.inrialpes.fr> Tim Peters wrote: > > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > 3.141599999999999948130380289512686431407928466796875 > > so the output you got is correctly rounded to 17 significant digits. IOW, > it's a feature. I'm very respectful when I see a number with so many digits in a row. :-) I'm not sure that this will be of any interest to you, number crunchers, but a research team in computer arithmetics here reported some major results lately: they claim that they "solved" the Table Maker's Dilemma for most common functions in IEEE-754 double precision arithmetic. (and no, don't ask me what this means ;-) For more information, see: http://www.ens-lyon.fr/~jmmuller/Intro-to-TMD.htm -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 20:03:15 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 20:03:15 +0200 (CEST) Subject: [Python-Dev] python -O weirdness In-Reply-To: <38EDD76F.986D3C39@tismer.com> from "Christian Tismer" at Apr 07, 2000 02:41:19 PM Message-ID: <200004071803.UAA27485@python.inrialpes.fr> Christian Tismer wrote: > > Guido van Rossum wrote: > ... > > The function name is not taken into account for the comparison. Maybe > > it should? > > Absolutely, please! Honestly, no. -O is used for speed, so showing the wrong symbols is okay. It's the same in C. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at tismer.com Fri Apr 7 20:37:54 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 20:37:54 +0200 Subject: [Python-Dev] python -O weirdness References: <200004071803.UAA27485@python.inrialpes.fr> Message-ID: <38EE2B02.1E6F3CB8@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Guido van Rossum wrote: > > ... > > > The function name is not taken into account for the comparison. Maybe > > > it should? > > > > Absolutely, please! > > Honestly, no. -O is used for speed, so showing the wrong symbols is > okay. It's the same in C. Not ok, IMHO. If the name is not guaranteed to be valid, why should it be there at all? If I write code that relies on inspecting those things, then I'm hosed. I'm the last one who argues against optimization. But I'd use either no name at all, or a tuple with all folded names. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 20:40:03 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 20:40:03 +0200 (CEST) Subject: [Python-Dev] the regression test suite Message-ID: <200004071840.UAA27606@python.inrialpes.fr> My kitchen programs show that regrtest.py keeps requesting more and more memory until it finishes all tests. IOW, it doesn't finalize properly each test. It keeps importing modules, without deleting them after each test. I think that before a particular test is run, we need to save the value of sys.modules, then restore it after the test (before running the next one). In a module enabled interpreter, this reduces the memory consumption almost by half... Patch? Think about the number of new tests that will be added in the future. I don't want to tolerate a silently approaching useless disk swapping :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From ping at lfw.org Fri Apr 7 20:47:45 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 7 Apr 2000 13:47:45 -0500 (CDT) Subject: [Python-Dev] Round Bug in Python 1.6? Message-ID: Tim Peters wrote: > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > 3.141599999999999948130380289512686431407928466796875 Let's call this number 'A' for the sake of discussion. > so the output you got is correctly rounded to 17 significant digits. IOW, > it's a feature. Clearly there is something very wrong here: Python 1.5.2+ (#2, Mar 28 2000, 18:27:50) Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 3.1416 3.1415999999999999 >>> Now you say that 17 significant digits are required to ensure that eval(repr(x)) == x, but we surely know that 17 digits are *not* required when x is A because i *just typed in* 3.1416 and the best choice of double value was A. I haven't gone and figured it out, but i'll take your word for it that 17 digits may be required in *certain* cases to ensure that eval(repr(x)) == x. They're just not required in all cases. It's very jarring to type something in, and have the interpreter give you back something that looks very different. It breaks a fundamental rule of consistency, and that damages the user's trust in the system or their understanding of the system. (What do you do then, start explaining the IEEE double representation to your CP4E beginner?) What should really happen is that floats intelligently print in the shortest and simplest manner possible, i.e. the fewest number of digits such that the decimal representation will convert back to the actual value. Now you may say this is a pain to implement, but i'm talking about sanity for the user here. I haven't investigated how to do this best yet. I'll go off now and see if i can come up with an algorithm that's not quite so stupid as def smartrepr(x): p = 17 while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 return '%%.%df' % p % x -- ?!ng From tismer at tismer.com Fri Apr 7 20:51:09 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 20:51:09 +0200 Subject: [Python-Dev] Long Multiplication is not commutative. References: <000701bfa037$a4545960$6c2d153f@tim> Message-ID: <38EE2E1D.6708B43D@tismer.com> Tim Peters wrote: > > > Yes, go for it. I would appreciate a bunch of new test cases that > > exercise the new path through the code, too... > > FYI, a suitable test would be to add a line to function test_division_2 in > test_long.py, to verify that x*y == y*x. A variety of bitlengths for x and > y are already generated by the framework. Thanks - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From moshez at math.huji.ac.il Fri Apr 7 20:45:41 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 7 Apr 2000 20:45:41 +0200 (IST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004062130.RAA26273@eric.cnri.reston.va.us> Message-ID: On Thu, 6 Apr 2000, Guido van Rossum wrote: > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... Just checking my newly bought "Guido Channeling" kit -- you mean str() but special case the snot out of strings(TM), don't you Trademark probably belong to Tim Peters. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From guido at python.org Fri Apr 7 21:18:40 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:18:40 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Fri, 07 Apr 2000 20:45:41 +0200." References: Message-ID: <200004071918.PAA27474@eric.cnri.reston.va.us> > Just checking my newly bought "Guido Channeling" kit -- you mean str() > but special case the snot out of strings(TM), don't you Except I'm not sure what kind of special-casing should be happening. Put quotes around it without worrying if that makes it a valid string literal is one thought that comes to mind. Another approach might be what Tk's text widget does -- pass through certain control characters (LF, TAB) and all (even non-ASCII) printing characters, but display other control characters as \x.. escapes rather than risk putting the terminal in a weird mode. No quotes though. Hm, I kind of like this: when used as intended, it will just display the text, with newlines and umlauts etc.; but when printing binary gibberish, it will do something friendly. There's also the issue of what to do with lists (or tuples, or dicts) containing strings. If we agree on this: >>> "hello\nworld\n\347" # octal 347 is a cedilla hello world ? >>> Then what should ("hello\nworld", "\347") show? I've got enough serious complaints that I don't want to propose that it use repr(): >>> ("hello\nworld", "\347") ('hello\nworld', '\347') >>> Other possibilities: >>> ("hello\nworld", "\347") ('hello world', '?') >>> or maybe >>> ("hello\nworld", "\347") ('''hello world''', '?') >>> Of course there's also the Unicode issue -- the above all assumes Latin-1 for stdout. Still no closure, I think... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 7 21:35:32 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:35:32 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Fri, 07 Apr 2000 13:47:45 CDT." References: Message-ID: <200004071935.PAA27541@eric.cnri.reston.va.us> > Tim Peters wrote: > > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > > > 3.141599999999999948130380289512686431407928466796875 > > Let's call this number 'A' for the sake of discussion. > > > so the output you got is correctly rounded to 17 significant digits. IOW, > > it's a feature. > > Clearly there is something very wrong here: > > Python 1.5.2+ (#2, Mar 28 2000, 18:27:50) > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> 3.1416 > 3.1415999999999999 > >>> > > Now you say that 17 significant digits are required to ensure > that eval(repr(x)) == x, but we surely know that 17 digits are > *not* required when x is A because i *just typed in* 3.1416 and > the best choice of double value was A. Ping has a point! > I haven't gone and figured it out, but i'll take your word for > it that 17 digits may be required in *certain* cases to ensure > that eval(repr(x)) == x. They're just not required in all cases. > > It's very jarring to type something in, and have the interpreter > give you back something that looks very different. It breaks a > fundamental rule of consistency, and that damages the user's > trust in the system or their understanding of the system. (What > do you do then, start explaining the IEEE double representation > to your CP4E beginner?) > > What should really happen is that floats intelligently print in > the shortest and simplest manner possible, i.e. the fewest > number of digits such that the decimal representation will > convert back to the actual value. Now you may say this is a > pain to implement, but i'm talking about sanity for the user here. > > I haven't investigated how to do this best yet. I'll go off > now and see if i can come up with an algorithm that's not > quite so stupid as > > def smartrepr(x): > p = 17 > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > return '%%.%df' % p % x Have a look at what Java does; it seems to be doing this right: & jpython JPython 1.1 on java1.2 (JIT: sunwjit) Copyright (C) 1997-1999 Corporation for National Research Initiatives >>> import java.lang >>> x = java.lang.Float(3.1416) >>> x.toString() '3.1416' >>> ^D & Could it be as simple as converting x +/- one bit and seeing how many differing digits there were? (Not that +/- one bit is easy to calculate...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 7 21:37:26 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:37:26 -0400 Subject: [Python-Dev] the regression test suite In-Reply-To: Your message of "Fri, 07 Apr 2000 20:40:03 +0200." <200004071840.UAA27606@python.inrialpes.fr> References: <200004071840.UAA27606@python.inrialpes.fr> Message-ID: <200004071937.PAA27552@eric.cnri.reston.va.us> > My kitchen programs show that regrtest.py keeps requesting more and > more memory until it finishes all tests. IOW, it doesn't finalize > properly each test. It keeps importing modules, without deleting them > after each test. I think that before a particular test is run, we need to > save the value of sys.modules, then restore it after the test (before > running the next one). In a module enabled interpreter, this reduces > the memory consumption almost by half... > > Patch? > > Think about the number of new tests that will be added in the future. > I don't want to tolerate a silently approaching useless disk swapping :-) I'm not particularly concerned, but it does make some sense. (And is faster than starting a fresh interpreter for each test.) So why don't you give it a try! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 7 21:49:52 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:49:52 -0400 Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: Your message of "Fri, 07 Apr 2000 16:48:31 +0200." <38EDF53F.94071785@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> Message-ID: <200004071949.PAA27635@eric.cnri.reston.va.us> > No problem... :-) Its a simple fix: once %s in an 8-bit string > sees a Unicode object it will stop processing the string and > restart using the unicode formatting algorithm. But the earlier items might already have incurred side effects (e.g. when rendering user code)... Unless you save all the strings you got for reuse, which seems a pain as well. --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Fri Apr 7 22:00:09 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 7 Apr 2000 15:00:09 -0500 (CDT) Subject: [Python-Dev] str() for interpreter output Message-ID: Guido van Rossum wrote: > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... You do NOT want this. I'm against this change -- quite strongly, in fact. Greg Ward wrote: > Oh, joy! oh happiness! someday soon, I may be able to type > "blah.__doc__" at the interactive prompt and get a readable result! Have repr() use triple-quotes when strings contain newlines if you like, but do *not* hide the fact that the thing being displayed is a string. Imagine the confusion this would cause! (in a hypothetical Python-with-str()...) >>> a = 1 + 1 >>> b = '2' >>> c = [1, 2, 3] >>> d = '[1, 2, 3]' ...much later... >>> a 2 >>> b 2 >>> a + 5 7 >>> b + 5 Traceback (innermost last): File "", line 1, in ? TypeError: illegal argument type for built-in operation Huh?!? >>> c [1, 2, 3] >>> d [1, 2, 3] >>> c.append(4) >>> c [1, 2, 3, 4] >>> d.append(4) Traceback (innermost last): File "", line 1, in ? AttributeError: attribute-less object Huh?!?! >>> c[1] 2 >>> d[1] 1 What?! This is guaranteed to confuse! Things that look the same should be the same. Things that are different should look different. Getting the representation of objects from the interpreter provides a very important visual cue: you can usually tell just by looking at the first character what kind of animal you've got. A digit means it's a number; a quote means a string; "[" means a list; "(" means a tuple; "{" means a dictionary; "<" means an instance or a special kind of object. Switching to str() instead of repr() completely breaks this property so you have no idea what you are getting. Intuitions go out the window. Granted, repr() cannot always produce an exact reconstruction of an object. repr() is not a serialization mechanism! We have 'pickle' for that. But the nice thing about repr() is that, in general, you can *tell* whether the representation is accurate enough to re-type: once you see a "<...>" sort of thing, you know that there is extra magic that you can't type in. "<...>" was an excellent choice because it is very clearly syntactically illegal. As a corollary, here is an important property of repr() that i think ought to be documented and preserved: eval(repr(x)) should produce an object with the same value and state as x, or it should cause a SyntaxError. We should avoid ever having it *succeed* and produce the *wrong* x. * * * As Tim suggested, i did go back and read the comp.lang.python thread on "__str__ vs. __repr__". Honestly i'm really surprised that such a convoluted hack as the suggestion to "special-case the snot out of strings" would come from Tim, and more surprised that it actually got so much airtime. Doing this special-case mumbo-jumbo would be even worse! Look: (in a hypothetical Python-with-snotless-str()...) >>> a = '\\' >>> b = '\'' ...much later... >>> a '\' >>> '\' File "", line 1 '\' ^ SyntaxError: invalid token (at this point i am envisioning the user screaming, "But that's what YOU said!") >>> b ''' >>> ''' ... Wha...?!! Or, alternatively, if even more effort had been expended removing snot: >>> b "'" >>> "'" "'" >>> print b ' Okay... then: >>> c = '"\'" >>> c '"'' >>> '"'' File "", line 1 '"'' ^ SyntaxError: invalid token Oh, it should print as '"\'', you say? Well then what of: >>> c '"\'' >>> d = '"\\\'' '"\\'' >>> '"\\'' File "", line 1 '"\\'' ^ SyntaxError: invalid token Damned if you do, damned if you don't. Tim's snot-removal algorithm forces the user to *infer* the rules of snot removal, remember them, and tentatively apply them to everything they see (since they still can't be sure whether snot has been removed from what they are seeing). How are the user and the interpreter ever to get along if they can't talk to each other in the same language? * * * As for the suggestion to add an interpreter hook to __builtins__ such that you can supply your own display routine, i'm all for it. Great idea there. * * * I think Donn Cave put it best: there are THREE different kinds of convert-to-string, and we'll only confuse the issue if we try to ignore the distinctions. (a) accurate serialization (b) coerce to string (c) friendly display (a) is taken care of by 'pickle'. (b) is str(). Clearly, coercing a string to a string should not change anything -- thus str(x) is just x if x is already a string. (c) is repr(). repr() is for the human, not for the machine. (a) is for the machine. repr() is: "Please show me as much information as you reasonably can about this object in an accurate and unambiguous way, but if you can't readably show me everything, make it obvious that you're not." repr() must be unambiguous, because the interpreter must help people learn by example. -- ?!ng From gmcm at hypernet.com Fri Apr 7 22:12:29 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 7 Apr 2000 16:12:29 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <1256984142-21537727@hypernet.com> Ka-Ping Yee wrote: > repr() must be unambiguous, because the interpreter must help people > learn by example. Speaking of which: >>> class A: ... def m(self): ... pass ... >>> a = A() >>> a.m >>> m = a.m >>> m >>> m is a.m 0 >>> ambiguated-ly y'rs - Gordon From guido at python.org Fri Apr 7 22:14:53 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 16:14:53 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Your message of "Fri, 07 Apr 2000 15:00:09 CDT." References: Message-ID: <200004072014.QAA27700@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > However, it may be time to switch so that "immediate expression" > > values are printed as str() instead of as repr()... [Ping] > You do NOT want this. > > I'm against this change -- quite strongly, in fact. Thanks for reminding me of what my original motivation was for using repr(). I am also still annoyed at some extension writers who violate the rule, and design a repr() that is nice to look at but lies about the type. Note that xrange() commits this sin! (I didn't write xrange() and never liked it. ;-) We still have a dilemma though... People using the interactive interpreter to perform some specific task (e.g. NumPy users), rather than to learn about Python, want str(), and actually I agree with them there. How can we give everybody wht they want? > As for the suggestion to add an interpreter hook to __builtins__ > such that you can supply your own display routine, i'm all for it. > Great idea there. Maybe this is the solution... --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Apr 7 23:03:31 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 23:03:31 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> <200004071949.PAA27635@eric.cnri.reston.va.us> Message-ID: <38EE4D22.CC43C664@lemburg.com> Guido van Rossum wrote: > > > No problem... :-) Its a simple fix: once %s in an 8-bit string > > sees a Unicode object it will stop processing the string and > > restart using the unicode formatting algorithm. > > But the earlier items might already have incurred side effects > (e.g. when rendering user code)... Unless you save all the strings > you got for reuse, which seems a pain as well. Oh well... I don't think it's worth getting this 100% right. We'd need quite a lot of code to store the intermediate results and then have them reused during the Unicode %-formatting -- just to catch the few cases where str(obj) does have side-effects: the code would have to pass the partially rendered string pasted together with the remaining format string to the Unicode coercion mechanism and then fiddle the arguments right. Which side-effects are you thinking about here ? Perhaps it would be better to simply raise an exception in case '%s' meets Unicode. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sat Apr 8 00:42:01 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 08 Apr 2000 00:42:01 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> <200004071949.PAA27635@eric.cnri.reston.va.us> <38EE4D22.CC43C664@lemburg.com> Message-ID: <38EE6439.80847A06@lemburg.com> "M.-A. Lemburg" wrote: > > Guido van Rossum wrote: > > > > > No problem... :-) Its a simple fix: once %s in an 8-bit string > > > sees a Unicode object it will stop processing the string and > > > restart using the unicode formatting algorithm. > > > > But the earlier items might already have incurred side effects > > (e.g. when rendering user code)... Unless you save all the strings > > you got for reuse, which seems a pain as well. > > Oh well... I don't think it's worth getting this 100% right. Never mind -- I have a patch ready now, that doesn't restart, but instead uses what has already been formatted and then continues in Unicode mode. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim_one at email.msn.com Sat Apr 8 03:41:48 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 7 Apr 2000 21:41:48 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000201bfa0fb$9af44b40$bc2d153f@tim> [Ka-Ping Yee] > ,,, > Now you say that 17 significant digits are required to ensure > that eval(repr(x)) == x, Yes. This was first proved in Jerome Coonen's doctoral dissertation, and is one of the few things IEEE-754 guarantees about fp I/O: that input(output(x)) == x for all finite double x provided that output() produces at least 17 significant decimal digits (and 17 is minimal). In particular, IEEE-754 does *not* guarantee that either I or O are properly rounded, which latter is needed for what *you* want to see here. The std doesn't require proper rounding in this case (despite that it requires it in all other cases) because no efficient method for doing properly rounded I/O was known at the time (and, alas, that's still true). > but we surely know that 17 digits are *not* required when x is A > because i *just typed in* 3.1416 and the best choice of double value > was A. Well, x = 1.0 provides a simpler case . > I haven't gone and figured it out, but i'll take your word for > it that 17 digits may be required in *certain* cases to ensure > that eval(repr(x)) == x. They're just not required in all cases. > > It's very jarring to type something in, and have the interpreter > give you back something that looks very different. It's in the very nature of binary floating-point that the numbers they type in are often not the numbers the system uses. > It breaks a fundamental rule of consistency, and that damages the user's > trust in the system or their understanding of the system. If they're surprised by this, they indeed don't understand the arithmetic at all! This is an argument for using a different form of arithmetic, not for lying about reality. > (What do you do then, start explaining the IEEE double representation > to your CP4E beginner?) As above. repr() shouldn't be used at the interactive prompt anyway (but note that I did not say str() should be). > What should really happen is that floats intelligently print in > the shortest and simplest manner possible, i.e. the fewest > number of digits such that the decimal representation will > convert back to the actual value. Now you may say this is a > pain to implement, but i'm talking about sanity for the user here. This can be done, but only if Python does all fp I/O conversions entirely on its own -- 754-conforming libc routines are inadequate for this purpose (and, indeed, I don't believe any libc other than Sun's does do proper rounding here). For background and code, track down "How To Print Floating-Point Numbers Accurately" by Steele & White, and its companion paper (s/Print/Read/) by Clinger. Steele & White were specifically concerned with printing the "shortest" fp representation possible such that proper input could later reconstruct the value exactly. Steele, White & Clinger give relatively simple code for this that relies on unbounded int arithmetic. Excruciatingly difficult and platform-#ifdef'ed "optimized" code for this was written & refined over several years by the numerical analyst David Gay, and is available from Netlib. > I haven't investigated how to do this best yet. I'll go off > now and see if i can come up with an algorithm that's not > quite so stupid as > > def smartrepr(x): > p = 17 > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > return '%%.%df' % p % x This merely exposes accidents in the libc on the specific platform you run it. That is, after print smartrepr(x) on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not yield the same number platform A started with. Both platforms have to do proper rounding to make this work; there's no way to do proper rounding by using libc; so Python has to do it itself; there's no efficient way to do it regardless; nevertheless, it's a noble goal, and at least a few languages in the Lisp family require it (most notably Scheme, from whence Steele, White & Clinger's interest in the subject). you're-in-over-your-head-before-the-water-touches-your-toes-ly y'rs - tim From billtut at microsoft.com Sat Apr 8 03:45:03 2000 From: billtut at microsoft.com (Bill Tutt) Date: Fri, 7 Apr 2000 18:45:03 -0700 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> > There has been a bug report about the treatment of Unicode > objects together with 8-bit format strings. The current > implementation converts the Unicode object to UTF-8 and then > inserts this value in place of the %s.... > > I'm inclined to change this to have '...%s...' % u'abc' > return u'...abc...' since this is just another case of > coercing data to the "bigger" type to avoid information loss. > > Thoughts ? Suddenly returning a Unicode string from an operation that was an 8-bit string is likely to give some code exterme fits of despondency. Converting to UTF-8 didn't give you any data loss, however it certainly might be unexpected to now find UTF-8 characters in what the user originally thought was a binary string containing whatever they had wanted it to contain. Throwing an exception would at the very least force the user to make a decision one way or the other about what they want to do with the data. They might want to do a codepage translation, or something else. (aka Hey, here's a bug I just found for you!) In what other cases are you suddenly returning a Unicode string object from which previouslly returned a string object? Bill From tim_one at email.msn.com Sat Apr 8 03:49:03 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 7 Apr 2000 21:49:03 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: <200004072014.QAA27700@eric.cnri.reston.va.us> Message-ID: <000301bfa0fc$9e452e80$bc2d153f@tim> [Guido] > Thanks for reminding me of what my original motivation was for using > repr(). I am also still annoyed at some extension writers who violate > the rule, and design a repr() that is nice to look at but lies about > the type. ... Back when this was a hot topic on c.l.py (there are no new topics <0.1 wink>), it was very clear that many did this to class __repr__ on purpose, precisely because they wanted to get back a readable string at the interactive prompt (where a *correct* repr may yield a megabyte of info -- see my extended examples from that thread with Rationals, and lists of Rationals, and dicts w/ Rationals etc). In fact, at least one Python old-timer argued strongly that the right thing to do was to swap the descriptions of str() and repr() in the docs! str()-should-also-"pass-str()-down"-ly y'rs - tim From fredrik at pythonware.com Sat Apr 8 07:47:13 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 8 Apr 2000 07:47:13 +0200 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> Message-ID: <002c01bfa11d$e4608ec0$0500a8c0@secret.pythonware.com> Bill Tutt wrote: > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Suddenly returning a Unicode string from an operation that was an 8-bit > string is likely to give some code exterme fits of despondency. why is this different from returning floating point values from operations involving integers and floats? > Converting to UTF-8 didn't give you any data loss, however it certainly > might be unexpected to now find UTF-8 characters in what the user originally > thought was a binary string containing whatever they had wanted it to contain. the more I've played with this, the stronger my opinion that the "now it's an ordinary string, now it's a UTF-8 string, now it's an ordinary string again" approach doesn't work. more on this in a later post. (am I the only one here that has actually tried to write code that handles both unicode strings and ordinary strings? if not, can anyone tell me what I'm doing wrong?) > Throwing an exception would at the very least force the user to make a > decision one way or the other about what they want to do with the data. > They might want to do a codepage translation, or something else. (aka Hey, > here's a bug I just found for you!) > In what other cases are you suddenly returning a Unicode string object from > which previouslly returned a string object? if unicode is ever to be a real string type in python, and not just a nifty extension type, it must be okay to return a unicode string from any operation that involves a unicode argument... From billtut at microsoft.com Sat Apr 8 08:24:06 2000 From: billtut at microsoft.com (Bill Tutt) Date: Fri, 7 Apr 2000 23:24:06 -0700 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF04@RED-MSG-50> > From: Fredrik Lundh [mailto:fredrik at pythonware.com] > > Bill Tutt wrote: > > > There has been a bug report about the treatment of Unicode > > > objects together with 8-bit format strings. The current > > > implementation converts the Unicode object to UTF-8 and then > > > inserts this value in place of the %s.... > > > > > > I'm inclined to change this to have '...%s...' % u'abc' > > > return u'...abc...' since this is just another case of > > > coercing data to the "bigger" type to avoid information loss. > > > > > > Thoughts ? > > > > Suddenly returning a Unicode string from an operation that > was an 8-bit > > string is likely to give some code exterme fits of despondency. > > why is this different from returning floating point values from > operations involving integers and floats? > > > Converting to UTF-8 didn't give you any data loss, however > it certainly > > might be unexpected to now find UTF-8 characters in what > the user originally > > thought was a binary string containing whatever they had > wanted it to contain. > > the more I've played with this, the stronger my opinion that > the "now it's an ordinary string, now it's a UTF-8 string, now > it's an ordinary string again" approach doesn't work. more on > this in a later post. > Well, unicode string/UTF-8 string, but I definately agree with you. Pick one or the other and make the user convert betwixt the two. > (am I the only one here that has actually tried to write code > that handles both unicode strings and ordinary strings? if not, > can anyone tell me what I'm doing wrong?) > In C++, yes. :) Autoconverting into or out of unicode is bound to lead to trouble for someone. Look at the various messes that misused C++ operator overloading can get you into. Whether its the code that wasn't expecting UTF-8 in a normal string type, or a formatting operation that used to return a normal string type now returning a Unicode string. > > Throwing an exception would at the very least force the > user to make a > > decision one way or the other about what they want to do > with the data. > > They might want to do a codepage translation, or something > else. (aka Hey, > > here's a bug I just found for you!) > > > In what other cases are you suddenly returning a Unicode > string object from > > which previouslly returned a string object? > > if unicode is ever to be a real string type in python, and not just a > nifty extension type, it must be okay to return a unicode string from > any operation that involves a unicode argument... Err. I'm not sure what you're getting at here. If your saying that it'd be nice if we could ditch the current string type and just use the Unicode string type, then I agree with you. However, that doesn't mean you should change the semantics of an operation that existed before unicode came into the picture, since it would break backward compatability. +1 for '%s' % u'\u1234' throwing a TypeError exception. Bill From tim_one at email.msn.com Sat Apr 8 09:23:16 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 8 Apr 2000 03:23:16 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071935.PAA27541@eric.cnri.reston.va.us> Message-ID: <000001bfa12b$4f5501e0$6b2d153f@tim> [Guido] > Have a look at what Java does; it seems to be doing this right: > > & jpython > JPython 1.1 on java1.2 (JIT: sunwjit) > Copyright (C) 1997-1999 Corporation for National Research Initiatives > >>> import java.lang > >>> x = java.lang.Float(3.1416) > >>> x.toString() > '3.1416' > >>> That Java does this is not an accident: Guy Steele pushed for the same rules he got into Scheme, although a) The Java rules are much tighter than Scheme's. and b) He didn't prevail on this point in Java until version 1.1 (before then Java's double/float->string never produced more precision than ANSI C's default %g format, so was inadequate to preserve equality under I/O). I suspect there was more than a bit of internal politics behind the delay, as the 754 camp has never liked the "minimal width" gimmick(*), and Sun's C and Fortran numerics (incl. their properly-rounding libc I/O routines) were strongly influenced by 754 committee members. > Could it be as simple as converting x +/- one bit and seeing how many > differing digits there were? (Not that +/- one bit is easy to > calculate...) Sorry, it's much harder than that. See the papers (and/or David Gay's code) I referenced before. (*) Why the minimal-width gimmick is disliked: If you print a (32-bit) IEEE float with minimal width, then read it back in as a (64-bit) IEEE double, you may not get the same result as if you had converted the original float to a double directly. This is because "minimal width" here is *relative to* the universe of 32-bit floats, and you don't always get the same minimal width if you compute it relative to the universe of 64-bit doubles instead. In other words, "minimal width" can lose accuracy needlessly -- but this can't happen if you print the float to full precision instead. From mal at lemburg.com Sat Apr 8 11:51:32 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 08 Apr 2000 11:51:32 +0200 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> Message-ID: <38EF0124.F5032CB2@lemburg.com> Bill Tutt wrote: > > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Suddenly returning a Unicode string from an operation that was an 8-bit > string is likely to give some code exterme fits of despondency. > > Converting to UTF-8 didn't give you any data loss, however it certainly > might be unexpected to now find UTF-8 characters in what the user originally > thought was > a binary string containing whatever they had wanted it to contain. Well, the design is to always coerce to Unicode when 8-bit string objects and Unicode objects meet. This is done for all string methods and that's the reason I'm also implementing this for %-formatting (internally this is just another string method). > Throwing an exception would at the very least force the user to make a > decision one way or the other about what they want to do with the data. > They might want to do a codepage translation, or something else. (aka Hey, > here's a bug I just found for you!) True; but Guido's intention was to have strings and Unicode interoperate without too much user intervention. > In what other cases are you suddenly returning a Unicode string object from > which previouslly returned a string object? All string methods automatically coerce to Unicode when they see a Unicode argument, e.g. " ".join(("abc", u"def")) will return u"abc def". -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Sat Apr 8 13:01:00 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 8 Apr 2000 13:01:00 +0200 (CEST) Subject: [Python-Dev] python -O weirdness In-Reply-To: <38EE2B02.1E6F3CB8@tismer.com> from "Christian Tismer" at Apr 07, 2000 08:37:54 PM Message-ID: <200004081101.NAA28756@python.inrialpes.fr> > > > [GvR] > > > ... > > > > The function name is not taken into account for the comparison. Maybe > > > > it should? > > > > > > [CT] > > > Absolutely, please! > > > > [VM] > > Honestly, no. -O is used for speed, so showing the wrong symbols is > > okay. It's the same in C. > > [CT] > Not ok, IMHO. If the name is not guaranteed to be valid, why > should it be there at all? If I write code that relies on > inspecting those things, then I'm hosed. I think that you don't want to rely on inspecting the symbol<->code bindings of an optimized program. In general. Python is different in this regard, though, because of the standard introspection facilities. One expects that f.func_code.co_name == 'f' is always true, although it's not for -O. A perfect example of a name `conflict' due to object sharing. The const array optimization is well known. It folds object constants which have the same value. In this particular case, however, they don't have the same value, because of the hardcoded function name. So in the end, it turns out that Chris is right (although not for the same reason ;-) and it would be nice to fix code_compare. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Sun Apr 9 03:26:23 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 8 Apr 2000 21:26:23 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071714.TAA27347@python.inrialpes.fr> Message-ID: <000001bfa1c2$9e403a80$18a2143f@tim> [Vladimir Marangozov] > I'm not sure that this will be of any interest to you, number crunchers, > but a research team in computer arithmetics here reported some major > results lately: they claim that they "solved" the Table Maker's Dilemma > for most common functions in IEEE-754 double precision arithmetic. > (and no, don't ask me what this means ;-) Back in the old days, some people spent decades making tables of various function values. A common way was to laboriously compute high-precision values over a sparse grid, using e.g. series expansions, then extend that to a fine grid via relatively simple interpolation formulas between the high-precision results. You have to compute the sparse grid to *some* "extra" precision in order to absorb roundoff errors in the interpolated values. The "dilemma" is figuring out how *much* extra precision: too much and it greatly slows the calculations, too little and the interpolated values are inaccurate. The "problem cases" for a function f(x) are those x such that the exact value of f(x) is very close to being exactly halfway between representable numbers. In order to round correctly, you have to figure out which representable number f(x) is closest to. How much extra precision do you need to use to resolve this correctly in all cases? Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3. That's not enough to know whether it *should* round to 41 or 42. So you need to try again with more precision. But how much? You might try 5 digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next? Might be a waste of effort. Try 20 next? Might *still* not be enough -- or could just as well be that 7 would have been enough and you did 10x the work you needed to do. Etc. It turns out that for most functions there's no general way known to answer the "how much?" question in advance: brute force is the best method known. For various IEEE double precision functions, so far it's turned out that you need in the ballpark of 40-60 extra accurate bits (beyond the native 53) in order to round back correctly to 53 in all cases, but there's no *theory* supporting that. It *could* require millions of extra bits. For those wondering "why bother?", the practical answer is this: if a std could require correct rounding, functions would be wholly portable across machines ("correctly rounded" is precisely defined by purely mathematical means). That's where IEEE-754 made its huge break with tradition, by requiring correct rounding for + - * / and sqrt. The places it left fuzzy (like string<->float, and all transcendental functions) are the places your program produces different results when you port it. Irritating one: MS VC++ on Intel platforms generates different code for exp() depending on the optimization level. They often differ in the last bit they compute. This wholly accounts for why Dragon's speech recognition software sometimes produces subtly (but very visibly!) different results depending on how it was compiled. Before I got tossed into this pit, it was assumed for a year to be either a -O bug or somebody fetching uninitialized storage. that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 06:39:09 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 00:39:09 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <000101bfa1dd$8c382f80$172d153f@tim> [Guido van Rossum] > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... [Ka-Ping Yee] > You do NOT want this. > > I'm against this change -- quite strongly, in fact. Relax, nobody wants that. The fact is that neither str() nor repr() is reasonable today for use at the interactive prompt. repr() *appears* adequate only so long as you stick to the builtin types, where the difference between repr() and str() is most often non-existent(!). But repr() has driven me (& not only me) mad for years at the interactive prompt in my own (and extension) types, since a *faithful* representation of a "large" object is exactly what you *don't* want to see scrolling by. You later say (echoing Donn Cave) > repr() is for the human, not for the machine but that contradicts the docs and the design. What you mean to say is "the thing that the interactive prompt uses by default *should* be for the human, not for the machine" -- which repr() is not. That's why repr() sucks here, despite that it's certainly "more for the human" than a pickle is. str() isn't suitable either, alas, despite that (by design and by the docs) it was *intended* to be, at least because str() on a container invokes repr() on the containees. Neither str() nor repr() can be used to get a human-friendly string form of nested objects today (unless, as is increasingly the *practice*, people misuse __repr__() to do what __str__() was *intended* to do -- c.f. Guido's complaint about that). > ... > Have repr() use triple-quotes when strings contain newlines > if you like, but do *not* hide the fact that the thing being > displayed is a string. Nobody wants to hide this (or, if someone does, set yourself up for merciless poking before it's too late). > ... > Getting the representation of objects from the interpreter provides > a very important visual cue: you can usually tell just by looking > at the first character what kind of animal you've got. A digit means > it's a number; a quote means a string; "[" means a list; "(" means a > tuple; "{" means a dictionary; "<" means an instance or a special > kind of object. Switching to str() instead of repr() completely > breaks this property so you have no idea what you are getting. > Intuitions go out the window. This is way oversold: str() also supplies "[" for lists, "(" for tuples, "{" for dicts, and "<" for instances of classes that don't override __str__. The only difference between repr() and str() in this listing of faux terror is when they're applied to strings. > Granted, repr() cannot always produce an exact reconstruction of an > object. repr() is not a serialization mechanism! To the contrary, many classes and types implement repr() for that very purpose. It's not universal but doesn't need to be. > We have 'pickle' for that. pickles are unreadable by humans; that's why repr() is often preferred. > ... > As a corollary, here is an important property of repr() that > i think ought to be documented and preserved: > > eval(repr(x)) should produce an object with the same value > and state as x, or it should cause a SyntaxError. > > We should avoid ever having it *succeed* and produce the *wrong* x. Fine by me. > ... > Honestly i'm really surprised that such a convoluted hack as the > suggestion to "special-case the snot out of strings" would come > from Tim, and more surprised that it actually got so much airtime. That thread tapped into real and widespread unhappiness with what's displayed at an interactive prompt today. That's why it got so much airtime -- no mystery there. As above, your objections to str() reduce to its behavior for strings specifically (I have more objections than just that -- str() should "get passed down" too), hence "str() special-casing the snot out of strings" was a direct hack to address that specific complaint. > Doing this special-case mumbo-jumbo would be even worse! Look: > > (in a hypothetical Python-with-snotless-str()...) > > >>> a = '\\' > >>> b = '\'' I'd actually like to use euroquotes for str(string) -- don't throw the Latin-1 away with your outrage . Whatever, examples with backslashes are non-starters, since newbies can't make any sense out of their doubling under repr() today either (if it's not a FAQ, it should be -- I've certainly had to explain it often enough!). > ...much later... > > >>> a > '\' > >>> '\' > File "", line 1 > '\' > ^ > SyntaxError: invalid token > > (at this point i am envisioning the user screaming, "But that's > what YOU said!") Nobody ever promised that eval(str(x)) == x -- if they want that, they should use repr() or backticks. Today they get >>> a '\\' and scream "Huh?! I thought that was only supposed to be ONE backslash!". Or someone in Europe tries to look at a list of strings, or a simple dict keyed by names, and gets back a god-awful mish-mash of octal backslash escapes (and str() can't be used today to stop that either, since str() "isn't passed down"). Compared to that, confusion over explicit backslashes strikes me as trivial. > [various examples of ambiguous output] That's why it's called a hack . Last time I corresponded with Guido about it, he was leaning toward using angle brackets (<>) instead. That would take away the temptation to believe you should be able to type the same thing back in and have it do something reasonable. > Tim's snot-removal algorithm forces the user to *infer* the rules > of snot removal, remember them, and tentatively apply them to > everything they see (since they still can't be sure whether snot > has been removed from what they are seeing). Not at all. "Tim's snot-removal algorithm" didn't remove anything ("removal" is an adjective I don't believe I've seen applied to it before). At the time it simply did str() and stuck a pair of quotes around the result. The (passed down) str() was the important part; how it's decorated to say "and, btw, it's a string" is the teensy tail of a flea that's killing the whole dog <0.9 wink>. If we had Latin-1, we could use euroquotes for this. If we had control over the display, we could use a different color or font. If we stick to 7-bit ASCII, we have to do *something* irritating. So here's a different idea for SSCTSOOS: escape quote chars and backslashes (like repr()) as needed, but leave everything else alone (like str()). Then you can have fun stringing N adjacent backslashes together , and other people can use non-ASCII characters without going mad. What I want *most*, though, is for ssctsoos() to get passed down (from container to containee), and for it to be the default action. > ... > As for the suggestion to add an interpreter hook to __builtins__ > such that you can supply your own display routine, i'm all for it. > Great idea there. Same here! But I reject going on from there to say "and since Python lets you do it yourself, Python isn't obligated to try harder itself". anything-to-keep-octal-escapes-out-of-a-unicode-world-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 06:39:17 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 00:39:17 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: <200004072014.QAA27700@eric.cnri.reston.va.us> Message-ID: <000201bfa1dd$90581800$172d153f@tim> [Guido] > ... > We still have a dilemma though... People using the interactive > interpreter to perform some specific task (e.g. NumPy users), rather > than to learn about Python, want str(), and actually I agree with them > there. And if they're using something fancier than NumPy arrays, they want str() to get passed down from containers to containees too. BTW, boosting the number of digits repr displays is likely to make NumPy users even unhappier so long as repr() is used at the prompt (they'll be very happy to be able to transport doubles exactly across machines via repr(), but won't want to see all the noise digits all the time). > How can we give everybody what they want? More than one display function, user-definable and user-settable, + a change in the default setting. From gstein at lyra.org Sun Apr 9 11:28:18 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 9 Apr 2000 02:28:18 -0700 (PDT) Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: <200004071313.JAA27132@eric.cnri.reston.va.us> Message-ID: On Fri, 7 Apr 2000, Guido van Rossum wrote: > > Something that just struck me: couldn't we use a couple of bits in the > > PYTHON_API_VERSION to check various other things that make dynamic modules > > break? WITH_THREAD is the one I just ran in to, but there's a few others such > > as the object refcounting statistics and platform-dependent things like the > > debug/nodebug compilation on Windows. > > I'm curious what combination didn't work? The thread APIs are > supposed to be designed so that all combinations work -- the APIs are > always present, they just don't do anything in the unthreaded > version. If an extension is compiled without threads, well, then it > won't release the interpreter lock, of course, but otherwise there > should be no bad effects. But if you enable "free threading" or "trace refcounts", then the combinations will not work. This is because these two options modify very basic things like Py_INCREF/DECREF. To help prevent mismatches, they do some monkey work with redefining a Python symbol (the InitModule thingy). Jack's idea of using PYTHON_API_VERSION is a cleaner approach to preventing imcompatibilities. > The debug issue on Windows is taken care of by a DLL naming > convention: the debug versions are named spam_d.dll (or .pyd). It would be nice to have it at the code level, too. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping at lfw.org Sun Apr 9 12:46:41 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 03:46:41 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000201bfa0fb$9af44b40$bc2d153f@tim> Message-ID: In a previous message, i wrote: > > It's very jarring to type something in, and have the interpreter > > give you back something that looks very different. [...] > > It breaks a fundamental rule of consistency, and that damages the user's > > trust in the system or their understanding of the system. Then on Fri, 7 Apr 2000, Tim Peters replied: > If they're surprised by this, they indeed don't understand the arithmetic at > all! This is an argument for using a different form of arithmetic, not for > lying about reality. This is not lying! If you type in "3.1416" and Python says "3.1416", then indeed it is the case that "3.1416" is a correct way to type in the floating-point number being expressed. So "3.1415999999999999" is not any more truthful than "3.1416" -- it's just more annoying. I just tried this in Python 1.5.2+: >>> .1 0.10000000000000001 >>> .2 0.20000000000000001 >>> .3 0.29999999999999999 >>> .4 0.40000000000000002 >>> .5 0.5 >>> .6 0.59999999999999998 >>> .7 0.69999999999999996 >>> .8 0.80000000000000004 >>> .9 0.90000000000000002 Ouch. I wrote: > > (What do you do then, start explaining the IEEE double representation > > to your CP4E beginner?) Tim replied: > As above. repr() shouldn't be used at the interactive prompt anyway (but > note that I did not say str() should be). What, then? Introduce a third conversion routine and further complicate the issue? I don't see why it's necessary. I wrote: > > What should really happen is that floats intelligently print in > > the shortest and simplest manner possible Tim replied: > This can be done, but only if Python does all fp I/O conversions entirely on > its own -- 754-conforming libc routines are inadequate for this purpose Not "all fp I/O conversions", right? Only repr(float) needs to be implemented for this particular purpose. Other conversions like "%f" and "%g" can be left to libc, as they are now. I suppose for convenience's sake it may be nice to add another format spec so that one can ask for this behaviour from the "%" operator as well, but that's a separate issue (perhaps "%r" to insert the repr() of an argument of any type?). > For background and code, track down "How To Print Floating-Point Numbers > Accurately" by Steele & White, and its companion paper (s/Print/Read/) Thanks! I found 'em. Will read... I suggested: > > def smartrepr(x): > > p = 17 > > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > > return '%%.%df' % p % x Tim replied: > This merely exposes accidents in the libc on the specific platform you run > it. That is, after > > print smartrepr(x) > > on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not > yield the same number platform A started with. That is not repr()'s job. Once again: repr() is not for the machine. It is not part of repr()'s contract to ensure the kind of platform-independent conversion you're talking about. It prints out the number in a way that upholds the eval(repr(x)) == x contract for the system you are currently interacting with, and that's good enough. If you wanted platform-independent serialization, you would use something else. As long as the language reference says "These represent machine-level double precision floating point numbers. You are at the mercy of the underlying machine architecture and C implementation for the accepted range and handling of overflow." and until Python specifies the exact sizes and behaviours of its floating-point numbers, you can't expect these kinds of cross-platform guarantees anyway. Here are the expectations i've come to have: str()'s contract: - if x is a string, str(x) == x - otherwise, str(x) is a reasonable string coercion from x repr()'s contract: - if repr(x) is syntactically valid, eval(repr(x)) == x - repr(x) displays x in a safe and readable way - for objects composed of basic types, repr(x) reflects what the user would have to say to produce x pickle's contract: - pickle.dumps(x) is a platform-independent serialization of the value and state of object x -- ?!ng From ping at lfw.org Sun Apr 9 12:33:00 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 03:33:00 -0700 (PDT) Subject: [Python-Dev] str() for interpreter output In-Reply-To: <000101bfa1dd$8c382f80$172d153f@tim> Message-ID: On Sun, 9 Apr 2000, Tim Peters wrote: > You later say (echoing Donn Cave) > > > repr() is for the human, not for the machine > > but that contradicts the docs and the design. What you mean to say > is "the thing that the interactive prompt uses by default *should* be for > the human, not for the machine" -- which repr() is not. No, what i said is what i said. Let's try this again: repr() is not for the machine. The documentation for __repr__ says: __repr__(self) Called by the repr() built-in function and by string conversions (reverse quotes) to compute the "official" string representation of an object. This should normally look like a valid Python expression that can be used to recreate an object with the same value. It only suggests that the output "normally look like a valid Python expression". It doesn't require it, and certainly doesn't imply that __repr__ should be the standard way to turn an object into a platform-independent serialization. > This is way oversold: str() also supplies "[" for lists, "(" for tuples, > "{" for dicts, and "<" for instances of classes that don't override __str__. > The only difference between repr() and str() in this listing of faux terror > is when they're applied to strings. Right, and that is exactly the one thing that breaks everything: because strings are the most dangerous things to display raw, they can appear like anything, and break all the rules in one fell swoop. > > Granted, repr() cannot always produce an exact reconstruction of an > > object. repr() is not a serialization mechanism! > > To the contrary, many classes and types implement repr() for that very > purpose. It's not universal but doesn't need to be. If they want to, that's fine. In general, however, repr() is not for the machine. If you are using repr(), it's because you are expecting a human to look at the thing at some point. > > We have 'pickle' for that. > > pickles are unreadable by humans; that's why repr() is often preferred. Precisely. You just said it yourself: repr() is for humans. That is why repr() cannot be mandated as a serialization mechanism. There are two goals at odds here: readability and serialization. You can't have both, so you must prioritize. Pickles are more about serialization than about readability; repr is more about readability than about serialization. repr() is the interpreter's way of communicating with the human. It makes sense that e.g. the repr() of a string that you see printed by the interpreter looks just like what you would type in to produce the same string, because the interpreter and the human should speak and understand the same language as much as possible. > > >>> a = '\\' > > >>> b = '\'' > > I'd actually like to use euroquotes for str(string) -- don't throw the > Latin-1 away with your outrage . And no, even if you argue that we need to have something else, whatever you want to call it, it's not called 'str'. 'str' is "coerce to string". If you coerce an object into the type it's already in, it must not change. So, if x is a string, then str(x) must == x. > Whatever, examples with backslashes > are non-starters, since newbies can't make any sense out of their doubling > under repr() today either (if it's not a FAQ, it should be -- I've certainly > had to explain it often enough!). It may not be easy, but at least it's *consistent*. Eventually, you can't avoid the problem of escaping characters, and you just have to learn how that works, and that's that. Introducing yet a different way of escaping things won't help. Or, to put it another way: to write Python, it is required that you understand how to read and write escaped strings. Either you learn just that, or you learn that plus another, different way to read escaped-strings-as-printed-by-the-interpreter. The second case clearly requires you to learn and remember more. > Nobody ever promised that eval(str(x)) == x -- if they want that, they > should use repr() or backticks. Today they get > > >>> a > '\\' > > and scream "Huh?! I thought that was only supposed to be ONE backslash!". You have to understand this at some point. You can't get around it. Changing the way the interpreter prints things won't save anyone the trouble of learning it. > Or someone in Europe tries to look at a list of strings, or a simple dict > keyed by names, and gets back a god-awful mish-mash of octal backslash > escapes (and str() can't be used today to stop that either, since str() > "isn't passed down"). This is a pretty sensible complaint to me. I don't use characters beyond 0x7f often, but i can empathize with the hassle. As you suggested, this could be solved by having the built-in container types do something nicer with str(), such as repr without escaping characters beyond 0x7f. (However, characters below 0x20 are definitely dangerous to the terminal, and would have to be escaped regardless.) > Not at all. "Tim's snot-removal algorithm" didn't remove anything > ("removal" is an adjective I don't believe I've seen applied to it before). Well, if you "special-case the snot OUT of strings", then you're removing snot, aren't you? :) > What I want *most*, though, is for ssctsoos() to get passed down (from > container to containee), and for it to be the default action. Getting it passed down as str() seems okay to me. Making it the default action, in my (naturally) subjective opinion, is Right Out if it means that eval(what_the_interpreter_prints_for(x)) == x no longer holds for objects composed of the basic built-in types. -- ?!ng From tismer at tismer.com Sun Apr 9 15:07:53 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 09 Apr 2000 15:07:53 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: Message-ID: <38F080A9.16DE05B8@tismer.com> Ok, just a word (carefully:) Ka-Ping Yee wrote: ... > I just tried this in Python 1.5.2+: > > >>> .1 > 0.10000000000000001 > >>> .2 > 0.20000000000000001 > >>> .3 > 0.29999999999999999 Agreed that this is not good. ... > repr()'s contract: > - if repr(x) is syntactically valid, eval(repr(x)) == x > - repr(x) displays x in a safe and readable way > - for objects composed of basic types, repr(x) reflects > what the user would have to say to produce x This sounds reasonable. BTW my problem did not come up by typing something in, but I just rounded a number down to 3 digits past the dot. Then, as usual, I just let the result drop from the prompt, without prefixing it with "print". repr() was used, and the result was astonishing. Here is the problem, as I see it: You say if you type 3.1416, you want to get exactly this back. But how should Python know that you typed it in? Same in my case: I just rounded to 3 digits, but how should Python know about this? And what do you expect when you type in 3.14160, do you want the trailing zero preserved or not? Maybe we would need to carry exactness around for numbers. Or even have a different float type for cases where we want exact numbers? Keyboard entry and rounding produce exact numbers. Simple operations between exact numbers would keep exactness, higher level functions would probably not. I think we dlved into a very difficult domain here. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From ping at lfw.org Sun Apr 9 19:24:07 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 10:24:07 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F080A9.16DE05B8@tismer.com> Message-ID: On Sun, 9 Apr 2000, Christian Tismer wrote: > Here is the problem, as I see it: > You say if you type 3.1416, you want to get exactly this back. > But how should Python know that you typed it in? > Same in my case: I just rounded to 3 digits, but how > should Python know about this? > > And what do you expect when you type in 3.14160, do you want > the trailing zero preserved or not? It's okay for the zero to go away, because it doesn't affect the value of the number. (Carrying around a significant-digit count or error range with numbers is another issue entirely, and a very thorny one at that.) I think "fewest digits needed to distinguish the correct value" will give good and least-surprising results here. This method guarantees: - If you just type a number in and the interpreter prints it back, it will never respond with more junk digits than you typed. - If you type in what the interpreter displays for a float, you can be assured of getting the same value. > Maybe we would need to carry exactness around for numbers. > Or even have a different float type for cases where we want > exact numbers? Keyboard entry and rounding produce exact numbers. If you mean a decimal representation, yes, perhaps we need to explore that possibility a little more. -- ?!ng "All models are wrong; some models are useful." -- George Box From tismer at tismer.com Sun Apr 9 20:53:51 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 09 Apr 2000 20:53:51 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: Message-ID: <38F0D1BF.E5ECA4E5@tismer.com> Ka-Ping Yee wrote: > > On Sun, 9 Apr 2000, Christian Tismer wrote: > > Here is the problem, as I see it: > > You say if you type 3.1416, you want to get exactly this back. > > But how should Python know that you typed it in? > > Same in my case: I just rounded to 3 digits, but how > > should Python know about this? > > > > And what do you expect when you type in 3.14160, do you want > > the trailing zero preserved or not? > > It's okay for the zero to go away, because it doesn't affect > the value of the number. (Carrying around a significant-digit > count or error range with numbers is another issue entirely, > and a very thorny one at that.) > > I think "fewest digits needed to distinguish the correct value" > will give good and least-surprising results here. This method > guarantees: Hmm, I hope I understood. Oh, wait a minute! What is the method? What is the correct value? If I type >>> 0.1 0.10000000000000001 >>> 0.10000000000000001 0.10000000000000001 >>> There is only one value: The one which is in the machine. Would you think it is ok to get 0.1 back, when you actually *typed* 0.10000000000000001 ? -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim_one at email.msn.com Sun Apr 9 21:42:11 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 15:42:11 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F080A9.16DE05B8@tismer.com> Message-ID: <000101bfa25b$b39567e0$812d153f@tim> [Christian Tismer] > ... > Here is the problem, as I see it: > You say if you type 3.1416, you want to get exactly this back. > > But how should Python know that you typed it in? > Same in my case: I just rounded to 3 digits, but how > should Python know about this? > > And what do you expect when you type in 3.14160, do you want > the trailing zero preserved or not? > > Maybe we would need to carry exactness around for numbers. > Or even have a different float type for cases where we want > exact numbers? Keyboard entry and rounding produce exact numbers. > Simple operations between exact numbers would keep exactness, > higher level functions would probably not. > > I think we dlved into a very difficult domain here. "This kind of thing" is hopeless so long as Python uses binary floating point. Ping latched on to "shortest" conversion because it appeared to solve "the problem" in a specific case. But it doesn't really solve anything -- it just shuffles the surprises around. For example, >>> 3.1416 - 3.141 0.00059999999999993392 >>> Do "shorest conversion" (relative to the universe of IEEE doubles) instead, and it would print 0.0005999999999999339 Neither bears much syntactic resemblance to the 0.0006 the numerically naive "expect". Do anything less than the 16 significant digits shortest conversion happens to produce in this case, and eval'ing the string won't return the number you started with. So "0.0005999999999999339" is the "best possible" string repr can produce (assuming you think "best" == "shortest faithful, relative to the platform's universe of possibilities", which is itself highly debatable). If you don't want to see that at the interactive prompt, one of two things has to change: A) Give up on eval(repr(x)) == x for float x, even on a single machine. or B) Stop using repr by default. There is *no* advantage to #A over the long haul: lying always extracts a price, and unlike most of you , I appeared to be the lucky email recipient of the passionate gripes about repr(float)'s inadequacy in 1.5.2 and before. Giving a newbie an illusion of comfort at the cost of making it useless for experts is simply nuts. The desire for #B pops up from multiple sources: people trying to use native non-ASCII chars in strings; people just trying to display docstrings without embedded "\012" (newline) and "\011" (tab) escapes; and people using "big" types (like NumPy arrays or rationals) where repr() can produce unboundedly more info than the interactive user typically wants to see. It *so happens* that str() already "does the right thing" in all 3 of the last three points, and also happens to produce "0.0006" for the example above. This is why people leap to: C) Use str by default instead of repr. But str doesn't pass down to containees, and *partly* does a wrong thing when applied to strings, so it's not suitable either. It's *more* suitable than repr, though! trade-off-ing-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 21:42:19 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 15:42:19 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <000201bfa25b$b7e7ab00$812d153f@tim> [Ping] > No, what i said is what i said. > > Let's try this again: > > repr() is not for the machine. Ping, believe me, I heard that the first 42 times . If it wasn't clear before, I'll spell it out: we don't agree on this, and I didn't agree with Donn Cave when he first went down this path. repr() is a noble attempt to be usable by both human and machine. > The documentation for __repr__ says: > > __repr__(self) Called by the repr() built-in function and by > string conversions (reverse quotes) to compute the "official" > string representation of an object. This should normally look > like a valid Python expression that can be used to recreate an > object with the same value. Additional docs are in the Built-in Functions section of the Library Ref (for repr() and str()). > It only suggests that the output "normally look like a valid > Python expression". It doesn't require it, and certainly doesn't > imply that __repr__ should be the standard way to turn an object > into a platform-independent serialization. Alas, the docs for repr and str are vague to the point of painfulness. Guido's *intent* is more evident in later c.l.py posts, and especially in what the implementation *does*: for at least all of ints, longs, floats, complex numbers and strings, and dicts, lists and tuples composed of those recursively, the 1.6 repr produces a faithful and platform-independent eval'able string composed of 7-bit ASCII printable characters. For floats and complex numbers, bit-for-bit reproducibility relies on the assumption that the platforms are IEEE-754, but all current Windows, Mac and Unix platforms (even Psion's EPOC32) *are*. So when you later say > There are two goals at odds here: readability and serialization. > You can't have both, sorry, but the 1.6 repr() implementation already meets both goals for a great many builtin types (as well as for dozens of classes & types I've implemented, and likely hundreds of classes & types others have implemented -- and there would be twice as many if people weren't abusing repr() to do what str() was intended to do so that the interactive prompt hehaves reasonably). > If you are using repr(), it's because you are expecting a human to > look at the thing at some point. Often, yes. More often it's because I expect a human to *edit* it (dump repr to a text file, fiddle it, then read it back in and eval it -- poor man's database), which they can't reasonably be expected to do with a pickle. Often also it's just a way to send a data structure in email, without needing to attach tedious instructions for how to use pickle to decipher it. >> pickles are unreadable by humans; that's why repr() is often preferred. > Precisely. You just said it yourself: repr() is for humans. *Partly*, yes. You assume an either/or here that I reject: repr() works best when it's designed for both == as Python itself does whenever possible. > That is why repr() cannot be mandated as a serialization mechanism. I haven't suggested to mandate it. It's a goal, and one which is often achievable, and appreciated when it is achieved. Nobody expects repr() to capture the state of an open file object -- but then they don't expect pickle to do that either . > There are two goals at odds here: readability and serialization. > You can't have both, so you must prioritize. Pickles are more > about serialization than about readability; repr is more about > readability than about serialization. Pickles are more about *efficient* machine serialization, sacrificing all readability to run as fast as possible. Sometimes that's the best choice; other times not. > repr() is the interpreter's way of communicating with the human. It is *a* way, sure, but for things like NumPy arrays and Rationals (and probably also for IEEE doubles) it's rarely the *best* way. > It makes sense that e.g. the repr() of a string that you see > printed by the interpreter looks just like what you would type > in to produce the same string, Yes, that's repr's job. But it's often *not* what the interactive user *wants*. You don't want it either! You later say > Right Out if it means that > > eval(what_the_interpreter_prints_for(x)) == x > > no longer holds for objects composed of the basic built-in types. and that implies the shortest string the prompt can display for 3.1416 - 3.141 is 0.0005999999999999339 (see reply to Christian for details on that example). Do you really want to get that string at the prompt? If you have a NumPy array with a million elements, do you really want the interpreter to display all of them -- and in ~17 different widths? If you're using one of my Rational classes, do you really want to see a ratio of multi-thousand digit longs instead of a nice 12-digit floating approximation? I use the interactive prompt a *lot* -- the current behavior plain sucks, starting about 10 minutes after you finish the Python Tutorial <0.7 wink>. > And no, even if you argue that we need to have something else, > whatever you want to call it, it's not called 'str'. Yes, I've said repeatedly that both str() and repr() are unsuitable. That's where SSCTSOOS started, as str() is *more* suitable for more people more of the time than is repr() -- but still isn't enough. > ... > Or, to put it another way: to write Python, it is required that > you understand how to read and write escaped strings. Either > you learn just that, or you learn that plus another, different > way to read escaped-strings-as-printed-by-the-interpreter. The > second case clearly requires you to learn and remember more. You need to learn whatever it takes to get the job done. Since the current alternatives do not get the job done, yes, if anything is ever introduced that *does* get the job done, there's more to learn. Complexity isn't necessarily evil; gratuitous complexity is evil. > ... > (However, characters below 0x20 are definitely dangerous to the terminal, > and would have to be escaped regardless.) They're no danger on any platform I use, and at least in MS-DOS they're mapped to useful graphics characters. Python has no way to know what's dangerous, and gets in the way by trying to guess. Even if x does have control characters that are dangerous, the user will get screwed as soon as they do print x unless you want (the implied) str() to start escaping "dangerous" characters too. Safety and usefulness are definitely at odds here, and I favor usefulness. If they want saftey, let 'em use Java . > Getting it passed down as str() seems okay to me. Making it > the default action, in my (naturally) subjective opinion, is > Right Out if it means that > > eval(what_the_interpreter_prints_for(x)) == x > > no longer holds for objects composed of the basic built-in types. Whereas in my daily use, this property is usually a *wrong* thing to shoot for at an interactive prompt (but is a great thing for repr() to shoot for). When I want eval'ability, it's just a pair of backticks away; by default, I'd rather see something *friendly*. If I type "ping" at the prompt, I don't want to see a second-by-second account of your entire life history . the-best-thing-to-do-with-most-info-is-to-suppress-it-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 22:14:17 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:14:17 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000301bfa260$2f161640$812d153f@tim> [Tim] >> If they're surprised by this, they indeed don't understand the >> arithmetic at all! This is an argument for using a different form of >> arithmetic, not for lying about reality. > This is not lying! Yes, I overstated that. It's not lying, but I defy anyone to explain the full truth of it in a way even Guido could understand <0.9 wink>. "Shortest conversion" is a subtle concept, requiring knowledge not only of the mathematical value, but of details of the HW representation. Plain old "correct rounding" is HW-independent, so is much easier to *fully* understand. And in things floating-point, what you don't fully understand will eventually burn you. Note that in a machine with 2-bit floating point, the "shortest conversion" for 0.75 is the string "0.8": this should suggest the sense in which "shortest conversion" can be actively misleading too. > If you type in "3.1416" and Python says "3.1416", then indeed it is the > case that "3.1416" is a correct way to type in the floating-point number > being expressed. So "3.1415999999999999" is not any more truthful than > "3.1416" -- it's just more annoying. Yes, shortest conversion is *defensible*. But Python has no code to implement that now, so it's not an option today. > I just tried this in Python 1.5.2+: > > >>> .1 > 0.10000000000000001 > >>> .2 > 0.20000000000000001 > >>> .3 > 0.29999999999999999 > >>> .4 > 0.40000000000000002 > >>> .5 > 0.5 > >>> .6 > 0.59999999999999998 > >>> .7 > 0.69999999999999996 > >>> .8 > 0.80000000000000004 > >>> .9 > 0.90000000000000002 > > Ouch. As shown in my reply to Christian, shortest conversion is not a cure for this "gosh, it printed so much more than I expected it to"; it only appears to "fix it" in the simplest examples. So long as you want eval(what's_diplayed) == what's_typed, this is unavoidable. The only ways to avoid that are to use a different arithmetic, or stop using repr() at the prompt. >> As above. repr() shouldn't be used at the interactive prompt >> anyway (but note that I did not say str() should be). > What, then? Introduce a third conversion routine and further > complicate the issue? I don't see why it's necessary. Because I almost never want current repr() or str() at the prompt, and even you don't want 3.1416-3.141 to display 0.0005999999999999339 (which is the least you can print and have eval return the true answer). >>> What should really happen is that floats intelligently print in >>> the shortest and simplest manner possible >> This can be done, but only if Python does all fp I/O conversions >> entirely on its own -- 754-conforming libc routines are inadequate >> for this purpose > Not "all fp I/O conversions", right? Only repr(float) needs to > be implemented for this particular purpose. Other conversions > like "%f" and "%g" can be left to libc, as they are now. No, all, else you risk %f and %g producing results that are inconsistent with repr(), which creates yet another set of incomprehensible surprises. This is not an area that rewards half-assed hacks! I'm intimately familiar with just about every half-assed hack that's been tried here over the last 20 years -- they never work in the end. The only approach that ever bore fruit was 754's "there is *a* mathematically correct answer, and *that's* the one you return". Unfortunately, they dropped the ball here on float<->string conversions (and very publicly regret that today). > I suppose for convenience's sake it may be nice to add another > format spec so that one can ask for this behaviour from the "%" > operator as well, but that's a separate issue (perhaps "%r" to > insert the repr() of an argument of any type?). %r is cool! I like that. >>> def smartrepr(x): >>> p = 17 >>> while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 >>> return '%%.%df' % p % x >> This merely exposes accidents in the libc on the specific >> platform you run it. That is, after >> >> print smartrepr(x) >> >> on IEEE-754 platform A, reading that back in on IEEE-754 ?> platform B may not yield the same number platform A started with. > That is not repr()'s job. Once again: > > repr() is not for the machine. And once again, I didn't and don't agree with that, and, to save the next seven msgs, never will . > It is not part of repr()'s contract to ensure the kind of > platform-independent conversion you're talking about. It > prints out the number in a way that upholds the eval(repr(x)) == x > contract for the system you are currently interacting with, and > that's good enough. It's not good enough for Java and Scheme, and *shouldn't* be good enough for Python. The 1.6 repr(float) is already platform-independent across IEEE-754 machines (it's not correctly rounded on most platforms, but *does* print enough that 754 guarantees bit-for-bit reproducibility) -- and virtually all Python platforms are IEEE-754 (I don't know of an exception -- perhaps Python is running on some ancient VAX?). The std has been around for 15+ years, virtually all platforms support it fully now, and it's about time languages caught up. BTW, the 1.5.2 text-mode pickle was *not* sufficient for reproducing floats either, even on a single machine. It is now -- but thanks to the change in repr. > If you wanted platform-independent serialization, you would > use something else. There is nothing else. In 1.5.2 and before, people mucked around with binary dumps hoping they didn't screw up endianness. > As long as the language reference says > > "These represent machine-level double precision floating > point numbers. You are at the mercy of the underlying > machine architecture and C implementation for the accepted > range and handling of overflow." > > and until Python specifies the exact sizes and behaviours of > its floating-point numbers, you can't expect these kinds of > cross-platform guarantees anyway. There's nothing wrong with exceeding expectations . Despite what the reference manual says, virtually all machines use identical fp representations today (this wasn't true when the text above was written). > str()'s contract: > - if x is a string, str(x) == x > - otherwise, str(x) is a reasonable string coercion from x The last is so vague as to say nothing. My counterpart-- at least equally vague --is - otherwise, str(x) is a string that's easy to read and contains a compact summary indicating x's nature and value in general terms > repr()'s contract: > - if repr(x) is syntactically valid, eval(repr(x)) == x > - repr(x) displays x in a safe and readable way I would say instead: - every character c in repr(x) has ord(c) in range(32, 128) - repr(x) should strive to be easily readable by humans > - for objects composed of basic types, repr(x) reflects > what the user would have to say to produce x Given your first point, does this say something other than "for basic types, repr(x) is syntactically valid"? Also unclear what "basic types" means. > pickle's contract: > - pickle.dumps(x) is a platform-independent serialization > of the value and state of object x Since pickle can't handle all objects, this exaggerates the difference between it and repr. Give a fuller description, like - If pickle.dumps(x) is defined, pickle.loads(pickle.dumps(x)) == x and it's the same as the first line of your repr() contract, modulo s/syntactically valid/is defined/ s/eval/pickle.loads/ s/repr/pickle.dumps/ The differences among all these guys remain fuzzy to me. but-not-surprising-when-talking-about-what-people-like-to-look-at-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 22:14:25 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:14:25 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000401bfa260$33e6ff40$812d153f@tim> [Ping] > ... > I think "fewest digits needed to distinguish the correct value" > will give good and least-surprising results here. This method > guarantees: > > - If you just type a number in and the interpreter > prints it back, it will never respond with more > junk digits than you typed. Note the example from another reply of a machine with 2-bit floats. There the user would see: >>> 0.75 # happens to be exactly representable on this machine 0.8 # because that's the shortest string needed on this machine # to get back 0.75 internally >> This kind of surprise is inherent in the approach, not specific to 2-bit machines . BTW, I don't know that it will never print more digits than you type: did you prove that? It's plausible, but many plausible claims about fp turn out to be false. > - If you type in what the interpreter displays for a > float, you can be assured of getting the same value. This isn't of value for most interactive use -- in general you want to see the range of a number, not enough to get 53 bits exactly (that's beyond the limits of human "number sense"). It also has one clearly bad aspect: when printing containers full of floats, the number of digits printed for each will vary wildly from float to float. Makes for an unfriendly display. If the prompt's display function were settable, I'd probably plug in pprint! From tim_one at email.msn.com Sun Apr 9 22:25:19 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:25:19 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F0D1BF.E5ECA4E5@tismer.com> Message-ID: <000501bfa261$b9b5f3a0$812d153f@tim> [Christian] > Hmm, I hope I understood. > Oh, wait a minute! What is the method? What is the correct value? > > If I type > >>> 0.1 > 0.10000000000000001 > >>> 0.10000000000000001 > 0.10000000000000001 > >>> > > There is only one value: The one which is in the machine. > Would you think it is ok to get 0.1 back, when you > actually *typed* 0.10000000000000001 ? Yes, this is the kind of surprise I sketched with the "2-bit machine" example. It can get more surprising than the above (where, as you suspect, "shortest conversion" yields "0.1" for both -- which, btw, is why reading it back in to a float type with more precision loses accuracy needlessly, which in turn is why 754 True Believers dislike it). repetitively y'rs - tim From akuchlin at mems-exchange.org Mon Apr 10 00:00:24 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Sun, 9 Apr 2000 18:00:24 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <14573.61191.486890.43591@seahag.cnri.reston.va.us> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> <14573.61191.486890.43591@seahag.cnri.reston.va.us> Message-ID: <14576.64888.59263.386826@newcnri.cnri.reston.va.us> Fred L. Drake, Jr. writes: >maintained code. I would be surprised if Grail is the only large >application which uses "regex" for performance reasons, and we don't Zope is another, and there's even a ts_regex module hiding in Zope which tries to provide thread-safety on top of regex. --amk From tim_one at email.msn.com Mon Apr 10 04:40:03 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 22:40:03 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071918.PAA27474@eric.cnri.reston.va.us> Message-ID: <000401bfa296$13876e20$7da0143f@tim> [Moshe Zadka] > Just checking my newly bought "Guido Channeling" kit -- you mean str() > but special case the snot out of strings(TM), don't you [Guido] > Except I'm not sure what kind of special-casing should be happening. Welcome to the club. > Put quotes around it without worrying if that makes it a valid string > literal is one thought that comes to mind. If nothing else , Ping convinced me the temptation to type that back in will prove overwhelming. > Another approach might be what Tk's text widget does -- pass through > certain control characters (LF, TAB) and all (even non-ASCII) printing > characters, but display other control characters as \x.. escapes > rather than risk putting the terminal in a weird mode. This must be platform-dependent? Just tried this loop in Win95 IDLE, using Courier: >>> for i in range(256): print i, chr(i), Across the whole range, it just showed what Windows always shows in the Courier font (which is usually a (empty or filled) rectangle for most "control characters"). No \x escapes at all. BTW, note that Tk unhelpfully translates a request for "Courier New" into a request for "Courier", which aren't the same fonts under Windows! So if anyone tries this with the IDLE Windows defaults, and doesn't see all the special characters Windows assigns to the range 128-159 in Courier New, that's why -- most of them aren't assigned under Courier. > No quotes though. Hm, I kind of like this: when used as intended, it will > just display the text, with newlines and umlauts etc.; but when printing > binary gibberish, it will do something friendly. Can't be worse than what happens now . > There's also the issue of what to do with lists (or tuples, or dicts) > containing strings. If we agree on this: > > >>> "hello\nworld\n\347" # octal 347 is a cedilla > hello > world > ? > >>> I don't think there is agreement on this, because nothing in the output says "btw, this thing was a string". Is that worth preserving? "It depends" is the only answer I've got to that. > Then what should ("hello\nworld", "\347") show? I've got enough serious > complaints that I don't want to propose that it use repr(): > > >>> ("hello\nworld", "\347") > ('hello\nworld', '\347') > >>> > > Other possibilities: > > >>> ("hello\nworld", "\347") > ('hello > world', '?') > >>> > > or maybe > > >>> ("hello\nworld", "\347") > ('''hello > world''', '?') > >>> I like the last best. > Of course there's also the Unicode issue -- the above all assumes > Latin-1 for stdout. > > Still no closure, I think... It's curious how you invoke "closure" when and only when you don't know what *you* want to do . a-guido-divided-against-himself-cannot-stand-ly y'rs - tim From mhammond at skippinet.com.au Mon Apr 10 06:32:53 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 10 Apr 2000 14:32:53 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. Message-ID: [Im re-sending as the attachment caused this to be held up for administrative approval. Ive forwarded the attachement to Chris - anyone else just mail me for it] Ive struck a crash in the new trashcan mechanism (so I guess Chris is gunna pay the most attention here). Although I can only provoke this reliably in debug builds, I believe it also exists in release builds, but is just far more insidious. Unfortunately, I also can not create a simple crash case. But I _can_ provide info on how you can reliably cause the crash. Obviously only tested on Windows... * Go to http://lima.mudlib.org/~rassilon/p2c/, and grab the download, and unzip. * Replace "transformer.py" with the attached version (multi-arg append bites :-) * Ensure you have a Windows "debug" build available, built from CVS. * From the p2c directory, Run "python_d.exe gencode.py gencode.py" You will get a crash, and the debugger will show you are destructing a list, with an invalid object. The crash occurs about 1000 times after this code is first hit, and I can't narrow the crash condition down :-( If you open object.h, and disable the trashcan mechanism (by changing the "xx", as the comments suggest) then it runs fine. Hope this helps someone - Im afraid I havent a clue :-( Mark. From gstein at lyra.org Mon Apr 10 10:14:59 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 01:14:59 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils sysconfig.py In-Reply-To: <200004100117.VAA16514@kaluha.cnri.reston.va.us> Message-ID: Why aren't we getting diffs on these things? Is it because of the "distutils" root instead of the Python root? Just curious... thx, -g On Sun, 9 Apr 2000, Greg Ward wrote: > Update of /projects/cvsroot/distutils/distutils > In directory kaluha:/tmp/cvs-serv16499 > > Modified Files: > sysconfig.py > Log Message: > Added optional 'prefix' arguments to 'get_python_inc()' and > 'get_python_lib()'. > > > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Mon Apr 10 10:18:20 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 01:18:20 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils cmd.py In-Reply-To: <200004100130.VAA16590@kaluha.cnri.reston.va.us> Message-ID: [ damn... can't see the code... went and checked it out... ] On Sun, 9 Apr 2000, Greg Ward wrote: > Update of /projects/cvsroot/distutils/distutils > In directory kaluha:/tmp/cvs-serv16575 > > Modified Files: > cmd.py > Log Message: > Added a check for the 'force' attribute in '__getattr__()' -- better than > crashing when self.force not defined. This seems a bit silly. Why don't you simply define .force in the __init__ method? Better yet: make the other guys crash -- the logic is bad if they are using something that isn't supposed to be defined on that particular Command object. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov at inrialpes.fr Mon Apr 10 11:25:03 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 10 Apr 2000 11:25:03 +0200 (CEST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000001bfa1c2$9e403a80$18a2143f@tim> from "Tim Peters" at Apr 08, 2000 09:26:23 PM Message-ID: <200004100925.LAA03689@python.inrialpes.fr> Tim Peters wrote: > > Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit > arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3. > That's not enough to know whether it *should* round to 41 or 42. So you > need to try again with more precision. But how much? You might try 5 > digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next? > Might be a waste of effort. Try 20 next? Might *still* not be enough -- or > could just as well be that 7 would have been enough and you did 10x the work > you needed to do. Right. From what I understand, the dilemma is this: In order to round correctly, how much extra precision do we need, so that the range of uncertainity (+-3 in your example) does not contain the middle of two consecutive representable numbers (say 41.49 and 41.501). "Solving" the dilemma is predicting this extra precision so that the ranges of uncertainity does not contain the middle of two consecutive floats. Which in turn equals to calculating the min distance between the image of a number and the middle of two consecutive machine numbers. And that's what these guys have calculated for common functions in IEEE-754 double precision, with brute force, using an apparently original algorithm they have proposed. > > that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim > I haven't asked for anything. It was just passive echoing with a good level of uncertainity :-). -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein at lyra.org Mon Apr 10 11:53:48 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 02:53:48 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] Unicode Patch Set 2000-04-10 In-Reply-To: <38F1A430.D70DF89@lemburg.com> Message-ID: On Mon, 10 Apr 2000, M.-A. Lemburg wrote: > The attached patch includes the following fixes and additions: >... > * '...%s...' % u"abc" now coerces to Unicode just like > string methods. Care is taken not to reevaluate already formatted > arguments -- only the first Unicode object appearing in the > argument mapping is looked up twice. Added test cases for > this to test_unicode.py. >... I missed a chance to bring this up on the first round of discussion, but is this really the right thing to do? We never coerce the string on the left based on operands. For example: if the operands are class instances, we call __str__ -- we don't call __coerce__. It seems a bit weird to magically revise the left operand. In many cases, a Unicode used as a string is used as a UTF-8 value. Why is that different in this case? Seems like a wierd special case. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Mon Apr 10 12:55:50 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 12:55:50 +0200 Subject: [Python-Dev] Re: [Patches] Unicode Patch Set 2000-04-10 References: Message-ID: <38F1B336.12B6707@lemburg.com> Greg Stein wrote: > > On Mon, 10 Apr 2000, M.-A. Lemburg wrote: > > The attached patch includes the following fixes and additions: > >... > > * '...%s...' % u"abc" now coerces to Unicode just like > > string methods. Care is taken not to reevaluate already formatted > > arguments -- only the first Unicode object appearing in the > > argument mapping is looked up twice. Added test cases for > > this to test_unicode.py. > >... > > I missed a chance to bring this up on the first round of discussion, but > is this really the right thing to do? We never coerce the string on the > left based on operands. For example: if the operands are class instances, > we call __str__ -- we don't call __coerce__. > > It seems a bit weird to magically revise the left operand. > > In many cases, a Unicode used as a string is used as a UTF-8 value. Why is > that different in this case? Seems like a wierd special case. It's not a special case: % works just like a method call and all string methods auto-coerce to Unicode in case a Unicode object is found among the arguments. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Mon Apr 10 13:19:51 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 10 Apr 2000 13:19:51 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: Message-ID: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> Greg Stein wrote: > In many cases, a Unicode used as a string is used as a UTF-8 value. Why is > that different in this case? Seems like a wierd special case. the whole "sometimes it's UTF-8, sometimes it's not" concept is one big mess (try using some existing string crunching code with unicode strings if you don't believe me -- using non-US input strings, of course). among other things, it's very hard to get things to work properly when string slicing and indexing no longer works as expected... I see two possible ways to solve this; rough proposals follow: ----------------------------------------------------------------------- 1. a java-like approach ----------------------------------------------------------------------- a) define *character* in python to be a unicode character b) provide two character containers: 8-bit strings and unicode strings. the former can only hold unicode characters in the range 0-255, the latter can hold characters from the full unicode character set (not entirely true for the current implementation, but that's not relevant here) given a string "s" of any string type, s[i] is *always* the i'th character. len(s) is always the number of characters in the string. len(s[i]) is 1. etc. c) string operations involving mixed types use the larger type for the return value. d) they raise TypeError if (c) doesn't make any sense. e) as before, 8-bit strings can also be used to store binary data, hold- ing *bytes* instead of characters. given an 8-bit string "b" used as a buffer, b[i] is always the i'th byte. len(b) is always the number of bytes in the buffer. binary buffers can be used to hold any external unicode encodings (utf-8, utf-16, etc), as well as non-unicode 8-bit encodings (iso-8859-x, cyrillic, far east, etc). there are no implicit conversions from buffers to strings; it's up to the programmer to spell that out when necessary. f) it's up to the programmer to keep track of what a given 8-bit string actually contains (strings, encoded characters, or some other kind of binary data). g) (optionally) change the language definition to say that source code is written in unicode, and provide an "encoding pragma" to tell the com- piler how to interpret any given source file. (maybe in 1.7?) (there are more issues here, but let's start with these) ----------------------------------------------------------------------- 2. a tcl-like approach ----------------------------------------------------------------------- a) change slicing, 8-bit regular expressions (etc) to handle UTF-8 byte sequences as characters. this opens one big can of worms... b) kill the worms. ----------------------------------------------------------------------- comments? (for obvious reasons, I'm especially interested in comments from people using non-ASCII characters on a daily basis...) Return-Path: Delivered-To: python-dev at python.org Received: from mr14.vic-remote.bigpond.net.au (mr14.vic-remote.bigpond.net.au [24.192.1.29]) by dinsdale.python.org (Postfix) with ESMTP id B8DCF1CD40 for ; Sun, 9 Apr 2000 20:53:33 -0400 (EDT) Received: from bobcat (CPE-144-132-23-166.vic.bigpond.net.au [144.132.23.166]) by mr14.vic-remote.bigpond.net.au (Pro-8.9.3/8.9.3) with SMTP id KAA21301 for ; Mon, 10 Apr 2000 10:55:59 +1000 (EST) From: "Mark Hammond" To: Date: Mon, 10 Apr 2000 10:55:39 +1000 Message-ID: X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Subject: [Python-Dev] Crash in new "trashcan" mechanism. Sender: python-dev-admin at python.org Errors-To: python-dev-admin at python.org X-BeenThere: python-dev at python.org X-Mailman-Version: 2.0beta2 Precedence: bulk List-Id: Python core developers Ive struck a crash in the new trashcan mechanism (so I guess Chris is gunna pay the most attention here). Although I can only provoke this reliably in debug builds, I believe it also exists in release builds, but is just far more insidious. Unfortunately, I also can not create a simple crash case. But I _can_ provide info on how you can reliably cause the crash. Obviously only tested on Windows... * Go to http://lima.mudlib.org/~rassilon/p2c/, and grab the download, and unzip. * Replace "transformer.py" with the attached version (multi-arg append bites :-) * Ensure you have a Windows "debug" build available, built from CVS. * From the p2c directory, Run "python_d.exe gencode.py gencode.py" You will get a crash, and the debugger will show you are destructing a list, with an invalid object. The crash occurs about 1000 times after this code is first hit, and I can't narrow the crash condition down :-( If you open object.h, and disable the trashcan mechanism (by changing the "xx", as the comments suggest) then it runs fine. Hope this helps someone - Im afraid I havent a clue :-( Mark. begin 666 transformer.py M(R!#;W!Y2 M+2!T6-L960N;F5T*0T*(R!&96)R=6%R M>2 Q.3DW+ at T*(PT*(PT*(R!4:&4@;W5T<'5T('1R964@:&%S('1H92!F;VQL M;W=I;F<@;F]D97,Z#0HC#0HC(%-O=7)C92!0>71H;VX@;&EN92 C)W, at 87!P M96%R(&%T('1H92!E;F0@;V8 at 96%C:"!O9B!A;&P@;V8@=&AE'!R,4YO M9&4L(&5X<'(R3F]D92P at 97AP'!R M,TYO9&4-"B,@=')Y9FEN86QL>3H)=')Y4W5I=&5.;V1E+"!F:6Y3nT94YO M9&4-"B,@=')Y97AC97!T. at ET'!R3F]D92P at 871T3$L('9A;#$I+" N+BXL("AK M97E.+"!V86Q.*2!=#0HC(&YO=#H)"65X<').;V1E#0HC(&-O;7!A'!R, at T*(PT*(R!#;VUP:6QE9"!A&]R. at E;(&YO9&4Q+" N+BXL(&YO9&5.(%T-"B, at 8FET86YD.@E;(&YO9&4Q M+" N+BXL(&YO9&5.(%T-"B,-"B, at 3W!E'!R3F]D92P@2LZ"6YO9&4-"B,@=6YA'!R*'1E>'0I#0H)=')E92 ]('!AR!]#0H@(" @9F]R('9A;'5E+"!N M86UE(&EN('-Y;6)O;"YS>6U?;F%M92YI=&5M7!E*'1R964I("$]('1Y<&4H6UTI. at T*(" @(" @ M=')E92 ]('!A'0I. at T*(" @("(B(E)E='5R;B!A(&UO9&EF:65D('!A7!E*&9I;&4I(#T]('1Y<&4H)R6UB;VPN9FEL95]I;G!U=#H-"@D@(')E='5R;B!S96QF+F9I M;&5?:6YPpH;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN979A;%]I;G!U M=#H-"@D@(')E='5R;B!S96QF+F5V86Q?:6YPpH;F]D95LQ.ETI#0H):68@ M;B ]/2!S>6UB;VPN;&%M8F1E9CH-"@D@(')E='5R;B!S96QF+FQA;6)D968H M;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN9G5N8V1E9CH-"@D@(')E='5R M;B!S96QF+F9U;F-D968H;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN8VQA M7!E)RP@;BD-"@T* M("!D968@71H:6YG(&%B;W5T(&)E:6YG(")I;G1E'!R3F]D92D-"B @("!N;V1E'!R,BP at 97AP'!R M75T-"B @("!E>'!R,2 ]('-E;&8N8V]M7VYO9&4H;F]D96QI'!R,R ]('-E;&8N8V]M7VYO9&4H;F]D96QI M'!R,R ]($YO;F4-"B @ M("!E;'-E. at T*(" @(" @97AP'!R,R ]($YO;F4-"B @(" @(" @ M#0H@(" @;B ]($YO9&4H)V5X96,G+"!E>'!R,2P at 97AP'!R,2P at 97AP4YO9&4L(&5L5]S=&UT*'-E;&8L(&YO9&5L:7-T*3H-"B @(" C("=T2<@)SHG('-U:71E("=F:6YA;&QY)R G M.B<@6UB;VPN97AC M97!T7V-L875S93H-"B @(" @(')E='5R;B!S96QF+F-O;5]T2AN;V1E;&ES="D-"@T*(" @(')E='5R;B!S96QF+F-O;5]T6UB;VPN'!R("@G+"<@97AP'!R;&ES=#H at 97AP6UB;VPN;&%M8F1E9CH-"B @(" @(')E='5R;B!S96QF+FQA M;6)D968H;F]D96QI2 at G;W(G+"!N;V1E;&ES="D-"@T*("!D968 at 86YD7W1E7!E(#T@)VYO=&EN)PT*"0D@(&5L7!E(#T@)VES;F]T)PT*"2 @96QS93H-"@D)='EP92 ](%]C;7!?='EP97-; M;ELP75T-"@T*"2 @;&EN96YO(#T@;FQ;,5U;,ET-"@D@(')E7!E+"!S96QF+F-O;5]N;V1E*&YO9&5L:7-T6VE=*2D@*0T*#0H) M(R!W92!N965D(&$@'!R*2H-"B @("!R971U'!R*2H-"B @("!R971U&]R7V5X<'(@*"2 at G8FET86YD)RP@;F]D96QI'!R*'-E;&8L(&YO9&5L:7-T*3H-"B @("!N;V1E(#T@2TG+"!N;V1E*0T*(" @(" @;F]D92YL:6YE;F\@/2!N;V1E;&ES=%LP75LR M70T*(" @(&EF('0@/3T@=&]K96XN5$E,1$4Z#0H@(" @("!N;V1E(#T at 3F]D M92 at G:6YV97)T)RP@;F]D92D-"B @(" @(&YO9&4N;&EN96YO(#T@;F]D96QI M5]T'AX>"DI($YO9&5S*0T*(" @(",@#0H@(" @:68@;F]D95LP72 ]/2!T M;VME;BY.15=,24Y%. at T*(" @(" @2AS96QF+"!N;V1E;&ES="DZ#0H@(" @(R!T2(@(CHB('-U:71E#0H@ M(" @;B ]($YO9&4H)W1R>69I;F%L;'DG+"!S96QF+F-O;5]N;V1E*&YO9&5L M:7-T6S)=*2P@5]E M>&-E<'0Z("=T&-E<'0Z"2!;5')Y3F]D M92P at 6V5X8V5P=%]C;&%U&-E<'1?8VQA=7-E. at T*(" @ M(" @(" C(&5X8V5P=%]C;&%U&-E<'0G(%ME>'!R(%LG+"<@97AP M'!R,B ]($YO;F4-"B @(" @(" @8VQA=7-E'!R,BP@65X8V5P="'!R M;&ES="!O$5R6UB;VPN871O;3H-"B @(" @(" @("!R M86ES92!3>6YT87A%2P@;F]D95LM M,5TL(&%S6YT87A%2P@;F]D92P@ M87-S:6=N:6YG*3H-"B @("!T(#T@;F]D95LQ75LP70T*(" @(&EF('0@/3T@ M=&]K96XN3%!!4CH-"B @(" @(')A:7-E(%-Y;G1A>$5R2P@;F]D95LR72P at 87-S:6=N:6YG*0T*(" @(&EF('0@/3T@=&]K96XN3%-1 M0CH-"B @(" @(')E='5R;B!S96QF+F-O;5]S=6)S8W)I<'1L:7-T*'!R:6UA M6YT87A%7!E.B E2AS96QF+"!T>7!E M+"!N;V1E;&ES="DZ#0H@(" @(D-O;7!I;&4@)TY/1$4@*$]0($Y/1$4I*B<@ M:6YT;R H='EP92P at 6R!N;V1E,2P at +BXN+"!N;V1E3B!=*2XB#0H@(" @:68@ M;&5N*&YO9&5L:7-T*2 ]/2 Q. at T*(" @(" @7!E+"!I=&5M&-E<'0Z#0H@(" @("!P4YO9&4L(&YO9&5L:7-T*3H-"B @("!T(#T@;F]D96QI4YO9&4L(&YO9&5L:7-T6S)=+"!/ M4%]!4%!,62D-"@T*(" @(')A:7-E(%-Y;G1A>$5R4YO9&4L(&YO9&5L:7-T*3H-"B @("!I9B!N;V1E;&ES M=%LP72 A/2!T;VME;BY.04U%. at T*(" @(" @$5R'!R97-S:6]N("@E'1E;F1E9%]S;&EC:6YG#0H@ M(" @(R!S:6UP;&5?2 B6R(@'1E;F1E9%]S;&EC:6YG.B!P6UB;VPN'!R97-S:6]N('P@<')O M<&5R7W-L:6-E('P at 96QL:7!S:7,-"B @("!C:" ](&YO9&5;,5T-"B @("!I M9B!C:%LP72 ]/2!T;VME;BY$3U0 at 86YD(&YO9&5;,EU;,%T@/3T@=&]K96XN M1$]4. at T*(" @(" @'!R97-S:6]N M#0H@(" @(R!U<'!E2!B92!F=7)T:&5R('-L:6-I;F2!L;V]K:6YG#0H@(" @(R!F;W(@6UB;VPN6UB;VPN97AP6UB;VPN=&5S=&QI M6UB;VPN86YD7W1E'!R+ T*("!S>6UB;VPN>&]R7V5X<'(L#0H@('-Y;6)O;"YA;F1?97AP M'!R M+ T*("!S>6UB;VPN=&5R;2P-"B @7!E M6UB;VPN9G5N M8V1E9BP-"B @6UB;VPN'!R7W-T;70L#0H@('-Y;6)O;"YP6UB;VPN9&5L7W-T;70L#0H@('-Y;6)O;"YP87-S7W-T;70L#0H@('-Y;6)O M;"YB6UB;VPN8V]N=&ENe?&5C7W-T;70L#0H@('-Y;6)O;"YA6UB;VPN M9F]R7W-T;70L#0H@('-Y;6)O;"YT6UB;VPN=&5S=&QI6UB M;VPN86YD7W1E'!R;&ES="P-"B @'!R+ T*("!S M>6UB;VPN6UB;VPN9F%C=&]R+ T*("!S>6UB;VPN<&]W97(L M#0H@('-Y;6)O;"YA=&]M+ T*("!=#0H-"E]A6UB;VPN86YD7W1E'!R M+ T*("!S>6UB;VPN>&]R7V5X<'(L#0H@('-Y;6)O;"YA;F1?97AP'!R+ T*("!S D>6UB;VPN=&5R;2P-"B @; from gstein@lyra.org on Mon, Apr 10, 2000 at 01:18:20AM -0700 References: <200004100130.VAA16590@kaluha.cnri.reston.va.us> Message-ID: <20000410091101.B406@mems-exchange.org> On 10 April 2000, Greg Stein said: > On Sun, 9 Apr 2000, Greg Ward wrote: > > Modified Files: > > cmd.py > > Log Message: > > Added a check for the 'force' attribute in '__getattr__()' -- better than > > crashing when self.force not defined. > > This seems a bit silly. Why don't you simply define .force in the __init__ > method? Duhh, 'cause I'm stupid? No, that's not it. 'Cause I was doing this on a lazy Sunday evening and not really thinking about it? Yeah, I think that's it. There, I now define self.force in the Command class constructor. A wee bit cheesy (not all Distutils command classes need or use self.force, and it wouldn't always mean the same thing), but it means minimal code upheaval for now. > [ damn... can't see the code... went and checked it out... ] Oops, that was a CVS config thing. Fixed now -- I'll go checkin that change and we'll all see if it worked. Just as well it was off though -- I checked in a couple of big documentation updates this weekend, and who wants to see 30k of LaTeX patches in their inbox on Monday morning? ;-) Greg -- Greg Ward - software developer gward at mems-exchange.org MEMS Exchange / CNRI voice: +1-703-262-5376 Reston, Virginia, USA fax: +1-703-262-5367 From guido at python.org Mon Apr 10 16:01:58 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 10:01:58 -0400 Subject: [Python-Dev] "takeuchi": a unicode string on IDLE shell Message-ID: <200004101401.KAA00238@eric.cnri.reston.va.us> Can anyone answer this? I can reproduce the output side of this, and I believe he's right about the input side. Where should Python migrate with respect to Unicode input? I think that what Takeuchi is getting is actually better than in Pythonwin or command line (where he gets Shift-JIS)... --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Mon, 10 Apr 2000 22:49:45 +0900 From: "takeuchi" To: Subject: a unicode string on IDLE shell Dear Guido, I plaied your latest CPython(Python1.6a1) on Win98 Japanese version, and found a strange IDLE shell behavior. I'm not sure this is a bug or feacher, so I report my story anyway. When typing a Japanese string on IDLE shell with IME , Tk8.3 seems to convert it to a UTF-8 representation. Unfortunatly Python does not know this, it is dealt with an ordinary string. >>> s = raw_input(">>>") Type Japanese characters with IME for example $B$"(B (This is the first character of Japanese alphabet, Hiragana) >>> s '\343\201\202' # UTF-8 encoded >>> print s $B$"(B # A proper griph is appear on the screen Print statement on IDLE shell works fine with a UTF-8 encoded string,however,slice operation or len() does not work. # I know this is a right result So I have to convert this string with unicode(). >>> u = unicode(s) >>> u u'\u3042' >>> print u $B$"(B # A proper griph is appear on the screen Do you think this convertion is unconfortable ? I think this behavior is inconsistant with command line Python and PythonWin. If I want the same result on command line Python shell or PythonWin shell, I have to code as follows; >>> s = raw_input(">>>") Type Japanese characters with IME for example $B$"(B >>>s '\202\240' # Shift-JIS encoded >>> print s $B$"(B # A proper griph is appear on the screen >>> u = unicode(s,"mbcs") # if I use unicode(s) then UnicodeError is raised ! >>>print u.encode("mbcs") # if I use print u then wrong griph is appear $B$"(B # A proper griph is appear on the screen This difference is confusing !! I do not have the best solution for this annoyance, I hope at least IDLE shell and PythonWin shell would have the same behavior . Thank you for reading. Best Regards, takeuchi ------- End of Forwarded Message From tismer at tismer.com Mon Apr 10 16:24:24 2000 From: tismer at tismer.com (Christian Tismer) Date: Mon, 10 Apr 2000 16:24:24 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F1E418.FF191AEE@tismer.com> About extensions and Trashcan. Mark Hammond wrote: ... > Ive struck a crash in the new trashcan mechanism (so I guess Chris > is gunna pay the most attention here). Although I can only provoke > this reliably in debug builds, I believe it also exists in release > builds, but is just far more insidious. > > Unfortunately, I also can not create a simple crash case. But I > _can_ provide info on how you can reliably cause the crash. > Obviously only tested on Windows... ... > You will get a crash, and the debugger will show you are destructing > a list, with an invalid object. The crash occurs about 1000 times > after this code is first hit, and I can't narrow the crash condition > down :-( The trashcan is built in a quite simple manner. It uses a List to delay deletions if the nesting level is deep. The list operations are not thread safe. A special case is handled: It *could* happen on destruction of the session, that trashcan cannot handle errors, since the thread state is already undefined. But the general case of no interpreter lock is undefined and forbidden. In a discussion with Guido, we first thought that we would need some thread safe object for the delay. Later on it turned out that it must be generally *forbidden* to destroy an object when the interpreter lock is not held. Reason: An instance destruction might call __del__, and that would run an interpreter without lock. Forbidden. For that reason, I kept the list in place. I think it is fine that it crashed. There are obviously extension modules left where the interpreter lock rule is violated. The builtin Python code has been checked, there are most probably no holes, including tkinter. Or, I made a mistake in this little code: void _PyTrash_deposit_object(op) PyObject *op; { PyObject *error_type, *error_value, *error_traceback; if (PyThreadState_GET() != NULL) PyErr_Fetch(&error_type, &error_value, &error_traceback); if (!_PyTrash_delete_later) _PyTrash_delete_later = PyList_New(0); if (_PyTrash_delete_later) PyList_Append(_PyTrash_delete_later, (PyObject *)op); if (PyThreadState_GET() != NULL) PyErr_Restore(error_type, error_value, error_traceback); } void _PyTrash_destroy_list() { while (_PyTrash_delete_later) { PyObject *shredder = _PyTrash_delete_later; _PyTrash_delete_later = NULL; ++_PyTrash_delete_nesting; Py_DECREF(shredder); --_PyTrash_delete_nesting; } } ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Mon Apr 10 16:40:19 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 10:40:19 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 10:20:34 EDT." <200004101420.KAA00291@eric.cnri.reston.va.us> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> Message-ID: <200004101440.KAA00324@eric.cnri.reston.va.us> Thinking about entering Japanese into raw_input() in IDLE more, I thought I figured a way to give Takeuchi a Unicode string when he enters Japanese characters. I added an experimental patch to the readline method of the PyShell class: if the line just read, when converted to Unicode, has fewer characters but still compares equal (and no exceptions happen during this test) then return the Unicode version. This doesn't currently work because the built-in raw_input() function requires that the readline() call it makes internally returns an 8-bit string. Should I relax that requirement in general? (I could also just replace __builtin__.[raw_]input with more liberal versions supplied by IDLE.) I also discovered that the built-in unicode() function is not idempotent: unicode(unicode('a')) returns u'\000a'. I think it should special-case this and return u'a' ! Finally, I believe we need a way to discover the encoding used by stdin or stdout. I have to admit I know very little about the file wrappers that Marc wrote -- is it easy to get the encoding out of them? IDLE should probably emulate this, as it's encoding is clearly UTF-8 (at least when using Tcl 8.1 or newer). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Apr 10 17:16:58 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:16:58 -0400 Subject: [Python-Dev] int division proposal in idle-dev Message-ID: <200004101516.LAA00442@eric.cnri.reston.va.us> David Scherer posted an interesting proposal to the idle-dev list for dealing with the incompatibility issues around int division. Bruce Sherwood also posted an interesting discussion there on how to deal with incompatibilities in general (culminating in a recommendation of David's solution). In brief, David abuses the "global" statement at the module level to implement a pragma. Not ideal, but kind of cute and backwards compatible -- this can be added to Python 1.5 or even 1.4 code without breaking! He proposes that you put "global olddivision" at the top of any file that relies on int/int yielding an int; a newer Python can then default to new division semantics. (He does this by generating a different opcode, which is also smart.) It's time to start thinking about a transition path -- Bruce's discussion and David's proposal are a fine starting point, I think. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Apr 10 17:32:17 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 17:32:17 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> Message-ID: <38F1F401.45535C23@lemburg.com> Guido van Rossum wrote: > > Thinking about entering Japanese into raw_input() in IDLE more, I > thought I figured a way to give Takeuchi a Unicode string when he > enters Japanese characters. > > I added an experimental patch to the readline method of the PyShell > class: if the line just read, when converted to Unicode, has fewer > characters but still compares equal (and no exceptions happen during > this test) then return the Unicode version. > > This doesn't currently work because the built-in raw_input() function > requires that the readline() call it makes internally returns an 8-bit > string. Should I relax that requirement in general? (I could also > just replace __builtin__.[raw_]input with more liberal versions > supplied by IDLE.) > > I also discovered that the built-in unicode() function is not > idempotent: unicode(unicode('a')) returns u'\000a'. I think it should > special-case this and return u'a' ! Good idea. I'll fix this in the next round. > Finally, I believe we need a way to discover the encoding used by > stdin or stdout. I have to admit I know very little about the file > wrappers that Marc wrote -- is it easy to get the encoding out of > them? I'm not sure what you mean: the name of the input encoding ? Currently, only the names of the encoding and decoding functions are available to be queried. > IDLE should probably emulate this, as it's encoding is clearly > UTF-8 (at least when using Tcl 8.1 or newer). It should be possible to redirect sys.stdin/stdout using the codecs.EncodedFile wrapper. Some tests show that raw_input() doesn't seem to use the redirected sys.stdin though... >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') >>> s = raw_input() ??? >>> s '\344\366\374' >>> s = sys.stdin.read() ??? >>> s '\303\244\303\266\303\274\012' -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Apr 10 17:38:58 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:38:58 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 17:32:17 +0200." <38F1F401.45535C23@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> Message-ID: <200004101538.LAA00486@eric.cnri.reston.va.us> > > Finally, I believe we need a way to discover the encoding used by > > stdin or stdout. I have to admit I know very little about the file > > wrappers that Marc wrote -- is it easy to get the encoding out of > > them? > > I'm not sure what you mean: the name of the input encoding ? > Currently, only the names of the encoding and decoding functions > are available to be queried. Whatever is helpful for a module or program that wants to know what kind of encoding is used. > > IDLE should probably emulate this, as it's encoding is clearly > > UTF-8 (at least when using Tcl 8.1 or newer). > > It should be possible to redirect sys.stdin/stdout using > the codecs.EncodedFile wrapper. Some tests show that raw_input() > doesn't seem to use the redirected sys.stdin though... > > >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') > >>> s = raw_input() > ??? > >>> s > '\344\366\374' > >>> s = sys.stdin.read() > ??? > >>> s > '\303\244\303\266\303\274\012' This deserves more looking into. The code for raw_input() in bltinmodule.c certainly *tries* to use sys.stdin. (I think that because your EncodedFile object is not a real stdio file object, it will take the second branch, near the end of the function; this calls PyFile_GetLine() which attempts to call readline().) Aha! It actually seems that your read() and readline() are inconsistent! I don't know your API well enough to know which string is "correct" (\344\366\374 or \303\244\303\266\303\274) but when I call sys.stdin.readline() I get the same as raw_input() returns: >>> from codecs import * >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') >>> s = raw_input() ??? >>> s '\344\366\374' >>> s = sys.stdin.read() ??? >>> >>> s '\303\244\303\266\303\274\012' >>> unicode(s) u'\344\366\374\012' >>> s = sys.stdin.readline() ??? >>> s '\344\366\374\012' >>> Didn't you say that your wrapper only wraps read()? Maybe you need to revise that decision! (Note that PyShell doesn't even define read() -- it only defines readline().) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Apr 10 17:45:29 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 10 Apr 2000 11:45:29 -0400 (EDT) Subject: [Python-Dev] test_fork1 on Linux Message-ID: <14577.63257.956728.228174@seahag.cnri.reston.va.us> I've just checked in changes to test_fork1.py make the test a little more sensible on Linux (where the assumption that the thread pids are the same as the controlling process doesn't hold). However, I'm still observing some serious weirdness with this test. As far as I've been able to tell, the os.fork() call always succeeds, but sometimes the parent process segfaults, and sometimes it locks up. It does seem to get to the os.waitpid() call, which isi appearantly where the failure actually occurs. (And sometimes everything works as expected!) If anyone here is particularly familiar with threading on Linux, I'd appreciate a little help, or even a pointer to someone who understands enough of the low-level aspects of threading on Linux that I can communicate with them to figure this out. Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw at python.org Mon Apr 10 17:52:43 2000 From: bwarsaw at python.org (Barry Warsaw) Date: Mon, 10 Apr 2000 11:52:43 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods Message-ID: <14577.63691.561040.281577@anthem.cnri.reston.va.us> A number of people have played FAST and loose with function and method docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are handy because they are the one attribute on funcs and methods that are easily writable. But as more people overload the semantics for docstrings, we'll get collisions. I've had a number of discussions with folks about adding attribute dictionaries to functions and methods so that you can essentially add any attribute. Namespaces are one honking great idea -- let's do more of those! Below is a very raw set of patches to add an attribute dictionary to funcs and methods. It's only been minimally tested, but if y'all like the idea, I'll clean it up, sanity check the memory management, and post the changes to patches at python.org. Here's some things you can do: -------------------- snip snip -------------------- Python 1.6a2 (#10, Apr 10 2000, 11:27:59) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def a(): pass ... >>> a.publish = 1 >>> a.publish 1 >>> a.__doc__ >>> a.__doc__ = 'a doc string' >>> a.__doc__ 'a doc string' >>> a.magic_string = a.__doc__ >>> a.magic_string 'a doc string' >>> dir(a) ['__doc__', '__name__', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name', 'magic_string', 'publish'] >>> class F: ... def a(self): pass ... >>> f = F() >>> f.a.publish Traceback (most recent call last): File "", line 1, in ? AttributeError: publish >>> f.a.publish = 1 >>> f.a.publish 1 >>> f.a.__doc__ >>> f.a.__doc__ = 'another doc string' >>> f.a.__doc__ 'another doc string' >>> f.a.magic_string = f.a.__doc__ >>> f.a.magic_string 'another doc string' >>> dir(f.a) ['__dict__', '__doc__', '__name__', 'im_class', 'im_func', 'im_self', 'magic_string', 'publish'] >>> -------------------- snip snip -------------------- -Barry [1] Aycock, "Compiling Little Languages in Python", http://www.foretec.com/python/workshops/1998-11/proceedings/papers/aycock-little/aycock-little.html [2] http://classic.zope.org:8080/Documentation/Reference/ORB P.S. I promised to add a little note about setattr and getattr vs. setattro and getattro. There's very little documentation about the differences, and searching on python.org doesn't seem to turn up anything. The differences are simple. setattr/getattr take a char* argument naming the attribute to change, while setattro/getattro take a PyObject* (hence the trailing `o' -- for Object). This stuff should get documented in the C API, but at least now, it'll turn up in a SIG search. :) -------------------- snip snip -------------------- Index: funcobject.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/funcobject.h,v retrieving revision 2.16 diff -c -r2.16 funcobject.h *** funcobject.h 1998/12/04 18:48:02 2.16 --- funcobject.h 2000/04/07 21:30:40 *************** *** 44,49 **** --- 44,50 ---- PyObject *func_defaults; PyObject *func_doc; PyObject *func_name; + PyObject *func_dict; } PyFunctionObject; extern DL_IMPORT(PyTypeObject) PyFunction_Type; Index: classobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/classobject.c,v retrieving revision 2.84 diff -c -r2.84 classobject.c *** classobject.c 2000/04/10 13:03:19 2.84 --- classobject.c 2000/04/10 15:27:15 *************** *** 1550,1577 **** /* Dummies that are not handled by getattr() except for __members__ */ {"__doc__", T_INT, 0}, {"__name__", T_INT, 0}, {NULL} /* Sentinel */ }; static PyObject * instancemethod_getattr(im, name) register PyMethodObject *im; ! PyObject *name; { ! char *sname = PyString_AsString(name); ! if (sname[0] == '_') { /* Inherit __name__ and __doc__ from the callable object implementing the method */ ! if (strcmp(sname, "__name__") == 0 || ! strcmp(sname, "__doc__") == 0) ! return PyObject_GetAttr(im->im_func, name); } if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "instance-method attributes not accessible in restricted mode"); return NULL; } ! return PyMember_Get((char *)im, instancemethod_memberlist, sname); } static void --- 1550,1608 ---- /* Dummies that are not handled by getattr() except for __members__ */ {"__doc__", T_INT, 0}, {"__name__", T_INT, 0}, + {"__dict__", T_INT, 0}, {NULL} /* Sentinel */ }; + static int + instancemethod_setattr(im, name, v) + register PyMethodObject *im; + char *name; + PyObject *v; + { + int rtn; + + if (PyEval_GetRestricted() || + strcmp(name, "im_func") == 0 || + strcmp(name, "im_self") == 0 || + strcmp(name, "im_class") == 0) + { + PyErr_Format(PyExc_TypeError, "read-only attribute: %s", name); + return -1; + } + return PyObject_SetAttrString(im->im_func, name, v); + } + + static PyObject * instancemethod_getattr(im, name) register PyMethodObject *im; ! char *name; { ! PyObject *rtn; ! ! if (strcmp(name, "__name__") == 0 || ! strcmp(name, "__doc__") == 0) { /* Inherit __name__ and __doc__ from the callable object implementing the method */ ! return PyObject_GetAttrString(im->im_func, name); } if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "instance-method attributes not accessible in restricted mode"); return NULL; + } + if (strcmp(name, "__dict__") == 0) + return PyObject_GetAttrString(im->im_func, name); + + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); + if (rtn == NULL) { + PyErr_Clear(); + rtn = PyObject_GetAttrString(im->im_func, name); + if (rtn == NULL) + PyErr_SetString(PyExc_AttributeError, name); } ! return rtn; } static void *************** *** 1662,1669 **** 0, (destructor)instancemethod_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ ! 0, /*tp_getattr*/ ! 0, /*tp_setattr*/ (cmpfunc)instancemethod_compare, /*tp_compare*/ (reprfunc)instancemethod_repr, /*tp_repr*/ 0, /*tp_as_number*/ --- 1693,1700 ---- 0, (destructor)instancemethod_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ ! (getattrfunc)instancemethod_getattr, /*tp_getattr*/ ! (setattrfunc)instancemethod_setattr, /*tp_setattr*/ (cmpfunc)instancemethod_compare, /*tp_compare*/ (reprfunc)instancemethod_repr, /*tp_repr*/ 0, /*tp_as_number*/ *************** *** 1672,1678 **** (hashfunc)instancemethod_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ ! (getattrofunc)instancemethod_getattr, /*tp_getattro*/ 0, /*tp_setattro*/ }; --- 1703,1709 ---- (hashfunc)instancemethod_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ ! 0, /*tp_getattro*/ 0, /*tp_setattro*/ }; Index: funcobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/funcobject.c,v retrieving revision 2.18 diff -c -r2.18 funcobject.c *** funcobject.c 1998/05/22 00:55:34 2.18 --- funcobject.c 2000/04/07 22:15:33 *************** *** 62,67 **** --- 62,68 ---- doc = Py_None; Py_INCREF(doc); op->func_doc = doc; + op->func_dict = PyDict_New(); } return (PyObject *)op; } *************** *** 133,138 **** --- 134,140 ---- {"__name__", T_OBJECT, OFF(func_name), READONLY}, {"func_defaults",T_OBJECT, OFF(func_defaults)}, {"func_doc", T_OBJECT, OFF(func_doc)}, + {"func_dict", T_OBJECT, OFF(func_dict)}, {"__doc__", T_OBJECT, OFF(func_doc)}, {NULL} /* Sentinel */ }; *************** *** 142,153 **** PyFunctionObject *op; char *name; { if (name[0] != '_' && PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not accessible in restricted mode"); return NULL; } ! return PyMember_Get((char *)op, func_memberlist, name); } static int --- 144,167 ---- PyFunctionObject *op; char *name; { + PyObject* rtn; + if (name[0] != '_' && PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not accessible in restricted mode"); return NULL; + } + if (strcmp(name, "__dict__") == 0) + return op->func_dict; + + rtn = PyMember_Get((char *)op, func_memberlist, name); + if (rtn == NULL) { + PyErr_Clear(); + rtn = PyDict_GetItemString(op->func_dict, name); + if (rtn == NULL) + PyErr_SetString(PyExc_AttributeError, name); } ! return rtn; } static int *************** *** 156,161 **** --- 170,177 ---- char *name; PyObject *value; { + int rtn; + if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not settable in restricted mode"); *************** *** 178,185 **** } if (value == Py_None) value = NULL; } ! return PyMember_Set((char *)op, func_memberlist, name, value); } static void --- 194,214 ---- } if (value == Py_None) value = NULL; + } + else if (strcmp(name, "func_dict") == 0) { + if (value == NULL || !PyDict_Check(value)) { + PyErr_SetString( + PyExc_TypeError, + "func_dict must be set to a dict object"); + return -1; + } + } + rtn = PyMember_Set((char *)op, func_memberlist, name, value); + if (rtn < 0) { + PyErr_Clear(); + rtn = PyDict_SetItemString(op->func_dict, name, value); } ! return rtn; } static void *************** *** 191,196 **** --- 220,226 ---- Py_DECREF(op->func_name); Py_XDECREF(op->func_defaults); Py_XDECREF(op->func_doc); + Py_XDECREF(op->func_dict); PyMem_DEL(op); } From mal at lemburg.com Mon Apr 10 18:01:52 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 18:01:52 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> Message-ID: <38F1FAF0.4821AE6C@lemburg.com> Guido van Rossum wrote: > > > > Finally, I believe we need a way to discover the encoding used by > > > stdin or stdout. I have to admit I know very little about the file > > > wrappers that Marc wrote -- is it easy to get the encoding out of > > > them? > > > > I'm not sure what you mean: the name of the input encoding ? > > Currently, only the names of the encoding and decoding functions > > are available to be queried. > > Whatever is helpful for a module or program that wants to know what > kind of encoding is used. > > > > IDLE should probably emulate this, as it's encoding is clearly > > > UTF-8 (at least when using Tcl 8.1 or newer). > > > > It should be possible to redirect sys.stdin/stdout using > > the codecs.EncodedFile wrapper. Some tests show that raw_input() > > doesn't seem to use the redirected sys.stdin though... > > > > >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') > > >>> s = raw_input() > > ??? > > >>> s > > '\344\366\374' > > >>> s = sys.stdin.read() > > ??? > > >>> s > > '\303\244\303\266\303\274\012' The latter is the "correct" output, BTW. > This deserves more looking into. The code for raw_input() in > bltinmodule.c certainly *tries* to use sys.stdin. (I think that > because your EncodedFile object is not a real stdio file object, it > will take the second branch, near the end of the function; this calls > PyFile_GetLine() which attempts to call readline().) > > Aha! It actually seems that your read() and readline() are > inconsistent! They are because I haven't yet found a way to implement readline() without buffering read-ahead data. The only way I can think of to implement it without buffering would be to read one char at a time which is much too slow. Buffering is hard to implement right when assuming that streams are stacked... every level would have its own buffering scheme and mixing .read() and .readline() wouldn't work too well. Anyway, I'll give it try... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Apr 10 17:56:26 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:56:26 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 18:01:52 +0200." <38F1FAF0.4821AE6C@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> Message-ID: <200004101556.LAA00578@eric.cnri.reston.va.us> > > Aha! It actually seems that your read() and readline() are > > inconsistent! > > They are because I haven't yet found a way to implement > readline() without buffering read-ahead data. The only way > I can think of to implement it without buffering would be > to read one char at a time which is much too slow. > > Buffering is hard to implement right when assuming that > streams are stacked... every level would have its own > buffering scheme and mixing .read() and .readline() > wouldn't work too well. Anyway, I'll give it try... Since you're calling methods on the underlying file object anyway, can't you avoid buffering by calling the *corresponding* underlying method and doing the conversion on that? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Apr 10 18:02:36 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 12:02:36 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: Your message of "Mon, 10 Apr 2000 11:52:43 EDT." <14577.63691.561040.281577@anthem.cnri.reston.va.us> References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: <200004101602.MAA00590@eric.cnri.reston.va.us> > A number of people have played FAST and loose with function and method > docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are > handy because they are the one attribute on funcs and methods that are > easily writable. But as more people overload the semantics for > docstrings, we'll get collisions. I've had a number of discussions > with folks about adding attribute dictionaries to functions and > methods so that you can essentially add any attribute. Namespaces are > one honking great idea -- let's do more of those! > > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, I'll clean it up, sanity check the memory management, and > post the changes to patches at python.org. Here's some things you can > do: > > -------------------- snip snip -------------------- > Python 1.6a2 (#10, Apr 10 2000, 11:27:59) [GCC 2.8.1] on sunos5 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> def a(): pass > ... > >>> a.publish = 1 > >>> a.publish > 1 > >>> a.__doc__ > >>> a.__doc__ = 'a doc string' > >>> a.__doc__ > 'a doc string' > >>> a.magic_string = a.__doc__ > >>> a.magic_string > 'a doc string' > >>> dir(a) > ['__doc__', '__name__', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name', 'magic_string', 'publish'] > >>> class F: > ... def a(self): pass > ... > >>> f = F() > >>> f.a.publish > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: publish > >>> f.a.publish = 1 > >>> f.a.publish > 1 Here I have a question. Should this really change F.a, or should it change the method bound to f only? You implement the former, but I'm not sure if those semantics are right -- if I have two instances, f1 and f2, and you change f2.a.spam, I'd be surprised if f1.a.spam got changed as well (since f1.a and f2.a are *not* the same thing -- they are not shared. f1.a.im_func and f2.a.im_func are the same thing, but f1.a and f2.a are distinct! I would suggest that you only allow setting attributes via the class or via a function. (This means that you must still implement the pass-through on method objects, but reject it if the method is bound to an instance.) --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at cnri.reston.va.us Mon Apr 10 18:05:14 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 10 Apr 2000 12:05:14 -0400 (EDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F1E418.FF191AEE@tismer.com> References: <38F1E418.FF191AEE@tismer.com> Message-ID: <14577.64442.47034.907133@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> I think it is fine that it crashed. There are obviously CT> extension modules left where the interpreter lock rule is CT> violated. The builtin Python code has been checked, there are CT> most probably no holes, including tkinter. Or, I made a mistake CT> in this little code: I think have misunderstood at least one of Mark's bug report and your response. Does the problem Mark reported rely on extension code? I thought the bug was triggered by running pure Python code. If that is the case, then it can never be fine that it crashed. If the problem relies on extension code, then there ought to be a way to write the extension so that it doesn't cause a crash. Jeremy PS Mark: Is the transformer.py you attached different from the one in the nondist/src/Compiler tree? It looks like the only differences are with the whitespace. From pf at artcom-gmbh.de Mon Apr 10 18:54:09 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 10 Apr 2000 18:54:09 +0200 (MEST) Subject: [Python-Dev] Re: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: from David Scherer at "Apr 10, 2000 9:54:35 am" Message-ID: Hi! David Scherer on idle-dev at python.org: [...] > in the interpreter* is fast. In principle, one could put THREE operators in > the language: one with the new "float division" semantics, one that divided > only integers, and a "backward compatibility" operator with EXACTLY the old > semantics: [...] > An outline of what I did: [...] Yes, this really clever. I like the ideas. [me]: > > 2. What should the new Interpreter do, if he sees a source file without a > > pragma defining the language level? There are two possibilities: [...] > > 2. Assume, it is a new source file and apply language level 2 to it. > > This has the disadvantage, that it will break any existing code. > I think the answer is 2. A high-quality script for adding the pragma to > existing files, with CLI and GUI interfaces, should be packaged with Python. > Running it on your existing modules would be part of the installation > process. Okay. But what is with the Python packages available on the Internet? May be the upcoming dist-utils should handle this? Or should the Python core distribution contain a clever installer program, which handles this? > Long-lived modules should always have a language level, since it makes them > more robust against changes and also serves as documentation. A version > statement could be encouraged at the top of any nontrivial script, e.g: > > python 1.6 [...] global python_1_5 #implies global old_division or global python_1_6 #implies global old_division or global python_1_7 #may be implies global new_division may be we can solve another issue just discussed on python_dev with global source_iso8859_1 or global source_utf_8 Cute idea... but we should keep the list of such pragmas short. > Personally, I think that it makes more sense to talk about ways to > gracefully migrate individual changes into the language than to put off > every backward-incompatible change to a giant future "flag day" that will > break all existing scripts. Versioning of some sort should be encouraged > starting *now*, and incorporated into 1.6 before it goes final. Yes. > Indeed, but Guido has spoken: > > > Great ideas there, Bruce! I hope you will post these to an > > appropriate mailing list (perhaps idle-dev, as there's no official SIG > > to discuss the Python 3000 transition yet, and python-dev is closed). May be someone can invite you into 'python-dev'? However the archives are open to anyone and writing to the list is also open to anybody. Only subscription is closed. I don't know why. Regards, Peter P.S.: Redirected Reply-To: to David and python-dev at python.org ! -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mal at lemburg.com Mon Apr 10 18:39:45 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 18:39:45 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> Message-ID: <38F203D1.4A0038F@lemburg.com> Guido van Rossum wrote: > > > > Aha! It actually seems that your read() and readline() are > > > inconsistent! > > > > They are because I haven't yet found a way to implement > > readline() without buffering read-ahead data. The only way > > I can think of to implement it without buffering would be > > to read one char at a time which is much too slow. > > > > Buffering is hard to implement right when assuming that > > streams are stacked... every level would have its own > > buffering scheme and mixing .read() and .readline() > > wouldn't work too well. Anyway, I'll give it try... > > Since you're calling methods on the underlying file object anyway, > can't you avoid buffering by calling the *corresponding* underlying > method and doing the conversion on that? The problem here is that Unicode has far more line break characters than plain ASCII. The underlying API would break on ASCII lines (or even worse on those CRLF sequences defined by the C lib), not the ones I need for Unicode. BTW, I think that we may need a new Codec class layer here: .readline() et al. are all text based methods, while the Codec base classes clearly work on all kinds of binary and text data. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Mon Apr 10 20:04:31 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:04:31 -0700 (PDT) Subject: [Python-Dev] CVS: distutils/distutils cmd.py In-Reply-To: <20000410091101.B406@mems-exchange.org> Message-ID: On Mon, 10 Apr 2000, Greg Ward wrote: > On 10 April 2000, Greg Stein said: >... > > [ damn... can't see the code... went and checked it out... ] > > Oops, that was a CVS config thing. Fixed now -- I'll go checkin that > change and we'll all see if it worked. Just as well it was off though > -- I checked in a couple of big documentation updates this weekend, and > who wants to see 30k of LaTeX patches in their inbox on Monday morning? > ;-) Cool. The CVS diffs appear to work quite fine now! Note: you might not get a 30k patch since the system elides giant diffs. Of course, if you patch 10 files, each with 3k diffs... :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Mon Apr 10 20:13:08 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:13:08 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Barry Warsaw wrote: >... > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, +1 on concept, -1 on the patch :-) >... > P.S. I promised to add a little note about setattr and getattr > vs. setattro and getattro. There's very little documentation about > the differences, and searching on python.org doesn't seem to turn up > anything. The differences are simple. setattr/getattr take a char* > argument naming the attribute to change, while setattro/getattro take > a PyObject* (hence the trailing `o' -- for Object). This stuff should > get documented in the C API, but at least now, it'll turn up in a SIG > search. :) And note that the getattro/setattro is preferred. It is easy to extract the char* from them; the other direction requires construction of an object. >... > + static int > + instancemethod_setattr(im, name, v) > + register PyMethodObject *im; > + char *name; > + PyObject *v; IMO, this should be instancemethod_setattro() and take a PyObject *name. In the function, you can extract the string for comparison. >... > + { > + int rtn; This variable isn't used. >... > static PyObject * > instancemethod_getattr(im, name) > register PyMethodObject *im; > ! char *name; IMO, this should remain a getattro function. (and fix the name) In your update, note how many GetAttrString calls there are. The plain GetAttr is typically faster. >... > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyObject_GetAttrString(im->im_func, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); Why do you mask this second error with the AttributeError? Seems that you should just leave whatever is there (typically an AttributeError, but maybe not!). >... > --- 144,167 ---- > PyFunctionObject *op; > char *name; > { > + PyObject* rtn; > + > if (name[0] != '_' && PyEval_GetRestricted()) { > PyErr_SetString(PyExc_RuntimeError, > "function attributes not accessible in restricted mode"); > return NULL; > + } > + if (strcmp(name, "__dict__") == 0) > + return op->func_dict; This is superfluous. The PyMember_Get will do this. > + rtn = PyMember_Get((char *)op, func_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyDict_GetItemString(op->func_dict, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); Again, with the masking... >... > + else if (strcmp(name, "func_dict") == 0) { > + if (value == NULL || !PyDict_Check(value)) { > + PyErr_SetString( > + PyExc_TypeError, > + "func_dict must be set to a dict object"); This raises an interesting thought. Why not just require the mapping protocol? Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 20:11:29 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:11:29 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 18:39:45 +0200." <38F203D1.4A0038F@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> <38F203D1.4A0038F@lemburg.com> Message-ID: <200004101811.OAA02323@eric.cnri.reston.va.us> > > Since you're calling methods on the underlying file object anyway, > > can't you avoid buffering by calling the *corresponding* underlying > > method and doing the conversion on that? > > The problem here is that Unicode has far more line > break characters than plain ASCII. The underlying API would > break on ASCII lines (or even worse on those CRLF sequences > defined by the C lib), not the ones I need for Unicode. Hm, can't we just use \n for now? > BTW, I think that we may need a new Codec class layer > here: .readline() et al. are all text based methods, > while the Codec base classes clearly work on all kinds of > binary and text data. Not sure what you mean here. Can you explain through an example? --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 10 20:27:03 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:27:03 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib urlparse.py,1.22,1.23 In-Reply-To: <200004101702.NAA01141@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > Update of /projects/cvsroot/python/dist/src/Lib > In directory eric:/projects/python/develop/guido/src/Lib > > Modified Files: > urlparse.py > Log Message: > Some cleanup -- don't use splitfields/joinfields, standardize > indentation (tabs only), rationalize some code in urljoin... Why not use string methods? (the patch still imports from string) Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 20:22:26 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:22:26 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib urlparse.py,1.22,1.23 In-Reply-To: Your message of "Mon, 10 Apr 2000 11:27:03 PDT." References: Message-ID: <200004101822.OAA02423@eric.cnri.reston.va.us> > Why not use string methods? (the patch still imports from string) I had the patch sitting in my directory for who knows how long -- I just wanted to flush it to the CVS repository. I didn't really want to thing about all the great changes I *could* make to the code... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Apr 10 20:44:01 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:44:01 -0400 Subject: [Python-Dev] Getting ready for 1.6 alpha 2 Message-ID: <200004101844.OAA02610@eric.cnri.reston.va.us> I'm getting ready for the release of alpha 2. Tomorrow afternoon (around 5:30pm east coast time) I'm going on vacation for the rest of the week, followed by a business trip most of the week after. Obviously, I'm anxious to release a solid alpha tomorrow. Please, send only simple or essential patches between now and the release date! --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 10 20:57:01 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:57:01 -0700 (PDT) Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: <200004101844.OAA02610@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > I'm getting ready for the release of alpha 2. Tomorrow afternoon > (around 5:30pm east coast time) I'm going on vacation for the rest of > the week, followed by a business trip most of the week after. > > Obviously, I'm anxious to release a solid alpha tomorrow. > > Please, send only simple or essential patches between now and the > release date! Jeremy reminded me that my new httplib.py is still pending integration. There are two possibilities: 1) My httplib.py uses a new name, or goes into a "net" package. We check it in today, and I follow up with patches to fold in post-1.5.2 compatibility items (such as the SSL stuff). 2) httplib.py will remain in the same place, so the compat changes must happen first. In both cases, I will also need to follow up with test and doc. IMO, we go with "net.httplib" and check it in today. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 21:00:08 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 15:00:08 -0400 Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: Your message of "Mon, 10 Apr 2000 11:57:01 PDT." References: Message-ID: <200004101900.PAA02692@eric.cnri.reston.va.us> > > Please, send only simple or essential patches between now and the > > release date! > > Jeremy reminded me that my new httplib.py is still pending integration. There will be another alpha release after I'm back -- I think this isn't that urgent. (Plus, just because you're you, you'd have to mail me a wet signature. :-) I am opposed to a net.* package until the reorganization discussion has resulted in a solid design. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 10 21:19:57 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 12:19:57 -0700 (PDT) Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: <200004101900.PAA02692@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > > > Please, send only simple or essential patches between now and the > > > release date! > > > > Jeremy reminded me that my new httplib.py is still pending integration. > > There will be another alpha release after I'm back -- I think this > isn't that urgent. True, but depending on location, it also has zero impact on the release. In other words: added functionality for testing, with no potential for breakage. > (Plus, just because you're you, you'd have to mail > me a wet signature. :-) You've got one on file already :-) [ I sent it back in December; was it misplaced, and I need to resend? ] > I am opposed to a net.* package until the reorganization discussion > has resulted in a solid design. Not a problem. Mine easily replaces httplib.py in its current location. It is entirely backwards compat. A new class is used to get the new functionality, and a compat "HTTP" class is provided (leveraging the new HTTPConnection class). Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 21:20:31 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 15:20:31 -0400 Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: Your message of "Mon, 10 Apr 2000 12:19:57 PDT." References: Message-ID: <200004101920.PAA02957@eric.cnri.reston.va.us> > > > Jeremy reminded me that my new httplib.py is still pending integration. > > > > There will be another alpha release after I'm back -- I think this > > isn't that urgent. > > True, but depending on location, it also has zero impact on the release. > In other words: added functionality for testing, with no potential for > breakage. You're just asking for exposure. But unless it's installed as httplib.py, it won't get much more exposure than if you put it on your website and post an announcement to c.l.py, I bet. > > (Plus, just because you're you, you'd have to mail > > me a wet signature. :-) > > You've got one on file already :-) > > [ I sent it back in December; was it misplaced, and I need to resend? ] I was just teasing. Our lawyer believes that you cannot send in a signature for code that you will contribute in the future; but I really don't care enough to force you to send another one... > > I am opposed to a net.* package until the reorganization discussion > > has resulted in a solid design. > > Not a problem. Mine easily replaces httplib.py in its current location. It > is entirely backwards compat. A new class is used to get the new > functionality, and a compat "HTTP" class is provided (leveraging the new > HTTPConnection class). I thought you said there was some additional work on compat changes? I quote: | 2) httplib.py will remain in the same place, so the compat changes must | happen first. Oh well, send it to Jeremy and he'll check it in if it's ready. But not without a test suite and documentation. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Mon Apr 10 21:47:12 2000 From: tismer at tismer.com (Christian Tismer) Date: Mon, 10 Apr 2000 21:47:12 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <38F1E418.FF191AEE@tismer.com> <14577.64442.47034.907133@goon.cnri.reston.va.us> Message-ID: <38F22FC0.C975290C@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> I think it is fine that it crashed. There are obviously > CT> extension modules left where the interpreter lock rule is > CT> violated. The builtin Python code has been checked, there are > CT> most probably no holes, including tkinter. Or, I made a mistake > CT> in this little code: > > I think have misunderstood at least one of Mark's bug report and your > response. Does the problem Mark reported rely on extension code? I > thought the bug was triggered by running pure Python code. If that is > the case, then it can never be fine that it crashed. If the problem > relies on extension code, then there ought to be a way to write the > extension so that it doesn't cause a crash. Oh! If it is so, then there is in fact a problem left in the Kernel. Mark, did you use an extension? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From andy at reportlab.com Mon Apr 10 21:46:25 2000 From: andy at reportlab.com (Andy Robinson) Date: Mon, 10 Apr 2000 20:46:25 +0100 Subject: [Python-Dev] Re: [I18n-sig] "takeuchi": a unicode string on IDLE shell References: <200004101401.KAA00238@eric.cnri.reston.va.us> Message-ID: <008a01bfa325$79b92f00$01ac2ac0@boulder> ----- Original Message ----- From: Guido van Rossum To: Cc: Sent: 10 April 2000 15:01 Subject: [I18n-sig] "takeuchi": a unicode string on IDLE shell > Can anyone answer this? I can reproduce the output side of this, and > I believe he's right about the input side. Where should Python > migrate with respect to Unicode input? I think that what Takeuchi is > getting is actually better than in Pythonwin or command line (where he > gets Shift-JIS)... > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think what he wants, as you hinted, is to be able to specify a 'system wide' default encoding of Shift-JIS rather than UTF8. UTF-8 has a certain purity in that it equally annoys every nation, and is nobody's default encoding. What a non-ASCII user needs is a site-wide way of setting the default encoding used for standard input and output. I think this could be done with something (config file? registry key) which site.py looks at, and wraps stream encoders around stdin, stdout and stderr. To illustrate why it matters, I often used to parse data files and do queries on a Japanese name and address database; I could print my lists and tuples in interactive mode and check they worked, or initialise functions with correct data, since the OS uses Shift-JIS as its native encoding and I was manipulating Shift-JIS strings. I've lost that ability now due to the Unicode stuff and would need to do >>> for thing in mylist: >>> ....print mylist.encode('shift_jis') to see the contents of a database row, rather than just >>> mylist BTW, Pythonwin stopped working in this regard when Scintilla came along; it prints a byte at a time now, although kanji input is fine, as is kanji pasted into a source file, as long as you specify a Japanese font. However, this is fixable - I just need to find a spare box to run Japanese windows on and find out where the printing goes wrong. Andy Robinson ReportLab From gstein at lyra.org Mon Apr 10 21:53:22 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 12:53:22 -0700 (PDT) Subject: [Python-Dev] httplib again In-Reply-To: <200004101920.PAA02957@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: >... > You're just asking for exposure. But unless it's installed as > httplib.py, it won't get much more exposure than if you put it on your > website and post an announcement to c.l.py, I bet. Hmm. Good point :-) >... > > > (Plus, just because you're you, you'd have to mail > > > me a wet signature. :-) > > > > You've got one on file already :-) > > > > [ I sent it back in December; was it misplaced, and I need to resend? ] > > I was just teasing. :-) >... > > > I am opposed to a net.* package until the reorganization discussion > > > has resulted in a solid design. > > > > Not a problem. Mine easily replaces httplib.py in its current location. It > > is entirely backwards compat. A new class is used to get the new > > functionality, and a compat "HTTP" class is provided (leveraging the new > > HTTPConnection class). > > I thought you said there was some additional work on compat changes? Oops. Yah. It would become option (2) (add compat stuff first) by dropping it over the current one. Mostly, I'm concerned about the SSL stuff that was added, but there may be other things (need to check the CVS logs). For example, there was all that stuff dealing with the errors (which never went in, I believe?). >... > Oh well, send it to Jeremy and he'll check it in if it's ready. But > not without a test suite and documentation. Ah. Well, then it definitely won't go in now :-). It'll take a bit to set up the tests and docco. Well... thanx for the replies. When I get the stuff ready, I'll speak up again. And yes, I do intend to ensure this stuff is ready in time for 1.6. Cheers, -g p.s. and I retract my request for inclusion of davlib. I think there is still some design work to do on that guy. -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 22:01:16 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 16:01:16 -0400 Subject: [Python-Dev] httplib again In-Reply-To: Your message of "Mon, 10 Apr 2000 12:53:22 PDT." References: Message-ID: <200004102001.QAA03201@eric.cnri.reston.va.us> > p.s. and I retract my request for inclusion of davlib. I think there is > still some design work to do on that guy. But it should at least be available outside the distro! The Vaults of Parnassus don't list it -- so it don't exist! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 10 22:50:26 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 13:50:26 -0700 (PDT) Subject: [Python-Dev] httplib again In-Reply-To: <200004102001.QAA03201@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > > p.s. and I retract my request for inclusion of davlib. I think there is > > still some design work to do on that guy. > > But it should at least be available outside the distro! The Vaults of > Parnassus don't list it -- so it don't exist! :-) D'oh! I forgot to bring it over from my alternate plane of reality. ... Okay. I've synchronized the universes. Parnassus now contains a number of records for my Python stuff (well, submitted at least). Thanx for the nag :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Mon Apr 10 22:34:12 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 22:34:12 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> Message-ID: <38F23AC4.12CBE187@lemburg.com> Guido van Rossum wrote: > > > > Finally, I believe we need a way to discover the encoding used by > > > stdin or stdout. I have to admit I know very little about the file > > > wrappers that Marc wrote -- is it easy to get the encoding out of > > > them? > > > > I'm not sure what you mean: the name of the input encoding ? > > Currently, only the names of the encoding and decoding functions > > are available to be queried. > > Whatever is helpful for a module or program that wants to know what > kind of encoding is used. Hmm, you mean something like file.encoding ? I'll add some additional attributes holding the encoding names to the wrapper classes (they will then be set by the wrapper constructor functions). BTW, I've just added .readline() et al. to the codecs... all except .readline() are easy to do. For .readline() I simply delegated line breaking to the underlying stream's .readline() method. This is far from optimal, but better than not having the method at all. I also adjusted the interfaces of the .splitlines() methods: they now take a different optional argument: """ S.splitlines([keepends]]) -> list of strings Return a list of the lines in S, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. """ This made implementing the above methods very simple and also allows writing codecs working with other basic storage types (UserString.py anyone ;-). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Apr 10 23:00:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 23:00:53 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> <38F203D1.4A0038F@lemburg.com> <200004101811.OAA02323@eric.cnri.reston.va.us> Message-ID: <38F24105.28ADB5EA@lemburg.com> Guido van Rossum wrote: > > > > Since you're calling methods on the underlying file object anyway, > > > can't you avoid buffering by calling the *corresponding* underlying > > > method and doing the conversion on that? > > > > The problem here is that Unicode has far more line > > break characters than plain ASCII. The underlying API would > > break on ASCII lines (or even worse on those CRLF sequences > > defined by the C lib), not the ones I need for Unicode. > > Hm, can't we just use \n for now? > > > BTW, I think that we may need a new Codec class layer > > here: .readline() et al. are all text based methods, > > while the Codec base classes clearly work on all kinds of > > binary and text data. > > Not sure what you mean here. Can you explain through an example? Well, the line concept is really only applicable to text data. Binary data doesn't have lines and e.g. a ZIP codec (probably) couldn't implement this kind of method. As it turns out, only the .writelines() method needs to know what kinds of input/output data objects are used (and then only to be able to specify a joining seperator). I'll just leave things as they are for now: quite shallow w/r to the class hierarchy. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Mon Apr 10 23:34:07 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 14:34:07 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.5,2.6 In-Reply-To: <200004102114.RAA07027@eric.cnri.reston.va.us> Message-ID: Euh... this is the incorrect fix. The 0 is wrong to begin with. Mark Favas submitted a proper patch for this. See his "Revised Patches for bug report 258" posted to patches at python.org on April 4th. Cheers, -g On Mon, 10 Apr 2000, Guido van Rossum wrote: > Update of /projects/cvsroot/python/dist/src/Modules > In directory eric:/projects/python/develop/guido/src/Modules > > Modified Files: > mmapmodule.c > Log Message: > I've had complaints about the comparison "where >= 0" before -- on > IRIX, it doesn't even compile. Added a cast: "where >= (char *)0". > > > Index: mmapmodule.c > =================================================================== > RCS file: /projects/cvsroot/python/dist/src/Modules/mmapmodule.c,v > retrieving revision 2.5 > retrieving revision 2.6 > diff -C2 -r2.5 -r2.6 > *** mmapmodule.c 2000/04/05 14:15:31 2.5 > --- mmapmodule.c 2000/04/10 21:14:05 2.6 > *************** > *** 2,6 **** > / Author: Sam Rushing > / Hacked for Unix by A.M. Kuchling > ! / $Id: mmapmodule.c,v 2.5 2000/04/05 14:15:31 fdrake Exp $ > > / mmapmodule.cpp -- map a view of a file into memory > --- 2,6 ---- > / Author: Sam Rushing > / Hacked for Unix by A.M. Kuchling > ! / $Id: mmapmodule.c,v 2.6 2000/04/10 21:14:05 guido Exp $ > > / mmapmodule.cpp -- map a view of a file into memory > *************** > *** 119,123 **** > char * where = (self->data+self->pos); > CHECK_VALID(NULL); > ! if ((where >= 0) && (where < (self->data+self->size))) { > value = (char) *(where); > self->pos += 1; > --- 119,123 ---- > char * where = (self->data+self->pos); > CHECK_VALID(NULL); > ! if ((where >= (char *)0) && (where < (self->data+self->size))) { > value = (char) *(where); > self->pos += 1; > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 23:43:03 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 17:43:03 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.5,2.6 In-Reply-To: Your message of "Mon, 10 Apr 2000 14:34:07 PDT." References: Message-ID: <200004102143.RAA07181@eric.cnri.reston.va.us> > Euh... this is the incorrect fix. The 0 is wrong to begin with. > > Mark Favas submitted a proper patch for this. See his "Revised Patches for > bug report 258" posted to patches at python.org on April 4th. Sigh. You're right. I've seen two patches to mmapmodule.c since he posted that patch, and no comments on his patch, so I thought his patch was already incorporated. I was wrong. Note that this module still gives 6 warnings on VC6.0, all C4018: '>' or '>=' signed/unsigned mismatch. I wish someone gave me a patch for that too. Unrelated: _sre.c also has a bunch of VC6 warnings -- all C4761, integral size mismatch in argument: conversion supplied. This is all about the calls to SRE_IS_DIGIT and SRE_IS_SPACE. The error occurs 8 times on 4 different lines, and is reported in a cyclic fashion: 106, 108, 110, 112, 106, 108, ..., etc., probably due to sre's recursive self-include tricks? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 11 00:11:26 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 18:11:26 -0400 Subject: [Python-Dev] 1.6a2 prerelease for Windows Message-ID: <200004102211.SAA07363@eric.cnri.reston.va.us> I've made a prerelease of the Windows installer available through the python.org/1.6 webpage (the link is in the paragraph *below* the a1 downloads). This is mostly to give Mark Hammond an opportunity to prepare win32all build 131, to deal with the changed location of the python16.dll file. Hey, it's still alpha software! --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Tue Apr 11 01:00:48 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 11 Apr 2000 09:00:48 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F22FC0.C975290C@tismer.com> Message-ID: > If it is so, then there is in fact a problem left > in the Kernel. > Mark, did you use an extension? I tried to explain this in private email: This is pure Python code. The parser module is the only extension being used. The crash _always_ occurs as a frame object is being de-allocated, and _always_ happens as a builtin list object (a local variable) is de-alloced by the frame. Always the same line of Python code, always the same line of C code, always the exact same failure. Mark. From mhammond at skippinet.com.au Tue Apr 11 01:41:16 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 11 Apr 2000 09:41:16 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <14577.64442.47034.907133@goon.cnri.reston.va.us> Message-ID: [Sorry - missed this bit] > PS Mark: Is the transformer.py you attached different > from the one in > the nondist/src/Compiler tree? It looks like the only > differences are > with the whitespace. The attached version is simply the "release" P2C transformer.py with .append args fixed. I imagine it is very close to the CVS version (and indeed I know for a fact that the CVS version also crashes). My initial testing showed the CVS compiler did _not_ trigger this bug (even though code that uses an identical transformer.py does), so I just dropped back to P2C and stopped when I saw it :-) Mark. From bwarsaw at python.org Tue Apr 11 01:48:51 2000 From: bwarsaw at python.org (bwarsaw at python.org) Date: Mon, 10 Apr 2000 19:48:51 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: <14578.26723.857270.63150@anthem.cnri.reston.va.us> > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, >>>>> "GS" == Greg Stein writes: GS> +1 on concept, -1 on the patch :-) Well, that's good, because I /knew/ the patch was a quick hack (which is why I posted it to python-dev and not patches :). Since there's been generally positive feedback on the idea, I think I'll flesh it out a bit. GS> And note that the getattro/setattro is preferred. It is easy GS> to extract the char* from them; the other direction requires GS> construction of an object. Good point. >... > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyObject_GetAttrString(im->im_func, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); GS> Why do you mask this second error with the AttributeError? GS> Seems that you should just leave whatever is there (typically GS> an AttributeError, but maybe not!). Good point here, but... > + rtn = PyMember_Get((char *)op, func_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyDict_GetItemString(op->func_dict, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); GS> Again, with the masking... ...here I don't want the KeyError to leak through the getattr() call. If you do "print func.non_existent_attr" wouldn't you want an AttributeError instead of a KeyError? Maybe it should explicitly test for KeyError rather than masking any error coming back from PyDict_GetItemString()? Or better yet (based on your suggestion below), it should do a PyMapping_HasKey() test, raise an AttributeError if not, then just return PyMapping_GetItemString(). >... > + else if (strcmp(name, "func_dict") == 0) { > + if (value == NULL || !PyDict_Check(value)) { > + PyErr_SetString( > + PyExc_TypeError, > + "func_dict must be set to a dict object"); GS> This raises an interesting thought. Why not just require the GS> mapping protocol? Good point again. -Barry From gstein at lyra.org Tue Apr 11 03:37:45 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 18:37:45 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14578.26723.857270.63150@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000 bwarsaw at python.org wrote: >... > GS> And note that the getattro/setattro is preferred. It is easy > GS> to extract the char* from them; the other direction requires > GS> construction of an object. > > Good point. Oh. Also, I noticed that you removed a handy optimization from the getattr function. Testing a character for '_' *before* calling strcmp() will save a good chunk of time, especially considering how often this function is used. Basically, review whether a quick test can save a strmp() call (and can be easily integrated). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Tue Apr 11 03:12:10 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 18:12:10 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14578.26723.857270.63150@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000 bwarsaw at python.org wrote: >... > >... > > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > > + if (rtn == NULL) { > > + PyErr_Clear(); > > + rtn = PyObject_GetAttrString(im->im_func, name); > > + if (rtn == NULL) > > + PyErr_SetString(PyExc_AttributeError, name); > > GS> Why do you mask this second error with the AttributeError? > GS> Seems that you should just leave whatever is there (typically > GS> an AttributeError, but maybe not!). > > Good point here, but... > > > + rtn = PyMember_Get((char *)op, func_memberlist, name); > > + if (rtn == NULL) { > > + PyErr_Clear(); > > + rtn = PyDict_GetItemString(op->func_dict, name); > > + if (rtn == NULL) > > + PyErr_SetString(PyExc_AttributeError, name); > > GS> Again, with the masking... > > ...here I don't want the KeyError to leak through the getattr() call. Ah! Subtle difference in the code there :-) I agree with you, on remapping the second one. I don't think the first needs to be remapped, however. > If you do "print func.non_existent_attr" wouldn't you want an > AttributeError instead of a KeyError? Maybe it should explicitly test > for KeyError rather than masking any error coming back from > PyDict_GetItemString()? Or better yet (based on your suggestion > below), it should do a PyMapping_HasKey() test, raise an > AttributeError if not, then just return PyMapping_GetItemString(). Seems that you could just do the PyMapping_GetItemString() and remap the error *if* it occurs. Presumably, the exception is the infrequent case and can stand to be a bit slower. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Tue Apr 11 02:58:39 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 17:58:39 -0700 (PDT) Subject: [Python-Dev] transformer.py changes? (was: Crash in new "trashcan" mechanism) In-Reply-To: Message-ID: On Tue, 11 Apr 2000, Mark Hammond wrote: > [Sorry - missed this bit] > > > PS Mark: Is the transformer.py you attached different > > from the one in > > the nondist/src/Compiler tree? It looks like the only > > differences are > > with the whitespace. > > The attached version is simply the "release" P2C transformer.py with > .append args fixed. I imagine it is very close to the CVS version > (and indeed I know for a fact that the CVS version also crashes). > > My initial testing showed the CVS compiler did _not_ trigger this > bug (even though code that uses an identical transformer.py does), > so I just dropped back to P2C and stopped when I saw it :-) Hrm. I fixed those things in the P2C CVS version. Guess I'll have to do a diff to see if there are any other changes... Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw at python.org Tue Apr 11 07:08:49 2000 From: bwarsaw at python.org (bwarsaw at python.org) Date: Tue, 11 Apr 2000 01:08:49 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> <200004101602.MAA00590@eric.cnri.reston.va.us> Message-ID: <14578.45921.289078.190085@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Here I have a question. Should this really change F.a, or GvR> should it change the method bound to f only? You implement GvR> the former, but I'm not sure if those semantics are right -- GvR> if I have two instances, f1 and f2, and you change f2.a.spam, GvR> I'd be surprised if f1.a.spam got changed as well (since f1.a GvR> and f2.a are *not* the same thing -- they are not shared. GvR> f1.a.im_func and f2.a.im_func are the same thing, but f1.a GvR> and f2.a are distinct! As are f1.a and f1.a! :) GvR> I would suggest that you only allow setting attributes via GvR> the class or via a function. (This means that you must still GvR> implement the pass-through on method objects, but reject it GvR> if the method is bound to an instance.) Given that, Python should probably raise a TypeError if an attempt is made to set an attribute on a bound method object. However, it should definitely succeed to /get/ an attribute on a bound method object. I'm not 100% sure that setting bound-method-attributes should be illegal, but we can be strict about it now and see if it makes sense to loosen the restriction later. Here's a candidate for Lib/test/test_methattr.py which should print a bunch of `1's. I'll post the revised diffs (taking into account GvR's and GS's suggestions) tomorrow after I've had a night to sleep on it. -Barry -------------------- snip snip -------------------- from test_support import verbose class F: def a(self): pass def b(): pass # setting attributes on functions try: b.blah except AttributeError: pass else: print 'did not get expected AttributeError' b.blah = 1 print b.blah == 1 print 'blah' in dir(b) # setting attributes on unbound methods try: F.a.blah except AttributeError: pass else: print 'did not get expected AttributeError' F.a.blah = 1 print F.a.blah == 1 print 'blah' in dir(F.a) # setting attributes on bound methods is illegal f1 = F() try: f1.a.snerp = 1 except TypeError: pass else: print 'did not get expected TypeError' # but accessing attributes on bound methods is fine print f1.a.blah print 'blah' in dir(f1.a) f2 = F() print f1.a.blah == f2.a.blah F.a.wazoo = F f1.a.wazoo is f2.a.wazoo # try setting __dict__ illegally try: F.a.__dict__ = (1, 2, 3) except TypeError: pass else: print 'did not get expected TypeError' F.a.__dict__ = {'one': 111, 'two': 222, 'three': 333} print f1.a.two == 222 from UserDict import UserDict d = UserDict({'four': 444, 'five': 555}) F.a.__dict__ = d try: f2.a.two except AttributeError: pass else: print 'did not get expected AttributeError' print f2.a.four is f1.a.four is F.a.four From tim_one at email.msn.com Tue Apr 11 08:01:15 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 11 Apr 2000 02:01:15 -0400 Subject: [Python-Dev] Re: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: Message-ID: <001f01bfa37b$58df5740$27a2143f@tim> [Peter Funk] > ... > May be someone can invite you into 'python-dev'? However the archives > are open to anyone and writing to the list is also open to anybody. > Only subscription is closed. I don't know why. The explanation is to be found at the very start of the list -- before it became public . The idea was to have a much smaller group than c.l.py, and composed of people who had contributed non-trivial stuff to Python's implementation. Also a group that felt comfortable arguing with each other (any heat you may perceive on this list is purely illusory ). So the idea was definitely to discourage participation(!), but never to do things in secret. Keeping subscription closed has served its purposes pretty well, despite that the only mechanism enforcing civility here is the lack of an invitation. Elitist social manipulation at its finest . From tim_one at email.msn.com Tue Apr 11 08:01:19 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 11 Apr 2000 02:01:19 -0400 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: Message-ID: <002101bfa37b$5b2acde0$27a2143f@tim> [Peter Funk] > ... > 2. What should the new Interpreter do, if he sees a source file without a > pragma defining the language level? Python has tens of thousands of users now -- if it doesn't default to "Python 1.5.2" (however that's spelled), approximately 79.681% of them will scream. Had the language done this earlier, it would have been much more sellable to default to the current version. However, a default is *just* "a default", and platform-appropriate config mechanism (from Windows registry to cmdline flag) could be introduced to change the default. That is, 1.7 comes out and all my code runs fine without changing a thing. Then I set a global config option to "pretend every module that doesn't say otherwise has a 1.7 pragma in it", and run things again to see what breaks. As part of the process of editing the files that need to be fixed, I can take that natural opportunity to dump in a 1.7 pragma in the modules I've changed, or a 1.6 pragma in the broken modules I can't (for whatever reason) alter just yet. Two pleasant minutes later, I'll have 6,834 .py files all saying "1.7" at the top. Hmm! So when 1.8 comes out, not a one of them will use any incompatible 1.8 features. So I'll also need a global config option that says "pretend every module has a 1.8 pragma in it, *regardless* of whether it has some other pragma in it already". But that will also screw up the one .py file I forgot that had a 1.5.2 pragma in it. Iterate this process a half dozen times, and I'm afraid the end result is intractable. Seems it would be much more tractable over the long haul to default to the current version. Then every incompatible change will require changing every file that relied on the old behavior (to dump in a "no, I can't use the current version's semantics" pragma) -- but that's the situation today too. The difference is that the minimal change required to get unstuck would be trivial. A nice user (like me ) would devote their life to keeping up with incompatible changes, so would never ever have a version pragma in any file. So I vote "default to current version" -- but, *wow*, that's going to be hard to sell. Tech note: Python's front end is not structured today in such a way that it's feasible to have the parser deal with a change in the set of keywords keying off a pragma -- any given identifier today is either always or never a keyword, and that choice is hardwired into the generated parse tables. Not a reason to avoid starting this process with 1.6, just a reason to avoid adding new keywords in 1.6 (it will take some real work to overcome the front end's limitations here). go-for-it!-ly y'rs - tim From pf at artcom-gmbh.de Tue Apr 11 12:15:20 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 11 Apr 2000 12:15:20 +0200 (MEST) Subject: [Python-Dev] The purpose of the 'repr' builtin function Message-ID: Hi! Currently the wrapper classes UserList und UserString contain the following method: def __repr__(self): return repr(self.data) I wonder about the following alternatives: def __repr__(self): return self.__class__.__name__ + "(" + repr(self.data) + ")" or even more radical (here only for lists as an example): def __repr__(self): result = [self.__class__.__name__, "("] for item in self.data: result.append(repr(item)) result.append(", ") result.append(")") return "".join(result) Just a thought which jumped into my mind during the recent discussion about the purpose of the 'repr' function (float representation). Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mhammond at skippinet.com.au Tue Apr 11 17:15:16 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 01:15:16 +1000 Subject: [Python-Dev] 1.6a2 prerelease for Windows In-Reply-To: <200004102211.SAA07363@eric.cnri.reston.va.us> Message-ID: [Guido wrote] > downloads). This is mostly to give Mark Hammond an opportunity to > prepare win32all build 131, to deal with the changed > location of the > python16.dll file. Thanks! After consideration like that, how could I do anything other than get it out straight away (and if starship wasnt down it would have been a few hours ago :-) 131 is up on starship now. Actually, it looks like starship is down again (or at least under serious stress!) so the pages may not reflect this. It should be at http://starship.python.net/crew/mhammond/downloads/win32all-131.exe Mark. From guido at python.org Tue Apr 11 17:33:15 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 11:33:15 -0400 Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: Your message of "Tue, 11 Apr 2000 12:15:20 +0200." References: Message-ID: <200004111533.LAA08163@eric.cnri.reston.va.us> > Currently the wrapper classes UserList und UserString contain the > following method: > > def __repr__(self): return repr(self.data) > > I wonder about the following alternatives: > > def __repr__(self): > return self.__class__.__name__ + "(" + repr(self.data) + ")" Yes and no. It would make them behave less like their "theoretical" base class, but you're right that it's better to be honest in repr(). Their str() could still look like self.data. > or even more radical (here only for lists as an example): > > def __repr__(self): > result = [self.__class__.__name__, "("] > for item in self.data: > result.append(repr(item)) > result.append(", ") > result.append(")") > return "".join(result) What's the advantage of this? It seems designed to be faster, but I doubt that it really is -- have you timed it? I'd go for simple -- how time-critical can repr() be...? --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 11 17:48:46 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 11 Apr 2000 17:48:46 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include unicodeobject.h,2.7,2.8 References: <200004111539.LAA08510@eric.cnri.reston.va.us> Message-ID: <01ed01bfa3cd$6d324f20$34aab5d4@hagrid> > Changed PyUnicode_Splitlines() maxsplit argument to keepends. shouldn't that be "PyUnicode_SplitLines" ? (and TailMatch, IsLineBreak, etc.) From effbot at telia.com Tue Apr 11 17:57:58 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 11 Apr 2000 17:57:58 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> Message-ID: <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> > comments? (for obvious reasons, I'm especially interested in comments > from people using non-ASCII characters on a daily basis...) nobody? maybe all problems are gone after the last round of checkins? oh well, I'll rebuild again, and see what happens if I remove all kludges in my test code... From tismer at tismer.com Tue Apr 11 18:12:32 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 18:12:32 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F34EF0.73099769@tismer.com> Mark Hammond wrote: > > > Can you perhaps tell me what the call stack says? > > Is it somewhere, or are we in finalization code of the > > interpreter? > > The crash is in _Py_Dealloc - op is a pointer, but all fields > (ob_type, ob_refcnt, etc) are all 0 - hence the crash. > > Next up is list_dealloc - op is also trashed - ob_item != NULL > (hence we are in the if condition, and calling Py_XDECREF() (which > triggers the _Py_Dealloc) - ob_size ==9, but all other fields are 0. > > Next up is Py_Dealloc() > > Next up is _PyTrash_Destroy() > > Next up is frame_dealloc() > > _Py_Dealloc() > > Next up is eval_code2() - the second last line - Py_DECREF(f) to > cleanup the frame it just finished executing. > > Up the stack are lots more eval_code2() - we are just running the > code - not shutting down or anything. And you do obviously not have any threads, right? And you are in the middle of a simple, heavy computing application. Nothing with GUI events happening? That can only mean there is a bug in the Python core or in the parser module. That happens to be exposed by trashcan, but isn't trashcan's fault. Well. Trashcan might change the order of destruction a little. This *should* not be a problem. But here is a constructed situation where I can think of a problem, if we have buggy code, somewhere: Assume you have something like a tuple that holds other elements. If there is a bug, like someone is dereferencing an argument in an arg tuple, what is always an error. This error can hide for a million of years: a contains (b, c, d) The C function decref's a first, and erroneously then also one of the contained elements. If b is already deallotted by decreffing a, it has refcount zero, but that doesn't hurt, since the dead object is still there, and no mallcos have taken place (unless there is a __del__ trigered of course). This eror would never be detected. With trashcan, it could happen that destruction of a is deferred, but by chance now the delayed erroneous decref of b might happen before a's decref, and there may be mallocs in between, since I have a growing list. If my code is valid (and it appears so), then I guess we have such a situation somewhere in the core code. I-smell-some-long-nightshifts-again - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From akuchlin at mems-exchange.org Tue Apr 11 18:19:21 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 11 Apr 2000 12:19:21 -0400 (EDT) Subject: [Python-Dev] Extensible library packages Message-ID: <200004111619.MAA05881@amarok.cnri.reston.va.us> For 1.6, the XML-SIG wants to submit a few more things, mostly a small SAX implementation. This currently lives in xml.sax.*. There are other subpackages around such as xml.dom, xml.utils, and so forth, but those aren't being proposed for inclusion (too large, too specialized, or whatever reason). The problem is that, if the Python standard library includes a package named 'xml', that package name can't be extended by add-on modules (unless they install themselves into Python's library directory, which is evil). Let's say Sean McGrath or whoever creates a new subpackage; how can he install it so that the code is accessible as xml.pyxie? One option that comes to mind is to have the xml package in the standard library automatically import all the names and modules from some other package ('xml_ext'? 'xml2') in site-packages. This means that all the third-party products install on top of the same location, $(prefix)/site-packages/xml/, which is only slightly less evil. I can't think of a good way to loop through everything in site-packages/* and detect some set of the available packages as XML-related, short of importing every single package, which isn't going to fly. Can anyone suggest a good solution? Fixing this may not require changing the core in any way, but the cleanest solution isn't obvious. -- A.M. Kuchling http://starship.python.net/crew/amk/ The mind of man, though perhaps the most splendid achievement of evolution, is not, surely, that answer to every problem of the universe. Hamlet suffers, but the Gravediggers go right on with their silly quibbles. -- Robertson Davies, "Opera and Humour" From mal at lemburg.com Tue Apr 11 18:35:23 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 18:35:23 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> Message-ID: <38F3544B.57AF8C42@lemburg.com> Fredrik Lundh wrote: > > > comments? (for obvious reasons, I'm especially interested in comments > > from people using non-ASCII characters on a daily basis...) > > nobody? FYI, there currently is a discussion emerging about this on the i18n-sig list. > maybe all problems are gone after the last round of checkins? Probably not :-/ ... the last round only fixed some minor things. > oh well, I'll rebuild again, and see what happens if I remove all > kludges in my test code... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Apr 11 18:41:26 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 18:41:26 +0200 Subject: [Python-Dev] Extensible library packages References: <200004111619.MAA05881@amarok.cnri.reston.va.us> Message-ID: <38F355B6.DD1FD387@lemburg.com> "Andrew M. Kuchling" wrote: > > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > SAX implementation. This currently lives in xml.sax.*. There are > other subpackages around such as xml.dom, xml.utils, and so forth, but > those aren't being proposed for inclusion (too large, too specialized, > or whatever reason). > > The problem is that, if the Python standard library includes a package > named 'xml', that package name can't be extended by add-on modules > (unless they install themselves into Python's library directory, which > is evil). Let's say Sean McGrath or whoever creates a new subpackage; > how can he install it so that the code is accessible as xml.pyxie? You could make use of the __path__ trick in packages and then redirect the imports of subpackages to look in some predefined other areas as well (e.g. a non-package dir .../site-packages/xml-addons/). Here is how I do this in the compatibility packages for my mx series: DateTime/__init__.py: # Redirect all imports to the corresponding mx package def _redirect(mx_subpackage): global __path__ import os,mx __path__ = [os.path.join(mx.__path__[0],mx_subpackage)] _redirect('DateTime') ... Greg won't like this, but __path__ does have its merrits ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Tue Apr 11 18:33:23 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 11 Apr 2000 18:33:23 +0200 Subject: [Python-Dev] Extensible library packages References: <200004111619.MAA05881@amarok.cnri.reston.va.us> Message-ID: <025e01bfa3d3$aa182800$34aab5d4@hagrid> Andrew M. Kuchling wrote: > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > SAX implementation. > Can anyone suggest a good solution? Fixing this may not require > changing the core in any way, but the cleanest solution isn't obvious. saxlib.py ? (yes, I'm serious) From Vladimir.Marangozov at inrialpes.fr Tue Apr 11 18:37:42 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 11 Apr 2000 18:37:42 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F1E418.FF191AEE@tismer.com> from "Christian Tismer" at Apr 10, 2000 04:24:24 PM Message-ID: <200004111637.SAA01941@python.inrialpes.fr> Christian Tismer wrote: > > About extensions and Trashcan. > ... > Or, I made a mistake in this little code: > > void > _PyTrash_deposit_object(op) > PyObject *op; > { > PyObject *error_type, *error_value, *error_traceback; > > if (PyThreadState_GET() != NULL) > PyErr_Fetch(&error_type, &error_value, &error_traceback); > > if (!_PyTrash_delete_later) > _PyTrash_delete_later = PyList_New(0); > if (_PyTrash_delete_later) > PyList_Append(_PyTrash_delete_later, (PyObject *)op); > > if (PyThreadState_GET() != NULL) > PyErr_Restore(error_type, error_value, error_traceback); > } Maybe unrelated, but this code does not handle the case when PyList_Append fails. If it fails, the object needs to be deallocated as usual. Looking at the macros, I don't see how you can do that because Py_TRASHCAN_SAFE_END, which calls the above function, occurs after the finalization code... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From pf at artcom-gmbh.de Tue Apr 11 18:39:45 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 11 Apr 2000 18:39:45 +0200 (MEST) Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: <200004111533.LAA08163@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 11, 2000 11:33:15 am" Message-ID: Hi! [me:] > > or even more radical (here only for lists as an example): > > > > def __repr__(self): > > result = [self.__class__.__name__, "("] > > for item in self.data: > > result.append(repr(item)) > > result.append(", ") > > result.append(")") > > return "".join(result) Guido van Rossum: > What's the advantage of this? It seems designed to be faster, but I > doubt that it really is -- have you timed it? I'd go for simple -- > how time-critical can repr() be...? I feel sorry: The example above was nonsense. I confused 'str' with 'repr' as I quickly hacked the function above in. I erroneously thought 'repr(some_list)' calls 'str()' on the items. If I only had checked more carefully before, I would have remembered that indeed the opposite is true: Currently lists don't have '__str__' and so fall back to 'repr' on the items when 'str([....])' is used. All this is related to the recent discussion about the new annoying behaviour of Python 1.6 when (mis?)used as a Desktop calculator: Python 1.6a1 (#6, Apr 3 2000, 10:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> print [0.1, 0.2] [0.10000000000000001, 0.20000000000000001] >>> print 0.1 0.1 >>> print (0.1, 0.2) (0.10000000000000001, 0.20000000000000001) >>> print (0.1, 0.2)[0] 0.1 >>> print (0.1, 0.2)[1] 0.2 So if default behaviour of the interactive interpreter would be changed not to use 'repr()' for objects typed at the prompt (I believe Tim Peters suggested that), this wouldn't help to make lists, tuples and dictionaries containing floats more readable. I don't know how to fix this, though. :-( Regards, Peter From tismer at tismer.com Tue Apr 11 18:57:09 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 18:57:09 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004111637.SAA01941@python.inrialpes.fr> Message-ID: <38F35965.CA28C845@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > About extensions and Trashcan. > > ... > > Or, I made a mistake in this little code: > Maybe unrelated, but this code does not handle the case when > PyList_Append fails. If it fails, the object needs to be deallocated > as usual. Looking at the macros, I don't see how you can do that > because Py_TRASHCAN_SAFE_END, which calls the above function, > occurs after the finalization code... Yes, it does not handle this case for the following reasons: Reason 1) If the append does not work, then the system is apparently in a incredibly bad state, most probably broken! Note that these actions only take place when we have a recursion depth of 50 or so. That means, we already freed some memory, and now we have trouble with this probably little list. I won't touch a broken memory management. Reason 2) If the append does not work, then we are not allowed to deallocate the element at all. Trashcan was written in order to avoid crashes for too deeply nested objects. The current nesting level of 20 or 50 is of course very low, but generally I would assume that the limit is choosen for good reasons, and any deeper recursion might cause a machine crash. Under this assumption, the only thing you can do is to forget about the object. Remark ad 1): I had once changed the strategy to use a tuple construct instead. Thinking of memory problems when the shredder list must be grown, this could give an advantage. The optimum would be if the destructor data structure is never bigger than the smallest nested object. This would even allow me to recycle these for the destruction, without any malloc at all. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Vladimir.Marangozov at inrialpes.fr Tue Apr 11 18:59:07 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 11 Apr 2000 18:59:07 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35965.CA28C845@tismer.com> from "Christian Tismer" at Apr 11, 2000 06:57:09 PM Message-ID: <200004111659.SAA02051@python.inrialpes.fr> Christian Tismer wrote: > > Vladimir Marangozov wrote: > > > > Maybe unrelated, but this code does not handle the case when > > PyList_Append fails. If it fails, the object needs to be deallocated > > as usual. Looking at the macros, I don't see how you can do that > > because Py_TRASHCAN_SAFE_END, which calls the above function, > > occurs after the finalization code... > > Yes, it does not handle this case for the following reasons: > ... Not enough good reasons to segfault. I suggest you move the call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert the condition there. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at tismer.com Tue Apr 11 19:20:36 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 19:20:36 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004111659.SAA02051@python.inrialpes.fr> Message-ID: <38F35EE4.7E741801@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Vladimir Marangozov wrote: > > > > > > Maybe unrelated, but this code does not handle the case when > > > PyList_Append fails. If it fails, the object needs to be deallocated > > > as usual. Looking at the macros, I don't see how you can do that > > > because Py_TRASHCAN_SAFE_END, which calls the above function, > > > occurs after the finalization code... > > > > Yes, it does not handle this case for the following reasons: > > ... > > Not enough good reasons to segfault. I suggest you move the > call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert > the condition there. Sorry, I don't see what you are suggesting, I'm distracted. Maybe you want to submit a patch, and a few more words on what you mean and why you prefer to core dump with stack overflow? I'm busy seeking a bug in the core, not in that ridiculous code. Somewhere is a real bug, probably the one which I was seeking many time before, when I got weird crashes in the small block heap of Windows. It was never solved, and never clear if it was Python or Windows memory management. Maybe we just found another entrance to this. It smells so very familiar: many many small tuples and we crash. busy-ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From effbot at telia.com Tue Apr 11 19:22:25 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 11 Apr 2000 19:22:25 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> <38F3544B.57AF8C42@lemburg.com> Message-ID: <004d01bfa3da$820c37a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > > nobody? > > FYI, there currently is a discussion emerging about this on the > i18n-sig list. okay, I'll catch up with that one later. > > maybe all problems are gone after the last round of checkins? > > Probably not :-/ ... the last round only fixed some minor > things. hey, aren't you supposed to say "don't worry, the design is rock solid"? ;-) From mal at lemburg.com Tue Apr 11 21:25:28 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 21:25:28 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> <38F3544B.57AF8C42@lemburg.com> <004d01bfa3da$820c37a0$34aab5d4@hagrid> Message-ID: <38F37C28.4E6D99F@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > > nobody? > > > > FYI, there currently is a discussion emerging about this on the > > i18n-sig list. > > okay, I'll catch up with that one later. > > > > maybe all problems are gone after the last round of checkins? > > > > Probably not :-/ ... the last round only fixed some minor > > things. > > hey, aren't you supposed to say "don't worry, the design > is rock solid"? ;-) Things are hard to get right when you have to deal with backward *and* forward compatibility, interoperability and user-friendliness all at the same time... but we'll keep trying ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From collins at seal.aero.org Tue Apr 11 22:03:47 2000 From: collins at seal.aero.org (Jeff Collins) Date: Tue, 11 Apr 2000 13:03:47 -0700 Subject: [Python-Dev] Python for small platforms Message-ID: <14579.25763.434844.257544@malibu.aero.org> I've just had the chance to examine the unicode implementation and was surprised by the size of the code introduced - not just by the size of the database extension module (which I understand Christian Tismer is optimizing and which I assume can be configured away), but in particular by the size of the additional objects (unicodeobject.c, unicodetype.c). These additional objects alone contribute approximately 100K to the resulting executable. On desktop systems, this is not of much concern and suggestions have been made previously to reduce this if necessary (shared extension modules and possibly a shared VM - libpython.so). However, on small embedded systems (eg, PalmIII), this additional code is tremendous. The current size of the python-1.5.2-pre-unicode VM (after removal of float and complex objects with more reductions to come) on the PalmIII is 240K (already huge by Palm standards). (For reference, the size of python-1.5.1 on the PalmIII is 160K, after removal of the compiler, parser, float/long/complex objects.) With the unicode additions, this value jumps to 340K. The upshot of this is that for small platforms on which I am working, unicode support will have to be removed. My immediated concern is that unicode is getting so embedded in python that it will be difficult to extract. The approach I've taken for removing "features" (like float objects): 1) removes the feature with WITHOUT_XXX #ifdef/#endif decorations, where XXX denotes the removable feature (configurable in config.h) 2) preserves the python API: builtin functions, C API, PyArg_Parse, print format specifiers, etc., raise MissingFeatureError if attempts are made to use them. Of course, the API associated with the removed feature is no longer present. 3) protects the reduced VM: all reads (via marshal, compile, etc.) involving source/compiled python code will fail with a MissingFeatureError if the reduced VM doesn't support it. 4) does not yet support a MissingFeatureError in the tokenizer if, say, 2.2 (for removed floats) is entered on the python command line. This instead results in a SyntaxError indicating a problem with the decimal point. It appears that another error token would have to be added to support this error. Of course, I may have missed something, but if the above appears to be a reasonable approach, I can supply patches (at least for floats and complexes) for further discussion. In the longer term, it would be helpful if developers would follow this (or a similar agreed upon approach) when adding new features. This would reduce the burden of maintaining python for small embedded platforms. Thanks, Jeff From guido at python.org Tue Apr 11 22:29:16 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 16:29:16 -0400 Subject: [Python-Dev] ANNOUNCE: Python 1.6 alpha 2 Message-ID: <200004112029.QAA09762@eric.cnri.reston.va.us> I've just released a source tarball and a Windows installer for Python 1.6 alpha 2 to the Python website: http://www.python.org/1.6/ If you missed the announcement for 1.6a1, probably the biggest news is Unicode support. More news is on the above webpage; Unicode is being discussed in the i18n-sig. Most changes since 1.6a1 affect either details of the Unicode support, or details of what the Windows installer installs where. Note: this is an alpha release. Some of the code is very rough! Please give it a try with your favorite Python application, but don't trust it for production use yet. I plan to release several more alpha and beta releases over the next two months, culminating in an 1.6 final release before June first. We need your help to make the final 1.6 release as robust as possible -- please test this alpha release!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 11 23:18:02 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 17:18:02 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils README.txt,1.9,1.10 In-Reply-To: Your message of "Tue, 11 Apr 2000 17:17:01 EDT." <200004112117.RAA02446@thrak.cnri.reston.va.us> References: <200004112117.RAA02446@thrak.cnri.reston.va.us> Message-ID: <200004112118.RAA09957@eric.cnri.reston.va.us> You realize that that README didn't make it into 1.6a2, right? Shouldn't be a problem. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Tue Apr 11 23:31:46 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 23:31:46 +0200 Subject: [Python-Dev] Python for small platforms References: <14579.25763.434844.257544@malibu.aero.org> Message-ID: <38F399C2.642C1127@lemburg.com> Jeff Collins wrote: > > The approach I've taken for removing "features" (like float objects): > 1) removes the feature with WITHOUT_XXX #ifdef/#endif decorations, > where XXX denotes the removable feature (configurable in config.h) > 2) preserves the python API: builtin functions, C API, PyArg_Parse, > print format specifiers, etc., raise MissingFeatureError if > attempts are made to use them. Of course, the API associated > with the removed feature is no longer present. > 3) protects the reduced VM: all reads (via marshal, compile, etc.) > involving source/compiled python code will fail with > a MissingFeatureError if the reduced VM doesn't support it. > 4) does not yet support a MissingFeatureError in the tokenizer > if, say, 2.2 (for removed floats) is entered on the python > command line. This instead results in a SyntaxError > indicating a problem with the decimal point. It appears that > another error token would have to be added to support > this error. Wouldn't it be simpler to replace the parts in question with dummy replacements ? The dummies could then raise appropriate exceptions as needed. This would work for float, complex and Unicode objects which all have a defined API. The advantage of this approach is that you don't need to maintain separate patches for these parts (which is a pain) and that you can provide drop-in archives which are easy to install: simply unzip over the full source tree and recompile. > Of course, I may have missed something, but if the above appears to be > a reasonable approach, I can supply patches (at least for floats and > complexes) for further discussion. In the longer term, it would be > helpful if developers would follow this (or a similar agreed upon > approach) when adding new features. This would reduce the burden of > maintaining python for small embedded platforms. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Wed Apr 12 01:28:01 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 11 Apr 2000 16:28:01 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] add string precisions to PyErr_Format calls In-Reply-To: <14579.11701.733010.789688@amarok.cnri.reston.va.us> Message-ID: On Tue, 11 Apr 2000, Andrew M. Kuchling wrote: > Greg Stein writes: > >Wouldn't it be best to simply fix PyErr_Format so that we don't have to > >continue to worry about buffer overruns? > > A while ago I suggested using nsprintf() in PyErr_Format, but that > means stealing the implementation from Apache for those platforms > where libc doesn't include nsprintf(). Haven't done it yet... Seems like it would be cake to write one that took *only* the %d and %s (unadorned) modifiers. We wouldn't need anything else, would we? [ ... grep'ing the source ... ] I see the following format codes which would need to change: %.###s -- switch to %s %i -- switch to %d %c -- hrm. probably need to support this (in stringobject.c) %x -- maybe switch to %d? (in stringobject.c) The last two are used once, both in stringobject.c. I could see a case for revising that call use just %s and %d. One pass to count the length, alloc, then one pass to fill in. The second pass could actually be handled by vsprintf() since we know the buffer is large enough. The only tricky part would be determining the max length for %d. For a 32-bit value, it is 10 digits; for 64-bit value, it is 20 digits. I'd say allocate room for 20 digits regardless of platform and be done with it. Maybe support %%, but I didn't see that anywhere. Somebody could add support when the need arises. Last problem: backwards compat for third-party modules using PyErr_Format. IMO, leave PyErr_Format for them (they're already responsible for buffer overruns (or not) since PyErr_Format isn't helping them). The new one would be PyErr_SafeFormat. Recommend the Safe version, deprecate the unsafe one. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw at cnri.reston.va.us Wed Apr 12 01:22:14 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 11 Apr 2000 19:22:14 -0400 (EDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes Message-ID: <14579.45990.603625.434317@anthem.cnri.reston.va.us> Here's the second go at adding arbitrary attribute support to function and method objects. Note that this time it's illegal (TypeError) to set an attribute on a bound method object; getting an attribute on a bound method object returns the value on the underlying function object. First the diffs, then the test case and test output. Enjoy, -Barry -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: methdiff.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_funcattrs.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_funcattrs URL: From mhammond at skippinet.com.au Wed Apr 12 01:54:50 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 09:54:50 +1000 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> Message-ID: > > comments? (for obvious reasons, I'm especially > interested in comments > > from people using non-ASCII characters on a daily basis...) > > nobody? Almost certainly not. a) Unicode objects are very new and not everyone has the time to fiddle with them, and b) many of us only speak English. So we need _you_ to tell us what the problems were/are. Dont wait for us to find them - explain them to us. At least we than have a change of sympathizing, even if we can not directly relate the experiences... > maybe all problems are gone after the last round of checkins? > oh well, I'll rebuild again, and see what happens if I remove all > kludges in my test code... OK - but be sure to let us know :-) Mark. From mhammond at skippinet.com.au Wed Apr 12 02:04:22 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 10:04:22 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35EE4.7E741801@tismer.com> Message-ID: To answer Chris' earlier question: No threads, no gui, no events. The "parser" module is the only builtin module (apart from the obvious - ntpath etc) Greg and/or Bill can correct me if I am wrong - it is just P2C, and it is just console based, mainline procedural code. It _is_ highly recursive tho (and I believe this will turn out to be the key factor in the crash) > Somewhere is a real bug, probably the one which I was > seeking many time before, when I got weird crashes in the small > block heap of Windows. It was never solved, and never clear if > it was Python or Windows memory management. I am confident that this problem was my fault, in that I was releasing a different version of the MFC DLLs than I had actually built with. At least everyone with a test case couldnt repro it after the DLL update. This new crash is so predictable and always with the same data that I seriously doubt the problem is in any way related. > Maybe we just found another entrance to this. > It smells so very familiar: many many small tuples and we crash. Lists this time, but I take your point. Ive got a busy next few days, so it is still exists after that I will put some more effort into it. Mark. From mhammond at skippinet.com.au Wed Apr 12 02:07:43 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 10:07:43 +1000 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <38F37C28.4E6D99F@lemburg.com> Message-ID: [Marc] > Things are hard to get right when you have to deal with > backward *and* forward compatibility, interoperability and > user-friendliness all at the same time... but we'll keep > trying ;-) Let me say publically that I think you have done a fine job, and obviously have put lots of thought and effort into it. If parts of the design turn out to be less than ideal (and subsequently changed before 1.6 is real) then this will not detract from your excellent work. Well done! [And also to Fredrik, whose code was the basis for the Unicode object itself - that was a nice piece of code too!] Aww-heck-I-love-all-you-guys--ly, Mark. From gward at mems-exchange.org Wed Apr 12 04:10:18 2000 From: gward at mems-exchange.org (Greg Ward) Date: Tue, 11 Apr 2000 22:10:18 -0400 Subject: [Python-Dev] How *does* Python determine sys.prefix? Message-ID: <20000411221018.A2587@mems-exchange.org> Ooh, here's a yucky problem. Last night, I installed Oliver Andrich's Python 1.5.2 RPM on my Linux box at home, so now I have two Python installations there: * my build, in /usr/local/python and /usr/local/python.i86-linux (I need to test Distutils in the prefix != exec_prefix case) * Oliver's RPM, in /usr I have a symlink /usr/local/bin/python pointing to ../../python.i86-linux/bin/python, and /usr/local/bin is first in my path: $ ls -lF `which python` lrwxrwxrwx 1 root root 30 Aug 28 1999 /usr/local/bin/python -> ../python.i86-linux/bin/python* Since I installed the RPM, /usr/local/bin/python reports an incorrect prefix: $ /usr/local/bin/python Python 1.5.2 (#1, Jun 20 1999, 19:56:42) [GCC 2.7.2.3] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr/local/bin/../python.i86-linux') Essentially the same thing if I run it directly, not through the symlink: $ /usr/local/python.i86-linux/bin/python Python 1.5.2 (#1, Jun 20 1999, 19:56:42) [GCC 2.7.2.3] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr/local/python.i86-linux') /usr/bin/python gets it right, though: $ /usr/bin/python Python 1.5.2 (#1, Apr 18 1999, 16:03:16) [GCC pgcc-2.91.60 19981201 (egcs-1.1.1 on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr') This strikes me as a pretty reasonable and straightforward way to have multiple Python installations; if Python is fooled into getting the wrong sys.prefix, then the Distutils are going to have a much tougher job! Don't tell me I have to write my own prefix-finding code now... (And no, I have not tried this under 1.6 yet.) Damn and blast my last-minute pre-release testing... I should have just released the bloody thing and let the bugs fly. Oh hell, I think I will anyways. Greg -- Greg Ward - software developer gward at mems-exchange.org MEMS Exchange / CNRI voice: +1-703-262-5376 Reston, Virginia, USA fax: +1-703-262-5367 From janssen at parc.xerox.com Wed Apr 12 04:17:38 2000 From: janssen at parc.xerox.com (Bill Janssen) Date: Tue, 11 Apr 2000 19:17:38 PDT Subject: [Python-Dev] Re: ANNOUNCE: Python 1.6 alpha 2 In-Reply-To: Your message of "Tue, 11 Apr 2000 13:29:51 PDT." <200004112029.QAA09762@eric.cnri.reston.va.us> Message-ID: <00Apr11.191729pdt."3438"@watson.parc.xerox.com> ILU seems to work fine with it. Bill From bwarsaw at cnri.reston.va.us Wed Apr 12 04:34:04 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 11 Apr 2000 22:34:04 -0400 (EDT) Subject: [Python-Dev] How *does* Python determine sys.prefix? References: <20000411221018.A2587@mems-exchange.org> Message-ID: <14579.57500.195708.720145@anthem.cnri.reston.va.us> >>>>> "GW" == Greg Ward writes: GW> Ooh, here's a yucky problem. Last night, I installed Oliver GW> Andrich's Python 1.5.2 RPM on my Linux box at home, so now I GW> have two Python installations there: Greg, I don't know why it's finding the wrong landmark. Perhaps the first test for running out of the build directory is tripping up? What happens if you remove /usr/lib/python$VERSION/string.py? If possible you should step through calculate_path() in getpath.c -- this implements the search through the file system for the landmarks. -Barry From gward at python.net Wed Apr 12 04:34:12 2000 From: gward at python.net (Greg Ward) Date: Tue, 11 Apr 2000 22:34:12 -0400 Subject: [Python-Dev] ANNOUNCE: Distutils 0.8 released Message-ID: <20000411223412.A643@beelzebub> Python Distribution Utilities release 0.8 April 11, 2000 The Python Distribution Utilities, or Distutils for short, are a collection of modules that aid in the development, distribution, and installation of Python modules. (It is intended that ultimately the Distutils will grow up into a system for distributing and installing whole Python applications, but for now their scope is limited to module distributions.) The Distutils are a standard part of Python 1.6; if you are running 1.6, you don't need to install the Distutils separately. This release is primarily so that you can add the Distutils to a Python 1.5.2 installation -- you will then be able to install modules that require the Distutils, or use the Distutils to distribute your own modules. More information is available at the Distutils web page: http://www.python.org/sigs/distutils-sig/ and in the README.txt included in the Distutils source distribution. You can download the Distutils from http://www.python.org/sigs/distutils-sig/download.html Trivial patches can be sent to me (Greg Ward) at gward at python.net. Larger patches should be discussed on the Distutils mailing list: distutils-sig at python.org. Here are the changes in release 0.8, if you're curious: * some incompatible naming changes in the command classes -- both the classes themselves and some key class attributes were renamed (this will break some old setup scripts -- see README.txt) * half-hearted, unfinished moves towards backwards compatibility with Python 1.5.1 (the 0.1.4 and 0.1.5 releases were done independently, and I still have to fold those code changes in to the current code) * added ability to search the Windows registry to find MSVC++ (thanks to Robin Becker and Thomas Heller) * renamed the "dist" command to "sdist" and introduced the "manifest template" file (MANIFEST.in), used to generate the actual manifest * added "build_clib" command to build static C libraries needed by Python extensions * fixed the "install" command -- we now have a sane, usable, flexible, intelligent scheme for doing standard, alternate, and custom installations (and it's even documented!) (thanks to Fred Drake and Guido van Rossum for design help) * straightened out the incompatibilities between the UnixCCompiler and MSVCCompiler classes, and cleaned up the whole mechanism for compiling C code in the process * reorganized the build directories: now build to either "build/lib" or "build/lib.", with temporary files (eg. compiler turds) in "build/temp." * merged the "install_py" and "install_ext" commands into "install_lib" -- no longer any sense in keeping them apart, since pure Python modules and extension modules build to the same place * added --debug (-g) flag to "build_*" commands, and make that carry through to compiler switches, names of extensions on Windows, etc. * fixed many portability bugs on Windows (thanks to many people) * beginnings of support for Mac OS (I'm told that it's enough for the Distutils to install itself) (thanks to Corran Webster) * actually pay attention to the "--rpath" option to "build_ext" (thanks to Joe Van Andel for spotting this lapse) * added "clean" command (thanks to Bastien Kleineidam) * beginnings of support for creating built distributions: changes to the various build and install commands to support it, and added the "bdist" and "bdist_dumb" commands * code reorganization: split core.py up into dist.py and cmd.py, util.py into *_util.py * removed global "--force" option -- it's now up to individual commands to define this if it makes sense for them * better error-handling (fewer extravagant tracebacks for errors that really aren't the Distutils' fault -- Greg Ward - just another Python hacker gward at python.net http://starship.python.net/~gward/ All the world's a stage and most of us are desperately unrehearsed. From jon at dgs.monash.edu.au Wed Apr 12 04:40:23 2000 From: jon at dgs.monash.edu.au (Jonathan Giddy) Date: Wed, 12 Apr 2000 12:40:23 +1000 (EST) Subject: [Python-Dev] Re: ANNOUNCE: Python 1.6 alpha 2 In-Reply-To: <"00Apr11.191729pdt.3438"@watson.parc.xerox.com> from "Bill Janssen" at Apr 11, 2000 07:17:38 PM Message-ID: <200004120240.MAA11342@nexus.csse.monash.edu.au> Bill Janssen declared: > >ILU seems to work fine with it. > >Bill Without wishing to jinx this good news, isn't the release of 1.6 the appropriate time to remove the redundant thread.h file? Jon. From Vladimir.Marangozov at inrialpes.fr Wed Apr 12 05:18:26 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:18:26 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35EE4.7E741801@tismer.com> from "Christian Tismer" at Apr 11, 2000 07:20:36 PM Message-ID: <200004120318.FAA06750@python.inrialpes.fr> Christian Tismer wrote: > > Vladimir Marangozov wrote: > > > > Not enough good reasons to segfault. I suggest you move the > > call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert > > the condition there. > > Sorry, I don't see what you are suggesting, I'm distracted. I was thinking about the following. Change the macros in object.h from: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ else \ _PyTrash_deposit_object((PyObject*)op);\ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ to: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ where _PyTrash_deposit_object returns 0 on success, -1 on failure. This gives another last chance to the system to finalize the object, hoping that the stack won't overflow. :-) My point is that it is better to control whether _PyTrash_deposit_object succeeds or not (and it may fail because of PyList_Append). If this doesn't sound acceptable (because of the possible stack overflow) it would still be better to abort in _PyTrash_deposit_object with an exception "stack overflow on recursive finalization" when PyList_Append fails. Leaving it unchecked is not nice -- especially in such extreme situations. Currently, if something fails, the object is not finalized (leaking memory). Ok, so be it. What's not nice is that this happens silently which is not the kind of tolerance I would accept from the Python runtime. As to the bug: it's curious that, as Mark reported, without the trashcan logic, things seem to run fine. The trashcan seems to provoke (ok, detect ;) some erroneous situation. I'd expect that if the trashcan macros are implemented as above, the crash will go away (which wouldn't solve the problem and would obviate the trashcan in the first place :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov at inrialpes.fr Wed Apr 12 05:34:48 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:34:48 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <200004120318.FAA06750@python.inrialpes.fr> from "Vladimir Marangozov" at Apr 12, 2000 05:18:26 AM Message-ID: <200004120334.FAA06784@python.inrialpes.fr> Of course, this Vladimir Marangozov wrote: > > to: > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > { \ > ++_PyTrash_delete_nesting; \ > if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > was meant to be this: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov at inrialpes.fr Wed Apr 12 05:54:13 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:54:13 +0200 (CEST) Subject: [Python-Dev] trashcan and PR#7 Message-ID: <200004120354.FAA06834@python.inrialpes.fr> While I'm at it, maybe the same recursion control logic could be used to remedy (most probably in PyObject_Compare) PR#7: "comparisons of recursive objects" reported by David Asher? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein at lyra.org Wed Apr 12 06:09:19 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 11 Apr 2000 21:09:19 -0700 (PDT) Subject: [Python-Dev] Extensible library packages In-Reply-To: <025e01bfa3d3$aa182800$34aab5d4@hagrid> Message-ID: On Tue, 11 Apr 2000, Fredrik Lundh wrote: > Andrew M. Kuchling wrote: > > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > > SAX implementation. > > > Can anyone suggest a good solution? Fixing this may not require > > changing the core in any way, but the cleanest solution isn't obvious. > > saxlib.py ? > > (yes, I'm serious) +1 When we solve the problem of installing items into "core" Python packages, then we can move saxlib.py (along with the rest of the modules in the standard library). Cheers, -g -- Greg Stein, http://www.lyra.org/ From pf at artcom-gmbh.de Wed Apr 12 07:43:59 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 07:43:59 +0200 (MEST) Subject: [Python-Dev] Extensible library packages In-Reply-To: <200004111619.MAA05881@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Apr 11, 2000 12:19:21 pm" Message-ID: Hi! Andrew M. Kuchling: [...] > The problem is that, if the Python standard library includes a package > named 'xml', ... [...] > Can anyone suggest a good solution? Fixing this may not require > changing the core in any way, but the cleanest solution isn't obvious. I dislike the idea of having user visible packages in the standard library too. As Fredrik already suggested, putting a file 'saxlib.py' into the lib, which exposes all what a user needs to know about 'sax' seems to be the best solution. Regards, Peter From tim_one at email.msn.com Wed Apr 12 09:52:01 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 12 Apr 2000 03:52:01 -0400 Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: Message-ID: <000401bfa453$fca9f1e0$ae2d153f@tim> [Peter Funk] > ... > So if default behaviour of the interactive interpreter would be changed > not to use 'repr()' for objects typed at the prompt (I believe Tim > Peters suggested that), this wouldn't help to make lists, tuples and > dictionaries containing floats more readable. Or lists, tuples and dicts of anything else either: that's what I'm getting at when I keep saying containers should "pass str() down" to containees. That it doesn't has frustrated me for years; newbies aren't bothered by it because before 1.6 str == repr for almost all builtin types, and newbies (by definition ) don't have any classes of their own overriding __str__ or __repr__. But I do, and their repr is rarely what I want to see in the shell. This is a different issue than (but related to) what the interactive prompt should use by default to format expression results. They have one key conundrum in common, though: if str() is simply passed down with no other change, then e.g. print str({"a:": "b, c", "a, b": "c"}) and (same thing in disguise) print {"a:": "b, c", "a, b": "c"} would display {a:: b, c, a, b: c} and that's darned unreadable. As far as I can tell, the only reason str(container) invokes repr on the containees today is simply to get some string quotes in output like this. That's fine so far as it goes, but leads to miserably bloated displays for containees of many types *other* than the builtin ones -- and even for string containees leads to embedded octal escape sequences all over the place. > I don't know how to fix this, though. :-( Sure you do! And we look forward to your patch . From gstein at lyra.org Wed Apr 12 10:09:30 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 01:09:30 -0700 (PDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes In-Reply-To: <14579.45990.603625.434317@anthem.cnri.reston.va.us> Message-ID: On Tue, 11 Apr 2000, Barry A. Warsaw wrote: > Here's the second go at adding arbitrary attribute support to function > and method objects. Note that this time it's illegal (TypeError) to > set an attribute on a bound method object; getting an attribute on a > bound method object returns the value on the underlying function > object. First the diffs, then the test case and test output. In the instancemethod_setattro function, it might be nice to do the speed optimization and test for sname[0] == 'i' before hitting the strcmp() calls. Oh: policy question: I would think that these attributes *should* be available in restricted mode. They aren't "sneaky" like the builtin attributes. Rather than PyMapping_Get/SetItemString()... PyObject_Get/SetItem() should be used. They apply to mappings and will be faster. Note that (internally) the PyMapping_Get/SetItemString use the latter forms (after constructing a string object(!)). ... whoops. I see that the function object doesn't use the ?etattro() variants. hrm. The stuff is looking really good! Cheers, -g -- Greg Stein, http://www.lyra.org/ From andy at reportlab.com Wed Apr 12 10:18:40 2000 From: andy at reportlab.com (Andy Robinson) Date: Wed, 12 Apr 2000 09:18:40 +0100 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <20000412035101.F38D71CE29@dinsdale.python.org> Message-ID: > > Things are hard to get right when you have to deal with > > backward *and* forward compatibility, interoperability and > > user-friendliness all at the same time... but we'll keep > > trying ;-) > > Let me say publically that I think you have done a fine job, and > obviously have put lots of thought and effort into it. If parts of > the design turn out to be less than ideal (and subsequently changed > before 1.6 is real) then this will not detract from your excellent > work. > > Well done! > > [And also to Fredrik, whose code was the basis for the Unicode > object itself - that was a nice piece of code too!] > Mark I've spent a fair bit of time converting strings and files the last few days, and I'd add that what we have now seems both rock solid and very easy to use. The remaining issues are entirely a matter of us end users trying to figure out what we should have asked for in the first place. Whether we achieve that finally before 1.6 is our problem; Marc-Andr\u00C9 and Fredrik have done a great job, and I think we are on track for providing something much more useful and extensible than (say) Java. As proof of this, someone has already contributed Japanese codecs based on the spec. - Andy Robinson From pf at artcom-gmbh.de Wed Apr 12 10:11:23 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 10:11:23 +0200 (MEST) Subject: [Python-Dev] Improving readability of interpreter expression output (was The purpose of 'repr'...) In-Reply-To: <000401bfa453$fca9f1e0$ae2d153f@tim> from Tim Peters at "Apr 12, 2000 3:52: 1 am" Message-ID: Hi! Tim Peters: [...] > This is a different issue than (but related to) what the interactive prompt > should use by default to format expression results. They have one key > conundrum in common, though: if str() is simply passed down with no other > change, then e.g. > > print str({"a:": "b, c", "a, b": "c"}) > and (same thing in disguise) > print {"a:": "b, c", "a, b": "c"} > > would display > > {a:: b, c, a, b: c} > > and that's darned unreadable. Would you please elaborate a bit more, what you have in mind with "other change" in your sentence above? > As far as I can tell, the only reason > str(container) invokes repr on the containees today is simply to get some > string quotes in output like this. That's fine so far as it goes, but leads > to miserably bloated displays for containees of many types *other* than the > builtin ones -- and even for string containees leads to embedded octal > escape sequences all over the place. > > > I don't know how to fix this, though. :-( > > Sure you do! And we look forward to your patch . No. Serious. I don't see how to fix the 'darned unreadable' output. passing 'str' down seems to be simple. But how to fix the problem above isn't obvious to me. Regards, Peter From mal at lemburg.com Wed Apr 12 10:17:02 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 10:17:02 +0200 Subject: [Python-Dev] #pragmas in Python source code Message-ID: <38F430FE.BAF40AB8@lemburg.com> There currently is a discussion about how to write Python source code in different encodings on i18n. The (experimental) solution so far has been to add a command line switch to Python which tells the compiler which encoding to expect for u"...strings..." ("...8-bit strings..." will still be used as is -- it's the user's responsibility to use the right encoding; the Unicode implementation will still assume them to be UTF-8 encoded in automatic conversions). In the end, a #pragma should be usable to tell the compiler which encoding to use for decoding the u"..." strings. What we need now, is a good proposal for handling these #pragmas... does anyone have experience with these ? Any ideas ? Here's a simple strawman for the syntax: # pragma key: value parser = re.compile( '^#\s*pragma\s+' '([a-zA-Z_][a-zA-Z0-9_]*):\s*' '(.+)' ) For the encoding this would be something like: # pragma encoding: unicode-escape The compiler would scan these pragma defs, add them to an internal temporary dictionary and use them for all subsequent code it finds during the compilation process. The dictionary would have to stay around until the original compile() call has completed (spanning recursive calls). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Wed Apr 12 11:24:09 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 02:24:09 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000401bfa260$33e6ff40$812d153f@tim> Message-ID: Sorry, i'm a little behind on this. I'll try to catch up over the next day or two. On Sun, 9 Apr 2000, Tim Peters wrote: > > Note the example from another reply of a machine with 2-bit floats. There > the user would see: > > >>> 0.75 # happens to be exactly representable on this machine > 0.8 # because that's the shortest string needed on this machine > # to get back 0.75 internally > >> > > This kind of surprise is inherent in the approach, not specific to 2-bit > machines . Okay, okay. But on a 2-bit machine you ought to be no more surprised by the above than by >>> 0.1 + 0.1 0.0 >>> 0.4 + 0.4 1.0 In fact, i suppose one could argue that 0.8 is just as honest as 0.75, as you could get 0.8 from anything in (0.625, 0.825)... or even *more* honest than 0.75, since "0.75" shows more significant digits than the precision of machine would justify. It could be argued either way. I don't see this as a fatal flaw of the 'smartrepr' method, though. After looking at the spec for java.lang.Float.toString() and the Clinger paper you mentioned, it appears to me that both essentially describe 'smartrepr', which seems encouraging. > BTW, I don't know that it will never print more digits than you type: did > you prove that? It's plausible, but many plausible claims about fp turn out > to be false. Indeed, fp *is* tricky, but i think in this case the proof actually is pretty evident -- The 'smartrepr' routine i suggested prints the representation with the fewest number of digits which converts back to the actual value. Since the thing that you originally typed converted to that value the first time around, certainly no *more* digits than what you typed are necessary to produce that value again. QED. > > - If you type in what the interpreter displays for a > > float, you can be assured of getting the same value. > > This isn't of value for most interactive use -- in general you want to see > the range of a number, not enough to get 53 bits exactly (that's beyond the > limits of human "number sense"). What do you mean by "the range of a number"? > It also has one clearly bad aspect: when > printing containers full of floats, the number of digits printed for each > will vary wildly from float to float. Makes for an unfriendly display. Yes, this is something you want to be able to control -- read on. > If the prompt's display function were settable, I'd probably plug in pprint! Since i've managed to convince Guido that such a hook might be nice, i seem to have worked myself into the position of being responsible for putting together a patch to do so... Configurability is good. It won't solve everything, but at least the flexibility provided by a "display" hook will let everybody have the ability to play whatever tricks they want. (Or, equivalently: to anyone who complains about the interpreter display, at least we have plausible grounds on which to tell them to go fix it themselves.) :) Here is what i have in mind: provide two hooks __builtins__.display(object) and __builtins__.displaytb(traceback, exception) that are called when the interpreter needs to display a result or when the top level catches an exception. Protocol is simple: 'display' gets one argument, an object, and can do whatever the heck it wants. 'displaytb' gets a traceback and an exception, and can do whatever the heck it wants. -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From fredrik at pythonware.com Wed Apr 12 11:39:03 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 11:39:03 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: Message-ID: <007801bfa462$f0f399f0$0500a8c0@secret.pythonware.com> Andy Robinson wrote: > I've spent a fair bit of time converting strings and files the > last few days, and I'd add that what we have now seems both rock solid > and very easy to use. I'm not worried about the core string types or the conversion machinery; what disturbs me is mostly the use of automagic conversions to UTF-8, which breaks the fundamental assumption that a string is a sequence of len(string) characters. "The items of a string are characters. There is no separate character type; a character is represented by a string of one item" (from the language reference) I still think the "all strings are sequences of unicode characters" strawman I posted earlier would simplify things for everyone in- volved (programmers, users, and the interpreter itself). more on this later. gotta ship some code first. From Vladimir.Marangozov at inrialpes.fr Wed Apr 12 11:47:56 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 11:47:56 +0200 (CEST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14577.63691.561040.281577@anthem.cnri.reston.va.us> from "Barry Warsaw" at Apr 10, 2000 11:52:43 AM Message-ID: <200004120947.LAA02067@python.inrialpes.fr> Barry Warsaw wrote: > > A number of people have played FAST and loose with function and method > docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are > handy because they are the one attribute on funcs and methods that are > easily writable. But as more people overload the semantics for > docstrings, we'll get collisions. I've had a number of discussions > with folks about adding attribute dictionaries to functions and > methods so that you can essentially add any attribute. Namespaces are > one honking great idea -- let's do more of those! Barry, I wonder what for... Just because there's a Python entity implemented as a C structure in which we can easily include a dict + access functions? I don't see the purpose of attaching state (vars) to an algorithm (i.e. a function). What are the benefits compared to class instances? And these special assignment rules because of the real overlap with real instances... Grrr, all this is pretty dark, conceptually. Okay, I inderstood: modules become classes, functions become instances, module variables are class variables, and classes become ... 2-nd order instances of modules. The only missing piece of the puzzle is a legal way to instantiate modules for obtaining functions and classes dynamically, because using eval(), the `new' module or The Hook is perceived as very hackish and definitely not OO. Once the puzzle would be solved, we'll discover that there would be only one additional little step towards inheritance for modules. How weird! Sounds like we're going to metaclass again... -1 until P3K. This is no so cute as it is dangerous. It opens the way to mind abuse. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik at pythonware.com Wed Apr 12 12:04:32 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 12:04:32 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <009601bfa466$807da2c0$0500a8c0@secret.pythonware.com> Vladimir Marangozov wrote: > I don't see the purpose of attaching state (vars) to an algorithm > (i.e. a function). What are the benefits compared to class instances? > > And these special assignment rules because of the real overlap with > real instances... Grrr, all this is pretty dark, conceptually. > > -1 until P3K. I agree. From tismer at tismer.com Wed Apr 12 14:08:51 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 14:08:51 +0200 Subject: [Python-Dev] trashcan and PR#7 References: <200004120354.FAA06834@python.inrialpes.fr> Message-ID: <38F46753.3759A7B6@tismer.com> Vladimir Marangozov wrote: > > While I'm at it, maybe the same recursion control logic could be > used to remedy (most probably in PyObject_Compare) PR#7: > "comparisons of recursive objects" reported by David Asher? Hey, what a good idea. You know what's happening? We are moving towards tail recursion. If we do this everywhere, Python converges towards Stackless Python. and-most-probably-a-better-one-than-mince - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From paul at prescod.net Wed Apr 12 14:20:18 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 07:20:18 -0500 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <38F46A02.3AB10147@prescod.net> Vladimir Marangozov wrote: > > ... > > I don't see the purpose of attaching state (vars) to an algorithm > (i.e. a function). A function is also an object. > What are the benefits compared to class instances? If I follow you, you are saying that whenever you need to associate information with a function, you should wrap up the function and object into a class. But the end result of this transformation could be a program in which every single function is a class. That would be incredibly annoying, especially with Python's scoping rules. In general, it may not even be possible. Consider the following cases: * I need to associate a Java-style type declaration with a method so that it can be recognized based on its type during Java method dispatch. How would you do that with instances? * I need to associate a "grammar rule" with a Python method so that the method is invoked when the parser recognizes a syntactic construct in the input data. * I need to associate an IDL declaration with a method so that a COM interface definition can be generated from the source file. * I need to associate an XPath "pattern string" with a Python method so that the method can be invoked when a tree walker discovers a particular pattern in an XML DOM. * I need to associate multiple forms of documentation with a method. They are optimized for different IDEs, environments or languages. > And these special assignment rules because of the real overlap with > real instances... Grrr, all this is pretty dark, conceptually. I don't understand what you are saying here. > Once the puzzle would be solved, we'll discover that there would be only > one additional little step towards inheritance for modules. How weird! > Sounds like we're going to metaclass again... I don't see what any of this has to do with Barry's extremely simple idea. Functions *are objects* in Python. It's too late to change that. Objects can have properties. Barry is just allowing arbitrary properties to be associated with functions. I don't see where there is anything mysterious here. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From akuchlin at mems-exchange.org Wed Apr 12 14:22:26 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 12 Apr 2000 08:22:26 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <200004120947.LAA02067@python.inrialpes.fr> References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <14580.27266.683908.216344@newcnri.cnri.reston.va.us> Vladimir Marangozov writes: >Barry, I wonder what for... In the two quoted examples, docstrings are used to store additional info about a function. SPARK uses them to contain grammar rules and the regular expressions for matching tokens. The object publisher in Zope uses the presence of a docstring to indicate whether a function or method is publicly accessible. As a third example, the optional type info being thrashed over in the Types-SIG would be another annotation for a function (though doing def f(): ... f.type = 'void' would be really clunky. >Once the puzzle would be solved, we'll discover that there would be only >one additional little step towards inheritance for modules. How weird! >Sounds like we're going to metaclass again... No, that isn't why Barry is experimenting with this -- instead, it's simply because annotating functions seems useful, but everyone uses the docstring because it's the only option. --amk From tismer at tismer.com Wed Apr 12 14:43:40 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 14:43:40 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004120318.FAA06750@python.inrialpes.fr> Message-ID: <38F46F7C.94D29561@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Vladimir Marangozov wrote: [yup, good looking patch] > where _PyTrash_deposit_object returns 0 on success, -1 on failure. This > gives another last chance to the system to finalize the object, hoping > that the stack won't overflow. :-) > > My point is that it is better to control whether _PyTrash_deposit_object > succeeds or not (and it may fail because of PyList_Append). > If this doesn't sound acceptable (because of the possible stack overflow) > it would still be better to abort in _PyTrash_deposit_object with an > exception "stack overflow on recursive finalization" when PyList_Append > fails. Leaving it unchecked is not nice -- especially in such extreme > situations. You bet that I *would* raise an exception if I could. Unfortunately the destructors have no way to report an error, and they are always called in a context where no error is expected (Py_DECREF macro). I believe this *was* quite ok, until __del__ was introduced. After that, it looks to me like a design flaw. IMHO there should not be a single function in a system that needs heap memory, and cannot report an error. > Currently, if something fails, the object is not finalized (leaking > memory). Ok, so be it. What's not nice is that this happens silently > which is not the kind of tolerance I would accept from the Python runtime. Yes but what can I do? This isn't worse than before. deletion errors die silently, this is the current concept. I don't agree with it, but I'm not the one to change policy. In that sense, trashcan was just compliant to a concept, without saying this is a good concept. :-) > As to the bug: it's curious that, as Mark reported, without the trashcan > logic, things seem to run fine. The trashcan seems to provoke (ok, detect ;) > some erroneous situation. I'd expect that if the trashcan macros are > implemented as above, the crash will go away (which wouldn't solve the > problem and would obviate the trashcan in the first place :-) I think trashcan can be made *way* smarter: Much much more better would be to avoid memory allocation in trashcan at all. I'm wondering if that would be possible. The idea is to catch a couple of objects in an earlier recursion level, and use them as containers for later objects-to-be-deleted. Not using memory at all, that's what I want. And it would avoid all messing with errors in this context. I hate Java dieing silently, since it has not enough memory to tell me that it has not enough memory :-) but-before-implementing-this-*I*-will-need-to-become-*way*-smarter - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fredrik at pythonware.com Wed Apr 12 14:50:21 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 14:50:21 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> Message-ID: <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Paul Prescod wrote: > * I need to associate a Java-style type declaration with a method so > that it can be recognized based on its type during Java method dispatch. class foo: typemap = {} def myfunc(self): pass typemap[myfunc] = typeinfo > * I need to associate a "grammar rule" with a Python method so that the > method is invoked when the parser recognizes a syntactic construct in > the input data. class foo: rules = [] def myfunc(self): pass rules.append(pattern, myfunc) > * I need to associate an IDL declaration with a method so that a COM > interface definition can be generated from the source file. class foo: idl = {} def myfunc(self): pass idl[myfunc] = "declaration" > * I need to associate an XPath "pattern string" with a Python method so > that the method can be invoked when a tree walker discovers a particular > pattern in an XML DOM. class foo: xpath = [] def myfunc(self): pass xpath.append("pattern", myfunc) From tismer at tismer.com Wed Apr 12 15:00:39 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 15:00:39 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004120334.FAA06784@python.inrialpes.fr> Message-ID: <38F47377.91306DA1@tismer.com> Mark, I know you are very busy. But I have no chance to build a debug version, and probably there are more differences. Can you perhaps try Vlad's patch? and tell me if the outcome changes? This would give me much more insight. The change affects the macros and the function _PyTrash_deposit_object which now must report an error via the return value. The macro code should be: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ And the _PyTrash_deposit_object code should be (untested): int _PyTrash_deposit_object(op) PyObject *op; { PyObject *error_type, *error_value, *error_traceback; if (PyThreadState_GET() != NULL) PyErr_Fetch(&error_type, &error_value, &error_traceback); if (!_PyTrash_delete_later) _PyTrash_delete_later = PyList_New(0); if (_PyTrash_delete_later) return PyList_Append(_PyTrash_delete_later, (PyObject *)op); else return -1; if (PyThreadState_GET() != NULL) PyErr_Restore(error_type, error_value, error_traceback); return 0; } The result of this would be really enlighting :-) ciao - chris Vladimir Marangozov wrote: > > Of course, this > > Vladimir Marangozov wrote: > > > > to: > > > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > > { \ > > ++_PyTrash_delete_nesting; \ > > if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ > > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > > > > was meant to be this: > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > { \ > ++_PyTrash_delete_nesting; \ > if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > > -- > Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr > http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at tismer.com Wed Apr 12 16:43:30 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 16:43:30 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Message-ID: <38F48B92.94477DE9@tismer.com> Fredrik Lundh wrote: > > Paul Prescod wrote: > > * I need to associate a Java-style type declaration with a method so > > that it can be recognized based on its type during Java method dispatch. > > class foo: > typemap = {} > def myfunc(self): > pass > typemap[myfunc] = typeinfo Yes, I know that nearly everything is possible to be emulated via classes. But what is so bad about an arbitrary function attribute? ciao - chris p.s.: Paul, did you know that you can use *anything* for __doc__? You could use a class instance instead which still serves as a __doc__ but has your attributes and more. Yes I know this is ugly :-)) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake at acm.org Wed Apr 12 16:47:22 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 10:47:22 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F430FE.BAF40AB8@lemburg.com> References: <38F430FE.BAF40AB8@lemburg.com> Message-ID: <14580.35962.86559.128123@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Here's a simple strawman for the syntax: ... > The compiler would scan these pragma defs, add them to an > internal temporary dictionary and use them for all subsequent > code it finds during the compilation process. The dictionary > would have to stay around until the original compile() call has > completed (spanning recursive calls). Marc-Andre, The problem with this proposal is that the pragmas are embedded in the comments; I'd rather see a new keyword and statement. It could be defined something like: pragma_atom: NAME | NUMBER | STRING+ pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* The biggest problem with embedding it in comments is that it is no longer part of the syntax tree generated by the parser. The pragmas become global to the module on a de-facto basis. While this is probably reasonable for the sorts of pragmas we've thought of so far, this seems an unnecessary restriction; future tools may support scoped pragmas to help out with selection of optimization strategies, for instance, or other applications. If we were to go with a strictly global view of pragmas, we'd need to expose the dictionary created by the parser. The parser module would need to be able to expose the dictionary and accept a dictionary when receiving a parse tree for compilation. The internals just can't be *too* internal! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson at nevex.com Wed Apr 12 16:55:55 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 12 Apr 2000 10:55:55 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: Is there any way to unify Barry's proposal for enriching doc strings with Marc-Andre's proposal for pragmas? I.e., can pragmas be doc dictionary entries on modules that have particular keys? This would make them part of the parse tree (as per Fred Drake's comments), but not require (extra) syntax changes. Greg From bwarsaw at cnri.reston.va.us Wed Apr 12 17:37:06 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 11:37:06 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Message-ID: <14580.38946.206846.261405@anthem.cnri.reston.va.us> Functions and methods are first class objects, and they already have attributes, some of which are writable. Why should __doc__ be special? Just because it was the first such attribute to have syntactic support for easily defining? Think about my proposal this way: it actual removes a restriction. What I don't like about /F's approach is that if you were building a framework, you'd now have two conventions you'd have to describe: where to find the mapping, and what keys to use in that mapping. With attributes, you've already got the former: getattr(). Plus, let's say you're handed a method object `x', would you rather do: if x.im_class.typemap[x.im_func] == 'int': ... or if x.__type__ == 'int': ... And what about function objects (as opposed to unbound methods). Where do you stash the typemap? In the module, I supposed. And if you can be passed either type of object, do you now have to do this? if hasattr(x, 'im_class'): if hasattr(x.im_class, 'typemap'): if x.im_class.typemap[x.im_func] == 'int': ... elif hasattr(x, 'func_globals'): if x.func_globals.has_key('typemap'): if x.func_globals['typemap'][x] == 'int': ... instead of the polymorphic elegance of if x.__type__ == 'int': ... Finally, think of this proposal as an evolutionary step toward enabling all kinds of future frameworks. At some point, there may be some kind of optional static type system. There will likely be some syntactic support for easily specifying the contents of the __type__ attribute. With the addition of func/meth attrs now, we can start to play with prototypes of this system, define conventions and standards, and then later when there is compiler support, simplify the definitions, but not have to change code that uses them. -Barry From akuchlin at mems-exchange.org Wed Apr 12 17:39:54 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Wed, 12 Apr 2000 11:39:54 -0400 (EDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: References: <000401bfa260$33e6ff40$812d153f@tim> Message-ID: <14580.39114.631398.101252@amarok.cnri.reston.va.us> Ka-Ping Yee writes: >Here is what i have in mind: provide two hooks > __builtins__.display(object) >and > __builtins__.displaytb(traceback, exception) Shouldn't these be in sys, along with sys.ps1 and sys.ps2? We don't want to add new display() and displaytb() built-ins, do we? --amk From pf at artcom-gmbh.de Wed Apr 12 17:37:05 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 17:37:05 +0200 (MEST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Apr 12, 2000 10:47:22 am" Message-ID: Hi! Fred L. Drake, Jr.: > M.-A. Lemburg writes: > > Here's a simple strawman for the syntax: > ... > > The compiler would scan these pragma defs, add them to an > > internal temporary dictionary and use them for all subsequent > > code it finds during the compilation process. The dictionary > > would have to stay around until the original compile() call has > > completed (spanning recursive calls). > > Marc-Andre, > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* This would defeat an important goal: backward compatibility: You can't add 'pragma division: old' or something like this to a source file, which should be able to run with both Python 1.5.2 and Py3k. This would make this mechanism useless for several important applications of pragmas. Here comes David Scherers idea into play. The relevant emails of this thread are in the archive at: > The biggest problem with embedding it in comments is that it is no > longer part of the syntax tree generated by the parser. The pragmas > become global to the module on a de-facto basis. While this is > probably reasonable for the sorts of pragmas we've thought of so far, > this seems an unnecessary restriction; future tools may support scoped > pragmas to help out with selection of optimization strategies, for > instance, or other applications. [...] IMO this is overkill. For all real applications that have been discussed so far, global pragmas are sufficient: - source file character encoding - language level - generated division operator byte codes - generated comparision operators byte codes (comparing strings and numbers) I really like Davids idea to use 'global' at module level for the purpose of pragmas. And this idea has also the advantage that Guido already wrote the idea is "kind of cute and backwards compatible". Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From bwarsaw at cnri.reston.va.us Wed Apr 12 17:56:18 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 11:56:18 -0400 (EDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes References: <14579.45990.603625.434317@anthem.cnri.reston.va.us> Message-ID: <14580.40098.690512.903519@anthem.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GS> In the instancemethod_setattro function, it might be nice to GS> do the speed optimization and test for sname[0] == 'i' before GS> hitting the strcmp() calls. Yeah, you could do that, but it complicates the code and the win seems negligable. GS> Oh: policy question: I would think that these attributes GS> *should* be available in restricted mode. They aren't "sneaky" GS> like the builtin attributes. Hmm, good point. That does simplify the code too. I wonder if the __dict__ itself should be restricted, but that doesn't seem like it would buy you much. We don't need to restrict them in classobject anyway, because they are already restricted in funcobject (which ends up getting the call anyway). It might be reasonable to relax that for arbitrary func attrs. GS> Rather than GS> PyMapping_Get/SetItemString()... PyObject_Get/SetItem() should GS> be used. They apply to mappings and will be faster. Note that GS> (internally) the PyMapping_Get/SetItemString use the latter GS> forms (after constructing a string object(!)). ... whoops. I GS> see that the function object doesn't use the ?etattro() GS> variants. hrm. Okay cool. Made these changes and `attro'd 'em too. GS> The stuff is looking really good! Thanks! -Barry From mal at lemburg.com Wed Apr 12 17:52:34 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 17:52:34 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F430FE.BAF40AB8@lemburg.com> <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: <38F49BC2.9C192C63@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > Here's a simple strawman for the syntax: > ... > > The compiler would scan these pragma defs, add them to an > > internal temporary dictionary and use them for all subsequent > > code it finds during the compilation process. The dictionary > > would have to stay around until the original compile() call has > > completed (spanning recursive calls). > > Marc-Andre, > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* > > The biggest problem with embedding it in comments is that it is no > longer part of the syntax tree generated by the parser. The pragmas > become global to the module on a de-facto basis. While this is > probably reasonable for the sorts of pragmas we've thought of so far, > this seems an unnecessary restriction; future tools may support scoped > pragmas to help out with selection of optimization strategies, for > instance, or other applications. Fine with me, but this probably isn't going to make it into 1.7 and I don't want to wait until Py3K... perhaps there is another way to implement this without adding a new keyword, e.g. we could first use some kind of hack to implement "# pragma ..." and then later on allow dropping the "#" to make full use of the new mechanism. > If we were to go with a strictly global view of pragmas, we'd need > to expose the dictionary created by the parser. The parser module > would need to be able to expose the dictionary and accept a dictionary > when receiving a parse tree for compilation. The internals just can't > be *too* internal! ;) True :-) BTW, while poking around in the tokenizer/compiler I found a serious bug in the way concatenated strings are implemented: right now the compiler expects to always find string objects, yet it could just as well receive Unicode objects or even mixed string and Unicode objects. Try it: u = (u"abc" u"abc") dumps core ! I'll fix this with the next patch set. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Apr 12 18:12:59 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 18:12:59 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4A08B.A855E69D@lemburg.com> Peter Funk wrote: > > Fred L. Drake, Jr.: > > M.-A. Lemburg writes: > > > Here's a simple strawman for the syntax: > > ... > > > The compiler would scan these pragma defs, add them to an > > > internal temporary dictionary and use them for all subsequent > > > code it finds during the compilation process. The dictionary > > > would have to stay around until the original compile() call has > > > completed (spanning recursive calls). > > > > Marc-Andre, > > The problem with this proposal is that the pragmas are embedded in > > the comments; I'd rather see a new keyword and statement. It could be > > defined something like: > > > > pragma_atom: NAME | NUMBER | STRING+ > > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* > > This would defeat an important goal: backward compatibility: You > can't add 'pragma division: old' or something like this to a source > file, which should be able to run with both Python 1.5.2 and Py3k. > This would make this mechanism useless for several important > applications of pragmas. Hmm, I don't get it: these pragmas would set variabels which make Python behave in a different way -- how do you plan to achieve backward compatibility here ? I mean, u = u"abc" raises a SyntaxError in Python 1.5.2 too... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at cnri.reston.va.us Wed Apr 12 18:37:20 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 12:37:20 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.38946.206846.261405@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <14580.42560.713427.885436@goon.cnri.reston.va.us> >>>>> "BAW" == Barry A Warsaw writes: BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. Why should BAW> __doc__ be special? Just because it was the first such BAW> attribute to have syntactic support for easily defining? I don't have a principled argument about why doc strings should be special, but I think that they should be. I think it's weird that you can change __doc__ at runtime; I would prefer that it be constant. BAW> Think about my proposal this way: it actually removes a BAW> restriction. I think this is really the crux of the matter! The proposal removes a useful restriction. The alternatives /F suggested seem clearer to me that sticking new attributes on functions and methods. Three things I like about the approach: It affords an opportunity to be very clear about how the attributes are intended to be used. I suspect it would be easier to describe with a static type system. It prevents confusion and errors that might result from unprincipled use of function attributes. Jeremy From gmcm at hypernet.com Wed Apr 12 18:56:24 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 12 Apr 2000 12:56:24 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.42560.713427.885436@goon.cnri.reston.va.us> References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <1256563909-46814536@hypernet.com> Jeremy Hylton wrote: > BAW> Think about my proposal this way: it actually removes a > BAW> restriction. > > I think this is really the crux of the matter! The proposal removes > a useful restriction. > > The alternatives /F suggested seem clearer to me that sticking new > attributes on functions and methods. Three things I like about the > approach: It affords an opportunity to be very clear about how the > attributes are intended to be used. I suspect it would be easier to > describe with a static type system. Having to be explicit about the method <-> regex / rule would severely damage SPARK's elegance. It would make Tim's doctest useless. > It prevents confusion and errors > that might result from unprincipled use of function attributes. While I'm sure I will be properly shocked and horrified when you come up with an example, in my naivety, I can't imagine what it will look like ;-). - Gordon From skip at mojam.com Wed Apr 12 19:28:04 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 12 Apr 2000 12:28:04 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.38946.206846.261405@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <14580.45604.756928.858721@beluga.mojam.com> BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. (Trying to read Fredrik's mind...) By extension, we should allow writable attributes to work for other objects. To pollute this discussion with an example from another one: i = 3.1416 i.__precision__ = 4 I haven't actually got anything against adding attributes to functions (or numbers, if it's appropriate). Just wondering out loud and playing a bit of a devil's advocate. Skip From ping at lfw.org Wed Apr 12 19:35:59 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 12:35:59 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* Wa-wa-wa-wa-wait... i thought the whole point of pragmas was that they were supposed to control the operation of the parser itself (you know, set the source character encoding and so on). So by definition they would have to happen at a different level, above the parsing. Or do we need to separate out two categories of pragmas -- pre-parse and post-parse pragmas? -- ?!ng From tismer at tismer.com Wed Apr 12 19:39:34 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 19:39:34 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <38F4B4D6.6F954CDF@tismer.com> Skip Montanaro wrote: > > BAW> Functions and methods are first class objects, and they already > BAW> have attributes, some of which are writable. > > (Trying to read Fredrik's mind...) takes too long since it isn't countable infinite... > By extension, we should allow writable attributes to work for other objects. > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > > I haven't actually got anything against adding attributes to functions (or > numbers, if it's appropriate). Just wondering out loud and playing a bit of > a devil's advocate. please let me join your hexensabbat (de|en)lighted-ly -y'rs - rapunzel -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake at acm.org Wed Apr 12 19:38:26 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 13:38:26 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: <14580.46226.990025.459426@seahag.cnri.reston.va.us> Ka-Ping Yee writes: > Wa-wa-wa-wa-wait... i thought the whole point of pragmas was > that they were supposed to control the operation of the parser > itself (you know, set the source character encoding and so on). > So by definition they would have to happen at a different level, > above the parsing. Hmm. That's one proposed use, which doesn't seem to fit well with my proposal. But I don't know that I'd think of that as a "pragma" in the general sense. I'll think about this one. I think encoding is a very special case, and I'm not sure I like dealing with it as a pragma. Are there any other (programming) languages that attempt to deal with multiple encodings? Perhaps I missed a message about it. > Or do we need to separate out two categories of pragmas -- > pre-parse and post-parse pragmas? Eeeks! We don't need too many special forms! That's ugly! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From moshez at math.huji.ac.il Wed Apr 12 19:36:14 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 12 Apr 2000 19:36:14 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > And voila! Numbers are no longer immutable. Using any numbers as keys in dicts? Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Wed Apr 12 19:45:15 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 12:45:15 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.46226.990025.459426@seahag.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > Or do we need to separate out two categories of pragmas -- > > pre-parse and post-parse pragmas? > > Eeeks! We don't need too many special forms! That's ugly! Eek indeed. I'm tempted to suggest we drop the multiple-encoding issue (i can hear the screams now). But you're right, i've never heard of another language that can handle configurable encodings right in the source code. Is it really necessary to tackle that here? Gak, what do Japanese programmers do? Has anyone seen any of that kind of source code? -- ?!ng From effbot at telia.com Wed Apr 12 19:42:24 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 19:42:24 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: Message-ID: <002401bfa4a6$778fc360$34aab5d4@hagrid> Moshe Zadka wrote: > > To pollute this discussion with an example from another one: > > > > i = 3.1416 > > i.__precision__ = 4 > > And voila! Numbers are no longer immutable. Using any > numbers as keys in dicts? so? you can use methods as keys today, you know... From skip at mojam.com Wed Apr 12 19:47:01 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 12 Apr 2000 12:47:01 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: References: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <14580.46741.757469.645439@beluga.mojam.com> Moshe> On Wed, 12 Apr 2000, Skip Montanaro wrote: >> To pollute this discussion with an example from another one: >> >> i = 3.1416 >> i.__precision__ = 4 >> Moshe> And voila! Numbers are no longer immutable. Using any numbers as Moshe> keys in dicts? Yes, and I use functions on occasion as dict keys as well. >>> def foo(): pass ... >>> d = {foo: 1} >>> print d[foo] 1 I suspect adding methods to functions won't invalidate their use in that context, nor would adding attributes to numbers. At any rate, it was just an example. Skip From moshez at math.huji.ac.il Wed Apr 12 19:44:50 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 12 Apr 2000 19:44:50 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <002401bfa4a6$778fc360$34aab5d4@hagrid> Message-ID: On Wed, 12 Apr 2000, Fredrik Lundh wrote: > so? you can use methods as keys today, you know... Actually, I didn't know. What hapens if you use a method as a key, and then change it's doc string? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jeremy at cnri.reston.va.us Wed Apr 12 19:51:32 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 13:51:32 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> Message-ID: <14580.47012.646862.615623@goon.cnri.reston.va.us> >>>>> "GMcM" == Gordon McMillan writes: [please imagine that the c is raised] BAW> Think about my proposal this way: it actually removes a BAW> restriction. [Jeremy Hylton wrote:] >> I think this is really the crux of the matter! The proposal >> removes a useful restriction. >> >> The alternatives /F suggested seem clearer to me that sticking >> new attributes on functions and methods. Three things I like >> about the approach: It affords an opportunity to be very clear >> about how the attributes are intended to be used. I suspect it >> would be easier to describe with a static type system. GMcM> Having to be explicit about the method <-> regex / rule would GMcM> severely damage SPARK's elegance. It would make Tim's doctest GMcM> useless. Do either of these examples modify the __doc__ attribute? I am happy to think of both of them as elegant abuses of the doc string. (Not sure what semantics I mean for "elegant abuse" but not pejorative.) I'm not arguing that we should change the language to prevent them from using doc strings. Fred and I were just talking, and he observed that a variant of Python that included a syntactic mechanism to specify more than one attribute (effectively, a multiple doc string syntax) might be less objectionable than setting arbitrary attributes at runtime. Neither of us could imagine just what that syntax would be. >> It prevents confusion and errors that might result from >> unprincipled use of function attributes. GMcM> While I'm sure I will be properly shocked and horrified when GMcM> you come up with an example, in my naivety, I can't imagine GMcM> what it will look like ;-). It would look really, really bad ;-). I couldn't think of a good example, so I guess this is a FUD argument. A rough sketch, though, would be a program that assigned attribute X to all functions that were to be used in a certain way. If the assignment is a runtime operation, rather than a syntactic construct that defines a static attribute, it would be possible to accidentally assign attribute X to a function that was not intended to be used that way. This connection between a group of functions and a particular behavior would depend entirely on some runtime magic with settable attributes. Jeremy From mal at lemburg.com Wed Apr 12 19:55:19 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 19:55:19 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.42560.713427.885436@goon.cnri.reston.va.us> Message-ID: <38F4B887.2C16FF03@lemburg.com> Jeremy Hylton wrote: > BAW> Think about my proposal this way: it actually removes a > BAW> restriction. > > I think this is really the crux of the matter! The proposal removes > a useful restriction. Not sure... I wouldn't mind having the ability to add attributes to all Python objects at my own liking. Ok, maybe a bit far fetched, but the idea would certainly be useful in some cases, e.g. to add new methods to built-in types or to add encoding name information to strings... > The alternatives /F suggested seem clearer to me that sticking new > attributes on functions and methods. Three things I like about the > approach: It affords an opportunity to be very clear about how the > attributes are intended to be used. I suspect it would be easier to > describe with a static type system. It prevents confusion and errors > that might result from unprincipled use of function attributes. The nice side-effect of having these function/method instance dictionaries is that they follow class inheritance. Something which is hard to do right with Fredrik's approach. I suspect that in Py3K we'll only have one type of class system: everything inherits from one global base class -- seen in that light, method attributes are really nothing unusual, since all instances would have instance dictionaries anyway (well maybe only on demand, but that's another story). Anyway, more power to you Barry :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gmcm at hypernet.com Wed Apr 12 19:56:18 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 12 Apr 2000 13:56:18 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <38F4B4D6.6F954CDF@tismer.com> Message-ID: <1256560314-47031192@hypernet.com> Christian Tismer wrote: > > > Skip Montanaro wrote: > > (Trying to read Fredrik's mind...) > > takes too long since it isn't countable infinite... Bounded, however. And therefore, um, dense ... - Gordon From paul at prescod.net Wed Apr 12 19:57:01 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 12:57:01 -0500 Subject: [Python-Dev] #pragmas and method attributes References: <38F430FE.BAF40AB8@lemburg.com> Message-ID: <38F4B8ED.8BC64F69@prescod.net> About a month ago I wrote (but did not publish) a proposal that combined #pragmas and method attributes. The reason I combined them is that in a lot of cases method "attributes" are supposed to be available in the parse tree, before the program begins to run. Here is my rough draft. ---- We've been discussing method attributes for a long time and I think that it might be worth hashing out in more detail, especially for type declaration experimentation. I'm proposing a generalization of the "decl" keyword that hasbeen kicked around in the types-sig. Other applications include Spark grammar strings, XML pattern-trigger strings, multiple language doc-strings, IDE "hints", optimization hints, associated multimedia (down with glass ttys!), IDL definitions, thread locking declarations, method visibility declarations, ... Of course some subset of attributes might migrate into Python's "core language". Decl gives us a place to experiment and get them right before we do that migration. Declarations would always be associated with functions, classes or modules. They would be simple string-keyed values in a dictionary attached to the function, class or module called __decls__. The syntax would be decl { :"value", :"value" } Key would be a Python name. Value would be any Python string. In the case of a type declaration it might be: decl {type:"def(myint: int) returns bar", french_doc:"Bonjour", english_doc: "Hello"} def func( myint ): return bar() No string interpolation or other runtime-ish evaluation is done by the compiler on those strings. Neither the keys nor the values are evaluated as Python expressions. We could have a feature that would allow values to be dictionary-ish strings themselves: decl {type:"def(myint: int) returns bar", doc : "Bonjour", languages:{ french: "Hello"} } That would presumably be rare (if we allow it at all). Again, there would be no evaluation or interpolation. The left hand must be a name. The right must be a Code which depended on the declaration can do whatever it wants...if it has some idea of "execution context" and it wants to (e.g.) do interpolation with things that have percent signs, nobody would stop it. A decl that applies to a function or class immediately precedes the funtion or class. A decl that applies to a module precedes all other statements other than the docstring (which can be before or after). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From moshez at math.huji.ac.il Wed Apr 12 19:55:54 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 12 Apr 2000 19:55:54 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256560314-47031192@hypernet.com> Message-ID: On Wed, 12 Apr 2000, Gordon McMillan wrote: > Bounded, however. And therefore, um, dense ... I sorta imagined it more like the Cantor set. Nowhere dense, but perfect sorry-but-he-started-with-the-maths-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw at cnri.reston.va.us Wed Apr 12 20:00:52 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:00:52 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <14580.47572.794837.109290@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Functions and methods are first class objects, and they BAW> already have attributes, some of which are writable. SM> (Trying to read Fredrik's mind...) SM> By extension, we should allow writable attributes to work for SM> other objects. To pollute this discussion with an example SM> from another one: | i = 3.1416 | i.__precision__ = 4 SM> I haven't actually got anything against adding attributes to SM> functions (or numbers, if it's appropriate). Just wondering SM> out loud and playing a bit of a devil's advocate. Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> i = 3.1416 >>> dir(i) [] Floats don't currently have attributes. -Barry From moshez at math.huji.ac.il Wed Apr 12 20:01:13 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 12 Apr 2000 20:01:13 +0200 (IST) Subject: [Python-Dev] #pragmas and method attributes In-Reply-To: <38F4B8ED.8BC64F69@prescod.net> Message-ID: On Wed, 12 Apr 2000, Paul Prescod wrote: > About a month ago I wrote (but did not publish) a proposal that combined > #pragmas and method attributes. The reason I combined them is that in a > lot of cases method "attributes" are supposed to be available in the > parse tree, before the program begins to run. Here is my rough draft. FWIW, I really really like this. def func(...): decl {zorb: 'visible', spark: 'some grammar rule'} pass Right on! But maybe even def func(...): decl zorb='visible' decl spark='some grammar rule' pass BTW: Why force the value to be a string? Any immutable basic type should do fine, no?? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jeremy at cnri.reston.va.us Wed Apr 12 20:08:29 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 14:08:29 -0400 (EDT) Subject: [Python-Dev] trashcan and PR#7 In-Reply-To: <38F46753.3759A7B6@tismer.com> References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> Message-ID: <14580.48029.512656.911718@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> Vladimir Marangozov wrote: >> While I'm at it, maybe the same recursion control logic could be >> used to remedy (most probably in PyObject_Compare) PR#7: >> "comparisons of recursive objects" reported by David Asher? CT> Hey, what a good idea. CT> You know what's happening? We are moving towards tail recursion. CT> If we do this everywhere, Python converges towards Stackless CT> Python. It doesn't seem like tail-recursion is the issue, rather we need to define some rules about when to end the recursion. If I understand what is being suggest, it is to create a worklist of subobjects to compare instead of making recursive calls to compare. This change would turn the core dump into an infinite loop; I guess that's an improvement, but not much of one. I have tried to come up with a solution in the same style as the repr solution. repr maintains a list of objects currently being repred. If it encounters a recursive request to repr the same object, it just prints "...". (There are better solutions, but this one is fairly simple.) I always get hung up on a cmp that works this way because at some point you discover a recursive cmp of two objects and you need to decide what to do. You can't just print "..." :-). So the real problem is defining some reasonable semantics for comparison of recursive objects. I checked what Scheme and Common Lisp, thinking that these languages must have dealt with the issue before. The answer, at least in Scheme, is infinite loop. R5RS notes: "'Equal?' may fail to terminate if its arguments are circular data structures. " http://www-swiss.ai.mit.edu/~jaffer/r5rs_8.html#SEC49 For eq? and eqv?, the answer is #f. The issue was also discussed in some detail by the ANSI commitee X3J13. A summary of the discussion is at here: http://www.xanalys.com/software_tools/reference/HyperSpec/Issues/iss143-writeup.html The result was to "Clarify that EQUAL and EQUALP do not descend any structures or data types other than the ones explicitly specified here:" [both descend for cons, bit-vectors, and strings; equalp has some special rules for hashtables and arrays] I believe this means that Common Lisp behaves the same way that Scheme does: comparison of circular data structures does not terminate. I don't think an infinite loop is any better than a core dump. At least with the core dump, you can inspect the core file and figure out what went wrong. In the infinite loop case, you'd wonder for a while why your program doesn't terminate, then kill it and inspect the core file anway :-). I think the comparison ought to return false or raise a ValueError. I'm not sure which is right. It seems odd to me that comparing two builtin lists could ever raise an exception, but it may be more Pythonic to raise an exception in the face of ambiguity. As the X3J13 committee noted: Object equality is not a concept for which there is a uniquely determined correct algorithm. The appropriateness of an equality predicate can be judged only in the context of the needs of some particular program. So, in the end, I propose ValueError. Jeremy From bwarsaw at cnri.reston.va.us Wed Apr 12 20:19:47 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:19:47 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <002401bfa4a6$778fc360$34aab5d4@hagrid> Message-ID: <14580.48707.373146.936232@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: >> so? you can use methods as keys today, you know... MZ> Actually, I didn't know. What hapens if you use a method as a MZ> key, and then change it's doc string? Nothing. Python 1.5.2 (#7, Apr 16 1999, 18:24:22) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): ... 'a doc string' ... >>> d = {} >>> d[foo] = foo >>> foo.__doc__ = 'norwegian blue' >>> d[foo].__doc__ 'norwegian blue' The hash of a function object is hash(func_code) ^ id(func_globals): Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): pass ... >>> hash(foo) 557536160 >>> hash(foo.func_code) 557215928 >>> id(foo.func_globals) 860952 >>> hash(foo.func_code) ^ id(foo.func_globals) 557536160 So in the words of Mr. Praline: The plumage don't enter into it. :) But you can still get quite evil: Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): pass ... >>> def bar(): print 1 ... >>> d = {} >>> d[foo] = foo >>> d[foo] >>> foo.func_code = bar.func_code >>> d[foo] Traceback (most recent call last): File "", line 1, in ? KeyError: Mwah, ha, ha! Gimme-lists-as-keys-and-who-really-/does/-need-tuples-after-all?-ly y'rs, -Barry From gvwilson at nevex.com Wed Apr 12 20:19:52 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 12 Apr 2000 14:19:52 -0400 (EDT) Subject: [Python-Dev] Processing XML with Perl (interesting article) (fwd) Message-ID: http://www.xml.com/pub/2000/04/05/feature/index.html is Michael Rodriguez' summary of XML processing modules for Perl. It opens with: "Perl is one of the most powerful (and even the most devout Python zealots will agree here) and widely used text processing languages." Greg From bwarsaw at cnri.reston.va.us Wed Apr 12 20:20:40 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:20:40 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> <14580.47012.646862.615623@goon.cnri.reston.va.us> Message-ID: <14580.48760.957536.805522@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> Fred and I were just talking, and he observed that a variant JH> of Python that included a syntactic mechanism to specify more JH> than one attribute (effectively, a multiple doc string syntax) JH> might be less objectionable than setting arbitrary attributes JH> at runtime. Neither of us could imagine just what that syntax JH> would be. So it's the writability of the attributes that bothers you? Maybe we need WORM-attrs? :) -Barry From skip at mojam.com Wed Apr 12 20:27:38 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 12 Apr 2000 13:27:38 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.47572.794837.109290@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> <14580.47572.794837.109290@anthem.cnri.reston.va.us> Message-ID: <14580.49178.341131.766028@beluga.mojam.com> BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. SM> I haven't actually got anything against adding attributes to SM> functions (or numbers, if it's appropriate). Just wondering out SM> loud and playing a bit of a devil's advocate. BAW> Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 BAW> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>>> i = 3.1416 >>>> dir(i) BAW> [] BAW> Floats don't currently have attributes. True enough, but why can't they? I see no reason that your writable function attributes proposal requires that functions already have attributes. Modifying my example, how about: >>> l = [1,2,3] >>> l.__type__ = "int" Like functions, lists do have (readonly) attributes. Why not allow them to have writable attributes as well? Awhile ago, Paul Prescod proposed something I think he called a super tuple, which allowed you to address tuple elements using attribute names: >>> t = ("x": 1, "y": 2, "z": 3) >>> print t.x 1 >>> print t[1] 2 (or something like that). I'm sure Paul or others will chime in if they think it's relevant. Your observation was that functions have a __doc__ attribute that is being abused in multiple, conflicting ways because it's the only function attribute people have to play with. I have absolutely no quibble with that. See: http://www.python.org/pipermail/doc-sig/1999-December/001671.html (Note that it apparently fell on completely deaf ears... ;-) I like your proposal. I was just wondering out loud if it should be more general. Skip From bwarsaw at cnri.reston.va.us Wed Apr 12 20:31:27 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:31:27 -0400 (EDT) Subject: [Python-Dev] #pragmas and method attributes References: <38F430FE.BAF40AB8@lemburg.com> <38F4B8ED.8BC64F69@prescod.net> Message-ID: <14580.49407.807617.750146@anthem.cnri.reston.va.us> >>>>> "PP" == Paul Prescod writes: PP> About a month ago I wrote (but did not publish) a proposal PP> that combined #pragmas and method attributes. The reason I PP> combined them is that in a lot of cases method "attributes" PP> are supposed to be available in the parse tree, before the PP> program begins to run. Here is my rough draft. Very cool. Combine them with Greg Wilson's approach and you've got my +1 on the idea. I still think it's fine that the func attr dictionary is writable. -Barry From mal at lemburg.com Wed Apr 12 20:31:16 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 20:31:16 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4C0F4.E0A8B01@lemburg.com> Ka-Ping Yee wrote: > > On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > > Or do we need to separate out two categories of pragmas -- > > > pre-parse and post-parse pragmas? > > > > Eeeks! We don't need too many special forms! That's ugly! > > Eek indeed. I'm tempted to suggest we drop the multiple-encoding > issue (i can hear the screams now). But you're right, i've never > heard of another language that can handle configurable encodings > right in the source code. Is it really necessary to tackle that here? Yes. > Gak, what do Japanese programmers do? Has anyone seen any of that > kind of source code? It's not intended for use by Asian programmers, it must be seen as a way to equally support all those different languages and scripts for which Python provides codecs. Note that Fred's argument is not far fetched: if you look closely at the way the compiler works it seems that adding a new keyword would indeed be the simplest solution. If done right, we could add some nifty lookup optimizations to the byte code compiler, e.g. a module might declare all globals as being constant or have all could allow the compiler to assume that all global lookups return constants allowing it to cache them or even rewrite the byte code at run-time... But the concepts are still not 100% right -- if we want to add scope to pragmas, we ought to follow the usual Python lookup scheme: locals, globals, built-ins. This would introduce the need to pass locals and globals to all APIs compiling Python source code. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Wed Apr 12 20:17:25 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 20:17:25 +0200 (MEST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F4A08B.A855E69D@lemburg.com> from "M.-A. Lemburg" at "Apr 12, 2000 6:12:59 pm" Message-ID: Hi! [me:] > > This would defeat an important goal: backward compatibility: You > > can't add 'pragma division: old' or something like this to a source > > file, which should be able to run with both Python 1.5.2 and Py3k. > > This would make this mechanism useless for several important > > applications of pragmas. M.-A. Lemburg: > Hmm, I don't get it: these pragmas would set variabels which > make Python behave in a different way -- how do you plan to > achieve backward compatibility here ? > > I mean, u = u"abc" raises a SyntaxError in Python 1.5.2 too... Okay. What I mean is for example changing the behaviour of the division operator: if 1/2 becomes 0.5 instead of 0 in some future version of Python, it is a must to be able to put in a pragma with the meaning "use the old style division in this module" into a source file without breaking the usability of this source file on older versions of Python. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From Mike.Da.Silva at uk.fid-intl.com Wed Apr 12 20:37:56 2000 From: Mike.Da.Silva at uk.fid-intl.com (Da Silva, Mike) Date: Wed, 12 Apr 2000 19:37:56 +0100 Subject: [Python-Dev] #pragmas in Python source code Message-ID: Java uses ResourceBundles, which are identified by basename + 2 character locale id (eg "en", "fr" etc). The content of the resource bundle is essentially a dictionary of name value pairs. MS Visual C++ uses pragma code_page(windows_code_page_id) in resource files to indicate what code page was used to generate the subsequent text. In both cases, an application would rely on a fixed (7 bit ASCII) subset to give the well-known key to find the localized text for the current locale. Any "hardcoded" string literals would be mangled when attempting to display them using an alternate locale. So essentially, one could take the view that correct support for localization is a runtime issue affecting the user of an application, not the developer. Hence, myfile.py may contain 8 bit string literals encoded in my current windows encoding (1252) but my user may be using Japanese Windows in code page 932. All I can guarantee is that the first 128 characters (notwithstanding BACKSLASH) will be rendered correctly - other characters will be interpreted as half width Katakana or worse. Any literal strings one embeds in code should be purely for the benefit of the code, not for the end user, who should be seeing properly localized text, pulled back from a localized text resource file _NOT_ python code, and automatically pumped through the appropriate native <--> unicode translations as required by the code. So to sum up, 1 Hardcoded strings are evil in source code unless they use the invariant ASCII (and by extension UTF8) character set. 2 A proper localized resource loading mechanism is required to fetch genuine localized text from a static resource file (ie not myfile.py). 3 All transformations of 8 bit strings to and from unicode should explicitly specify the 8 bit encoding for the source/target of the conversion, as appropriate. 4 Assume that a Japanese / Chinese programmer will find it easier to code using the invariant ASCII subset than a Western European / American will be able to read hanzi in source code. Regards, Mike da Silva -----Original Message----- From: Ka-Ping Yee [mailto:ping at lfw.org] Sent: Wednesday, April 12, 2000 6:45 PM To: Fred L. Drake, Jr. Cc: Python Developers @ python.org Subject: Re: [Python-Dev] #pragmas in Python source code On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > Or do we need to separate out two categories of pragmas -- > > pre-parse and post-parse pragmas? > > Eeeks! We don't need too many special forms! That's ugly! Eek indeed. I'm tempted to suggest we drop the multiple-encoding issue (i can hear the screams now). But you're right, i've never heard of another language that can handle configurable encodings right in the source code. Is it really necessary to tackle that here? Gak, what do Japanese programmers do? Has anyone seen any of that kind of source code? -- ?!ng _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://www.python.org/mailman/listinfo/python-dev From bwarsaw at cnri.reston.va.us Wed Apr 12 20:43:01 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:43:01 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> <14580.47572.794837.109290@anthem.cnri.reston.va.us> <14580.49178.341131.766028@beluga.mojam.com> Message-ID: <14580.50101.747669.794035@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Floats don't currently have attributes. SM> True enough, but why can't they? Skip, didn't you realize I was setting you up to ask that question? :) I don't necessarily think other objects shouldn't have such attributes, but I thought it might be easier to shove this one tiny little pill down peoples' throats first. Once they realize it tastes good, /they'll/ want more :) SM> Awhile ago, Paul Prescod proposed something I think he called SM> a super tuple, which allowed you to address tuple elements SM> using attribute names: >> t = ("x": 1, "y": 2, "z": 3) print t.x | 1 | >>> print t[1] | 2 SM> (or something like that). I'm sure Paul or others will chime SM> in if they think it's relevant. Might be. I thought that was a cool idea too at the time. SM> Your observation was that functions have a __doc__ attribute SM> that is being abused in multiple, conflicting ways because SM> it's the only function attribute people have to play with. I SM> have absolutely no quibble with that. See: SM> SM> http://www.python.org/pipermail/doc-sig/1999-December/001671.html SM> (Note that it apparently fell on completely deaf ears... ;-) I SM> like your proposal. I was just wondering out loud if it SM> should be more general. Perhaps so. -Barry From effbot at telia.com Wed Apr 12 20:43:55 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 20:43:55 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Mike wrote: > Any literal strings one embeds in code should be purely for the benefit of > the code, not for the end user, who should be seeing properly localized > text, pulled back from a localized text resource file _NOT_ python code, and > automatically pumped through the appropriate native <--> unicode > translations as required by the code. that's hardly a CP4E compatible solution, is it? Ping wrote: > > But you're right, i've never heard of another language that can handle > > configurable encodings right in the source code. XML? From glyph at twistedmatrix.com Wed Apr 12 21:46:24 2000 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 12 Apr 2000 14:46:24 -0500 (EST) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <002101bfa37b$5b2acde0$27a2143f@tim> Message-ID: Language pragmas are all fine and good, BUT ... Here in the Real World(TM) we have to deal with version in compatibilities out the wazoo. I am currently writing a java application that has to run on JDK 1.1, and 1.2, and microsoft's half-way JDK 1.1+1/2 thingy. Python comes installed on many major linux distributions, and the installed base is likely to be even larger than it is now by the time Python 1.6 is ready for the big time. I'd like to tell people who still have RedHat 6.2 installed in six months that they can just download a 40k script and not a 5M interpreter source tarball (which will be incompatible with their previous python installation, which they need for other stuff) when I deploy an end-user application. (Sorry, I can't think of another way to say that, I'm still recovering from java-isms...) :-) What I'm saying is that it would be great if you could write an app that would still function with existing versions of the interpreter, but would be missing certain features that were easier to implement with the new language symantics or required a new core library feature. Backward compatibility is as important to me as forward compatibility, and I'd prefer not to achieve it by writing exclusively to python 1.5.2 for the rest of my life. The way I would like to see this happen is NOT with language pragmas ('global' strikes me as particularly inappropriate, since that already means something else...) but with file-extensions. For example, if you have a python file which uses 1.6 features, I call it 'foo.1_6.py'. I also have a version that will work with 1.5, albeit slightly slower/less featureful: so I call it 'foo.py'. 'import foo' will work correctly. Or, if I only have 'foo.1_6.py' it will break, which I gather would be the desired behavior. As long as we're talking about versioning issues, could we perhaps introduce a slightly more robust introspective facility than assert(sys.version[:3])=='1.5' ? And finally, I appreciate that some physics students may find it confusing that 1/2 yeilds 0 instead of 0.5, but I think it would be easier to just teach them to do 1./2 rather than changing the symantics of integer constants completely ... I use python to do a lot of GUI work right now (and it's BEAUTIFUL for interfacing with Gtk/Tk/Qt, so I'm looking forward to doing more of it) and when I divide *1* by *2*, that's what I mean. I want integers, because I'm talking about pixels. It would be a real drag to go through all of my code and insert int(1/2) because there's no way to do integer math in python anymore... (Besides, what about 100000000000000000000L/200000000000000000000L, which I believe will shortly be lacking the Ls...?) Maybe language features that are like this could be handled by a pseudo-module? I.E. import syntax syntax.floating_point_division() or somesuch ... I'm not sure how you'd implement this so it would be automatic in certain contexts (merging it into your 'site' module maybe? that has obvious problems though), but considering that such features may be NOT the behavior desired by everyone, it seems strange to move the language in that direction unilaterally. ______ __ __ _____ _ _ | ____ | \_/ |_____] |_____| |_____| |_____ | | | | @ t w i s t e d m a t r i x . c o m http://www.twistedmatrix.com/~glyph/ From effbot at telia.com Wed Apr 12 20:52:09 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 20:52:09 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr><38F46A02.3AB10147@prescod.net><001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <002f01bfa4b0$39df5440$34aab5d4@hagrid> Barry A. Warsaw wrote: > Finally, think of this proposal as an evolutionary step toward > enabling all kinds of future frameworks. /.../ With the addition > of func/meth attrs now, we can start to play with prototypes > of this system, define conventions and standards /.../ does this mean that this feature will be labelled as "experimental" (and hopefully even "interpreter specific"). if so, +0. From bwarsaw at cnri.reston.va.us Wed Apr 12 20:56:32 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:56:32 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <002101bfa37b$5b2acde0$27a2143f@tim> Message-ID: <14580.50912.543239.347566@anthem.cnri.reston.va.us> >>>>> "GL" == Glyph Lefkowitz writes: GL> As long as we're talking about versioning issues, could we GL> perhaps introduce a slightly more robust introspective GL> facility than GL> assert(sys.version[:3])=='1.5' sys.hexversion? Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys >>> sys.hexversion 17170593 >>> hex(sys.hexversion) '0x10600a1' From bwarsaw at cnri.reston.va.us Wed Apr 12 20:57:47 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:57:47 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <002f01bfa4b0$39df5440$34aab5d4@hagrid> Message-ID: <14580.50987.10065.518955@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> does this mean that this feature will be labelled as FL> "experimental" (and hopefully even "interpreter specific"). Do you mean "don't add it to JPython whenever I actually get around to making it compatible with CPython 1.6"? -Barry From tismer at tismer.com Wed Apr 12 21:03:33 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:03:33 +0200 Subject: [Python-Dev] trashcan and PR#7 References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <38F4C885.D75DABF2@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> Vladimir Marangozov wrote: > >> While I'm at it, maybe the same recursion control logic could be > >> used to remedy (most probably in PyObject_Compare) PR#7: > >> "comparisons of recursive objects" reported by David Asher? > > CT> Hey, what a good idea. > > CT> You know what's happening? We are moving towards tail recursion. > CT> If we do this everywhere, Python converges towards Stackless > CT> Python. > > It doesn't seem like tail-recursion is the issue, rather we need to > define some rules about when to end the recursion. If I understand > what is being suggest, it is to create a worklist of subobjects to > compare instead of making recursive calls to compare. This change > would turn the core dump into an infinite loop; I guess that's an > improvement, but not much of one. Well, I actually didn't read PR#7 before replying. Thought it was about comparing deeply nested structures. What about this? For one, we do an improved comparison, which is of course towards tail recursion, since we push part of the work after the "return". Second, we can guess the number of actually existing objects, and limit the number of comparisons by this. If we need more comparisons than we have objects, then we raise an exception. Might still take some time, but a bit less than infinite. ciao - chris (sub-cantor-set-minded) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at tismer.com Wed Apr 12 21:06:00 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:06:00 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> <14580.47012.646862.615623@goon.cnri.reston.va.us> <14580.48760.957536.805522@anthem.cnri.reston.va.us> Message-ID: <38F4C918.A1344D68@tismer.com> bwarsaw at cnri.reston.va.us wrote: > > >>>>> "JH" == Jeremy Hylton writes: > > JH> Fred and I were just talking, and he observed that a variant > JH> of Python that included a syntactic mechanism to specify more > JH> than one attribute (effectively, a multiple doc string syntax) > JH> might be less objectionable than setting arbitrary attributes > JH> at runtime. Neither of us could imagine just what that syntax > JH> would be. > > So it's the writability of the attributes that bothers you? Maybe we > need WORM-attrs? :) Why don't you just use WORM programming style. Write it once (into the CVS) and get many complaints :-) chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal at lemburg.com Wed Apr 12 21:02:25 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 21:02:25 +0200 Subject: [Python-Dev] #pragmas and method attributes References: Message-ID: <38F4C841.7CE3FB32@lemburg.com> Moshe Zadka wrote: > > On Wed, 12 Apr 2000, Paul Prescod wrote: > > > About a month ago I wrote (but did not publish) a proposal that combined > > #pragmas and method attributes. The reason I combined them is that in a > > lot of cases method "attributes" are supposed to be available in the > > parse tree, before the program begins to run. Here is my rough draft. > > FWIW, I really really like this. > > def func(...): > decl {zorb: 'visible', spark: 'some grammar rule'} > pass > > Right on! > > But maybe even > > def func(...): > decl zorb='visible' > decl spark='some grammar rule' > pass Hmm, this is not so far away from simply letting function/method attribute use the compiled-in names of all locals as basis, e.g. def func(x): a = 3 print func.a func.a would look up 'a' in func.func_code.co_names and return the corresponding value found in func.func_code.co_consts. Note that subsequent other assignments to 'a' are not recognized by this technique, since co_consts and co_names are written sequentially. For the same reason, writing things like 'a = 2 + 3' will break this lookup technique. This would eliminate any need for added keywords and probably provide the best programming comfort and the attributes are immutable per se. We would still have to come up with a way to declare these attributes for builtin methods and modules... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at cnri.reston.va.us Wed Apr 12 21:07:41 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 15:07:41 -0400 (EDT) Subject: [Python-Dev] trashcan and PR#7 In-Reply-To: <14580.48029.512656.911718@goon.cnri.reston.va.us> References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <14580.51581.31775.233843@goon.cnri.reston.va.us> Just after I sent the previous message, I realized that the "trashcan" approach is needed in addition to some application-specific logic for what to do when recursive traversals of objects occur. This is true for repr and for a compare that fixes PR#7. Current recipe for repr coredump: original = l = [] for i in range(1000000): new = [] l.append(new) l = new l.append(original) repr(l) Jeremy From glyph at twistedmatrix.com Wed Apr 12 22:06:17 2000 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 12 Apr 2000 15:06:17 -0500 (EST) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Barry A. Warsaw wrote: > sys.hexversion? Thank you! I stand corrected (and embarrassed) but perhaps this could be a bit better documented? a search of Google comes up with only one hit for this on the entire web: http://www.python.org/1.5/NEWS-152b2.txt ... From gstein at lyra.org Wed Apr 12 21:20:55 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:20:55 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: > BAW> Functions and methods are first class objects, and they already > BAW> have attributes, some of which are writable. > > (Trying to read Fredrik's mind...) > > By extension, we should allow writable attributes to work for other objects. > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > > I haven't actually got anything against adding attributes to functions (or > numbers, if it's appropriate). Just wondering out loud and playing a bit of > a devil's advocate. Numbers have no attributes right now. Functions have mutable attributes (__doc__). Barry is allowing them to be annotated (without crushing the values into __doc__ in some nasty way). Paul gave some great examples. IMO, the Zope "visibility by use of __doc__" is the worst kind of hack :-) "Let me be a good person and doc all my functions. Oh, crap! Somebody hacked my system!" And the COM thing was great. Here is what we do today: class MyCOMServer: _public_methods_ = ['Hello'] def private_method(self, args): ... def Hello(self, args) ... The _public_methods_ thing is hacky. I'd rather see a "Hello.public = 1" in there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gvwilson at nevex.com Wed Apr 12 21:16:40 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 12 Apr 2000 15:16:40 -0400 (EDT) Subject: [Python-Dev] re: #pragmas and method attributes Message-ID: > > On Wed, 12 Apr 2000, Paul Prescod wrote: > > About a month ago I wrote (but did not publish) a proposal that combined > > #pragmas and method attributes. The reason I combined them is that in a > > lot of cases method "attributes" are supposed to be available in the > > parse tree, before the program begins to run. Here is my rough draft. > Moshe Zadka wrote: > BTW: Why force the value to be a string? Any immutable basic type > should do fine, no?? If attributes can be objects other than strings, then programmers can implement hierarchical nesting directly using: def func(...): decl { 'zorb' : 'visible', 'spark' : { 'rule' : 'some grammar rule', 'doc' : 'handle quoted expressions' } 'info' : { 'author' : ('Greg Wilson', 'Allun Smythee'), 'date' : '2000-04-12 14:08:20 EDT' } } pass instead of: def func(...): decl { 'zorb' : 'visible', 'spark-rule' : 'some grammar rule', 'spark-doc' : 'handle quoted expressions' 'info-author' : 'Greg Wilson, Allun Smythee', 'info-date' : '2000-04-12 14:08:20 EDT' } pass In my experience, every system for providing information has eventually wanted/needed to be hierarchical --- code blocks, HTML, the Windows registry, you name it. This can be faked up using some convention like semicolon-separated lists, but processing (and escaping insignificant uses of separator characters) quickly becomes painful. (Note that if Python supported multi-dicts, or if something *ML-ish was being used for decl's, the "author" tag in "info" could be listed twice, instead of requiring programmers to fall back on char-separated lists.) Just another random, Greg From bwarsaw at cnri.reston.va.us Wed Apr 12 21:21:16 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 15:21:16 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: <14580.52396.923837.488505@anthem.cnri.reston.va.us> >>>>> "GL" == Glyph Lefkowitz writes: BAW> sys.hexversion? GL> Thank you! GL> I stand corrected (and embarrassed) but perhaps this could be GL> a bit better documented? a search of Google comes up with GL> only one hit for this on the entire web: GL> http://www.python.org/1.5/NEWS-152b2.txt ... Yup, it looks like it's missing from Fred's 1.6 doc tree too. Do python-devers think we also need to make the other patchlevel.h constants available through sys? If so, and because sys.hexversion is currently undocumented, I'd propose making sys.hexversion a tuple of (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) or leaving sys.hexversion as is and crafting a new sys variable which is the [1:] of the tuple above. Prolly need to expose PY_RELEASE_LEVEL_{ALPHA,BETA,GAMMA,FINAL} as constants too. -Barry From effbot at telia.com Wed Apr 12 21:21:50 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 21:21:50 +0200 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <002101bfa37b$5b2acde0$27a2143f@tim> <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: <007001bfa4b4$6216e780$34aab5d4@hagrid> > sys.hexversion? > > Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import sys > >>> sys.hexversion > 17170593 > >>> hex(sys.hexversion) > '0x10600a1' bitmasks!? (ouch. python is definitely not what it used to be. wonder if the right answer to this is "wouldn't a tuple be much more python-like?" or "I'm outta here...") From gstein at lyra.org Wed Apr 12 21:29:04 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:29:04 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.49178.341131.766028@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: >... > BAW> Floats don't currently have attributes. > > True enough, but why can't they? I see no reason that your writable > function attributes proposal requires that functions already have > attributes. Modifying my example, how about: > > >>> l = [1,2,3] > >>> l.__type__ = "int" > > Like functions, lists do have (readonly) attributes. Why not allow them to > have writable attributes as well? Lists, floats, etc are *data*. There is plenty of opportunity for creating data structures that contain whatever you want, organized in any fashion. Functions are (typically) not data. Applying these attributes is a way to define program semantics, not record data. There are two entirely separate worlds here. Adding attributes makes great sense, as a way to enhance the definition of your program's semantics and operation. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Apr 12 21:33:18 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:33:18 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.47012.646862.615623@goon.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Jeremy Hylton wrote: >... > It would look really, really bad ;-). I couldn't think of a good > example, so I guess this is a FUD argument. A rough sketch, though, > would be a program that assigned attribute X to all functions that > were to be used in a certain way. If the assignment is a runtime > operation, rather than a syntactic construct that defines a static > attribute, it would be possible to accidentally assign attribute X to > a function that was not intended to be used that way. This connection > between a group of functions and a particular behavior would depend > entirely on some runtime magic with settable attributes. This is a FUD argument also. I could just as easily mis-label a function when using __doc__ strings, when using mappings in a class object, or using some runtime structures to record the attribute. Your "label" can be recorded in any number of ways. It can be made incorrect in all of them. There is nothing intrinsic to function attributes that makes them more prone to error. Being able to place them into function attributes means that you have a better *model* for how you record these values. Why place them into a separate mapping if your intent is to enhance the semantics of a function? If the semantics apply to a function, then bind it right there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw at cnri.reston.va.us Wed Apr 12 21:29:11 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 15:29:11 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <002101bfa37b$5b2acde0$27a2143f@tim> <14580.50912.543239.347566@anthem.cnri.reston.va.us> <007001bfa4b4$6216e780$34aab5d4@hagrid> Message-ID: <14580.52871.763195.168373@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> (ouch. python is definitely not what it used to be. wonder FL> if the right answer to this is "wouldn't a tuple be much more FL> python-like?" or "I'm outta here...") Yeah, pulling the micro version number out of sys.hexversion is ugly and undocumented, hence my subsequent message. The basically idea is pretty cool though, and I've adopted it to Mailman. It allows me to do this: previous_version = last_hex_version() this_version = mm_cfg.HEX_VERSION if previous_version < this_version: # I'm upgrading -Barry From tismer at tismer.com Wed Apr 12 21:37:27 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:37:27 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: Message-ID: <38F4D077.AEE37C@tismer.com> Greg Stein wrote: ... > Being able to place them into function attributes means that you have a > better *model* for how you record these values. Why place them into a > separate mapping if your intent is to enhance the semantics of a function? > If the semantics apply to a function, then bind it right there. BTW., is then there also a way for the function *itself* so look into its attributes? If it should be able to take special care about its attributes, it would be not nice if it had to know its own name for that? Some self-like surrogate? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From effbot at telia.com Wed Apr 12 21:34:44 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 21:34:44 +0200 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> > If so, and because sys.hexversion is currently undocumented, I'd > propose making sys.hexversion a tuple of > > (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, > PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) thanks. I feel better now ;-) but wouldn't something like (1, 6, 0, "a1") be easier to understand and use? From fdrake at acm.org Wed Apr 12 21:46:07 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 15:46:07 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> <14580.52396.923837.488505@anthem.cnri.reston.va.us> <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> Message-ID: <14580.53887.525513.603276@seahag.cnri.reston.va.us> Fredrik Lundh writes: > but wouldn't something like (1, 6, 0, "a1") be easier > to understand and use? Yes! (But you knew that....) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From ping at lfw.org Wed Apr 12 22:06:03 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 15:06:03 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: On Wed, 12 Apr 2000, Fredrik Lundh wrote: > Ping wrote: > > > But you're right, i've never heard of another language that can handle > > > configurable encodings right in the source code. > > XML? Don't get me started. XML is not a language. It's a serialization format for trees (isomorphic to s-expressions, but five times more verbose). It has no semantics. Anyone who tries to tell you otherwise is probably a marketing drone or has been brainwashed by the buzzword brigade. -- ?!ng From bwarsaw at cnri.reston.va.us Wed Apr 12 22:04:45 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 16:04:45 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> <14580.52396.923837.488505@anthem.cnri.reston.va.us> <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> Message-ID: <14580.55005.924001.146052@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> but wouldn't something like (1, 6, 0, "a1") be easier FL> to understand and use? I wasn't planning on splitting PY_VERSION, just in exposing the other #define ints in patchlevel.h -Barry From fdrake at acm.org Wed Apr 12 22:08:35 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 16:08:35 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: <14580.55235.6235.662297@seahag.cnri.reston.va.us> Ka-Ping Yee writes: > Don't get me started. XML is not a language. It's a serialization And XML was exactly why I asked about *programming* languages. XML just doesn't qualify in any way I can think of as a language. Unless it's also called "Marketing-speak." ;) XML, as you point out, is a syntactic aspect of tree encoding. Harrumph. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Wed Apr 12 22:10:21 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 22:10:21 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <00ad01bfa4bb$24129360$34aab5d4@hagrid> Ka-Ping Yee wrote: > > XML? > > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). call it whatever you want -- my point was that their way of handling configurable encodings in the source code is good enough for python. (briefly, it's all unicode on the inside, and either ASCII/UTF-8 or something compatible enough to allow the parser to find the "en- coding" attribute without too much trouble... except for the de- fault encoding, the same approach should work for python) From effbot at telia.com Wed Apr 12 22:20:26 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 22:20:26 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> <14580.55235.6235.662297@seahag.cnri.reston.va.us> Message-ID: <00bd01bfa4bc$8ad9ce00$34aab5d4@hagrid> Fred L. Drake, Jr. wrote: > > Don't get me started. XML is not a language. It's a serialization > > And XML was exactly why I asked about *programming* languages. XML > just doesn't qualify in any way I can think of as a language. oh, come on. in what way is "Python source code" more expressive than XML, if you don't have anything that inter- prets it? does the Python parser create "better" trees than an XML parser? > XML, as you point out, is a syntactic aspect of tree encoding. just like a Python source file is a syntactic aspect of a Python (parse) tree encoding, right? ;-) ... but back to the real issue -- the point is that XML provides a mechanism for going from an external representation to an in- ternal (unicode) token stream, and that mechanism is good enough for python source code. why invent yet another python-specific wheel? From effbot at telia.com Wed Apr 12 22:25:56 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 22:25:56 +0200 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us><14580.52396.923837.488505@anthem.cnri.reston.va.us><008a01bfa4b6$2baca0c0$34aab5d4@hagrid> <14580.55005.924001.146052@anthem.cnri.reston.va.us> Message-ID: <00c901bfa4bd$4ff82560$34aab5d4@hagrid> Barry wrote: > >>>>> "FL" == Fredrik Lundh writes: > > FL> but wouldn't something like (1, 6, 0, "a1") be easier > FL> to understand and use? > > I wasn't planning on splitting PY_VERSION, just in exposing the other > #define ints in patchlevel.h neither was I. I just want Python to return those values in a form suitable for a Python programmer, not a C preprocessor. in other words: char release[2+1]; sprintf(release, "%c%c", PY_RELEASE_LEVEL - 0x0A + 'a', PY_RELEASE_SERIAL + '0'); sys.longversion = BuildTuple("iiis", PY_MAJOR_VERSION, PY_MINOR_VERSION, PY_MICRO_VERSION, release) (this assumes that the release serial will never exceed 9, but I think that's a reasonable restriction...) From skip at mojam.com Wed Apr 12 22:33:22 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 12 Apr 2000 15:33:22 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: References: <14580.49178.341131.766028@beluga.mojam.com> Message-ID: <14580.56722.404953.614718@beluga.mojam.com> me> >>> l = [1,2,3] me> >>> l.__type__ = "int" Greg> Lists, floats, etc are *data*. There is plenty of opportunity for Greg> creating data structures that contain whatever you want, organized Greg> in any fashion. Yeah, but there's no reason you wouldn't want to reason about them. They are, after all, first-class objects. If you consider these other attributes as meta-data, allowing data attributes to hang off lists, tuples, ints or regex objects makes perfect sense to me. I believe someone else during this thread suggested that one use of function attributes might be to record the function's return type. My example above is not really any different. Simpleminded, yes. Part of the value of l, no. Skip From ping at lfw.org Wed Apr 12 22:54:49 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 15:54:49 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <00bd01bfa4bc$8ad9ce00$34aab5d4@hagrid> Message-ID: Fred L. Drake, Jr. wrote: > And XML was exactly why I asked about *programming* languages. XML > just doesn't qualify in any way I can think of as a language. I'm harumphing right along with you, Fred. :) On Wed, 12 Apr 2000, Fredrik Lundh wrote: > oh, come on. in what way is "Python source code" more > expressive than XML, if you don't have anything that inter- > prets it? does the Python parser create "better" trees than > an XML parser? Python isn't just a parse tree. It has semantics. XML has no semantics. It's content-free content. :) > but back to the real issue -- the point is that XML provides a > mechanism for going from an external representation to an in- > ternal (unicode) token stream, and that mechanism is good > enough for python source code. You have a point. I'll go look at what they do. -- ?!ng From gvwilson at nevex.com Wed Apr 12 23:01:04 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 12 Apr 2000 17:01:04 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Message-ID: > Ka-Ping Yee wrote: > Python isn't just a parse tree. It has semantics. > XML has no semantics. It's content-free content. :) Python doesn't even have a parse tree (never mind semantics) unless you have a Python parser handy. XML gives my application a way to parse your information, even if I can't understand it, which is a big step over (for example) comments or strings embedded in Python/Perl/Java source files, colon (or is semi-colon?) separated lists in .ini and .rc files, etc. (I say this having wrestled with large Fortran programs in which a sizeable fraction of the semantics was hidden in comment-style pragmas. Having seen the demands this style of coding places on compilers, and compiler writers, I'm willing to walk barefoot through the tundra to get something more structured. Hanging one of Barry's doc dict's off a module ensures that key information is part of the parse tree, and that anyone who wants to extend the mechanism can do so in a structured way. I'd still rather have direct embedding of XML, but I think doc dicts are still a big step forward.) Greg p.s. This has come up as a major issue in the Software Carpentry competition. On the one hand, you want (the equivalent of) makefiles to be language neutral, so that (for example) you can write processors in Perl and Java as well as Python. On the other hand, you want to have functions, lists, and all the other goodies associated with a language. From DavidA at ActiveState.com Wed Apr 12 23:10:49 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 12 Apr 2000 14:10:49 -0700 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <14580.52871.763195.168373@anthem.cnri.reston.va.us> Message-ID: > The basically idea is pretty cool though, and I've adopted it to > Mailman. It allows me to do this: > > previous_version = last_hex_version() > this_version = mm_cfg.HEX_VERSION > > if previous_version < this_version: > # I'm upgrading Why can't you do that with tuples? --david From bwarsaw at cnri.reston.va.us Wed Apr 12 23:44:16 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 17:44:16 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <14580.52871.763195.168373@anthem.cnri.reston.va.us> Message-ID: <14580.60976.200757.562690@anthem.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: >> The basically idea is pretty cool though, and I've adopted it >> to Mailman. It allows me to do this: previous_version = >> last_hex_version() this_version = mm_cfg.HEX_VERSION if >> previous_version < this_version: # I'm upgrading DA> Why can't you do that with tuples? How do you know they aren't tuples? :) (no, Moshe, you do not need to answer :) -Barry From DavidA at ActiveState.com Thu Apr 13 00:51:36 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 12 Apr 2000 15:51:36 -0700 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> Message-ID: Gordon McMillan: > Jeremy Hylton wrote: > > It prevents confusion and errors > > that might result from unprincipled use of function attributes. > > While I'm sure I will be properly shocked and horrified when > you come up with an example, in my naivety, I can't imagine > what it will look like ;-). I'm w/ Gordon & Barry on this one. I've wanted method and function attributes in the past and had to rely on building completely new classes w/ __call__ methods just to 'fake it'. There's a performance cost to having to do that, but most importantly there's a big increase in code complexity, readability, maintanability, yaddability, etc. I'm surprised that Jeremy sees it as such a problem area -- if I wanted to play around with static typing, having a version of Python which let me store method metadata cleanly would make me jump with joy. FWIW, I'm perfectly willing to live in a world where 'unprincipled use of method and function attributes' means that my code can't get optimized, just like I don't expect my code which modifies __class__ to get optimized (as long as someone defines what those principles are!). --david From paul at prescod.net Wed Apr 12 21:33:14 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 14:33:14 -0500 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4CF7A.8F99562F@prescod.net> Ka-Ping Yee wrote: > >... > > Eek indeed. I'm tempted to suggest we drop the multiple-encoding > issue (i can hear the screams now). The XML rule is one encoding per file. One thing that I think that they did innovate in (I had nothing to do with that part) is that entities encoded in something other than UTF-8 or UTF-16 must start with the declaration: "". This has two benefits: By looking at the first four bytes of the file we can differentiate between several different encoding "families" (Shift-JIS-like, UTF-8-like, UTF-16-like, ...) and then we can tell the *precise* encoding by looking at the encoding attribute. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From mhammond at skippinet.com.au Thu Apr 13 02:15:08 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 10:15:08 +1000 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: > Do python-devers think we also need to make the other patchlevel.h > constants available through sys? Can't see why, but also can't see why not! > If so, and because sys.hexversion is currently undocumented, Since when has that ever stopped anyone :-) > I'd > propose making sys.hexversion a tuple of > > (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, > PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) > > or leaving sys.hexversion as is and crafting a new sys > variable which > is the [1:] of the tuple above. My code already uses sys.hexversion to differentiate between 1.5 and 1.6, so if we do anything I would vote for a new name. Personally however, I think the hexversion gives all the information you need - ie, you either want a printable version - sys.version - or a machine comparable version - sys.hexversion. Can't really think of a reason you would want the other attributes... Mark. From mhammond at skippinet.com.au Thu Apr 13 02:20:12 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 10:20:12 +1000 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <007001bfa4b4$6216e780$34aab5d4@hagrid> Message-ID: > > >>> hex(sys.hexversion) > > '0x10600a1' > > bitmasks!? Nah - a comparable number :-) if sys.hexversion >= 0x01060100: # Require Python 1.6 or later! Seems perfectly reasonable and understandable to me. And much cleaner than a tuple: if tuple_version[0] > 1 or tuple_version[0] == 1 and tuple_version[6] >= 1: etc Unless Im missing the point - but I can't see any case other than version comparisons in which hexversion is useful - so it seems perfect to me. > (ouch. python is definitely not what it used to be. wonder > if the right answer to this is "wouldn't a tuple be much more > python-like?" or "I'm outta here...") Be sure to let us know. Mark. From akuchlin at mems-exchange.org Thu Apr 13 02:46:21 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 12 Apr 2000 20:46:21 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: References: <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: <14581.6365.234022.976395@newcnri.cnri.reston.va.us> Mark Hammond quoted Barry Warsaw: >> I'd >> propose making sys.hexversion a tuple of >> (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, >> PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) If it's a tuple, the name "hexversion" makes absolutely no sense. Call it version_tuple or something like that. --amk From gstein at lyra.org Thu Apr 13 03:10:54 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 18:10:54 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <38F4D077.AEE37C@tismer.com> Message-ID: On Wed, 12 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: > ... > > Being able to place them into function attributes means that you have a > > better *model* for how you record these values. Why place them into a > > separate mapping if your intent is to enhance the semantics of a function? > > If the semantics apply to a function, then bind it right there. > > BTW., is then there also a way for the function *itself* > so look into its attributes? If it should be able to take > special care about its attributes, it would be not nice > if it had to know its own name for that? > Some self-like surrogate? Separate problem. Functions can't do that today with their own __doc__ attribute. Feel free to solve this issue, but it is distinct from the attributes-on-functions issue being discussed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond at skippinet.com.au Thu Apr 13 03:07:45 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 11:07:45 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <200004120334.FAA06784@python.inrialpes.fr> Message-ID: The trashcan bug turns out to be trivial to describe, but not so trivial to fix. Put simply, the trashcan mechanism conflicts horribly with PY_TRACE_REFS :-( The problem stems from the fact that the trashcan resurrects objects. An object added to the trashcan has its ref count as zero, but is then added to the trash list, transitioning its ref-count back to 1. Deleting the trashcan then does a second deallocation of the object, again taking the ref count back to zero, and this time actually doing the destruction. By pure fluke, this works without Py_DEBUG defined! With Py_DEBUG defined, this first causes problems due to ob_type being NULL. _Py_Dealloc() sets the ob_type element to NULL before it calls the object de-allocater. Thus, the trash object first hits a zero refcount, and its ob_type is zapped. It is then resurrected, but the ob_type value remains NULL. When the second deallocation for the object happens, this NULL type forces the crash. Changing the Py_DEBUG version of _Py_Dealloc() to not zap the type doesnt solve the problem. The whole _Py_ForgetReference() linked-list management also dies. Second time we attempt to deallocate the object the code that removes the object from the "alive objects" linked list fails - the object was already removed first time around. I see these possible solutions: * The trash mechanism is changed to keep a list of (address, deallocator) pairs. This is a "cleaner" solution, as the list is not considered holding PyObjects as such, just blocks of memory to be freed with a custom allocator. Thus, we never end up in a position where a Python objects are resurrected - we just defer the actual memory deallocation, rather than attempting a delayed object destruction. This may not be as trivial to implement as to describe :-) * Debug builds disable the trash mechanism. Not desired as the basic behaviour of the interpreter will change, making bug tracking with debug builds difficult! If we went this way, I would (try to :-) insist that the Windows debug builds dropped Py_DEBUG, as I really want to avoid the scenario that switching to a debug build changes the behaviour to this extent. * Perform further hacks, so that Py_ForgetReference() gracefully handles NULL linked-list elements etc. Any thoughts? Mark. From gstein at lyra.org Thu Apr 13 03:25:41 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 18:25:41 -0700 (PDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: Message-ID: On Thu, 13 Apr 2000, Mark Hammond wrote: >... > I see these possible solutions: > > * The trash mechanism is changed to keep a list of (address, > deallocator) pairs. This is a "cleaner" solution, as the list is > not considered holding PyObjects as such, just blocks of memory to > be freed with a custom allocator. Thus, we never end up in a > position where a Python objects are resurrected - we just defer the > actual memory deallocation, rather than attempting a delayed object > destruction. This may not be as trivial to implement as to describe > :-) > > * Debug builds disable the trash mechanism. Not desired as the > basic behaviour of the interpreter will change, making bug tracking > with debug builds difficult! If we went this way, I would (try to > :-) insist that the Windows debug builds dropped Py_DEBUG, as I > really want to avoid the scenario that switching to a debug build > changes the behaviour to this extent. > > * Perform further hacks, so that Py_ForgetReference() gracefully > handles NULL linked-list elements etc. > > Any thoughts? Option 4: lose the trashcan mechanism. I don't think the free-threading issue was ever resolved. Cheers, -g -- Greg Stein, http://www.lyra.org/ From esr at thyrsus.com Thu Apr 13 04:56:38 2000 From: esr at thyrsus.com (esr at thyrsus.com) Date: Wed, 12 Apr 2000 22:56:38 -0400 Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: <20000412225638.E9002@thyrsus.com> Ka-Ping Yee : > > XML? > > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). It has no semantics. Anyone who tries to tell you otherwise > is probably a marketing drone or has been brainwashed by the buzzword > brigade. Heh. What he said. Squared. Describing XML as a "language" around an old-time LISPer like me (or a new-time one like Ping) is a damn good way to get your eyebrows singed. -- Eric S. Raymond "...quemadmodum gladius neminem occidit, occidentis telum est." [...a sword never kills anybody; it's a tool in the killer's hand.] -- (Lucius Annaeus) Seneca "the Younger" (ca. 4 BC-65 AD), From tim_one at email.msn.com Thu Apr 13 05:54:15 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 12 Apr 2000 23:54:15 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> Message-ID: <000001bfa4fb$f07e0340$3e2d153f@tim> Lisp systems for 40+ years traditionally supported user-muckable "property lists" on all symbols, which were basically arbitrary dicts w/ a clumsy syntax. No disaster ensued; to the contrary, it was often handy. So +0 from me on the add-attrs-to-funcs idea. The same idea applies to all objects, of course, but attrs-on-funcs has some bang for the buck (adding a dict to e.g. each int object would be a real new burden with little payback). -1 on any notion of restricting attr values to be immutable. [Gordon] > Having to be explicit about the method <-> regex / rule would > severely damage SPARK's elegance. That's why I'm only +0 instead of +1: SPARK won't switch to use the new method anyway, because the beauty of abusing docstrings is that it's syntactically *obvious*. There already exist any number of other ways to associate arbitrary info with arbitrary objects, and it's no mystery why SPARK and Zope avoided all of them in favor of docstring abuse. > It would make Tim's doctest useless. This one not so: doctest is *not* meant to warp docstrings toward testing purposes; it's intended that docstrings remain wholly for human-friendly documentation. What doctest does is give you a way to guarantee that the elucidating examples good docstrings *should have anyway* work exactly as advertised (btw, doctest examples found dozens of places in my modules that needed to be fixed to recover from 1.6 alpha no longer sticking a trailing "L" on str(long) -- if you're not using doctest every day, you're an idiot ). If I could add an attr to funcs, though, *then* I'd think about changing doctest to also run examples in any e.g. func.doctest attrs it could find, and that *new* mechanism would be warped toward testing purposes. Indeed, I think that would be an excellent use for the new facility. namespaces-are-one-honking-great-etc-ly y'rs - tim From tim_one at email.msn.com Thu Apr 13 07:00:29 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 01:00:29 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <000701bfa505$31008380$4d2d153f@tim> [Jeremy Hylton]> > It doesn't seem like tail-recursion is the issue, rather we need to > define some rules about when to end the recursion. If I understand > what is being suggest, it is to create a worklist of subobjects to > compare instead of making recursive calls to compare. This change > would turn the core dump into an infinite loop; I guess that's an > improvement, but not much of one. > > ... > > So the real problem is defining some reasonable semantics for > comparison of recursive objects. I think this is exactly a graph isomorphism problem, since Python always compares "by value" (so isomorphism is the natural generalization). This isn't hard (!= tedious, alas) to define or to implement naively, but a straightforward implementation would be very expensive at runtime compared to the status quo. That's why "real languages" would rather suffer an infinite loop. It's expensive because there's no cheap way to know whether you have a loop in an object. An anal compromise would be to run comparisons full speed without trying to detect loops, but if the recursion got "too deep" break out and start over with an expensive alternative that does check for loops. The later requires machinery similar to copy.deepcopy's. > ... > I think the comparison ought to return false or raise a ValueError. After a = [] b = [] a.append(a) b.append(b) it certainly "ought to be" the case that a == b in Python. "false" makes no sense. ValueError makes no sense either unless we incur the expense of proving first that at least one object does contain a loop (as opposed to that it's just possibly nested thousands of levels deep) -- but then we may as well implement an isomorphism discriminator. > I'm not sure which is right. It seems odd to me that comparing two > builtin lists could ever raise an exception, but it may be more > Pythonic to raise an exception in the face of ambiguity. As the > X3J13 committee noted: Lisps have more notions of "equality" than Python 1.6 has flavors of strings . Python has only one notion of equality (conveniently ignoring that it actually has two ). The thing the Lisp people argue about is which of the three or four notions of equality to apply at varying levels when trying to compute one of their *other* notions of equality -- there *can't* be a universally "good" answer to that mess. Python's life is easier here. in-concept-if-not-in-implementation-ly y'rs - tim From effbot at telia.com Thu Apr 13 08:24:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 08:24:17 +0200 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: Message-ID: <003101bfa511$06c89920$34aab5d4@hagrid> Mark Hammond wrote: > Nah - a comparable number :-) tuples can also be compared. > if sys.hexversion >= 0x01060100: # Require Python 1.6 or later! if sys.versiontuple >= (1, 6, 1): ... From moshez at math.huji.ac.il Thu Apr 13 09:10:30 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 13 Apr 2000 09:10:30 +0200 (IST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Message-ID: [Ping] > But you're right, i've never heard of another language that can handle > configurable encodings right in the source code. [The eff-bot] > XML? [Ping] > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). It has no semantics. Anyone who tries to tell you otherwise > is probably a marketing drone or has been brainwashed by the buzzword > brigade. Of coursem but "everything is a tree". If you put Python in XML by having the parse-tree serialized, then you can handle any encoding in the source file, by snarfing it from XML. not-in-favour-of-Python-in-XML-but-this-is-sure-to-encourage-Greg-Wilson-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer at tismer.com Thu Apr 13 12:50:05 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 13 Apr 2000 12:50:05 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F5A65D.5C2666B5@tismer.com> Greg Stein wrote: > > On Thu, 13 Apr 2000, Mark Hammond wrote: > >... > > I see these possible solutions: > > > > * The trash mechanism is changed to keep a list of (address, > > deallocator) pairs. This is a "cleaner" solution, as the list is > > not considered holding PyObjects as such, just blocks of memory to > > be freed with a custom allocator. Thus, we never end up in a > > position where a Python objects are resurrected - we just defer the > > actual memory deallocation, rather than attempting a delayed object > > destruction. This may not be as trivial to implement as to describe > > :-) This one sounds quite hard to implement. > > * Debug builds disable the trash mechanism. Not desired as the > > basic behaviour of the interpreter will change, making bug tracking > > with debug builds difficult! If we went this way, I would (try to > > :-) insist that the Windows debug builds dropped Py_DEBUG, as I > > really want to avoid the scenario that switching to a debug build > > changes the behaviour to this extent. I vote for this one at the moment. > > * Perform further hacks, so that Py_ForgetReference() gracefully > > handles NULL linked-list elements etc. > > > > Any thoughts? > > Option 4: lose the trashcan mechanism. I don't think the free-threading > issue was ever resolved. Option 5: Forget about free threading, change trashcan in a way that it doesn't change the order of destruction, doesn't need memory at all, and therefore does not change anything if it is disabled in debug mode. cheers - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From ping at lfw.org Thu Apr 13 13:22:56 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 04:22:56 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <14580.39114.631398.101252@amarok.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Andrew M. Kuchling wrote: > Ka-Ping Yee writes: > >Here is what i have in mind: provide two hooks > > __builtins__.display(object) > >and > > __builtins__.displaytb(traceback, exception) > > Shouldn't these be in sys, along with sys.ps1 and sys.ps2? We don't > want to add new display() and displaytb() built-ins, do we? Yes, you're right, they belong in sys. For a while i was under the delusion that you could customize more than one sub-interpreter by giving each one a different modified __builtins__, but that's an rexec thing and completely the wrong approach. Looks like the right approach to customizing sub-interpreters is to generalize the interface of code.InteractiveInterpreter and add more options to code.InteractiveConsole. sys.display and sys.displaytb would then be specifically for tweaking the main interactive interpreter only (just like sys.ps1 and sys.ps2). Still quite worth it, i believe, so i'll proceed. -- ?!ng "You should either succeed gloriously or fail miserably. Just getting by is the worst thing you can do." -- Larry Smith From effbot at telia.com Thu Apr 13 13:06:57 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 13:06:57 +0200 Subject: [Python-Dev] if key in dict? Message-ID: <014901bfa538$637c37e0$34aab5d4@hagrid> now that we have the sq_contains slot, would it make sense to add support for "key in dict" ? after all, if key in dict: ... is a bit more elegant than: if dict.has_key(key): ... and much faster than: if key in dict.keys(): ... (the drawback is that once we add this, some people might ex- pect dictionaries to behave like sequences in others ways too...) (and yes, this might break code that looks for tp_as_sequence before looking for tp_as_mapping. haven't found any code like that, but I might have missed something). whaddyathink? From gstein at lyra.org Thu Apr 13 13:14:56 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 04:14:56 -0700 (PDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F5A65D.5C2666B5@tismer.com> Message-ID: On Thu, 13 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Option 4: lose the trashcan mechanism. I don't think the free-threading > > issue was ever resolved. > > Option 5: Forget about free threading, change trashcan in a way > that it doesn't change the order of destruction, doesn't need > memory at all, and therefore does not change anything if it is > disabled in debug mode. hehe... :-) Definitely possible. Seems like you could just statically allocate an array of PyObject* and drop the pointers in there (without an INCREF or anything). Place them there, in order. Dunno about the debug stuff, and how that would affect it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu Apr 13 13:19:32 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 04:19:32 -0700 (PDT) Subject: [Python-Dev] if key in dict? In-Reply-To: <014901bfa538$637c37e0$34aab5d4@hagrid> Message-ID: On Thu, 13 Apr 2000, Fredrik Lundh wrote: > now that we have the sq_contains slot, would it make > sense to add support for "key in dict" ? > > after all, > > if key in dict: > ... The counter has always been, "but couldn't that be read as 'if value in dict' ??" Or maybe 'if (key, value) in dict' ?? People have different impressions of what "in" should mean for a dict. And some people change their impression from one function to the next :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Thu Apr 13 11:22:27 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 11:22:27 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F591D3.32CD3B2A@lemburg.com> I think we should put the discussion back on track again... We were originally talking about proposals to integrate #pragmas into Python source. These pragmas are (for now) intended to provide information to the Python byte code compiler, so that it can make certain assumptions on a per file basis. So far, there have been numerous proposals for all kinds of declarations and decorations of files, functions, methods, etc. As usual in Python Space, things got generalized to a point where people forgot about the original intent ;-) The current need for #pragmas is really very simple: to tell the compiler which encoding to assume for the characters in u"...strings..." (*not* "...8-bit strings..."). The idea behind this is that programmers should be able to use other encodings here than the default "unicode-escape" one. Perhaps someone has a better idea on how to signify this to the compiler ? Could be that we don't need this pragma discussion at all if there is a different, more elegant solution to this... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Thu Apr 13 13:50:02 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 04:50:02 -0700 (PDT) Subject: [Python-Dev] if key in dict? In-Reply-To: Message-ID: On Thu, 13 Apr 2000, Greg Stein wrote: > On Thu, 13 Apr 2000, Fredrik Lundh wrote: > > now that we have the sq_contains slot, would it make > > sense to add support for "key in dict" ? > > > > after all, > > > > if key in dict: > > ... > > The counter has always been, "but couldn't that be read as 'if value in > dict' ??" I've been quite happy with "if key in dict". I forget if i already made this analogy when it came up in regard to the issue of supporting a "set" type, but if you think of it like a real dictionary -- when someone asks you if a particular word is "in the dictionary", you look it up in the keys of the dictionary, not in the definitions. And it does read much better than has_key, and makes it easier to use dicts like sets. So i think it would be nice, though i've seen this meet opposition before. -- ?!ng "You should either succeed gloriously or fail miserably. Just getting by is the worst thing you can do." -- Larry Smith From effbot at telia.com Thu Apr 13 13:50:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 13:50:17 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> Message-ID: <017b01bfa53e$748cc080$34aab5d4@hagrid> M.-A. Lemburg wrote: > The current need for #pragmas is really very simple: to tell > the compiler which encoding to assume for the characters > in u"...strings..." (*not* "...8-bit strings..."). why not? why keep on pretending that strings and strings are two different things? it's an artificial distinction, and it only causes problems all over the place. > Could be that we don't need this pragma discussion at all > if there is a different, more elegant solution to this... here's one way: 1. standardize on *unicode* as the internal character set. use an encoding marker to specify what *external* encoding you're using for the *entire* source file. output from the tokenizer is a stream of *unicode* strings. 2. if the user tries to store a unicode character larger than 255 in an 8-bit string, raise an OverflowError. 3. the default encoding is "none" (instead of XML's "utf-8"). in this case, treat the script as an ascii superset, and store each string literal as is (character-wise, not byte-wise). additional notes: -- item (3) is for backwards compatibility only. might be okay to change this in Py3K, but not before that. -- leave the implementation of (1) to 1.7. for now, assume that scripts have the default encoding, which means that (2) cannot happen. -- we still need an encoding marker for ascii supersets (how about ;-). however, it's up to the tokenizer to detect that one, not the parser. the parser only sees unicode strings. From tismer at tismer.com Thu Apr 13 13:56:18 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 13 Apr 2000 13:56:18 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F5B5E2.40B20B53@tismer.com> Greg Stein wrote: > > On Thu, 13 Apr 2000, Christian Tismer wrote: > > Greg Stein wrote: > >... > > > Option 4: lose the trashcan mechanism. I don't think the free-threading > > > issue was ever resolved. > > > > Option 5: Forget about free threading, change trashcan in a way > > that it doesn't change the order of destruction, doesn't need > > memory at all, and therefore does not change anything if it is > > disabled in debug mode. > > hehe... :-) > > Definitely possible. Seems like you could just statically allocate an > array of PyObject* and drop the pointers in there (without an INCREF or > anything). Place them there, in order. Dunno about the debug stuff, and > how that would affect it. I could even better use the given objects-to-be-destroyed as an explicit stack. Similar to what the debug delloc does, I may abuse the type pointer as a stack pointer. Since the refcount is zero, it can be abused to store a type code (we have only 5 types to distinguish here), and there is enough room for some state like a loop counter as well. Given that, I can build a destructor without recursion, but with an explicit stack and iteration. It would not interfere with anything, since it actually does the same thing, just in a different way, but in the same order, without mallocs. Should I try it? (say no and I'll do it anyway:) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From skip at mojam.com Thu Apr 13 15:34:53 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 08:34:53 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F591D3.32CD3B2A@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> Message-ID: <14581.52477.70286.774494@beluga.mojam.com> Marc> We were originally talking about proposals to integrate #pragmas Marc> ... Minor nit... How about we lose the "#" during these discussions so we aren't all subliminally disposed to embed pragmas in comments or to add the C preprocessor to Python? ;-) -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From skip at mojam.com Thu Apr 13 15:39:47 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 08:39:47 -0500 (CDT) Subject: [Python-Dev] if key in dict? In-Reply-To: References: Message-ID: <14581.52771.512393.600949@beluga.mojam.com> Ping> I've been quite happy with "if key in dict". I forget if i Ping> already made this analogy when it came up in regard to the issue Ping> of supporting a "set" type, but if you think of it like a real Ping> dictionary -- when someone asks you if a particular word is "in Ping> the dictionary", you look it up in the keys of the dictionary, not Ping> in the definitions. Also, for many situations, "if value in dict" will be extraordinarily inefficient. In "in" semantics are added to dicts, a corollary move will be to extend this functionality to other non-dict mappings (e.g., file-based mapping objects like gdbm). Implementing "in" for them would be excruciatingly slow if the LHS was "value". To not break the rule of least astonishment when people push large dicts to disk, the only feasible implementation is "if key in dict". -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From fdrake at acm.org Thu Apr 13 15:46:44 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 09:46:44 -0400 (EDT) Subject: [Python-Dev] if key in dict? In-Reply-To: <14581.52771.512393.600949@beluga.mojam.com> References: <14581.52771.512393.600949@beluga.mojam.com> Message-ID: <14581.53188.587479.280569@seahag.cnri.reston.va.us> Skip Montanaro writes: > Also, for many situations, "if value in dict" will be extraordinarily > inefficient. In "in" semantics are added to dicts, a corollary move will be > to extend this functionality to other non-dict mappings (e.g., file-based > mapping objects like gdbm). Implementing "in" for them would be > excruciatingly slow if the LHS was "value". To not break the rule of least > astonishment when people push large dicts to disk, the only feasible > implementation is "if key in dict". Skip, Performance issues aside, I can see very valid reasons for the x in "x in dict" to be either the key or (key, value) pair. For this reason, I've come to consider "x in dict" a mis-feature, though I once pushed for it as well. It may be easy to explain that x is just the key, but it's not clearly the only reasonably desirable semantic. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Thu Apr 13 16:26:01 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 10:26:01 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <017b01bfa53e$748cc080$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> Message-ID: <14581.55545.30446.471809@seahag.cnri.reston.va.us> Fredrik Lundh writes: > -- item (3) is for backwards compatibility only. might be okay to > change this in Py3K, but not before that. > > -- leave the implementation of (1) to 1.7. for now, assume that > scripts have the default encoding, which means that (2) cannot > happen. We shouldn't need to change it then; Unicode editing capabilities will be pervasive by then, right? Oh, heck, it might even be legacy support by then! ;) Seriously, I'd hesitate to change any interpretation of default encoding until Unicode support is pervasive and fully automatic in tools like Notepad, vi/vim, XEmacs, and BBedit/Alpha (or whatever people use on MacOS these days). If I can't use teco on it, we're being too pro-active! ;) > -- we still need an encoding marker for ascii supersets (how about > ;-). however, it's up to > the tokenizer to detect that one, not the parser. the parser only > sees unicode strings. Agreed here. But shouldn't that be: This is war, I tell you, war! ;) Now, just need to hack the exec(2) call on all the Unices so that is properly recognized and used to run the scripts properly, obviating the need for those nasty shbang lines! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov at inrialpes.fr Thu Apr 13 17:22:49 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 13 Apr 2000 17:22:49 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) Message-ID: <200004131522.RAA05137@python.inrialpes.fr> Obviously, the user-attr proposal is not so "simple" as it looks like. I wish we all realize what's really going on here. In all cited use cases for this proposal, functions are no more perceived as functions per se, but as data structures (objects) which are the target of the computation. IOW, functions are just considered as *instances* of a class (inheriting from the builtin "PyFunction" class) with user-attributes, having a state, and eventually a set of operations bound to them. I guess that everybody realized that with this proposal, one could bind not only doc strings, but also functions to the function. def func(): pass def serialize(): ... func.pack = serialize func.pack() What is this? This is manual instance customization. Since nobody said that this would be done 'exceptionally', but rather on a regular basis for all functions (and generally, for all objects) in a program, the way to customize instances after the fact, makes those instances singletons of user-defined classes. You may say "so what?". Well, this is fine if it were part of the object model from the start. And there's no reason why only functions and methods can have this functionality. Stick the __dict__ slot in the object header and let me bind user-attributes to all objects. I have a favorite number, 7, so I want to label it Vlad's number. seven = 7; seven.fanclub = ['Vlad']. I want to add a boolean func to all numbers, n.is_prime(). I want to have a s.zip() method for a set of strings in my particular application, not only the builtin ones. Why is it not allowed to have this today? Think about it! How would you solve your app needs today? Through classes and instances. That's the prescribed `legal' way to do customized objects; no shortcuts. Saying that mucking with functions' __doc__ strings is the only way to implement some functionality is simply not true. In short, there's no way I can accept this proposal in its current state and make the distingo between functions/methods and other kinds of objects (including 3rd party ones). If we're to go down this road, treat all objects as equal citizens in this regard. None or all. The object model must remain consistent. This proposal opens a breach in it. And not the lightest! And this is only part of the reasons why I'm still firmly -1 until P3K. Glad to see that Barry exposed some of the truth about it, after preserving our throats, i.e. he understood that we understood that he fully understood the power of namespaces, but eventually decided to propose a fraction of a significant change reserved for the next major Python release... wink >>> wink.fraction = 1e+-1 >>> wink.fraction.precision = 1e-+1 >>> wink.compute() 0.0 -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From effbot at telia.com Thu Apr 13 17:36:34 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 17:36:34 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> Message-ID: <007c01bfa55e$0faf0360$34aab5d4@hagrid> > Modified Files: > sysmodule.c > Log Message: > > Define version_info to be a tuple (major, minor, micro, level); level > is a string "a2", "b1", "c1", or '' for a final release. maybe level should be chosen so that version_info for a final release is larger than version_info for the corresponding beta ? From akuchlin at mems-exchange.org Thu Apr 13 17:39:43 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 13 Apr 2000 11:39:43 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <007c01bfa55e$0faf0360$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.59967.326442.73539@amarok.cnri.reston.va.us> Fredrik Lundh writes: >> Define version_info to be a tuple (major, minor, micro, level); level >> is a string "a2", "b1", "c1", or '' for a final release. >maybe level should be chosen so that version_info for a final >release is larger than version_info for the corresponding beta ? 'a2' < 'b1' < 'c1' < 'final' --amk From fdrake at acm.org Thu Apr 13 17:41:32 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 11:41:32 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <007c01bfa55e$0faf0360$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.60076.525602.848031@seahag.cnri.reston.va.us> Fredrik Lundh writes: > maybe level should be chosen so that version_info for a final > release is larger than version_info for the corresponding beta ? I thought about that, but didn't like it; should it perhaps be 'final'? If the purpose is to simply make it increase monotonically like sys.hexversion, why not just use sys.hexversion? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin at mems-exchange.org Thu Apr 13 17:44:19 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 13 Apr 2000 11:44:19 -0400 (EDT) Subject: [Python-Dev] >2GB Data.fs files on FreeBSD In-Reply-To: References: Message-ID: <14581.60243.557955.192783@amarok.cnri.reston.va.us> [Cc'ed to python-dev from the zope-dev mailing list; trim your follow-ups appropriately] R. David Murray writes: >So it looks like there is a problem using Zope with a large database >no matter what the platform. Has anyone figured out how to fix this? ... >But given the number of people who have said "use FreeBSD if you want >big files", I'm really wondering about this. What if later I >have an application where I really need a >2GB database? Different system calls are used for large files, because you can no longer use 32-bit ints to store file position. There's a HAVE_LARGEFILE_SUPPORT #define that turns on the use of these alternate system calls; see Python's configure.in for the test used to detect when it should be turned on. You could just hack the generated config.h to turn on large file support and recompile your copy of Python, but if the configure.in test is incorrect, that should be fixed. The test is: AC_MSG_CHECKING(whether to enable large file support) if test "$have_long_long" = yes -a \ "$ac_cv_sizeof_off_t" -gt "$ac_cv_sizeof_long" -a \ "$ac_cv_sizeof_long_long" -ge "$ac_cv_sizeof_off_t"; then AC_DEFINE(HAVE_LARGEFILE_SUPPORT) AC_MSG_RESULT(yes) else AC_MSG_RESULT(no) fi I thought you have to use the loff_t type instead of off_t; maybe this test should check for it instead? Anyone know anything about large file support? -- A.M. Kuchling http://starship.python.net/crew/amk/ When I dream, sometimes I remember how to fly. You just lift one leg, then you lift the other leg, and you're not standing on anything, and you can fly. -- Chloe Russell, in SANDMAN #43: "Brief Lives:3" From effbot at telia.com Thu Apr 13 17:44:57 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 17:44:57 +0200 Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us><007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> Message-ID: <008c01bfa55f$3af9f880$34aab5d4@hagrid> > Fredrik Lundh writes: > > maybe level should be chosen so that version_info for a final > > release is larger than version_info for the corresponding beta ? > > I thought about that, but didn't like it; should it perhaps be > 'final'? If the purpose is to simply make it increase monotonically > like sys.hexversion, why not just use sys.hexversion? readability? the sys.hexversion stuff isn't exactly obvious: >>> dir(sys) ... 'hexversion' ... >>> sys.hexversion 17170594 eh? is that version 1.71, or what? "final" is okay, I think. better than "f0", at least ;-) From fdrake at acm.org Thu Apr 13 17:56:38 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 11:56:38 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <008c01bfa55f$3af9f880$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> Message-ID: <14581.60982.631891.629922@seahag.cnri.reston.va.us> Fredrik Lundh writes: > readability? But hexversion retains the advantage that it's been there longer, and that's just too hard to change at this point. (Guido didn't leave the keys to his time machine...) > the sys.hexversion stuff isn't exactly obvious: I didn't say hexversion was pretty or that anyone liked it! Writing the docs, version_info is a *lot* easier to explain. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Thu Apr 13 17:55:08 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 17:55:08 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> Message-ID: <38F5EDDC.731E6740@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > The current need for #pragmas is really very simple: to tell > > the compiler which encoding to assume for the characters > > in u"...strings..." (*not* "...8-bit strings..."). > > why not? Because plain old 8-bit strings should work just as before, that is, existing scripts only using 8-bit strings should not break. > why keep on pretending that strings and strings are two > different things? it's an artificial distinction, and it only > causes problems all over the place. Sure. The point is that we can't just drop the old 8-bit strings... not until Py3K at least (and as Fred already said, all standard editors will have native Unicode support by then). So for now we're stuck with Unicode *and* 8-bit strings and have to make the two meet somehow -- which isn't all that easy, since 8-bit strings carry no encoding information. > > Could be that we don't need this pragma discussion at all > > if there is a different, more elegant solution to this... > > here's one way: > > 1. standardize on *unicode* as the internal character set. use > an encoding marker to specify what *external* encoding you're > using for the *entire* source file. output from the tokenizer is > a stream of *unicode* strings. Yep, that would work in Py3K... > 2. if the user tries to store a unicode character larger than 255 > in an 8-bit string, raise an OverflowError. There are no 8-bit strings in Py3K -- only 8-bit data buffers which don't have string methods ;-) > 3. the default encoding is "none" (instead of XML's "utf-8"). in > this case, treat the script as an ascii superset, and store each > string literal as is (character-wise, not byte-wise). Uhm. I think UTF-8 will be the standard for text file formats by then... so why not make it UTF-8 ? > additional notes: > > -- item (3) is for backwards compatibility only. might be okay to > change this in Py3K, but not before that. > > -- leave the implementation of (1) to 1.7. for now, assume that > scripts have the default encoding, which means that (2) cannot > happen. I'd say, leave all this to Py3K. > -- we still need an encoding marker for ascii supersets (how about > ;-). however, it's up to > the tokenizer to detect that one, not the parser. the parser only > sees unicode strings. Hmm, the tokenizer doesn't do any string -> object conversion. That's a task done by the parser. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Thu Apr 13 18:06:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 18:06:53 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> Message-ID: <38F5F09D.53E323EF@lemburg.com> Skip Montanaro wrote: > > Marc> We were originally talking about proposals to integrate #pragmas > Marc> ... > > Minor nit... How about we lose the "#" during these discussions so we > aren't all subliminally disposed to embed pragmas in comments or to add the > C preprocessor to Python? ;-) Hmm, anything else would introduce a new keyword, I guess. And new keywords cause new scripts to fail in old interpreters even when they don't use Unicode at all and only include per convention. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Thu Apr 13 18:16:55 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 11:16:55 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F5F09D.53E323EF@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> Message-ID: <14581.62199.122899.126940@beluga.mojam.com> Marc> Skip Montanaro wrote: >> Minor nit... How about we lose the "#" during these discussions so >> we aren't all subliminally disposed to embed pragmas in comments or >> to add the C preprocessor to Python? ;-) Marc> Hmm, anything else would introduce a new keyword, I guess. And new Marc> keywords cause new scripts to fail in old interpreters even when Marc> they don't use Unicode at all and only include is> per convention. My point was only that using "#pragma" (or even "pragma") sort of implies we have our eye on a solution, but I don't think we're far enough down the path of answering what we want to have any concrete ideas about how to implement it. I think this thread started (more-or-less) when Guido posted an idea that originally surfaced on the idle-dev list about using "global ..." to implement functionality like this. It's not clear to me at this point what the best course might be. Skip From fdrake at acm.org Thu Apr 13 18:31:50 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 12:31:50 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F5F09D.53E323EF@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> Message-ID: <14581.63094.538920.187344@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Hmm, anything else would introduce a new keyword, I guess. And > new keywords cause new scripts to fail in old interpreters > even when they don't use Unicode at all and only include > per convention. Only if the new keyword is used in the script or anything it imports. This is exactly like using new syntax (u'...') or new library features (unicode('abc', 'iso-8859-1')). I can't think of anything that gets included "by convention" that breaks anything. I don't recall a proposal that we should casually add pragmas to our scripts if there's no need to do so. Adding pragmas to library modules is *not* part of the issue; they'd only be there if the version of Python they're part of supports the syntax. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Thu Apr 13 18:47:52 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 12:47:52 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F4CF7A.8F99562F@prescod.net> References: <38F4CF7A.8F99562F@prescod.net> Message-ID: <14581.64056.727047.412805@seahag.cnri.reston.va.us> Paul Prescod writes: > The XML rule is one encoding per file. One thing that I think that they > did innovate in (I had nothing to do with that part) is that entities I think an important part of this is that the location of the encoding declaration is completely fixed; it can't start five lines down (after all, it might be hard to know what a line is!). If we say, "The first character of a Python source file must be '#', or assume native encoding.", we go a long way to figuring out what's a line (CR/LF/CRLF can be dealt with in a "universal" fashion), so we can deal with something a little farther down, but I'd hate to be so flexible that it became too tedious to implement. I'd be more accepting of encoding declarations embedded in comments than pragmas. (Not that I *like* abusing comments like that.) So perhaps a Python encoding declaration becomes: #?python encoding="iso-8859-7"?# ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Thu Apr 13 18:51:35 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 18:51:35 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F4CF7A.8F99562F@prescod.net> <14581.64056.727047.412805@seahag.cnri.reston.va.us> Message-ID: <003901bfa568$890a5200$34aab5d4@hagrid> Fred wrote: > #?python encoding="iso-8859-7"?# like in: #!/usr/bin/python #?python encoding="utf-8" tabsize=5 if __name__ == "__main__": print "hello!" I've seen worse... From effbot at telia.com Thu Apr 13 18:52:44 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 18:52:44 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> Message-ID: <003a01bfa568$b190c560$34aab5d4@hagrid> M.-A. Lemburg wrote: > Fredrik Lundh wrote: > > > > M.-A. Lemburg wrote: > > > The current need for #pragmas is really very simple: to tell > > > the compiler which encoding to assume for the characters > > > in u"...strings..." (*not* "...8-bit strings..."). > > > > why not? > > Because plain old 8-bit strings should work just as before, > that is, existing scripts only using 8-bit strings should not break. but they won't -- if you don't use an encoding directive, and don't use 8-bit characters in your string literals, everything works as before. (that's why the default is "none" and not "utf-8") if you use 8-bit characters in your source code and wish to add an encoding directive, you need to add the right encoding directive... > > why keep on pretending that strings and strings are two > > different things? it's an artificial distinction, and it only > > causes problems all over the place. > > Sure. The point is that we can't just drop the old 8-bit > strings... not until Py3K at least (and as Fred already > said, all standard editors will have native Unicode support > by then). I discussed that in my original "all characters are unicode characters" proposal. in my proposal, the standard string type will have to roles: a string either contains unicode characters, or binary bytes. -- if it contains unicode characters, python guarantees that methods like strip, lower (etc), and regular expressions work as expected. -- if it contains binary data, you can still use indexing, slicing, find, split, etc. but they then work on bytes, not on chars. it's still up to the programmer to keep track of what a certain string object is (a real string, a chunk of binary data, an en- coded string, a jpeg image, etc). if the programmer wants to convert between a unicode string and an external encoding to use a certain unicode encoding, she needs to spell it out. the codecs are never called "under the hood". (note that if you encode a unicode string into some other encoding, the result is binary buffer. operations like strip, lower et al does *not* work on encoded strings). > So for now we're stuck with Unicode *and* 8-bit strings > and have to make the two meet somehow -- which isn't all > that easy, since 8-bit strings carry no encoding information. in my proposal, both string types hold unicode strings. they don't need to carry any encoding information, because they're not encoded. > > > Could be that we don't need this pragma discussion at all > > > if there is a different, more elegant solution to this... > > > > here's one way: > > > > 1. standardize on *unicode* as the internal character set. use > > an encoding marker to specify what *external* encoding you're > > using for the *entire* source file. output from the tokenizer is > > a stream of *unicode* strings. > > Yep, that would work in Py3K... or 1.7 -- see below. > > 2. if the user tries to store a unicode character larger than 255 > > in an 8-bit string, raise an OverflowError. > > There are no 8-bit strings in Py3K -- only 8-bit data > buffers which don't have string methods ;-) oh, you've seen the Py3K specification? > > 3. the default encoding is "none" (instead of XML's "utf-8"). in > > this case, treat the script as an ascii superset, and store each > > string literal as is (character-wise, not byte-wise). > > Uhm. I think UTF-8 will be the standard for text file formats > by then... so why not make it UTF-8 ? in time for 1.6? or you mean Py3K? sure! I said that in my first "additional note", didn't I: > > additional notes: > > > > -- item (3) is for backwards compatibility only. might be okay to > > change this in Py3K, but not before that. > > > > -- leave the implementation of (1) to 1.7. for now, assume that > > scripts have the default encoding, which means that (2) cannot > > happen. > > I'd say, leave all this to Py3K. do you mean it's okay to settle for a broken design in 1.6, since we can fix it in Py3K? that's scary. fixing the design is not that hard, and can be done now. implementing all parts of it is harder, and require extensive changes to the compiler/interpreter architecture. but iirc, such changes are already planned for 1.7... > > -- we still need an encoding marker for ascii supersets (how about > > ;-). however, it's up to > > the tokenizer to detect that one, not the parser. the parser only > > sees unicode strings. > > Hmm, the tokenizer doesn't do any string -> object conversion. > That's a task done by the parser. "unicode string" meant Py_UNICODE*, not PyUnicodeObject. if the tokenizer does the actual conversion doesn't really matter; the point is that once the code has passed through the tokenizer, it's unicode. From bwarsaw at cnri.reston.va.us Thu Apr 13 18:59:03 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 12:59:03 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.64727.928889.239985@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> maybe level should be chosen so that version_info for a final FL> release is larger than version_info for the corresponding beta FL> ? Yes, absolutely. Please don't break the comparability of version_info or the connection with the patchversion.h macros. -Barry From fdrake at acm.org Thu Apr 13 19:05:17 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:05:17 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.64727.928889.239985@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> Message-ID: <14581.65101.110813.343483@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > Yes, absolutely. Please don't break the comparability of version_info > or the connection with the patchversion.h macros. So I'm the only person here today who prefers the release level of a final version to be '' instead of 'final'? Or did I miss all the messages of enthusiastic support for '' from my screaming fans? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw at cnri.reston.va.us Thu Apr 13 19:04:40 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:04:40 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> Message-ID: <14581.65064.13261.43476@anthem.cnri.reston.va.us> >>>>> "AMK" == Andrew M Kuchling writes: AMK> Fredrik Lundh writes: >> Define version_info to be a tuple (major, minor, micro, level); >> level is a string "a2", "b1", "c1", or '' for a final release. >> maybe level should be chosen so that version_info for a final >> release is larger than version_info for the corresponding beta >> ? AMK> 'a2' < 'b1' < 'c1' < 'final' Another reason I don't like the strings: 'b9' > 'b10' :( I can imagine a remote possibility of more than 9 pre-releases (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has to fit in 4 bits), so at the very least, make that string 'a02', 'a03', etc. -Barry From bwarsaw at cnri.reston.va.us Thu Apr 13 19:07:54 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:07:54 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> Message-ID: <14581.65258.431992.820885@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> readability? Yup. FL> "final" is okay, I think. better than "f0", at least ;-) And I think (but am not 100% positive) that once a final release comes out, Guido stops incrementing the PY_RELEASE_SERIAL's and instead starts incrementing PY_MICRO_VERSION. If that's not the case, then it complicates things a bit. -Barry From bwarsaw at cnri.reston.va.us Thu Apr 13 19:08:51 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:08:51 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> <14581.60982.631891.629922@seahag.cnri.reston.va.us> Message-ID: <14581.65315.489980.275044@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I didn't say hexversion was pretty or that anyone liked Fred> it! Writing the docs, version_info is a *lot* easier to Fred> explain. So is it easier to explain that the empty string means a final release or that 'final' means a final release? :) -Barry From fdrake at acm.org Thu Apr 13 19:11:19 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:11:19 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.65064.13261.43476@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> Message-ID: <14581.65463.994272.442725@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > I can imagine a remote possibility of more than 9 pre-releases > (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has > to fit in 4 bits), so at the very least, make that string 'a02', > 'a03', etc. Doesn't this further damage the human readability of the value? I thought that was an important reason to break it up from sys.hexversion. (Note also that you're not just saying more than 9 pre-releases, but more than 9 at any one of alpha, beta, or release candidate stages. 1-9 at each stage is already 27 pre-release packages.) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gmcm at hypernet.com Thu Apr 13 19:11:14 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 13:11:14 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131522.RAA05137@python.inrialpes.fr> Message-ID: <1256476619-52065132@hypernet.com> Vladimir Marangozov wrote: > > > Obviously, the user-attr proposal is not so "simple" as it looks like. This is not obvious to me. Both the concept and implementation appear fairly simple to me. > I wish we all realize what's really going on here. > > In all cited use cases for this proposal, functions are no more > perceived as functions per se, but as data structures (objects) > which are the target of the computation. IOW, functions are just > considered as *instances* of a class (inheriting from the builtin > "PyFunction" class) with user-attributes, having a state, and > eventually a set of operations bound to them. I don't see why they aren't still functions. Putting a rack on my bicycle doesn't make it a pickup truck. I think it's a real stretch to say they would become "instances of a class". There's no inheritance, and the "state" isn't visible inside the function (probably unfortunately ). Just like today, they are objects of type PyFunction, and they get called the same old way. You'll be able to hang extra stuff off them, just like today you can hang extra stuff off a module object without the module's knowledge or cooperation. > I guess that everybody realized that with this proposal, one could > bind not only doc strings, but also functions to the function. > > def func(): pass > def serialize(): ... > func.pack = serialize > func.pack() > > What is this? This is manual instance customization. What is "def"? What is f.__doc__ = ... ? > Since nobody said that this would be done 'exceptionally', but rather > on a regular basis for all functions (and generally, for all objects) > in a program, the way to customize instances after the fact, makes > those instances singletons of user-defined classes. Only according to a very loose definition of "instance" and "user-defined class". More accurately, they are objects as they always have been (oops, Barry screwed up the time- machine again; please adjust the tenses of the above). > You may say "so what?". Well, this is fine if it were part of the > object model from the start. And there's no reason why only functions > and methods can have this functionality. Stick the __dict__ slot in > the object header and let me bind user-attributes to all objects. Perceived need is part of this. > I have a favorite number, 7, so I want to label it Vlad's number. > seven = 7; seven.fanclub = ['Vlad']. I want to add a boolean func > to all numbers, n.is_prime(). I want to have a s.zip() method for a > set of strings in my particular application, not only the builtin ones. > > Why is it not allowed to have this today? Think about it! This is apparently a reducto ad absurdum argument. It's got the absurdum, but not much reducto. I prefer this one: Adding attributes to functions is immoral. Therefore func.__doc__ is immoral and should be removed. For another thing, we'll need a couple generations to argue about what to do with those 100 singleton int objects . > How would you solve your app needs today? Through classes and instances. > That's the prescribed `legal' way to do customized objects; no shortcuts. > Saying that mucking with functions' __doc__ strings is the only way to > implement some functionality is simply not true. No, it's a matter of convenience. Remember, Pythonistas are from Yorkshire ("You had Python??... You had assembler??.. You had front-panel toggle switches??.. You had wire- wrapping tools??.."). > In short, there's no way I can accept this proposal in its current > state and make the distingo between functions/methods and other kinds > of objects (including 3rd party ones). If we're to go down this road, > treat all objects as equal citizens in this regard. None or all. They are all first class objects already. Adding capabilities to one of them doesn't subtract them from any other. > The object model must remain consistent. This proposal opens a breach in it. > And not the lightest! In any sense in which you can apply the word "consistent" to Python's object model, I fail to see how this makes it less so. > And this is only part of the reasons why I'm still firmly -1 until P3K. > Glad to see that Barry exposed some of the truth about it, after preserving > our throats, i.e. he understood that we understood that he fully understood > the power of namespaces, but eventually decided to propose a fraction of > a significant change reserved for the next major Python release... wink > > >>> wink.fraction = 1e+-1 > >>> wink.fraction.precision = 1e-+1 > >>> wink.compute() > 0.0 I don't see anything here but an argument that allowing attributes on function objects makes them vaguely similar to instance objects. To the extent that I can agree with that, I fail to see any harm in it. - Gordon From fdrake at acm.org Thu Apr 13 19:16:15 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:16:15 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.65258.431992.820885@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> <14581.60982.631891.629922@seahag.cnri.reston.va.us> <14581.65315.489980.275044@anthem.cnri.reston.va.us> <14581.65258.431992.820885@anthem.cnri.reston.va.us> Message-ID: <14582.223.861189.614634@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > And I think (but am not 100% positive) that once a final release comes > out, Guido stops incrementing the PY_RELEASE_SERIAL's and instead > starts incrementing PY_MICRO_VERSION. If that's not the case, then > it complicates things a bit. patchlevel.h includes a comment that indicates serial should be 0 for final releases. > So is it easier to explain that the empty string means a final release > or that 'final' means a final release? :) I think it's the same; either is a special value. The only significant advantage of 'final' is the monotonicity provided by 'final'. I'm not convinced that it's otherwise any better. It also means to create a formatter version number from this that you need to special-case the last item in sys.version_info. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Thu Apr 13 19:15:14 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 19:15:14 +0200 Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us><007c01bfa55e$0faf0360$34aab5d4@hagrid><14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> Message-ID: <007301bfa56b$d8ec1ee0$34aab5d4@hagrid> > I can imagine a remote possibility of more than 9 pre-releases > (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has > to fit in 4 bits) or rather, "I can imagine a remote possibility of more than 5 pre-releases (counting from 1), but not more than 9 (since PY_RELEASE_SERIAL has to fit in a single decimal digit"? in the very unlikely case that I'm wrong, feel free to break the glass and install the following patch: #define PY_RELEASE_LEVEL_DESPAIR 0xD #define PY_RELEASE_LEVEL_EXTRAMUNDANE 0xE #define PY_RELEASE_LEVEL_FINAL 0xF /* Serial should be 0 here */ From bwarsaw at cnri.reston.va.us Thu Apr 13 19:17:30 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:17:30 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> Message-ID: <14582.298.938842.466851@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> So I'm the only person here today who prefers the release Fred> level of a final version to be '' instead of 'final'? Or Fred> did I miss all the messages of enthusiastic support for '' Fred> from my screaming fans? I've blocked those messages at your mta, so you would't be fooled into doing the wrong thing. I'll repost them to you, but only after you change it back to 'final' means final. Then you can be rightfully indignant at all of us losers who wanted it the other way, and caused you all that extra work! :) root-of-all-evil-ly y'rs, -Barry From fdrake at acm.org Thu Apr 13 19:20:24 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:20:24 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.298.938842.466851@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> <14582.298.938842.466851@anthem.cnri.reston.va.us> Message-ID: <14582.472.612445.191833@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > I've blocked those messages at your mta, so you would't be fooled into > doing the wrong thing. I'll repost them to you, but only after you I don't mind that, just don't stop the groupies! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip at mojam.com Thu Apr 13 19:24:35 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 12:24:35 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256476619-52065132@hypernet.com> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> Message-ID: <14582.723.427231.355475@beluga.mojam.com> Gordon> I don't see why they aren't still functions. Putting a rack on Gordon> my bicycle doesn't make it a pickup truck. Though putting a gun in the rack might... ;-) Skip From bwarsaw at cnri.reston.va.us Thu Apr 13 19:25:13 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:25:13 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> Message-ID: <14582.761.365390.946880@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> Doesn't this further damage the human readability of the Fred> value? A little, but it's a fine compromise between the various constraints. Another way you could structure that tuple is to split the PY_RELEASE_LEVEL and the PY_RELEASE_SERIAL. Make the former even more readable if you want, and make the latter a real int. Thus Python 1.6a2 would have a sys.version_info() of (1, 6, 0, 'alpha', 2), e.g. the form is: (major, minor, micro, level, serial) You can't use 'gamma' though because then you break comparability. Maybe use 'candidate' instead? Sigh. Fred> I thought that was an important reason to break it Fred> up from sys.hexversion. (Note also that you're not just Fred> saying more than 9 pre-releases, but more than 9 at any one Fred> of alpha, beta, or release candidate stages. 1-9 at each Fred> stage is already 27 pre-release packages.) Well, Guido hisself must have thought that there was a remote possibility of more than 9 releases at a particular level, otherwise he'd have jammed PY_RELEASE_SERIAL in 3 bits. I mean, there's no other possible explanation for his choices is there?! :) -Barry From fdrake at acm.org Thu Apr 13 19:31:04 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:31:04 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.761.365390.946880@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> Message-ID: <14582.1112.750322.6958@seahag.cnri.reston.va.us> bwarsaw at cnri.reston.va.us writes: > A little, but it's a fine compromise between the various constraints. > Another way you could structure that tuple is to split the > PY_RELEASE_LEVEL and the PY_RELEASE_SERIAL. Make the former even more > readable if you want, and make the latter a real int. Thus Python > 1.6a2 would have a sys.version_info() of (1, 6, 0, 'alpha', 2), > e.g. the form is: > > (major, minor, micro, level, serial) I've thought of this as well, and certainly prefer it to the 'a01' solution. > You can't use 'gamma' though because then you break comparability. > Maybe use 'candidate' instead? Sigh. Yeah. > Well, Guido hisself must have thought that there was a remote > possibility of more than 9 releases at a particular level, otherwise > he'd have jammed PY_RELEASE_SERIAL in 3 bits. I mean, there's no > other possible explanation for his choices is there?! :) Clearly. I'll have to break his heart when I release 1.6a16 this afternoon. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Thu Apr 13 19:32:41 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:32:41 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.723.427231.355475@beluga.mojam.com> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> <14582.723.427231.355475@beluga.mojam.com> Message-ID: <14582.1209.471995.242974@seahag.cnri.reston.va.us> Skip Montanaro writes: > Though putting a gun in the rack might... ;-) And make sure that rack is big enough for the dogs, we don't want them to feel left out! (Gosh, I'm feeling like I'm back in south-west Virginia already! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip at mojam.com Thu Apr 13 19:32:37 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 12:32:37 -0500 (CDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.761.365390.946880@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> Message-ID: <14582.1205.389790.293558@beluga.mojam.com> BAW> Thus Python 1.6a2 would have a sys.version_info() of (1, 6, 0, BAW> 'alpha', 2), e.g. the form is: BAW> (major, minor, micro, level, serial) BAW> You can't use 'gamma' though because then you break comparability. Yeah, you can. Don't use 'final'. Use 'omega'... ;-) Skip From bwarsaw at cnri.reston.va.us Thu Apr 13 19:35:05 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:35:05 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> <14582.298.938842.466851@anthem.cnri.reston.va.us> <14582.472.612445.191833@seahag.cnri.reston.va.us> Message-ID: <14582.1353.482124.111121@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I don't mind that, just don't stop the groupies! ;) Hey, take it from me, groupies are a dime a dozen. They ask you all kinds of boring questions like what kind of strings you use (or how fast your disk drives are). It's the "gropies" you want. 'Course, tappin' away at a keyboard that only makes one kind of annoying clicking sound and isn't midi-fied won't get you any gropies. Even if you're an amazing hunk of a bass god, it's tough (so you know I'm at a /severe/ disadvantage :) -Barry From bwarsaw at cnri.reston.va.us Thu Apr 13 19:38:29 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:38:29 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> <14582.1205.389790.293558@beluga.mojam.com> Message-ID: <14582.1557.128677.346938@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Thus Python 1.6a2 would have a sys.version_info() of (1, 6, BAW> 0, 'alpha', 2), e.g. the form is: BAW> (major, minor, micro, level, serial) BAW> You can't use 'gamma' though because then you break BAW> comparability. SM> Yeah, you can. Don't use 'final'. Use 'omega'... ;-) Or how 'bout: "zats the last one yer gonna git, ya peons, now leave me ALONE" ? -Barry From gmcm at hypernet.com Thu Apr 13 19:39:06 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 13:39:06 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.723.427231.355475@beluga.mojam.com> References: <1256476619-52065132@hypernet.com> Message-ID: <1256474945-52165985@hypernet.com> Skip wrote: > > Gordon> I don't see why they aren't still functions. Putting a rack on > Gordon> my bicycle doesn't make it a pickup truck. > > Though putting a gun in the rack might... ;-) Nah, I live in downeast Maine. I'd need a trailer hitch and snow- plow mount to qualify. - Gordon From skip at mojam.com Thu Apr 13 19:51:08 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 12:51:08 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.1209.471995.242974@seahag.cnri.reston.va.us> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> <14582.723.427231.355475@beluga.mojam.com> <14582.1209.471995.242974@seahag.cnri.reston.va.us> Message-ID: <14582.2316.638334.342115@beluga.mojam.com> Fred> Skip Montanaro writes: >> Though putting a gun in the rack might... ;-) Fred> And make sure that rack is big enough for the dogs, we don't want Fred> them to feel left out! They fit in the panniers. (They're minature german shorthair pointers...) extending-this-silliness-ly y'rs... Skip From Vladimir.Marangozov at inrialpes.fr Thu Apr 13 20:10:33 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 13 Apr 2000 20:10:33 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM Message-ID: <200004131810.UAA05752@python.inrialpes.fr> Gordon McMillan wrote: > > I don't see anything here but an argument that allowing > attributes on function objects makes them vaguely similar to > instance objects. To the extent that I can agree with that, I fail > to see any harm in it. > To the extent it encourages confusion, I think it sucks. >>> def this(): ... sucks = "no" ... >>> this.sucks = "yes" >>> >>> print this.sucks 'yes' Why on earth 'sucks' is not the object defined in the function's namespace? Who made that deliberate decision? Clearly 'this' defines a new namespace, so it'll be also legitimate to get a NameError, or to: >>> print this.sucks 'no' Don't you think? And don't explain to me that this is because there's a code object, different from the function object, which is compiled at the function's definition, then assotiated with the function object, blah, blah, blah... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jeremy at cnri.reston.va.us Thu Apr 13 21:08:12 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 15:08:12 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <000701bfa505$31008380$4d2d153f@tim> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> Message-ID: <14582.5791.148277.87450@walden> >>>>> "TP" == Tim Peters writes: TP> [Jeremy Hylton]> >> So the real problem is defining some reasonable semantics for >> comparison of recursive objects. TP> I think this is exactly a graph isomorphism problem, since TP> Python always compares "by value" (so isomorphism is the natural TP> generalization). I'm not familiar with any algorithms for the graph isomorphism problem, but I took a stab at a simple comparison algorithm. The idea is to detect comparisons that would cross back-edges in the object graphs. Instead of starting a new comparison, assume they are the same. If, in fact, the objects are not the same, they must differ in some other way; some other part of the comparison will fail. TP> This isn't hard (!= tedious, alas) to define or to implement TP> naively, but a straightforward implementation would be very TP> expensive at runtime compared to the status quo. That's why TP> "real languages" would rather suffer an infinite loop. TP> It's expensive because there's no cheap way to know whether you TP> have a loop in an object. My first attempt at implementing this is expensive. I maintain a dictionary that contains all the object pairs that are currently being compared. Specifically, the dictionary is used to implement a set of object id pairs. Every call to PyObject_Compare will add a new pair to the dictionary when it is called and remove it when it returns (except for a few trivial cases). A naive patch is included below. It does seem to involve a big performance hit -- more than 10% slower on pystone. It also uses a lot of extra space. Note that the patch has all its initialization code inline in PyObject_Compare; moving that elsewhere will help a little. It also use a bunch of function calls where macros would be more efficient. TP> An anal compromise would be to run comparisons full speed TP> without trying to detect loops, but if the recursion got "too TP> deep" break out and start over with an expensive alternative TP> that does check for loops. The later requires machinery similar TP> to copy.deepcopy's. It looks like the anal compromise might be necessary. I'll re-implement the patch more carefully and see what the real effect on performance is. Jeremy Index: object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 239c239 < "__repr__ returned non-string (type %.200s)", --- > "__repr__ returned non-string (type %s)", 276c276 < "__str__ returned non-string (type %.200s)", --- > "__str__ returned non-string (type %s)", 300a301,328 > static PyObject *cmp_state_key = NULL; > > static PyObject* > cmp_state_make_pair(v, w) > PyObject *v, *w; > { > PyObject *pair = PyTuple_New(2); > if (pair == NULL) > return NULL; > if ((long)v <= (long)w) { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); > } else { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); > } > return pair; > } > > void > cmp_state_clear_pair(dict, key) > PyObject *dict, *key; > { > PyDict_DelItem(dict, key); > Py_DECREF(key); > } > > 305a334,336 > PyObject *tstate_dict, *cmp_dict, *pair; > int result; > 311a343,376 > tstate_dict = PyThreadState_GetDict(); > if (tstate_dict == NULL) { > PyErr_BadInternalCall(); > return -1; > } > /* fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v, > v->ob_type->tp_name, (long)w, w->ob_type->tp_name); > */ > /* XXX should initialize elsewhere */ > if (cmp_state_key == NULL) { > cmp_state_key = PyString_InternFromString("compare_state"); > cmp_dict = PyDict_New(); > if (cmp_dict == NULL) > return -1; > PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); > } else { > cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key); > if (cmp_dict == NULL) > return NULL; > PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); > } > > pair = cmp_state_make_pair(v, w); > if (pair == NULL) { > PyErr_BadInternalCall(); > return -1; > } > if (PyDict_GetItem(cmp_dict, pair)) { > /* already comparing these objects. assume they're > equal until shown otherwise > */ > Py_DECREF(pair); > return 0; > } 316a382,384 > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } 317a386 > cmp_state_clear_pair(cmp_dict, pair); 329a399,401 > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } 344a417 > cmp_state_clear_pair(cmp_dict, pair); 350,364c423,425 < else if (PyUnicode_Check(v) || PyUnicode_Check(w)) { < int result = PyUnicode_Compare(v, w); < if (result == -1 && PyErr_Occurred() && < PyErr_ExceptionMatches(PyExc_TypeError)) < /* TypeErrors are ignored: if Unicode coercion < fails due to one of the arguments not < having the right type, we continue as < defined by the coercion protocol (see < above). Luckily, decoding errors are < reported as ValueErrors and are not masked < by this technique. */ < PyErr_Clear(); < else < return result; < } --- > cmp_state_clear_pair(cmp_dict, pair); > if (PyUnicode_Check(v) || PyUnicode_Check(w)) > return PyUnicode_Compare(v, w); 372c433,434 < if (vtp->tp_compare == NULL) --- > if (vtp->tp_compare == NULL) { > cmp_state_clear_pair(cmp_dict, pair); 374c436,439 < return (*vtp->tp_compare)(v, w); --- > } > result = (*vtp->tp_compare)(v, w); > cmp_state_clear_pair(cmp_dict, pair); > return result; From gstein at lyra.org Thu Apr 13 21:09:02 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 12:09:02 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.60,2.61 In-Reply-To: <200004131744.NAA30726@seahag.cnri.reston.va.us> Message-ID: It's great that you made this change! I hadn't got through my mail, but was going to recommend it... :-) One comment: On Thu, 13 Apr 2000, Fred Drake wrote: >... > --- 409,433 ---- > v = PyInt_FromLong(PY_VERSION_HEX)); > Py_XDECREF(v); > + /* > + * These release level checks are mutually exclusive and cover > + * the field, so don't get too fancy with the pre-processor! > + */ > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_ALPHA > + v = PyString_FromString("alpha"); > + #endif > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_BETA > + v = PyString_FromString("beta"); > + #endif > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_GAMMA > + v = PyString_FromString("candidate"); > + #endif > #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_FINAL > ! v = PyString_FromString("final"); > ! #endif > PyDict_SetItemString(sysdict, "version_info", > ! v = Py_BuildValue("iiiNi", PY_MAJOR_VERSION, > PY_MINOR_VERSION, > ! PY_MICRO_VERSION, v, > ! PY_RELEASE_SERIAL)); > Py_XDECREF(v); > PyDict_SetItemString(sysdict, "copyright", I would recommend using the "s" format code in Py_BuildValue. It simplifies the code, and it is quite a bit easier for a human to process. When I first saw the code, I thought "the level string leaks!" Then I saw the "N" code, went and looked it up, and realized what is going on. So... to avoid that, the "s" code would be great. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bitz at bitdance.com Thu Apr 13 21:12:34 2000 From: bitz at bitdance.com (R. David Murray) Date: Thu, 13 Apr 2000 15:12:34 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Andrew M. Kuchling wrote: > longer use 32-bit ints to store file position. There's a > HAVE_LARGEFILE_SUPPORT #define that turns on the use of these > alternate system calls; see Python's configure.in for the test used to I just looked in my python config.h on my FreeBSD system, and I see: #define HAVE_LARGEFILE_SUPPORT 1 So it looks like it is on, and it seems to me the problem could be in either Python or FileStorage.py in Zope. This is a Zope 2.1.2 system (but I diffed filestorage.py against the 2.1.6 version and didn't see any relevant changes) running on a FreeBSD 3.1 system. A make test in Python passed all tests, but I don't know if large file support is tested by the tests. --RDM PS: anyone from the python list replying to this please CC me as I am not on that list. From gmcm at hypernet.com Thu Apr 13 21:26:13 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 15:26:13 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131810.UAA05752@python.inrialpes.fr> References: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM Message-ID: <1256468519-52554453@hypernet.com> Vladimir Marangozov wrote: > Gordon McMillan wrote: > > > > I don't see anything here but an argument that allowing > > attributes on function objects makes them vaguely similar to > > instance objects. To the extent that I can agree with that, I fail > > to see any harm in it. > > > > To the extent it encourages confusion, I think it sucks. > > >>> def this(): > ... sucks = "no" > ... > >>> this.sucks = "yes" > >>> > >>> print this.sucks > 'yes' > > Why on earth 'sucks' is not the object defined in the function's namespace? Because that one is a local. Python allows the same name in different places. Used wisely, it's a handy feature of namespaces. > Who made that deliberate decision? What decision? To put a name "sucks" both in the function's locals and as a function attribute? To print something accessed with object.attribute notation in the obvious manner? Deciding not to cause gratuitous UnboundLocalErrors? This is nowhere near as confusing as, say, putting a module named X in a package named X and then saying "from X import *", (hi, Marc-Andre!). > Clearly 'this' defines a new namespace, > so it'll be also legitimate to get a NameError, or to: > > >>> print this.sucks > 'no' > > Don't you think? Only if you've done "this.sucks = 'no'". Or are you saying that if functions have attributes, people will all of a sudden expect that function locals will have initialized and maintained state? We certainly get plenty of newbie confusion about namespaces, assignment and scoping; maybe I've seen one or two where people thought function.local should be legal (do Python-tutors see this?). In those cases, is it the existence of function.__doc__ that causes the confusion? If yes, and this is a serious problem, then you should be arguing for the removal of __doc__. If not, why would allowing adding more attributes exacerbate the problem? > And don't explain to me that this is because there's a code object, > different from the function object, which is compiled at the function's > definition, then assotiated with the function object, blah, blah, blah... No problem. [Actually, the best argument against this I can see is that functional-types already try to use function objects where any sane person knows you should use an instance; and since this doesn't further their agenda, the bastard's will just scream louder ]. - Gordon From fdrake at acm.org Thu Apr 13 22:05:10 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 16:05:10 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.60,2.61 In-Reply-To: References: <200004131744.NAA30726@seahag.cnri.reston.va.us> Message-ID: <14582.10358.843823.467467@seahag.cnri.reston.va.us> Greg Stein writes: > I would recommend using the "s" format code in Py_BuildValue. It > simplifies the code, and it is quite a bit easier for a human to process. > When I first saw the code, I thought "the level string leaks!" Then I saw > the "N" code, went and looked it up, and realized what is going on. Good point; 'N' is relatively obscure in my experience as well. I've made the change (and there's probably less code in the binary as well!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bitz at bitdance.com Thu Apr 13 22:04:56 2000 From: bitz at bitdance.com (R. David Murray) Date: Thu, 13 Apr 2000 16:04:56 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: OK, some more info. The code in FileStorage.py looks like this: ------------------- def read_index(file, name, index, vindex, tindex, stop='\377'*8, ltid=z64, start=4, maxoid=z64): read=file.read seek=file.seek seek(0,2) file_size=file.tell() print file_size, start if file_size: if file_size < start: raise FileStorageFormatError, file.name [etc] ------------------- I stuck that print statement in there. The results of the print are: -2147248811L 4 So it looks to my uneducated eye like file.tell() is broken. The actual on-disk size of the file, by the way, is indeed 2147718485, so it looks like somebody's not using the right size data structure somewhere. So, can anyone tell me what to look for, or am I stuck for the moment? --RDM PS: anyone on pthon-dev replying please CC me as I am only on the zope list. From paul at prescod.net Thu Apr 13 22:55:43 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 13 Apr 2000 15:55:43 -0500 Subject: [Python-Dev] OT: XML References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> <20000412225638.E9002@thyrsus.com> Message-ID: <38F6344F.25D344B5@prescod.net> Well, as long as everyone else is going to be off-topic: What definition of "language" are you using? And while you're at it, what definition of "semantics" are you using? As I recall, a string is an ordered list of symbols and a language is an unordered set of strings. I know that Ka-Ping, despite going to a great university was in Engineering, not computer science, so I'll excuse him for not knowing the Chomskian definition of language, :), but what's your excuse Eric? Most XML people will happily admit that XML has no "semantics" but I think that's bullshit too. The mapping from the string to the abstract tree data model *is the semantic content* of the XML specification. Yes, it is a brain-dead simple mapping and so the semantic structure provided by the XML specification is minimal...but that's the whole point. It's supposed to be simple. It's supposed to not get in the way of higher level semantics. It makes as little sense to reject XML out of hand because it is a buzzword but is not innovative as it does for people to embrace it mystically because it is Microsoft's flavor of the week. XML takes simple ideas from the Lisp and document processing communities and popularize them so that they can achieve economies of scale. It sounds exactly like the relationship between Lisp and Python to me... By the way, what data model or text encoding is NOT isomorphic to Lisp S-expressions? Isn't Python code isomorphic to Lisp s-expessions? Paul Prescod From jeremy at cnri.reston.va.us Fri Apr 14 00:06:39 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 18:06:39 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> Message-ID: <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> I did one more round of work on this idea, and I'm satisfied with the results. Most of the performance hit can be eliminated by doing nothing until there are at least N recursive calls to PyObject_Compare, where N is fairly large. (I picked 25000.) Non-circular objects that are not deeply nested only pay for an integer increment, a decrement, and a compare. Background for patches-only readers: This patch appears to fix PR#7. Comments and suggestions solicitied. I think this is worth checking in. Jeremy Index: Include/object.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/object.h,v retrieving revision 2.52 diff -r2.52 object.h 286a287,289 > /* tstate dict key for PyObject_Compare helper */ > extern PyObject *_PyCompareState_Key; > Index: Python/pythonrun.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v retrieving revision 2.91 diff -r2.91 pythonrun.c 151a152,153 > _PyCompareState_Key = PyString_InternFromString("cmp_state"); > Index: Objects/object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 300a301,306 > PyObject *_PyCompareState_Key; > > int _PyCompareState_nesting = 0; > int _PyCompareState_flag = 0; > #define NESTING_LIMIT 25000 > 305a312,313 > int result; > 372c380 < if (vtp->tp_compare == NULL) --- > if (vtp->tp_compare == NULL) { 374c382,440 < return (*vtp->tp_compare)(v, w); --- > } > ++_PyCompareState_nesting; > if (_PyCompareState_nesting > NESTING_LIMIT) > _PyCompareState_flag = 1; > if (_PyCompareState_flag && > (vtp->tp_as_mapping || (vtp->tp_as_sequence && > !PyString_Check(v)))) > { > PyObject *tstate_dict, *cmp_dict, *pair; > > tstate_dict = PyThreadState_GetDict(); > if (tstate_dict == NULL) { > PyErr_BadInternalCall(); > return -1; > } > cmp_dict = PyDict_GetItem(tstate_dict, _PyCompareState_Key); > if (cmp_dict == NULL) { > cmp_dict = PyDict_New(); > if (cmp_dict == NULL) > return -1; > PyDict_SetItem(tstate_dict, > _PyCompareState_Key, > cmp_dict); > } > > pair = PyTuple_New(2); > if (pair == NULL) { > return -1; > } > if ((long)v <= (long)w) { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); > } else { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); > } > if (PyDict_GetItem(cmp_dict, pair)) { > /* already comparing these objects. assume > they're equal until shown otherwise > */ > Py_DECREF(pair); > --_PyCompareState_nesting; > if (_PyCompareState_nesting == 0) > _PyCompareState_flag = 0; > return 0; > } > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } > result = (*vtp->tp_compare)(v, w); > PyDict_DelItem(cmp_dict, pair); > Py_DECREF(pair); > } else { > result = (*vtp->tp_compare)(v, w); > } > --_PyCompareState_nesting; > if (_PyCompareState_nesting == 0) > _PyCompareState_flag = 0; > return result; From ping at lfw.org Fri Apr 14 00:41:44 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 17:41:44 -0500 (CDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.5791.148277.87450@walden> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > >>>>> "TP" == Tim Peters writes: > > TP> [Jeremy Hylton]> > >> So the real problem is defining some reasonable semantics for > >> comparison of recursive objects. There is a "right" way to do this, i believe, and my friend Mark Miller implemented it in E. He tells me his algorithm is inspired by the method for unification of cyclic structures in Prolog III. It's available in the E source code (in the file elib/tables/Equalizer.java). See interesting stuff on equality and cyclic data structures at http://www.erights.org/javadoc/org/erights/e/elib/tables/Equalizer.html http://www.erights.org/elang/same-ref.html http://www.erights.org/elang/blocks/defVar.html http://www.eros-os.org/~majordomo/e-lang/0698.html There is also a thread about equality issues in general at: http://www.eros-os.org/~majordomo/e-lang/0000.html It's long, but worth perusing. Here is my rough Python translation of the code in the E Equalizer. Python 1.4 (Mar 25 2000) [C] Copyright 1991-1997 Stichting Mathematisch Centrum, Amsterdam Python Console v1.4 by Ka-Ping Yee >>> def same(left, right, sofar={}): ... hypothesis = (id(left), id(right)) ... if left is right or sofar.has_key(hypothesis): return 1 ... if type(left) is not type(right): return 0 ... if type(left) is type({}): ... left, right = left.items(), right.items() ... if type(left) is type([]): ... sofar[hypothesis] = 1 ... try: ... for i in range(len(left)): ... if not same(left[i], right[i], sofar): return 0 ... return 1 ... finally: ... del sofar[hypothesis] ... return left == right ... ... >>> same([3],[4]) 0 >>> same([3],[3]) 1 >>> a = [1,2,3] >>> b = [1,2,3] >>> c = [1,2,3] >>> same(a,b) 1 >>> a[1] = a >>> same(a,a) 1 >>> same(a,b) 0 >>> b[1] = b >>> same(a,b) 1 >>> b[1] = c >>> b [1, [1, 2, 3], 3] >>> same(a,b) 0 >>> c[1] = b >>> same(a,b) 1 >>> same(b,c) 1 >>> I would like to see Python's comparisons work this way (i.e. "correct" as opposed to "we give up"). -- ?!ng From ping at lfw.org Fri Apr 14 00:49:21 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 17:49:21 -0500 (CDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: Message-ID: As a reference, here is the corresponding cyclic-structure-comparison example from a message about E: ? define tight := [1, tight, "x"] # value: [1, ***CYCLE***, x] ? define loose := [1, [1, loose, "x"], "x"] # value: [1, ***CYCLE***, x] ? tight == loose # value: true ? def map := [tight => "foo"] # value: [[1, ***CYCLE***, x] => foo] ? map[loose] # value: foo Internally, tight and loose have very different representations. However, when both cycles are unwound, they represent the same infinite tree. One could say that tight's representation of this tree is more tightly wound than loose's representation. However, this difference is only in the implementation, not in the semantics. The value of tight and loose is only the infinite tree they represent. If these trees are the same, then tight and loose are ==. Notice that loose prints out according to the tightest winding of the tree it represents, not according to the cycle by which it represents this tree. Only the tightest winding is finite and canonical. -- ?!ng From bwarsaw at cnri.reston.va.us Fri Apr 14 01:14:49 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 19:14:49 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> Message-ID: <14582.21737.387268.332139@anthem.cnri.reston.va.us> JH> Comments and suggestions solicitied. I think this is worth JH> checking in. Please regenerate with unified or context diffs! -Barry From jeremy at cnri.reston.va.us Fri Apr 14 01:19:30 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 19:19:30 -0400 (EDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: References: Message-ID: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Looks like the proposed changed to PyObject_Compare matches E for your example. The printed representation doesn't match, but I'm not sure that is as important. >>> tight = [1, None, "x"] >>> tight[1] = tight >>> tight [1, [...], 'x'] >>> loose = [1, [1, None, "x"], "x"] >>> loose[1][1] = loose >>> loose [1, [1, [...], 'x'], 'x'] >>> tight [1, [...], 'x'] >>> tight == loose 1 Jeremy From jeremy at cnri.reston.va.us Fri Apr 14 01:30:02 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 19:30:02 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.21737.387268.332139@anthem.cnri.reston.va.us> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> <14582.21737.387268.332139@anthem.cnri.reston.va.us> Message-ID: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> Here it is contextified. One small difference from the previous patch is that NESTING_LIMIT is now only 1000. I think this is sufficient to cover commonly occuring nested containers. Jeremy Index: Include/object.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/object.h,v retrieving revision 2.52 diff -c -r2.52 object.h *** object.h 2000/03/21 16:14:47 2.52 --- object.h 2000/04/13 21:50:10 *************** *** 284,289 **** --- 284,292 ---- extern DL_IMPORT(int) Py_ReprEnter Py_PROTO((PyObject *)); extern DL_IMPORT(void) Py_ReprLeave Py_PROTO((PyObject *)); + /* tstate dict key for PyObject_Compare helper */ + extern PyObject *_PyCompareState_Key; + /* Flag bits for printing: */ #define Py_PRINT_RAW 1 /* No string quotes etc. */ Index: Python/pythonrun.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v retrieving revision 2.91 diff -c -r2.91 pythonrun.c *** pythonrun.c 2000/03/10 23:03:54 2.91 --- pythonrun.c 2000/04/13 21:50:25 *************** *** 149,154 **** --- 149,156 ---- /* Init Unicode implementation; relies on the codec registry */ _PyUnicode_Init(); + _PyCompareState_Key = PyString_InternFromString("cmp_state"); + bimod = _PyBuiltin_Init_1(); if (bimod == NULL) Py_FatalError("Py_Initialize: can't initialize __builtin__"); Index: Objects/object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -c -r2.67 object.c *** object.c 2000/04/10 13:42:33 2.67 --- object.c 2000/04/13 21:44:42 *************** *** 298,308 **** --- 298,316 ---- return PyInt_FromLong(c); } + PyObject *_PyCompareState_Key; + + int _PyCompareState_nesting = 0; + int _PyCompareState_flag = 0; + #define NESTING_LIMIT 1000 + int PyObject_Compare(v, w) PyObject *v, *w; { PyTypeObject *vtp, *wtp; + int result; + if (v == NULL || w == NULL) { PyErr_BadInternalCall(); return -1; *************** *** 369,377 **** /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); } ! if (vtp->tp_compare == NULL) return (v < w) ? -1 : 1; ! return (*vtp->tp_compare)(v, w); } long --- 377,443 ---- /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); } ! if (vtp->tp_compare == NULL) { return (v < w) ? -1 : 1; ! } ! ++_PyCompareState_nesting; ! if (_PyCompareState_nesting > NESTING_LIMIT) ! _PyCompareState_flag = 1; ! if (_PyCompareState_flag && ! (vtp->tp_as_mapping || (vtp->tp_as_sequence && ! !PyString_Check(v)))) ! { ! PyObject *tstate_dict, *cmp_dict, *pair; ! ! tstate_dict = PyThreadState_GetDict(); ! if (tstate_dict == NULL) { ! PyErr_BadInternalCall(); ! return -1; ! } ! cmp_dict = PyDict_GetItem(tstate_dict, _PyCompareState_Key); ! if (cmp_dict == NULL) { ! cmp_dict = PyDict_New(); ! if (cmp_dict == NULL) ! return -1; ! PyDict_SetItem(tstate_dict, ! _PyCompareState_Key, ! cmp_dict); ! } ! ! pair = PyTuple_New(2); ! if (pair == NULL) { ! return -1; ! } ! if ((long)v <= (long)w) { ! PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); ! PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); ! } else { ! PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); ! PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); ! } ! if (PyDict_GetItem(cmp_dict, pair)) { ! /* already comparing these objects. assume ! they're equal until shown otherwise ! */ ! Py_DECREF(pair); ! --_PyCompareState_nesting; ! if (_PyCompareState_nesting == 0) ! _PyCompareState_flag = 0; ! return 0; ! } ! if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { ! return -1; ! } ! result = (*vtp->tp_compare)(v, w); ! PyDict_DelItem(cmp_dict, pair); ! Py_DECREF(pair); ! } else { ! result = (*vtp->tp_compare)(v, w); ! } ! --_PyCompareState_nesting; ! if (_PyCompareState_nesting == 0) ! _PyCompareState_flag = 0; ! return result; } long From ping at lfw.org Fri Apr 14 04:09:49 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 21:09:49 -0500 (CDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > Looks like the proposed changed to PyObject_Compare matches E for your > example. The printed representation doesn't match, but I'm not sure > that is as important. Very, very cool. Well done. Say, when did printing get fixed? > >>> tight = [1, None, "x"] > >>> tight[1] = tight > >>> tight > [1, [...], 'x'] -- ?!ng From jeremy at cnri.reston.va.us Fri Apr 14 04:14:11 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 22:14:11 -0400 (EDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: References: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: <14582.32499.38092.53395@bitdiddle.cnri.reston.va.us> >>>>> "KPY" == Ka-Ping Yee writes: KPY> On Thu, 13 Apr 2000, Jeremy Hylton wrote: >> Looks like the proposed changed to PyObject_Compare matches E for >> your example. The printed representation doesn't match, but I'm >> not sure that is as important. KPY> Very, very cool. Well done. Say, when did printing get fixed? Looks like the repr checkin was pre-1.5.1. I glanced at the sameness code in E, and it looks like it is doing exactly the same thing. It keeps a mapping of comparisons seen sofar and returns true for them. It seems that E's types don't define their own methods for sameness, though. The same methods seem to understand the internals of the various E types. Or is it just a few special ones. Jeremy From tim_one at email.msn.com Fri Apr 14 04:32:48 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 22:32:48 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256468519-52554453@hypernet.com> Message-ID: <000a01bfa5b9$b99a6760$182d153f@tim> [Gordon McMillan] > ... > Or are you saying that if functions have attributes, people will > all of a sudden expect that function locals will have initialized > and maintained state? I expect that they'll expect exactly what happens in JavaScript, which supports function attributes too, and where it's often used as a nicer-than-globals way to get the effect of C-like mutable statics (conceptually) local to the function. BTW, viewing this all in OO terms would make compelling sense only if Guido viewed everything in OO terms -- but he doesn't. To the extent that people must , Python doesn't stop you from adding arbitrary unique attrs to class instances today either. consistent-in-inconsistency-ly y'rs - tim From tim_one at email.msn.com Fri Apr 14 04:32:44 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 22:32:44 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.5791.148277.87450@walden> Message-ID: <000901bfa5b9$b7f6c980$182d153f@tim> [Jeremy Hylton] > I'm not familiar with any algorithms for the graph isomorphism > problem, Well, while an instance of graph isomorphism, this one is a relatively simple special case (because "the graphs" here are rooted, directed, and have ordered children). > but I took a stab at a simple comparison algorithm. The idea > is to detect comparisons that would cross back-edges in the object > graphs. Instead of starting a new comparison, assume they are the > same. If, in fact, the objects are not the same, they must differ in > some other way; some other part of the comparison will fail. Bingo! That's the key trick. From effbot at telia.com Fri Apr 14 06:58:50 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 06:58:50 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000a01bfa5b9$b99a6760$182d153f@tim> Message-ID: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Tim Peters wrote: > [Gordon McMillan] > > ... > > Or are you saying that if functions have attributes, people will > > all of a sudden expect that function locals will have initialized > > and maintained state? > > I expect that they'll expect exactly what happens in JavaScript, which > supports function attributes too, and where it's often used as a > nicer-than-globals way to get the effect of C-like mutable statics > (conceptually) local to the function. so it's no longer an experimental feature, it's a "static variables" thing? umm. I had nearly changed my mind to a "okay, if you insist +1", but now it's back to -1 again. maybe in Py3K... From bwarsaw at cnri.reston.va.us Fri Apr 14 07:23:40 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 01:23:40 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000a01bfa5b9$b99a6760$182d153f@tim> <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <14582.43868.600655.132428@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> so it's no longer an experimental feature, it's a "static FL> variables" thing? FL> umm. I had nearly changed my mind to a "okay, if you insist FL> +1", but now it's back to -1 again. maybe in Py3K... C'mon! Most people are still going to just use module globals for function statics because they're less to type (notwithstanding the sometimes-optional global decl). You can't worry about all the novel abuses people will think up for this feature -- they're already doing it with all sorts of other things Pythonic, e.g. docstrings, global as pragma, etc. Can I get at least a +0? :) -Barry From tim_one at email.msn.com Fri Apr 14 09:34:46 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 14 Apr 2000 03:34:46 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <000401bfa5e3$e8c5ce60$612d153f@tim> [Tim] >> I expect that they'll expect exactly what happens in JavaScript, which >> supports function attributes too, and where it's often used as a >> nicer-than-globals way to get the effect of C-like mutable statics >> (conceptually) local to the function. [/F] > so it's no longer an experimental feature, it's a "static variables" > thing? Yes, of course people will use it to get the effect of function statics. OK by me. People do the same thing today with class data attributes (i.e., to get the effect of mutable statics w/o polluting the module namespace). They'll use it for all sorts of other stuff too -- it's mechanism, not policy. BTW, I don't think any "experimental feature" has ever been removed -- only features that weren't experimental. So if you want to see it go away ... > umm. I had nearly changed my mind to a "okay, if you insist +1", > but now it's back to -1 again. maybe in Py3K... Greg gave the voting rule as: > -1 "Veto. And is my reasoning." Vladimir has done some reasoning, but the basis of your objection remains a mystery. We should be encouraging our youth to submit patches with their crazy ideas . From gansevle at cs.utwente.nl Fri Apr 14 09:46:08 2000 From: gansevle at cs.utwente.nl (Fred Gansevles) Date: Fri, 14 Apr 2000 09:46:08 +0200 Subject: [Python-Dev] cvs-server out of sync with mailing-list ? Message-ID: <200004140746.JAA05473@localhost.localdomain> I try to keep up-to-date with the cvs-tree at cvs.python.org and receive the python-checkins at python.org mailing-list. Just now I discovered that the cvs-server and the checkins-list are out of sync. For example: according to the checkins-list the latest version of src/Python/sysmodule.c is 2.62 and according to the cvs-server the latest version is 2.59 Am I missing something or is there some kind of a problem ? ____________________________________________________________________________ Fred Gansevles Phone: +31 53 489 4613 >>> Your one-stop-shop for Linux/WinNT/NetWare <<< Org.: Twente University, Fac. of CS, Box 217, 7500 AE Enschede, Netherlands "Bill needs more time to learn Linux" - Steve B. From mal at lemburg.com Fri Apr 14 01:05:12 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 01:05:12 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM <1256468519-52554453@hypernet.com> Message-ID: <38F652A8.B2F8C822@lemburg.com> Gordon McMillan wrote: > ... > This is nowhere near as confusing as, say, putting a module > named X in a package named X and then saying "from X > import *", (hi, Marc-Andre!). Users shouldn't bother looking into packages... only at the documented interface ;-) The hack is required to allow sibling submodules to import the packages main module (I could have also written import __init__ everywhere but that wouldn't have made things clearer), BTW. It turned out to be very convenient during development of all those mx packages. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Apr 14 10:46:15 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 10:46:15 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> Message-ID: <38F6DAD7.BBAF72E5@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > Fredrik Lundh wrote: > > > > > > M.-A. Lemburg wrote: > > > > The current need for #pragmas is really very simple: to tell > > > > the compiler which encoding to assume for the characters > > > > in u"...strings..." (*not* "...8-bit strings..."). > > > > > > why not? > > > > Because plain old 8-bit strings should work just as before, > > that is, existing scripts only using 8-bit strings should not break. > > but they won't -- if you don't use an encoding directive, and > don't use 8-bit characters in your string literals, everything > works as before. > > (that's why the default is "none" and not "utf-8") > > if you use 8-bit characters in your source code and wish to > add an encoding directive, you need to add the right encoding > directive... Fair enough, but this would render all the auto-coercion code currently in 1.6 useless -- all string to Unicode conversions would have to raise an exception. > > > why keep on pretending that strings and strings are two > > > different things? it's an artificial distinction, and it only > > > causes problems all over the place. > > > > Sure. The point is that we can't just drop the old 8-bit > > strings... not until Py3K at least (and as Fred already > > said, all standard editors will have native Unicode support > > by then). > > I discussed that in my original "all characters are unicode > characters" proposal. in my proposal, the standard string > type will have to roles: a string either contains unicode > characters, or binary bytes. > > -- if it contains unicode characters, python guarantees that > methods like strip, lower (etc), and regular expressions work > as expected. > > -- if it contains binary data, you can still use indexing, slicing, > find, split, etc. but they then work on bytes, not on chars. > > it's still up to the programmer to keep track of what a certain > string object is (a real string, a chunk of binary data, an en- > coded string, a jpeg image, etc). if the programmer wants > to convert between a unicode string and an external encoding > to use a certain unicode encoding, she needs to spell it out. > the codecs are never called "under the hood". > > (note that if you encode a unicode string into some other > encoding, the result is binary buffer. operations like strip, > lower et al does *not* work on encoded strings). Huh ? If the programmer already knows that a certain string uses a certain encoding, then he can just as well convert it to Unicode by hand using the right encoding name. The whole point we are talking about here is that when having the implementation convert a string to Unicode all by itself it needs to know which encoding to use. This is where we have decided long ago that UTF-8 should be used. The pragma discussion is about a totally different issue: pragmas could make it possible for the programmer to tell the *compiler* which encoding to use for literal u"unicode" strings -- nothing more. Since "8-bit" strings currently don't have an encoding attached to them we store them as-is. I don't want to get into designing a completely new character container type here... this can all be done for Py3K, but not now -- it breaks things at too many ends (even though it would solve the issues with strings being used in different contexts). > > > -- we still need an encoding marker for ascii supersets (how about > > > ;-). however, it's up to > > > the tokenizer to detect that one, not the parser. the parser only > > > sees unicode strings. > > > > Hmm, the tokenizer doesn't do any string -> object conversion. > > That's a task done by the parser. > > "unicode string" meant Py_UNICODE*, not PyUnicodeObject. > > if the tokenizer does the actual conversion doesn't really matter; > the point is that once the code has passed through the tokenizer, > it's unicode. The tokenizer would have to know which parts of the input string to convert to Unicode and which not... plus there are different encodings to be applied, e.g. UTF-8, Unicode-Escape, Raw-Unicode-Escape, etc. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Apr 14 10:24:30 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 10:24:30 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> <14581.63094.538920.187344@seahag.cnri.reston.va.us> Message-ID: <38F6D5BE.924F4D62@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > Hmm, anything else would introduce a new keyword, I guess. And > > new keywords cause new scripts to fail in old interpreters > > even when they don't use Unicode at all and only include > > per convention. > > Only if the new keyword is used in the script or anything it > imports. This is exactly like using new syntax (u'...') or new > library features (unicode('abc', 'iso-8859-1')). Right, but I would guess that people would then start using these keywords in all files per convention (so as not to trip over bugs due to wrong encodings). Perhaps I'm overcautious here... > I can't think of anything that gets included "by convention" that > breaks anything. I don't recall a proposal that we should casually > add pragmas to our scripts if there's no need to do so. Adding > pragmas to library modules is *not* part of the issue; they'd only be > there if the version of Python they're part of supports the syntax. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From paul at prescod.net Fri Apr 14 11:12:08 2000 From: paul at prescod.net (Paul Prescod) Date: Fri, 14 Apr 2000 04:12:08 -0500 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000a01bfa5b9$b99a6760$182d153f@tim> <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <38F6E0E8.E336F6C6@prescod.net> Fredrik Lundh wrote: > > so it's no longer an experimental feature, it's a "static variables" > thing? > > umm. I had nearly changed my mind to a "okay, if you insist +1", > but now it's back to -1 again. maybe in Py3K... I think that we get 95% of the benefit without any of the "dangers" (though I don't agree with the arguments against) if we allow the attachment of properties only at compile time and disallow mutation of them at runtime. That will allow Spark, EventDOM, multi-lingual docstrings etc., but disallow static variables. I'm not agreeing that using function properties as static variables is a bad thing...I'm just saying that we might be able to agree on a less powerful mechanism and then revisit the more general one in Py3K. Let's not forget that Py3K is going to be a very hard exercise in trying to combine everyone's ideas "all at once". Experience gained now is golden. We should probably be more amenable to "experimental ideas" now -- secure in the knowledge that they can be killed off in Py3K. If we put ideas we are not 100% comfortable with in Py3K we will be stuck with them forever. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From mhammond at skippinet.com.au Fri Apr 14 15:01:33 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 14 Apr 2000 23:01:33 +1000 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: > Can I get at least a +0? :) Im quite amazed this is contentious! Definately a +1 from me! Mark. From skip at mojam.com Fri Apr 14 15:05:44 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 08:05:44 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.6056.362378.834649@beluga.mojam.com> Mark> Im quite amazed this is contentious! Definately a +1 from me! +1 from the skippi in Chicago as well... Skip From mhammond at skippinet.com.au Fri Apr 14 15:11:39 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 14 Apr 2000 23:11:39 +1000 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <38F6E0E8.E336F6C6@prescod.net> Message-ID: > I think that we get 95% of the benefit without any of the > "dangers" > (though I don't agree with the arguments against) if we allow the > attachment of properties only at compile time and > disallow mutation of > them at runtime. AFAIK, this would be a pretty serious change. The compiler just generates (basically)PyObject_SetAttr() calls. There is no way in the current runtime to differentiate between "compile time" and "runtime" attribute references... If this was done, it would simply be ugly hacks to support what can only be described as unpythonic in the first place! [Unless of course Im missing something...] Mark. From fredrik at pythonware.com Fri Apr 14 15:34:48 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 15:34:48 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: Message-ID: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Barry Wrote: > > Can I get at least a +0? :) okay, I'll retract. here's today's opinion: +1 on an experimental future, which is not part of the language definition, and not necessarily supported by all implementations. (and where supported, not necessarily very efficient). -1 on static function variables implemented as attributes on function or method objects. def eff(): "eff" print "eff", eff.__doc__ def bot(): "bot" print "bot", bot.__doc__ eff() bot() eff, bot = bot, eff eff() bot() # or did your latest patch solve this little dilemma? # if so, -1 on your patch ;-) From fdrake at acm.org Fri Apr 14 15:46:11 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:46:11 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F6D5BE.924F4D62@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> <14581.63094.538920.187344@seahag.cnri.reston.va.us> <38F6D5BE.924F4D62@lemburg.com> Message-ID: <14583.8483.628361.523059@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Right, but I would guess that people would then start using these > keywords in all files per convention (so as not to trip over > bugs due to wrong encodings). I don't imagine the new keywords would be used by anyone that wasn't specifically interested in their effect. Code that isn't needed tends not to get written! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Fri Apr 14 15:55:36 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:55:36 -0400 (EDT) Subject: [Python-Dev] cvs-server out of sync with mailing-list ? In-Reply-To: <200004140746.JAA05473@localhost.localdomain> References: <200004140746.JAA05473@localhost.localdomain> Message-ID: <14583.9048.857826.186107@seahag.cnri.reston.va.us> Fred Gansevles writes: > Just now I discovered that the cvs-server and the checkins-list are out of > sync. For example: according to the checkins-list the latest version of > src/Python/sysmodule.c is 2.62 and according to the cvs-server the latest > version is 2.59 > > Am I missing something or is there some kind of a problem ? There's a problem, but it's highly isolated. We're updating the public CVS using rsync tunnelled through ssh, which worked greate until some of us switched to Linux workstations, where OpenSSH behaves a little differently with some private keys files. I've not figured out how to work around it yet, but will keep playing with it. I've synced the public CVS from a Solaris box for now, so all the recent changes should be visible. Until I get things fixed, I'll try to remember to sync it before I head home in the evenings. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Fri Apr 14 15:57:34 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:57:34 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils unixccompiler.py,1.21,1.22 In-Reply-To: <200004141353.JAA04309@thrak.cnri.reston.va.us> References: <200004141353.JAA04309@thrak.cnri.reston.va.us> Message-ID: <14583.9166.166905.476276@seahag.cnri.reston.va.us> Greg Ward writes: > ! # Not many Unices required ranlib anymore -- SunOS 4.x is, I > ! # think the only major Unix that does. Maybe we need some You're saying that SunOS 4.x *is* a major Unix???? Not for a while, now.... -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin at mems-exchange.org Fri Apr 14 16:15:37 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 14 Apr 2000 10:15:37 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000401bfa5e3$e8c5ce60$612d153f@tim> References: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> <000401bfa5e3$e8c5ce60$612d153f@tim> Message-ID: <14583.10249.322298.959083@amarok.cnri.reston.va.us> >Yes, of course people will use it to get the effect of function statics. OK >by me. People do the same thing today with class data attributes (i.e., to Wait, the attributes added to a function are visible inside the function? (I haven't looked that closely at the patch?) That strikes me as a much more significant change to Python's scoping, making it local, function attribute, then global scope. a I thought of the attributes as labels that could be attached to a callable object for the convenience of some external system, but the function would remain blissfully unaware of the external meaning attached to itself. -1 from me if a function's attributes are visible to code inside the function; +0 if they're not. -- A.M. Kuchling http://starship.python.net/crew/amk/ The paradox of money is that when you have lots of it you can manage life quite cheaply. Nothing so economical as being rich. -- Robertson Davies, _The Rebel Angels_ From skip at mojam.com Fri Apr 14 16:39:27 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 09:39:27 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.10249.322298.959083@amarok.cnri.reston.va.us> References: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> <000401bfa5e3$e8c5ce60$612d153f@tim> <14583.10249.322298.959083@amarok.cnri.reston.va.us> Message-ID: <14583.11679.107267.727484@beluga.mojam.com> >> Yes, of course people will use it to get the effect of function >> statics. OK by me. People do the same thing today with class data >> attributes (i.e., to AMK> Wait, the attributes added to a function are visible inside the AMK> function? (I haven't looked that closely at the patch?) No, they aren't. There is no change of Python's scoping rules using Barry's function attributes patch. In fact, they are *only* available to the function itself via the function's name in the module globals. That's why Fredrik's "eff, bot = bot, eff" trick worked as it did. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 16:41:39 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 16:41:39 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: from "Mark Hammond" at Apr 14, 2000 11:01:33 PM Message-ID: <200004141441.QAA02162@python.inrialpes.fr> Mark Hammond wrote: > > > Can I get at least a +0? :) > > Im quite amazed this is contentious! Definately a +1 from me! > > Mark. > Amazed or not, it is contentious. I have the responsability to remove my veto once my concerns are adressed. So far, I have the impression that all I get (if I get anything at all -- see above) is "conveniency" from Gordon, which is nothing else but laziness about creating instances. As long as we discuss customization of objects with builtin types, the "inconsistency" stays bound to classes and instances. Add modules if you wish, but they are just namespaces. This proposal expands the customization inconsistency to functions and methods. And I am reluctant to see this happening "under the hood", without a global vision of the problem, just because a couple of people have abused unprotected attributes and claim that they can't do what they want because Python doesn't let them to. As to the object model, together with naming and binding, I say: KISS or do it right the first time. add-more-oil-to-the-fire-and-you'll-burn-your-house--ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip at mojam.com Fri Apr 14 17:04:51 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 10:04:51 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141441.QAA02162@python.inrialpes.fr> References: <200004141441.QAA02162@python.inrialpes.fr> Message-ID: <14583.13203.900930.294033@beluga.mojam.com> Vladimir> So far, I have the impression that all I get (if I get Vladimir> anything at all -- see above) is "conveniency" from Gordon, Vladimir> which is nothing else but laziness about creating instances. No, you get function metadata. Barry's original reason for creating the patch was that the only writable attribute for functions or methods is the doc string. Multiple people are using it now to mean different things, and this leads to problems when those different uses clash. I submit that if I have to wrap methods (not functions) in classes and instantiate them to avoid being "lazy", then my code is going to look pretty horrible after applying this more than once or twice. Both Zope and John Aycock's system (SPARK?) demonstrate the usefulness of being able to attach metadata to functions and methods. All Barry is suggesting is that Python support that capability better. Finally, it's not clear to my feeble brain just how I would go about instantiating a method to get this capability today. Suppose I have class Spam: def eggs(self, a): return a and I want to attach an attribute to Spam.eggs that tells me if it is public/private in the Zope sense. Zope requires you to add a doc string to a method to declare that it's public: class Spam: def eggs(self, a): "doc" return a Fine, except that effectively prevents you from adding doc strings to your "private" methods as Greg Stein pointed out. Barry's proposal would allow the Spam.eggs author to attach an attribute to it: class Spam: def eggs(self, a): "doc" return a eggs.__zope_access__ = "private" I think the solution you're proposing is class Spam: class EggsMethod: def __call__(self, a): "doc" return a __zope_access__ = "private" eggs = EggsMethod() This seems to work, but also seems like a lot of extra baggage (and a performance hit to boot) to arrive at what seems like a very simple concept. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 17:30:31 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 17:30:31 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.13203.900930.294033@beluga.mojam.com> from "Skip Montanaro" at Apr 14, 2000 10:04:51 AM Message-ID: <200004141530.RAA02277@python.inrialpes.fr> Skip Montanaro wrote: > > Barry's proposal would allow the Spam.eggs author to attach an attribute to > it: > > class Spam: > def eggs(self, a): > "doc" > return a > eggs.__zope_access__ = "private" > > I think the solution you're proposing is > > class Spam: > class EggsMethod: > def __call__(self, a): > "doc" > return a > __zope_access__ = "private" > eggs = EggsMethod() > > This seems to work, but also seems like a lot of extra baggage (and a > performance hit to boot) to arrive at what seems like a very simple concept. > If you prefer embedded definitions, among other things, you could do: __zope_access__ = { 'Spam' : 'public' } class Spam: __zope_access__ = { 'eggs' : 'private', 'eats' : 'public' } def eggs(self, ...): ... def eats(self, ...): ... or have a completely separate class/structure for access control (which is what you would do it in C, btw, for existing objects to which you can't add slots, ex: file descriptors, mem segments, etc). -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From bwarsaw at cnri.reston.va.us Fri Apr 14 17:52:17 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 11:52:17 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <38F6E0E8.E336F6C6@prescod.net> Message-ID: <14583.16049.933693.237302@anthem.cnri.reston.va.us> >>>>> "MH" == Mark Hammond writes: MH> AFAIK, this would be a pretty serious change. The compiler MH> just generates (basically)PyObject_SetAttr() calls. There is MH> no way in the current runtime to differentiate between MH> "compile time" and "runtime" attribute references... If this MH> was done, it would simply be ugly hacks to support what can MH> only be described as unpythonic in the first place! MH> [Unless of course Im missing something...] You're not missing anything Mark! Remember Python's /other/ motto: "we're all consenting adults here". If you don't wanna mutate your function attrs at runtime... just don't! :) -Barry From bwarsaw at cnri.reston.va.us Fri Apr 14 17:59:55 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 14 Apr 2000 11:59:55 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Message-ID: <14583.16507.268456.950881@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> # or did your latest patch solve this little dilemma? No, definitely not. >>>>> "AMK" == Andrew M Kuchling writes: AMK> Wait, the attributes added to a function are visible inside AMK> the function? My patch definitely does not change Python's scoping rules in any way. This was a 1/2 hour hack, for Guido's sake! :) -Barry From tim_one at email.msn.com Fri Apr 14 18:04:32 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 14 Apr 2000 12:04:32 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.10249.322298.959083@amarok.cnri.reston.va.us> Message-ID: <000501bfa62b$1ef8f560$d82d153f@tim> [Tim] > Yes, of course people will use it to get the effect of function > statics. OK by me. People do the same thing today with class data > attributes (i.e., to [Andrew M. Kuchling] > Wait, the attributes added to a function are visible inside the > function? No, same as in JavaScript, you need funcname.attr, just as you need classname.attr in Python today to fake the effect of mutable class statics (in the C++ sense). > [hysteria deleted ] > ... > +0 if they're not. From paul at prescod.net Fri Apr 14 18:21:31 2000 From: paul at prescod.net (Paul Prescod) Date: Fri, 14 Apr 2000 11:21:31 -0500 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: Message-ID: <38F7458B.17F72652@prescod.net> Mark Hammond wrote: > > AFAIK, this would be a pretty serious change. The compiler just > generates (basically)PyObject_SetAttr() calls. I posted a proposal a few days back that does not use the "." SetAttr syntax and is clearly distinguisable (visually and by the compiler) from runtime property assignment. http://www.python.org/pipermail/python-dev/2000-April/004875.html The response was light but positive... -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From skip at mojam.com Fri Apr 14 18:29:34 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 11:29:34 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <38F7458B.17F72652@prescod.net> References: <38F7458B.17F72652@prescod.net> Message-ID: <14583.18286.67371.157754@beluga.mojam.com> Paul> I posted a proposal a few days back that does not use the "." Paul> SetAttr syntax and is clearly distinguisable (visually and by the Paul> compiler) from runtime property assignment. Paul> http://www.python.org/pipermail/python-dev/2000-April/004875.html Paul> The response was light but positive... Paul, I have a question. Given the following example from your note: decl {type:"def(myint: int) returns bar", french_doc:"Bonjour", english_doc: "Hello"} def func( myint ): return bar() how is the compiler supposed to associate a particular "decl {...}" with a particular function? Is it just by order in the file? -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From gmcm at hypernet.com Fri Apr 14 18:32:42 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 12:32:42 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141441.QAA02162@python.inrialpes.fr> References: from "Mark Hammond" at Apr 14, 2000 11:01:33 PM Message-ID: <1256392532-57122808@hypernet.com> Vladimir Marangozov wrote: > Amazed or not, it is contentious. I have the responsability to > remove my veto once my concerns are adressed. So far, I have the > impression that all I get (if I get anything at all -- see above) > is "conveniency" from Gordon, which is nothing else but laziness > about creating instances. I have the impression that majority of changes to Python are conveniences. > As long as we discuss customization of objects with builtin types, > the "inconsistency" stays bound to classes and instances. Add modules > if you wish, but they are just namespaces. This proposal expands > the customization inconsistency to functions and methods. And I am > reluctant to see this happening "under the hood", without a global > vision of the problem, just because a couple of people have abused > unprotected attributes and claim that they can't do what they want > because Python doesn't let them to. Can you please explain how "consistency" is violated? - Gordon From gmcm at hypernet.com Fri Apr 14 18:32:42 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 12:32:42 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Message-ID: <1256392531-57122875@hypernet.com> Fredrik Lundh wrote: > -1 on static function variables implemented as > attributes on function or method objects. > > def eff(): > "eff" > print "eff", eff.__doc__ > > def bot(): > "bot" > print "bot", bot.__doc__ > > eff() > bot() > > eff, bot = bot, eff > > eff() > bot() > > # or did your latest patch solve this little dilemma? > # if so, -1 on your patch ;-) To belabor the obvious (existing Python allows obsfuction), I present: class eff: "eff" def __call__(self): print "eff", eff.__doc__ class bot: "bot" def __call__(self): print "bot", bot.__doc__ e = eff() b = bot() e() b() eff, bot = bot, eff e = eff() b = bot() e() b() There's nothing new here. Why does allowing the ability to obsfucate suddenly warrant a -1? - Gordon From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 19:15:09 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 19:15:09 +0200 (CEST) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> from "Jeremy Hylton" at Apr 13, 2000 07:30:02 PM Message-ID: <200004141715.TAA02492@python.inrialpes.fr> Jeremy Hylton wrote: > > Here it is contextified. One small difference from the previous patch > is that NESTING_LIMIT is now only 1000. I think this is sufficient to > cover commonly occuring nested containers. > > Jeremy > > [patch omitted] Nice. I think you don't need the _PyCompareState_flag. Like in trashcan, _PyCompareState_nesting is enough to enter the sections of the code that depend on _PyCompareState_flag. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From moshez at math.huji.ac.il Fri Apr 14 19:46:12 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 14 Apr 2000 19:46:12 +0200 (IST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131810.UAA05752@python.inrialpes.fr> Message-ID: On Thu, 13 Apr 2000, Vladimir Marangozov wrote: > >>> def this(): > ... sucks = "no" > ... > >>> this.sucks = "yes" > >>> > >>> print this.sucks > 'yes' > > Why on earth 'sucks' is not the object defined in the function's namespace? > Who made that deliberate decision? Clearly 'this' defines a new namespace, > so it'll be also legitimate to get a NameError, or to: > > >>> print this.sucks > 'no' > > Don't you think? No. >>> def this(turing_machine): ... if stops(turing_machine): ... confusing = "yes" ... else: ... confusing = "no" ... >>> print this.confusing -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From klm at digicool.com Fri Apr 14 20:19:42 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 14 Apr 2000 14:19:42 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > Can I get at least a +0? :) I want function attributes. (There are all sorts of occasions i need cues to classify functions for executives that map and apply them, and this seems like the perfect way to couple that information with the object. Much nicer than having to mangle the names of the functions, or create some external registry with the classifications.) And i think i'd want them even more if they were visible within the function, so i could do static variables. Why is that a bad thing? So i guess that means i'd give a +1 for the proposal as stands, with the understanding that you'd get *another* +1 for the additional feature - yielding a bigger, BETTER +1. Metadata, static vars, frameworks ... oh my!-) (Oh, and i'd suggest up front that documentation for this feature recommend people not use "__*__" names for their own object attributes, to avoid collisions with eventual use of them by python.) Ken klm at digicool.com From bwarsaw at cnri.reston.va.us Fri Apr 14 20:21:11 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 14 Apr 2000 14:21:11 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.24983.768952.870567@anthem.cnri.reston.va.us> >>>>> "KM" == Ken Manheimer writes: KM> (Oh, and i'd suggest up front that documentation for this KM> feature recommend people not use "__*__" names for their own KM> object attributes, to avoid collisions with eventual use of KM> them by python.) Agreed. From fdrake at acm.org Fri Apr 14 20:25:46 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 14:25:46 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Ken Manheimer writes: > (Oh, and i'd suggest up front that documentation for this feature > recommend people not use "__*__" names for their own object attributes, to > avoid collisions with eventual use of them by python.) Isn't that a standing recommendation for all names? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Fri Apr 14 20:29:43 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 20:29:43 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256392531-57122875@hypernet.com> Message-ID: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> > To belabor the obvious (existing Python allows obsfuction), I > present: > > class eff: > "eff" > def __call__(self): > print "eff", eff.__doc__ > > class bot: > "bot" > def __call__(self): > print "bot", bot.__doc__ > > e = eff() > b = bot() > e() > b() > > eff, bot = bot, eff > e = eff() > b = bot() > e() > b() > > There's nothing new here. Why does allowing the ability to > obsfucate suddenly warrant a -1? since when did Python grow full lexical scoping? does anyone that has learned about the LGB rule expect the above to work? in contrast, my example used a name which appears to be defined in the same scope as the other names introduced on the same line of source code -- but isn't. def foo(x): foo.x = x here, "foo" doesn't refer to the same namespace as the argument "x", but to instead whatever happens to be in an entirely different namespace at the time the function is executed. in other words, this feature cannot really be used to store statics -- it only looks that way... From effbot at telia.com Fri Apr 14 20:32:26 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 20:32:26 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes onfuncs and methods) References: Message-ID: <001901bfa63f$ca0c08c0$34aab5d4@hagrid> Ken Manheimer wrote: > I want function attributes. (There are all sorts of occasions i need cues > to classify functions for executives that map and apply them, and this > seems like the perfect way to couple that information with the > object. Much nicer than having to mangle the names of the functions, or > create some external registry with the classifications.) how do you expect to find all methods that has a given attribute? > And i think i'd want them even more if they were visible within the > function, so i could do static variables. Why is that a bad thing? because it doesn't work, unless you change python in a backwards incompatible way. that's okay in py3k, it's not okay in 1.6. From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 21:07:15 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 21:07:15 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256392532-57122808@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 12:32:42 PM Message-ID: <200004141907.VAA02670@python.inrialpes.fr> Gordon McMillan wrote: > > [VM] > > As long as we discuss customization of objects with builtin types, > > the "inconsistency" stays bound to classes and instances. Add modules > > if you wish, but they are just namespaces. This proposal expands > > the customization inconsistency to functions and methods. And I am > > reluctant to see this happening "under the hood", without a global > > vision of the problem, just because a couple of people have abused > > unprotected attributes and claim that they can't do what they want > > because Python doesn't let them to. > > Can you please explain how "consistency" is violated? > Yes, I can. To start with and to save me typing, please reread the 1st section of Demo/metaclasses/meta-vladimir.txt about Classes. ------- Now, whenever there are two instances 'a' and 'b' of the class A, the first inconsistency is that we're allowed to assign attributes to these instances dynamically, which are not declared in the class A. Strictly speaking, if I say: >>> a.author = "Guido" and if 'author' is not an attribute of 'a' after the instantiation of A (i.e. after a = A() completes), we should get a NameError. It's an inconsistency because whenever the above assignment succeeds, 'a' is no more an instance of A. It's an instance of some other class, because A prescribes what *all* instances of A have in *common*. So from here, we have to find our way in the object model and live with this 1st inconsistency. Problem: What is the class of the singleton 'a' then? Say, I need this class after the fact to build another society of objects, i.e. "clone" 'a' a hundred of times, because 'a' has dozens of attributes different than 'b'. To make a long story short, it turns out that we can build a Python class A1, having those attributes declared, then instantiate A1 hundreds of times and hopefully, let 'a' find its true identity with: >>> a.__class__ = A1 This is the key of the story. We *can* build, for a given singleton, its Python class, after the fact. And this is the only thing which still makes the Python class model 'relatively consistent'! If it weren't possible to build that class A1, it would have been better to stop talking about classes and a class model in Python. ("associations of typed structures with per-type binding rules" would have probably been a better term). Now to the question: how "consistency" is violated by the proposal? It is violated, because actually we *can't* build and restore the class, after the fact, of a builtin object (a funtion 'f') to which we add user attributes. We can't do it for 2 reasons, which we hope to solve in Py3K: 1) the class of 'f' is implemented in C 2) we still can't inherit from builtin classes (at least in CPython) As a consequence, we can't actually build hundreds of "clones" of 'f' by instantiating a class object. We can build them by adding manually the same attribute, but this is not OO, this is just 'binding to a namespace'. This is the true reason on why this fragile consistency is violated. Please, save me the trouble to expose the details you're missing, to each of you, where those details are omitted for simplicity. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From effbot at telia.com Fri Apr 14 21:17:23 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 21:17:23 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> Message-ID: <005401bfa646$123ef2a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > > but they won't -- if you don't use an encoding directive, and > > don't use 8-bit characters in your string literals, everything > > works as before. > > > > (that's why the default is "none" and not "utf-8") > > > > if you use 8-bit characters in your source code and wish to > > add an encoding directive, you need to add the right encoding > > directive... > > Fair enough, but this would render all the auto-coercion > code currently in 1.6 useless -- all string to Unicode > conversions would have to raise an exception. I though it was rather clear by now that I think the auto- conversion stuff *is* useless... but no, that doesn't mean that all string to unicode conversions need to raise exceptions -- any 8-bit unicode character obviously fits into a 16-bit unicode character, just like any integer fits in a long integer. if you convert the other way, you might get an OverflowError, just like converting from a long integer to an integer may give you an exception if the long integer is too large to be represented as an ordinary integer. after all, i = int(long(v)) doesn't always raise an exception... > > > > why keep on pretending that strings and strings are two > > > > different things? it's an artificial distinction, and it only > > > > causes problems all over the place. > > > > > > Sure. The point is that we can't just drop the old 8-bit > > > strings... not until Py3K at least (and as Fred already > > > said, all standard editors will have native Unicode support > > > by then). > > > > I discussed that in my original "all characters are unicode > > characters" proposal. in my proposal, the standard string > > type will have to roles: a string either contains unicode > > characters, or binary bytes. > > > > -- if it contains unicode characters, python guarantees that > > methods like strip, lower (etc), and regular expressions work > > as expected. > > > > -- if it contains binary data, you can still use indexing, slicing, > > find, split, etc. but they then work on bytes, not on chars. > > > > it's still up to the programmer to keep track of what a certain > > string object is (a real string, a chunk of binary data, an en- > > coded string, a jpeg image, etc). if the programmer wants > > to convert between a unicode string and an external encoding > > to use a certain unicode encoding, she needs to spell it out. > > the codecs are never called "under the hood". > > > > (note that if you encode a unicode string into some other > > encoding, the result is binary buffer. operations like strip, > > lower et al does *not* work on encoded strings). > > Huh ? If the programmer already knows that a certain > string uses a certain encoding, then he can just as well > convert it to Unicode by hand using the right encoding > name. I thought that was what I said, but the text was garbled. let's try again: if the programmer wants to convert between a unicode string and a buffer containing encoded text, she needs to spell it out. the codecs are never called "under the hood" > The whole point we are talking about here is that when > having the implementation convert a string to Unicode all > by itself it needs to know which encoding to use. This is > where we have decided long ago that UTF-8 should be > used. does "long ago" mean that the decision cannot be questioned? what's going on here? face it, I don't want to guess when and how the interpreter will convert strings for me. after all, this is Python, not Perl. if I want to convert from a "string of characters" to a byte buffer using a certain character encoding, let's make that explicit. Python doesn't convert between other data types for me, so why should strings be a special case? > The pragma discussion is about a totally different > issue: pragmas could make it possible for the programmer > to tell the *compiler* which encoding to use for literal > u"unicode" strings -- nothing more. Since "8-bit" strings > currently don't have an encoding attached to them we store > them as-is. what do I have to do to make you read my proposal? shout? okay, I'll try: THERE SHOULD BE JUST ONE INTERNAL CHARACTER SET IN PYTHON 1.6: UNICODE. for consistency, let this be true for both 8-bit and 16-bit strings (as well as Py3K's 31-bit strings ;-). there are many possible external string encodings, just like there are many possible external integer encodings. but for integers, that's not something that the core implementation cares much about. why are strings different? > I don't want to get into designing a completely new > character container type here... this can all be done for Py3K, > but not now -- it breaks things at too many ends (even though > it would solve the issues with strings being used in different > contexts). you don't need to -- you only need to define how the *existing* string type should be used. in my proposal, it can be used in two ways: -- as a string of unicode characters (restricted to the 0-255 subset, by obvious reasons). given a string 's', len(s) is always the number of characters, s[i] is the i'th character, etc. or -- as a buffer containing binary bytes. given a buffer 'b', len(b) is always the number of bytes, b[i] is the i'th byte, etc. this is one flavour less than in the 1.6 alphas -- where strings sometimes contain UTF-8 (and methods like upper etc doesn't work), sometimes an 8-bit character set (and upper works), and sometimes binary buffers (for which upper doesn't work). (hmm. I've said all this before, haven't I?) > > > > -- we still need an encoding marker for ascii supersets (how about > > > > ;-). however, it's up to > > > > the tokenizer to detect that one, not the parser. the parser only > > > > sees unicode strings. > > > > > > Hmm, the tokenizer doesn't do any string -> object conversion. > > > That's a task done by the parser. > > > > "unicode string" meant Py_UNICODE*, not PyUnicodeObject. > > > > if the tokenizer does the actual conversion doesn't really matter; > > the point is that once the code has passed through the tokenizer, > > it's unicode. > > The tokenizer would have to know which parts of the > input string to convert to Unicode and which not... plus there > are different encodings to be applied, e.g. UTF-8, Unicode-Escape, > Raw-Unicode-Escape, etc. sigh. why do you insist on taking a very simple thing and making it very very complicated? will anyone out there ever use an editor that supports different encodings for different parts of the file? why not just assume that the *ENTIRE SOURCE FILE* uses a single encoding, and let the tokenizer (or more likely, a conversion stage before the tokenizer) convert the whole thing to unicode. let the rest of the compiler work on Py_UNICODE* strings only, and all your design headaches will just disappear. ... frankly, I'm beginning to feel like John Skaller. do I have to write my own interpreter to get this done right? :-( From klm at digicool.com Fri Apr 14 21:18:18 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 14 Apr 2000 15:18:18 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> Message-ID: > since when did Python grow full lexical scoping? > > does anyone that has learned about the LGB rule expect > the above to work? Not sure what LGB stands for. "Local / Global / Built-in"? > in contrast, my example used a name which appears to be > defined in the same scope as the other names introduced > on the same line of source code -- but isn't. > > def foo(x): > foo.x =3D x > > here, "foo" doesn't refer to the same namespace as the > argument "x", but to instead whatever happens to be in > an entirely different namespace at the time the function > is executed. > > in other words, this feature cannot really be used to store > statics -- it only looks that way... Huh. ?? I'm assuming your hypothetical foo.x means the attribute 'x' of the function 'foo' in the global namespace for the function 'foo' - which, conveniently, is the module where foo is defined! 8<--- foo.py --->8 def foo(): # Return the object named 'foo'. return foo 8<--- end foo.py --->8 8<--- bar.py --->8 from foo import * print foo() 8<--- end bar.py --->8 % python bar.py % I must be misapprehending what you're suggesting - i know you know this stuff better than i do - but it seems to me that foo.x would work, were foo to have an x. (And that foo.x would, in my esteem, be a suboptimal way to get at x from within foo, but that's besides the fact.) Ken klm at digicool.com From jeremy at cnri.reston.va.us Fri Apr 14 21:18:53 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Fri, 14 Apr 2000 15:18:53 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <200004141715.TAA02492@python.inrialpes.fr> References: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> <200004141715.TAA02492@python.inrialpes.fr> Message-ID: <14583.28445.105079.446201@bitdiddle.cnri.reston.va.us> >>>>> "VM" == Vladimir Marangozov writes: VM> Jeremy Hylton wrote: >> Here it is contextified. One small difference from the previous >> patch is that NESTING_LIMIT is now only 1000. I think this is >> sufficient to cover commonly occuring nested containers. >> >> Jeremy >> >> [patch omitted] VM> Nice. VM> I think you don't need the _PyCompareState_flag. Like in VM> trashcan, _PyCompareState_nesting is enough to enter the VM> sections of the code that depend on _PyCompareState_flag. Right. Thanks for the suggestion, and thanks to Barry & Fred for theirs. I've checked in the changes. Jeremy From effbot at telia.com Fri Apr 14 21:28:09 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 21:28:09 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: Message-ID: <006201bfa647$92f3c000$34aab5d4@hagrid> Ken Manheimer wrote: > > does anyone that has learned about the LGB rule expect > > the above to work? > > Not sure what LGB stands for. "Local / Global / Built-in"? certain bestselling python books are known to use this acronym... > I'm assuming your hypothetical foo.x means the attribute 'x' of the > function 'foo' in the global namespace for the function 'foo' - which, > conveniently, is the module where foo is defined! did you run the eff() bot() example? > I must be misapprehending what you're suggesting - i know you know this > stuff better than i do - but it seems to me that foo.x would work, were > foo to have an x. sure, it seems to be working. but not for the right reason. > (And that foo.x would, in my esteem, be a suboptimal > way to get at x from within foo, but that's besides the fact.) fwiw, I'd love to see a good syntax for this. might even change my mind... From effbot at telia.com Fri Apr 14 21:32:54 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 21:32:54 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000401bfa5e3$e8c5ce60$612d153f@tim> Message-ID: <007e01bfa648$3c701de0$34aab5d4@hagrid> TimBot wrote: > Greg gave the voting rule as: > > > -1 "Veto. And is my reasoning." sorry, I must have missed that post, since I've interpreted the whole thing as: if reduce(operator.add, list_of_votes) > 0 and guido_likes_it(): implement(feature) (probably because I've changed the eff-bot script to use 'sre' instead of 're'...) can you repost the full set of rules? From gmcm at hypernet.com Fri Apr 14 21:36:53 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 15:36:53 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> Message-ID: <1256381481-212228@hypernet.com> Fredrik Lundh wrote: > > To belabor the obvious (existing Python allows obsfuction), I > > present: > > > > class eff: > > "eff" > > def __call__(self): > > print "eff", eff.__doc__ > > > > class bot: > > "bot" > > def __call__(self): > > print "bot", bot.__doc__ > > > > e = eff() > > b = bot() > > e() > > b() > > > > eff, bot = bot, eff > > e = eff() > > b = bot() > > e() > > b() > > > > There's nothing new here. Why does allowing the ability to > > obsfucate suddenly warrant a -1? > > since when did Python grow full lexical scoping? I know that's not Swedish, but I haven't the foggiest what you're getting at. Where did lexical scoping enter? > does anyone that has learned about the LGB rule expect > the above to work? You're the one who did "eff, bot = bot, eff". The only intent I can infer is obsfuction. The above works the same as yours, for whatever your definition of "work". > in contrast, my example used a name which appears to be > defined in the same scope as the other names introduced > on the same line of source code -- but isn't. > > def foo(x): > foo.x = x I guess I'm missing something. -------snip------------ def eff(): "eff" print "eff", eff.__doc__ def bot(): "bot" print "bot", bot.__doc__ eff() bot() eff, bot = bot, eff eff() bot() -----------end----------- I guess we're not talking about the same example. > here, "foo" doesn't refer to the same namespace as the > argument "x", but to instead whatever happens to be in > an entirely different namespace at the time the function > is executed. > > in other words, this feature cannot really be used to store > statics -- it only looks that way... Again, I'm mystified. After "eff, bot = bot, eff", I don't see why 'bot() == "eff bot"' is a wrong result. Put it another way: are you reporting a bug in 1.5.2? If it's a bug, why is my example not a bug? If it's not a bug, why would the existence of other attributes besides __doc__ be a problem? - Gordon From akuchlin at mems-exchange.org Fri Apr 14 21:37:01 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 14 Apr 2000 15:37:01 -0400 (EDT) Subject: Re[Python-Dev] #pragmas in Python source code In-Reply-To: <005401bfa646$123ef2a0$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> Message-ID: <14583.29533.608524.961284@amarok.cnri.reston.va.us> Fredrik Lundh writes: > if the programmer wants to convert between a unicode > string and a buffer containing encoded text, she needs > to spell it out. the codecs are never called "under the > hood" Watching the successive weekly Unicode patchsets, each one fixing some obscure corner case that turned out to be buggy -- '%s' % ustr, concatenating literals, int()/float()/long(), comparisons -- I'm beginning to agree with Fredrik. Automatically making Unicode strings and regular strings interoperate looks like it requires many changes all over the place, and I worry if it's possible to catch them all in time. Maybe we should consider being more conservative, and just having the Unicode built-in type, the unicode() built-in function, and the u"..." notation, and then leaving all responsibility for conversions up to the user. On the other hand, *some* default conversion seems needed, because it seems draconian to make open(u"abcfile") fail with a TypeError. (While I want to see Python 1.6 expedited, I'd also not like to see it saddled with a system that proves to have been a mistake, or one that's a maintenance burden. If forced to choose between delaying and getting it right, the latter wins.) >why not just assume that the *ENTIRE SOURCE FILE* uses a single >encoding, and let the tokenizer (or more likely, a conversion stage >before the tokenizer) convert the whole thing to unicode. To reinforce Fredrik's point here, note that XML only supports encodings at the level of an entire file (or external entity). You can't tell an XML parser that a file is in UTF-8, except for this one element whose contents are in Latin1. -- A.M. Kuchling http://starship.python.net/crew/amk/ Dream casts a human shadow, when it occurs to him to do so. -- From SANDMAN: "Season of Mists", episode 0 From effbot at telia.com Fri Apr 14 21:53:35 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 21:53:35 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256381481-212228@hypernet.com> Message-ID: <009401bfa64b$202f6c00$34aab5d4@hagrid> > > > There's nothing new here. Why does allowing the ability to > > > obsfucate suddenly warrant a -1? > > > > since when did Python grow full lexical scoping? > > I know that's not Swedish, but I haven't the foggiest what > you're getting at. Where did lexical scoping enter? > > > does anyone that has learned about the LGB rule expect > > the above to work? > > You're the one who did "eff, bot = bot, eff". The only intent I > can infer is obsfuction. The above works the same as yours, > for whatever your definition of "work". okay, I'll try again: in your example, the __call__ function refers to a name that is defined several levels up. in my example, the "foo" function refers to a name that *looks* like it's in the same scope as the "x" argument (etc), but isn't. for the interpreter, the examples are identical. for the reader, they're not. > Put it another way: are you reporting a bug in 1.5.2? If it's a > bug, why is my example not a bug? If it's not a bug, why > would the existence of other attributes besides __doc__ be a > problem? because people isn't likely to use __doc__ to store static variables? From skip at mojam.com Fri Apr 14 22:03:41 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 15:03:41 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <006201bfa647$92f3c000$34aab5d4@hagrid> References: <006201bfa647$92f3c000$34aab5d4@hagrid> Message-ID: <14583.31133.851143.570161@beluga.mojam.com> >> (And that foo.x would, in my esteem, be a suboptimal way to get at x >> from within foo, but that's besides the fact.) Fredrik> fwiw, I'd love to see a good syntax for this. might even Fredrik> change my mind... Could we overload "_"'s meaning yet again (assuming it doesn't already have a special meaning within functions)? That way def bar(): print _.x def foo(): print _.x foo.x = "public" bar.x = "private" bar, foo = foo, bar foo() would display private on stdout. *Note* - I would not advocate this use be extended to do a more general lookup of attributes - it should just refer to attributes of the function of which the executing code object is an attribute. (It may not even be possible.) (I've never used _ for anything, so I don't know all its current (ab)uses. This is just a thought that occurred to me...) Skip From gmcm at hypernet.com Fri Apr 14 22:18:56 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 16:18:56 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141907.VAA02670@python.inrialpes.fr> References: <1256392532-57122808@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 12:32:42 PM Message-ID: <1256378958-363995@hypernet.com> Vladimir Marangozov wrote: > Gordon McMillan wrote: > > Can you please explain how "consistency" is violated? > > > > Yes, I can. > Strictly speaking, if I say: > > >>> a.author = "Guido" > > and if 'author' is not an attribute of 'a' after the instantiation > of A (i.e. after a = A() completes), we should get a NameError. Ah. I see. Quite simply, you're arguing from First Principles in an area where I have none. I used to, but I found that all systems built from First Principles (Eiffel, Booch's methodology...) yielded 3 headed monsters. It can be entertaining (in the WWF sense). Just trick some poor sucker into saying "class method" in the C++ sense and then watch Jim Fulton deck him, the ref and half the front row. Personally, I regard (dynamic instance.attribute) as a handy feature, not as a flaw in the object model. - Gordon From moshez at math.huji.ac.il Fri Apr 14 22:19:50 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 14 Apr 2000 22:19:50 +0200 (IST) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <000901bfa5b9$b7f6c980$182d153f@tim> Message-ID: On Thu, 13 Apr 2000, Tim Peters wrote: > Well, while an instance of graph isomorphism, this one is a relatively > simple special case (because "the graphs" here are rooted, directed, and > have ordered children). Ordered? What about dictionaries? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw at cnri.reston.va.us Fri Apr 14 22:49:41 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 16:49:41 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <006201bfa647$92f3c000$34aab5d4@hagrid> Message-ID: <14583.33893.192967.369037@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> fwiw, I'd love to see a good syntax for this. might even FL> change my mind... def foo(x): self.x = x ? :) -Barry From bwarsaw at cnri.reston.va.us Fri Apr 14 23:03:25 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 17:03:25 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256381481-212228@hypernet.com> <009401bfa64b$202f6c00$34aab5d4@hagrid> Message-ID: <14583.34717.128345.245459@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> because people isn't likely to use __doc__ to store FL> static variables? Okay, let's really see how much we can abuse __doc__ today. I'm surprised neither Zope nor SPARK are this evil. Why must I add the extra level of obfuscating indirection? Or are we talking about making __doc__ read-only in 1.6, or restricting it to strings only? -Barry -------------------- snip snip -------------------- import sys print sys.version def decorate(func): class D: pass doc = func.__doc__ func.__doc__ = D() func.__doc__.__doc__ = doc def eff(): "eff" print "eff", eff.__doc__.__doc__ decorate(eff) def bot(): "bot" print "bot", bot.__doc__.__doc__ decorate(bot) eff.__doc__.publish = 1 bot.__doc__.publish = 0 eff() bot() eff, bot = bot, eff eff() bot() for f in (eff, bot): print 'Can I publish %s? ... %s' % (f.__name__, f.__doc__.publish and 'yes' or 'no') -------------------- snip snip -------------------- % python /tmp/scary.py 1.5.2 (#7, Apr 16 1999, 18:24:22) [GCC 2.8.1] eff eff bot bot bot eff eff bot Can I publish bot? ... no Can I publish eff? ... yes From bwarsaw at cnri.reston.va.us Fri Apr 14 23:05:43 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 17:05:43 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <006201bfa647$92f3c000$34aab5d4@hagrid> <14583.31133.851143.570161@beluga.mojam.com> Message-ID: <14583.34855.459510.161223@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> (I've never used _ for anything, so I don't know all its SM> current (ab)uses. This is just a thought that occurred to SM> me...) One place it's used is in localized applications. See Tools/i18n/pygettext.py. -Barry From gstein at lyra.org Fri Apr 14 23:20:27 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:20:27 -0700 (PDT) Subject: [Python-Dev] voting (was: Object customization) In-Reply-To: <007e01bfa648$3c701de0$34aab5d4@hagrid> Message-ID: On Fri, 14 Apr 2000, Fredrik Lundh wrote: > TimBot wrote: > > Greg gave the voting rule as: > > > > > -1 "Veto. And is my reasoning." > > sorry, I must have missed that post, since I've > interpreted the whole thing as: > > if reduce(operator.add, list_of_votes) > 0 and guido_likes_it(): > implement(feature) As in all cases, that "and" should be an "or" :-) > (probably because I've changed the eff-bot script > to use 'sre' instead of 're'...) > > can you repost the full set of rules? http://www.python.org/pipermail/python-dev/2000-March/004312.html Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:23:50 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:23:50 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <14583.33893.192967.369037@anthem.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > >>>>> "FL" == Fredrik Lundh writes: > > FL> fwiw, I'd love to see a good syntax for this. might even > FL> change my mind... > > def foo(x): > self.x = x > > ? :) Hehe... actually, I'd take Skip's "_.x = x" over the above suggestion. The above syntax creates too much of an expectation to look for "self". There would, of course, be problems that self.x doesn't work in a method while _.x could. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake at acm.org Fri Apr 14 23:18:48 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 17:18:48 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: References: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: <14583.35640.746399.601030@seahag.cnri.reston.va.us> R. David Murray writes: > So it looks to my uneducated eye like file.tell() is broken. The actual > on-disk size of the file, by the way, is indeed 2147718485, so it looks > like somebody's not using the right size data structure somewhere. > > So, can anyone tell me what to look for, or am I stuck for the moment? Hmm. What is off_t defined to be on your platform? In config.h, is HAVE_FTELLO or HAVE_FTELL64 defined? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 23:21:59 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 23:21:59 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256378958-363995@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 04:18:56 PM Message-ID: <200004142121.XAA03202@python.inrialpes.fr> Gordon McMillan wrote: > > Ah. I see. Quite simply, you're arguing from First Principles Exactly. I think that these principles play an important role in the area of computer programming, because they put the markers in the evolution of our thoughts when we're trying to transcript the real world through formal computer terms. No kidding :-) So we need to put some limits before loosing completely these driving markers. No kidding. > in an area where I have none. too bad for you > I used to, but I found that all systems built from First Principles > (Eiffel, Booch's methodology...) yielded 3 headed monsters. Yes. This is the state Python tends to reach, btw. I'd like to avoid this madness. Put simply, if we loose the meaning of the notion of a class of objects, there's no need to have a 'class' keyword, because it would do more harm than good. > Personally, I regard (dynamic instance.attribute) as a handy feature Gordon, I know that it's handy! > not as a flaw in the object model. if we still pretend there is one... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Fri Apr 14 23:22:08 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 23:22:08 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> Message-ID: <38F78C00.7BAE1C12@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > > but they won't -- if you don't use an encoding directive, and > > > don't use 8-bit characters in your string literals, everything > > > works as before. > > > > > > (that's why the default is "none" and not "utf-8") > > > > > > if you use 8-bit characters in your source code and wish to > > > add an encoding directive, you need to add the right encoding > > > directive... > > > > Fair enough, but this would render all the auto-coercion > > code currently in 1.6 useless -- all string to Unicode > > conversions would have to raise an exception. > > I though it was rather clear by now that I think the auto- > conversion stuff *is* useless... > > but no, that doesn't mean that all string to unicode conversions > need to raise exceptions -- any 8-bit unicode character obviously > fits into a 16-bit unicode character, just like any integer fits in a > long integer. > > if you convert the other way, you might get an OverflowError, just > like converting from a long integer to an integer may give you an > exception if the long integer is too large to be represented as an > ordinary integer. after all, > > i = int(long(v)) > > doesn't always raise an exception... This is exactly the same as proposing to change the default encoding to Latin-1. I don't have anything against that (being a native Latin-1 user :), but I would assume that other native language writer sure do: e.g. all programmers not using Latin-1 as native encoding (and there are lots of them). > > > > > why keep on pretending that strings and strings are two > > > > > different things? it's an artificial distinction, and it only > > > > > causes problems all over the place. > > > > > > > > Sure. The point is that we can't just drop the old 8-bit > > > > strings... not until Py3K at least (and as Fred already > > > > said, all standard editors will have native Unicode support > > > > by then). > > > > > > I discussed that in my original "all characters are unicode > > > characters" proposal. in my proposal, the standard string > > > type will have to roles: a string either contains unicode > > > characters, or binary bytes. > > > > > > -- if it contains unicode characters, python guarantees that > > > methods like strip, lower (etc), and regular expressions work > > > as expected. > > > > > > -- if it contains binary data, you can still use indexing, slicing, > > > find, split, etc. but they then work on bytes, not on chars. > > > > > > it's still up to the programmer to keep track of what a certain > > > string object is (a real string, a chunk of binary data, an en- > > > coded string, a jpeg image, etc). if the programmer wants > > > to convert between a unicode string and an external encoding > > > to use a certain unicode encoding, she needs to spell it out. > > > the codecs are never called "under the hood". > > > > > > (note that if you encode a unicode string into some other > > > encoding, the result is binary buffer. operations like strip, > > > lower et al does *not* work on encoded strings). > > > > Huh ? If the programmer already knows that a certain > > string uses a certain encoding, then he can just as well > > convert it to Unicode by hand using the right encoding > > name. > > I thought that was what I said, but the text was garbled. let's > try again: > > if the programmer wants to convert between a unicode > string and a buffer containing encoded text, she needs > to spell it out. the codecs are never called "under the > hood" Again and again... The orginal intent of the Unicode integration was trying to make Unicode and 8-bit strings interoperate without too much user intervention. At a cost (the UTF-8 encoding), but then if you do use this encoding (and this is not far fetched since there are input sources which do return UTF-8, e.g. TCL), the Unicode implementation will apply all its knowledge in order to get you satisfied. If you don't like this, you can always apply explicit conversion calls wherever needed. Latin-1 and UTF-8 are not compatible, the conversion is very likely to cause an exception, so the user will indeed be informed about this failure. > > The whole point we are talking about here is that when > > having the implementation convert a string to Unicode all > > by itself it needs to know which encoding to use. This is > > where we have decided long ago that UTF-8 should be > > used. > > does "long ago" mean that the decision cannot be > questioned? what's going on here? > > face it, I don't want to guess when and how the interpreter > will convert strings for me. after all, this is Python, not Perl. > > if I want to convert from a "string of characters" to a byte > buffer using a certain character encoding, let's make that > explicit. Hey, there's nothing which prevents you from doing so explicitly. > Python doesn't convert between other data types for me, so > why should strings be a special case? Sure it does: 1.5 + 2 == 3.5, 2L + 3 == 5L, etc... > > The pragma discussion is about a totally different > > issue: pragmas could make it possible for the programmer > > to tell the *compiler* which encoding to use for literal > > u"unicode" strings -- nothing more. Since "8-bit" strings > > currently don't have an encoding attached to them we store > > them as-is. > > what do I have to do to make you read my proposal? > > shout? > > okay, I'll try: > > THERE SHOULD BE JUST ONE INTERNAL CHARACTER > SET IN PYTHON 1.6: UNICODE. Please don't shout... simply read on... Note that you are again argueing for using Latin-1 as default encoding -- why don't you simply make this fact explicit ? > for consistency, let this be true for both 8-bit and 16-bit > strings (as well as Py3K's 31-bit strings ;-). > > there are many possible external string encodings, just like there > are many possible external integer encodings. but for integers, > that's not something that the core implementation cares much > about. why are strings different? > > > I don't want to get into designing a completely new > > character container type here... this can all be done for Py3K, > > but not now -- it breaks things at too many ends (even though > > it would solve the issues with strings being used in different > > contexts). > > you don't need to -- you only need to define how the *existing* > string type should be used. in my proposal, it can be used in two > ways: > > -- as a string of unicode characters (restricted to the > 0-255 subset, by obvious reasons). given a string 's', > len(s) is always the number of characters, s[i] is the > i'th character, etc. > > or > > -- as a buffer containing binary bytes. given a buffer 'b', > len(b) is always the number of bytes, b[i] is the i'th > byte, etc. > > this is one flavour less than in the 1.6 alphas -- where strings sometimes > contain UTF-8 (and methods like upper etc doesn't work), sometimes an > 8-bit character set (and upper works), and sometimes binary buffers (for > which upper doesn't work). Strings always contain data -- there's no encoding attached to them. If the user calls .upper() on a binary string the output will most probably no longer be usable... but that's the programmers fault, not the string type's fault. > (hmm. I've said all this before, haven't I?) You know as well as I do that the existing string type is used for both binary and text data. You cannot simply change this by introducing some new definition of what should be stored in buffers and what in strings... not until we officially redefined these things say in Py3K ;-) > frankly, I'm beginning to feel like John Skaller. do I have to write my > own interpreter to get this done right? :-( No, but you should have started this discussion in late November last year... not now, when everything has already been implemented and people are starting to the use the code that's there with great success. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Apr 14 23:29:48 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 23:29:48 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <14583.29533.608524.961284@amarok.cnri.reston.va.us> Message-ID: <38F78DCC.C630F32@lemburg.com> "Andrew M. Kuchling" wrote: > > >why not just assume that the *ENTIRE SOURCE FILE* uses a single > >encoding, and let the tokenizer (or more likely, a conversion stage > >before the tokenizer) convert the whole thing to unicode. > > To reinforce Fredrik's point here, note that XML only supports > encodings at the level of an entire file (or external entity). You > can't tell an XML parser that a file is in UTF-8, except for this one > element whose contents are in Latin1. Hmm, this would mean that someone who writes: """ #pragma script-encoding utf-8 u = u"\u1234" print u """ would suddenly see "\u1234" as output. If that's ok, fine with me... it would make things easier on the compiler side (even though I'm pretty sure that people won't like this). BTW: I will be offline for the next week... I'm looking forward to where this dicussion will be heading. Have fun, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Fri Apr 14 23:43:16 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:43:16 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: On Fri, 14 Apr 2000, Fredrik Lundh wrote: > Tim Peters wrote: > > [Gordon McMillan] > > > ... > > > Or are you saying that if functions have attributes, people will > > > all of a sudden expect that function locals will have initialized > > > and maintained state? > > > > I expect that they'll expect exactly what happens in JavaScript, which > > supports function attributes too, and where it's often used as a > > nicer-than-globals way to get the effect of C-like mutable statics > > (conceptually) local to the function. > > so it's no longer an experimental feature, it's a "static variables" > thing? Don't be so argumentative. Tim suggested a possible use. Not what it really means or how it really works. I look at it as labelling a function with metadata about that function. I use globals or class attrs for "static" data. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:45:51 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:45:51 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: Message-ID: On Fri, 14 Apr 2000, Mark Hammond wrote: > > I think that we get 95% of the benefit without any of the > > "dangers" > > (though I don't agree with the arguments against) if we allow the > > attachment of properties only at compile time and > > disallow mutation of > > them at runtime. > > AFAIK, this would be a pretty serious change. The compiler just > generates (basically)PyObject_SetAttr() calls. There is no way in > the current runtime to differentiate between "compile time" and > "runtime" attribute references... If this was done, it would simply > be ugly hacks to support what can only be described as unpythonic in > the first place! > > [Unless of course Im missing something...] You aren't at all! Paul hit his head, or he is assuming some additional work to allow the compiler to know more. I agree with you: compilation in Python is just code execution; there is no way Python can disallow runtime changes. (from a later note, it appears he is referring to introducing "decl", which I don't think is on the table for 1.6) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:48:27 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:48:27 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004141530.RAA02277@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: >... > If you prefer embedded definitions, among other things, you could do: > > __zope_access__ = { 'Spam' : 'public' } > > class Spam: > __zope_access__ = { 'eggs' : 'private', > 'eats' : 'public' } > def eggs(self, ...): ... > def eats(self, ...): ... > > or have a completely separate class/structure for access control > (which is what you would do it in C, btw, for existing objects > to which you can't add slots, ex: file descriptors, mem segments, etc). This is uglier than attaching the metadata directly to the target that you are describing! If you want to apply metadata to functions, then apply them to the function! Don't shove them off in a separate structure. You're the one talking about cleanliness, yet you suggest something that is very poor from a readability, maintainability, and semantic angle. Ick. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:52:22 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:52:22 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Fred L. Drake, Jr. wrote: > Ken Manheimer writes: > > (Oh, and i'd suggest up front that documentation for this feature > > recommend people not use "__*__" names for their own object attributes, to > > avoid collisions with eventual use of them by python.) > > Isn't that a standing recommendation for all names? Yup. Personally, I use "_*" for private variables or other "hidden" type things that shouldn't be part of an object's normal interface. For example, all the stuff that the Python/COM interface uses is prefixed by "_" to denote that it is metadata about the classes rather than part of its interface. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:56:37 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:56:37 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141907.VAA02670@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: >... > Now, whenever there are two instances 'a' and 'b' of the class A, > the first inconsistency is that we're allowed to assign attributes > to these instances dynamically, which are not declared in the class A. > > Strictly speaking, if I say: > > >>> a.author = "Guido" > > and if 'author' is not an attribute of 'a' after the instantiation > of A (i.e. after a = A() completes), we should get a NameError. I'll repeat what Gordon said: the current Python behavior is entirely correct, entirely desirable, and should not (can not) change. Your views on what an object model should be are not Python's views. If the person who writes "a.author =" wants to do that, then let them. Python does not put blocks in people's way, it simply presumes that people are intelligent and won't do Bad Things. There are enumerable times where I've done the following: class _blank() pass data = _blank() data.item = foo data.extra = bar func(data) It is a tremendously easy way to deal with arbitrary data on an attribute basis, rather than (say) dictionary's key-based basis. >... arguments about alternate classes and stuff ... Sorry. That just isn't Python. Not in practice, nor in intent. Applying metadata to the functions is an entirely valid, Pythonic idea. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Apr 15 00:01:51 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:01:51 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004142121.XAA03202@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: > Gordon McMillan wrote: > > > > Ah. I see. Quite simply, you're arguing from First Principles > > Exactly. > > I think that these principles play an important role in the area > of computer programming, because they put the markers in the > evolution of our thoughts when we're trying to transcript the > real world through formal computer terms. No kidding :-) > So we need to put some limits before loosing completely these > driving markers. No kidding. In YOUR opinion. In MY opinion, they're bunk. Python provides me with the capabilities that I want: objects when I need them, and procedural flow when that is appropriate. It avoids obstacles and gives me freedom of expression and ways to rapidly develop code. I don't have to worry about proper organization unless and until I need it. Formalisms be damned. I want something that works for ME. Give me code, make it work, and get out of my way. That's what Python is good for. I could care less about "proper programming principles". Pragmatism. That's what I seek. >... > > I used to, but I found that all systems built from First Principles > > (Eiffel, Booch's methodology...) yielded 3 headed monsters. > > Yes. This is the state Python tends to reach, btw. I'd like to avoid > this madness. Does not. There are many cases where huge systems have been built using Python, built well, and are quite successful. And yes, there have also been giant, monster-sized Bad Python Programs out there, too. But that can be done in ANY language. Python doesn't *tend* towards that at all. Certainly, Perl does, but we aren't talking about that (until now :-) > Put simply, if we loose the meaning of the notion of a class of objects, > there's no need to have a 'class' keyword, because it would do more harm > than good. Huh? What the heck do you mean by this? >... > > not as a flaw in the object model. > > if we still pretend there is one... It *DOES* have one. To argue there isn't one is simply insane and argumentative. Python just doesn't have YOUR object model. Live with it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov at inrialpes.fr Sat Apr 15 00:00:19 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 00:00:19 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on In-Reply-To: from "Greg Stein" at Apr 14, 2000 02:56:37 PM Message-ID: <200004142200.AAA03409@python.inrialpes.fr> Greg Stein wrote: > > Your views on what an object model should be are not Python's views. Ehm, could you explain to me what are Python's views? Sorry, I don't see any worthy argument in your posts that would make me switch from -1 to -0. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip at mojam.com Sat Apr 15 00:00:30 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 17:00:30 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Message-ID: <14583.38142.520596.804466@beluga.mojam.com> Barry said "_" is effectively taken because it means something (at least when used a function?) to pygettext. How about "__" then? def bar(): print __.x def foo(): print __.x foo.x = "public" bar.x = "private" ... It has the added benefit that this usage adheres to the "Python gets to stomp on __-prefixed variables" convention. my-underscore-key-works-better-than-yours-ly y'rs, Skip From gstein at lyra.org Sat Apr 15 00:13:25 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:13:25 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142200.AAA03409@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Your views on what an object model should be are not Python's views. > > Ehm, could you explain to me what are Python's views? > Sorry, I don't see any worthy argument in your posts > that would make me switch from -1 to -0. "We're all adults here." Python says that you can do what you want. It won't get in your way. Badness is not defined. If somebody wants to write "a.author='Guido'" then they can. There are a number of objects that can have arbitrary attributes. Classes, modules, and instances are a few (others?). Function objects are a proposed additional one. In all cases, attaching new attributes is fine and dandy -- no restriction. (well, you can implement __setattr__ on a class instance) Python's object model specifies a number of other behaviors, but nothing really material here. Of course, all these "views" are simply based on Guido's thoughts and the implementation. Implementation, doc, current practice, and Guido's discussions over the past eight years of Python's existence have also contributed to the notion of "The Python Way". Some of that may be very hard to write down, although I've attempted to write a bit of that above. After five years of working with Python, I'd like to think that I've absorbed and understand the Python Way. Can I state it? No. "We're all adults here" is a good one for this discussion. If you think that function attributes are bad for your programs, then don't use them. There are many others who find them tremendously handy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Apr 15 00:19:24 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:19:24 -0700 (PDT) Subject: [Python-Dev] veto? (was: Object customization) In-Reply-To: <200004142200.AAA03409@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Your views on what an object model should be are not Python's views. > > Ehm, could you explain to me what are Python's views? > Sorry, I don't see any worthy argument in your posts > that would make me switch from -1 to -0. Note that all votes are important, but only a signal to Guido about our individual feelings on the matter. Every single person on this list could vote -1, and Guido can still implement the feature (at his peril :-). Conversely, we could all vote +1 and he can refuse to implement it. In this particular case, your -1 vote says that you really dislike this feature. Great. And you've provided a solid explanation why. Even better! Now, people can respond to your vote and attempt to get you to change it. This is goodness because maybe you voted -1 based on a misunderstanding or something unclear in the proposal (I'm talking general now; I don't believe that is the case here). After explanation and enlightenment, you could change the vote. The discussion about *why* you voted -1 is also instructive to Guido. It may raise an issue that he hadn't considered. In addition, people attempting to change your mind are also providing input to Guido. [ maybe too much input is flying around, but the principle is there :-) ] Basically, we can call them vetoes or votes. Either way, this is still Guido's choice :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov at inrialpes.fr Sat Apr 15 00:54:24 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 00:54:24 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Greg Stein" at Apr 14, 2000 03:13:25 PM Message-ID: <200004142254.AAA03535@python.inrialpes.fr> Greg Stein wrote: > > Python says that you can do what you want. 'Python' says nothing. Or are you The Voice of Python? If custom object attributes are convenient for you, then I'd suggest to generalize the concept, because I perceived it as a limitation too, but not for functions and methods. I'll repeat myself: >>> wink >>> wink.fraction = 1e+-1 >>> wink.fraction.precision = 1e-+1 >>> wink.compute() 0.0 Has anybody noticed that 'fraction' is a float I wanted to qualify with a 'precision' attribute? Again: if we're about to go that road, let's do it in one shot. *This* is what would change my vote. I'll leave Guido to cut the butter, or to throw it all out the window. You're right Greg: I hardly can contribute more in this case, even if I wanted to. Okay, +53 :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip at mojam.com Sat Apr 15 01:01:14 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 18:01:14 -0500 (CDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142254.AAA03535@python.inrialpes.fr> References: <200004142254.AAA03535@python.inrialpes.fr> Message-ID: <14583.41786.144784.114440@beluga.mojam.com> Vladimir> I'll repeat myself: >>>> wink Vladimir> >>>> wink.fraction = 1e+-1 >>>> wink.fraction.precision = 1e-+1 >>>> wink.compute() Vladimir> 0.0 Vladimir> Has anybody noticed that 'fraction' is a float I wanted to Vladimir> qualify with a 'precision' attribute? Quick comment before I rush home... There is a significant cost to be had by adding attributes to numbers (ints at least). They can no longer be shared in the int cache. I think the runtime size increase would be pretty huge, as would the extra overhead in creating all those actual (small) IntObjects instead of sharing a single copy. On the other hand, functions are already pretty heavyweight objects and occur much less frequently than numbers in common Python programs. They aren't shared (except for instance methods, which Barry's patch already excludes), so there's no risk of stomping on attributes that are shared by more than one function. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From gstein at lyra.org Sat Apr 15 01:14:01 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 16:14:01 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142254.AAA03535@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Python says that you can do what you want. > > 'Python' says nothing. Or are you The Voice of Python? Well, yah. You're just discovering that?! :-) I meant "The Python Way" says that you can do what you want. It doesn't speak often, but if you know how to hear it... it is a revelation :-) > If custom object attributes are convenient for you, then I'd suggest to Custom *function* attributes. Functions are one of the few objects in Python that are "structural" in their intent and use, yet have no way to record data. Modules and classes have a way to, but not functions. [ by "structure", I mean something that contributes to the structure, organization, and mechanics of your program. as opposed to data, such as lists, dicts, instances. ] And ditto what Skip said about attaching attributes to ints and other immutables. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond at skippinet.com.au Sat Apr 15 03:45:27 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat, 15 Apr 2000 11:45:27 +1000 Subject: Re[Python-Dev] #pragmas in Python source code In-Reply-To: <14583.29533.608524.961284@amarok.cnri.reston.va.us> Message-ID: I can see the dilemma, but... > Maybe we should consider being more conservative, and > just having the > Unicode built-in type, the unicode() built-in function, > and the u"..." > notation, and then leaving all responsibility for > conversions up to > the user. Win32 and COM has been doing exactly this for the last couple of years. And it sucked. > On the other hand, *some* default conversion > seems needed, > because it seems draconian to make open(u"abcfile") fail with a > TypeError. For exactly this reason. The end result is that the first thing you ever do with a Unicode object is convert it to a string. > (While I want to see Python 1.6 expedited, I'd also not > like to see it > saddled with a system that proves to have been a mistake, or one > that's a maintenance burden. If forced to choose between > delaying and > getting it right, the latter wins.) Agreed. I thought this implementation stemmed from Guido's desire to do it this way in the 1.x family, and move towards Fredrik's proposal for Py3k. As a geneal comment: Im a little confused and dissapointed here. We are all bickering like children while our parents are away. All we are doing is creating a _huge_ pile of garbage for Guido to ignore when he returns. We are going to be presenting Guido with around 400 messages at my estimate. He can't possibly read them all. So the end result is that all the posturing and flapping going on here is for naught, and he is just going to do whatever he wants anyway - as he always has done, and as has worked so well for Python. Sheesh - we should all consider how we can be the most effective, not the most loud or aggressive! Mark. From moshez at math.huji.ac.il Sat Apr 15 07:06:00 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 15 Apr 2000 07:06:00 +0200 (IST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141530.RAA02277@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: > If you prefer embedded definitions, among other things, you could do: > > __zope_access__ = { 'Spam' : 'public' } > > class Spam: > __zope_access__ = { 'eggs' : 'private', > 'eats' : 'public' } > def eggs(self, ...): ... > def eats(self, ...): ... This solution is close to what the eff-bot suggested. In this case it is horrible because of "editing effort": the meta-data and code of a function are better off together physically, so you would change it to class Spam: __zope_access__ = {} def eggs(self): pass __zope_access__['eggs'] = 'private' def eats(self): pass __zope_access__['eats'] = 'public' Which is way too verbose. Especially, if the method gets thrown around, you find yourself doing things like meth.im_class.__zope_access__[meth.im_func.func_name] Instead of meth.__zope_access__ And sometimes you write a function: def do_something(self): pass And the infrastructure adds the method to a class of its choice. Where would you stick the attribute then? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tim_one at email.msn.com Sat Apr 15 07:51:57 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 15 Apr 2000 01:51:57 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan andPR#7) In-Reply-To: Message-ID: <000701bfa69e$b5d768e0$092d153f@tim> [Tim] > Well, while an instance of graph isomorphism, this one is a relatively > simple special case (because "the graphs" here are rooted, directed, and > have ordered children). [Moshe Zadka] > Ordered? What about dictionaries? An ordering of a dict's kids is forced in the context of comparison (see dict_compare in dictobject.c). From Vladimir.Marangozov at inrialpes.fr Sat Apr 15 08:56:44 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 08:56:44 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: <14583.41786.144784.114440@beluga.mojam.com> from "Skip Montanaro" at Apr 14, 2000 06:01:14 PM Message-ID: <200004150656.IAA03994@python.inrialpes.fr> Skip Montanaro wrote: > > Vladimir> Has anybody noticed that 'fraction' is a float I wanted to > Vladimir> qualify with a 'precision' attribute? > > Quick comment before I rush home... There is a significant cost to be had > by adding attributes to numbers (ints at least). They can no longer be > shared in the int cache. I think the runtime size increase would be pretty > huge, as would the extra overhead in creating all those actual (small) > IntObjects instead of sharing a single copy. I know that. Believe it or not, I have a good image of the cost it would infer, better than yours. because I've thought about this problem (as well as other related problems yet to be 'discovered'), and have spent some time in the past trying to find a couple of solutions to them. However, I eventually decided to stop my time machine and wait for these issues to show up, then take a stance on them. And this is what I did in this case. I'm tired to lack good arguments and see incoming capitalized words. This makes no sense here. Go to c.l.py and repeat "we're all adults here" *there*, please. To close this chapter, I think that if this gets in, Python's user base will get more confused and would have to swallow yet another cheap gimmick. You won't be able to explain it well to them. They won't really understand it, because their brains are still young, inexperienced, looking for logical explanations where all notions coexist peacefully. In the long term, what you're pushing for to get your money quickly, isn't a favor. And that's why I maintain my vote. call-me-again-if-you-need-more-than-53'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Sat Apr 15 11:28:15 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 15 Apr 2000 11:28:15 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: Message-ID: <38F8362F.51C2F7EC@lemburg.com> Mark Hammond wrote: > > I thought this implementation stemmed from Guido's desire > to do it this way in the 1.x family, and move towards Fredrik's > proposal for Py3k. Right. Let's do this step by step and get some experience first. With that gained experience we can still polish up the design towards a compromise which best suits all our needs. The integration of Unicode into Python is comparable to the addition of floats to an interpreter which previously only understood integers -- things are obviously going to be a little different than before. Our goal should be to make it as painless as possible and at least IMHO this can only be achieved by gaining practical experience in this new field first. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Sat Apr 15 12:14:54 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sat, 15 Apr 2000 12:14:54 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <38F78C00.7BAE1C12@lemburg.com> Message-ID: <002101bfa6c3$73e6c3c0$34aab5d4@hagrid> > This is exactly the same as proposing to change the default > encoding to Latin-1. no, it isn't. here's what I'm proposing: -- the internal character set is unicode, and nothing but unicode. in 1.6, this applies to strings. in 1.7 or later, it applies to source code as well. -- the default source encoding is "unknown" -- the is no other default encoding. all strings use the unicode character set. to give you some background, let's look at section 3.2 of the existing language definition: [Sequences] represent finite ordered sets indexed by natural numbers. The built-in function len() returns the number of items of a sequence. When the length of a sequence is n, the index set contains the numbers 0, 1, ..., n-1. Item i of sequence a is selected by a[i]. An object of an immutable sequence type cannot change once it is created. The items of a string are characters. There is no separate character type; a character is represented by a string of one item. Characters represent (at least) 8-bit bytes. The built-in functions chr() and ord() convert between characters and nonnegative integers representing the byte values. Bytes with the values 0-127 usually represent the corre- sponding ASCII values, but the interpretation of values is up to the program. The string data type is also used to represent arrays of bytes, e.g., to hold data read from a file. (in other words, given a string s, len(s) is the number of characters in the string. s[i] is the i'th character. len(s[i]) is 1. etc. the existing string type doubles as byte arrays, where given an array b, len(b) is the number of bytes, b[i] is the i'th byte, etc). my proposal boils down to a few small changes to the last three sentences in the definition. basically, change "byte value" to "character code" and "ascii" to "unicode": The built-in functions chr() and ord() convert between characters and nonnegative integers representing the character codes. Character codes usually represent the corresponding unicode values. The 8-bit string data type is also used to represent arrays of bytes, e.g., to hold data read from a file. that's all. the rest follows from this. ... just a few quickies to sort out common misconceptions: > I don't have anything against that (being a native Latin-1 > user :), but I would assume that other native language > writer sure do: e.g. all programmers not using Latin-1 > as native encoding (and there are lots of them). the unicode folks have already made that decision. I find it very strange that we should use *another* model for the first 256 characters, just to "equally annoy everyone". (if people have a problem with the first 256 unicode characters having the same internal representation as the ISO 8859-1 set, tell them to complain to the unicode folks). > (and this is not far fetched since there are input sources > which do return UTF-8, e.g. TCL), the Unicode implementation > will apply all its knowledge in order to get you satisfied. there are all sorts of input sources. major platforms like windows and java use 16-bit unicode. and Tcl has an internal unicode string type, since they realized that storing UTF-8 in 8-bit strings was horridly inefficient (they tried to do it right, of course). the internal type looks like this: typedef unsigned short Tcl_UniChar; typedef struct String { int numChars; size_t allocated; size_t uallocated; Tcl_UniChar unicode[2]; } String; (Tcl uses dual-ported objects, where each object can have an UTF-8 string representation in addition to the internal representation. if you change one of them, the other is recalculated on demand) in fact, it's Tkinter that converts the return value to UTF-8, not Tcl. that can be fixed. > > Python doesn't convert between other data types for me, so > > why should strings be a special case? > > Sure it does: 1.5 + 2 == 3.5, 2L + 3 == 5L, etc... but that's the key point: 2L and 3 are both integers, from the same set of integers. if you convert a long integer to an integer, it still contains an integer from the same set. (maybe someone can fill me in here: what's the formally correct word here? set? domain? category? universe?) also, if you convert every item in a sequence of long integers to ordinary integers, all items are still members of the same integer set. in contrast, the UTF-8 design converts between strings of characters, and arrays of bytes. unless you change the 8-bit string type to know about UTF-8, that means that you change string items from one domain (characters) to another (bytes). > Note that you are again argueing for using Latin-1 as > default encoding -- why don't you simply make this fact > explicit ? nope. I'm standardizing on a character set, not an encoding. character sets are mapping between integers and characters. in this case, we use the unicode character set. encodings are ways to store strings of text as bytes in a byte array. > not now, when everything has already been implemented and > people are starting to the use the code that's there with great > success. the positive reports I've seen all rave about the codec frame- work. that's a great piece of work. without that, it would have been impossible to do what I'm proposing. (so what are you complaining about? it's all your fault -- if you hadn't done such a great job on that part of the code, I wouldn't have noticed the warts ;-) if you look at my proposal from a little distance, you'll realize that it doesn't really change much. all that needs to be done is to change some of the conversion stuff. if we decide to do this, I can do the work for you, free of charge. From effbot at telia.com Sat Apr 15 12:45:15 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sat, 15 Apr 2000 12:45:15 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F8362F.51C2F7EC@lemburg.com> Message-ID: <005801bfa6c7$b22783a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > Right. Let's do this step by step and get some experience first. > With that gained experience we can still polish up the design > towards a compromise which best suits all our needs. so practical experience from other languages, other designs, and playing with the python alphas doesn't count? > The integration of Unicode into Python is comparable to the > addition of floats to an interpreter which previously only > understood integers. use "long integers" instead of "floats", and you'll get closer to the actual case. but where's the problem? python has solved this problem for numbers, and what's more important: the language reference tells us how strings are supposed to work: "The items of a string are characters." (see previous mail) "Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters." this solves most of the issues. to handle the rest, look at the language reference description of integer: [Integers] represent elements from the mathematical set of whole numbers. Borrowing the "elements from a single set" concept, define characters as Characters represent elements from the unicode character set. and let all mixed-string operations use string coercion, just like numbers. can it be much simpler? From effbot at telia.com Sat Apr 15 13:19:14 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sat, 15 Apr 2000 13:19:14 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <14583.29533.608524.961284@amarok.cnri.reston.va.us> <38F78DCC.C630F32@lemburg.com> Message-ID: <00c901bfa6cd$62259580$34aab5d4@hagrid> M.-A. Lemburg wrote: > > To reinforce Fredrik's point here, note that XML only supports > > encodings at the level of an entire file (or external entity). You > > can't tell an XML parser that a file is in UTF-8, except for this one > > element whose contents are in Latin1. > > Hmm, this would mean that someone who writes: > > """ > #pragma script-encoding utf-8 > > u = u"\u1234" > print u > """ > > would suddenly see "\u1234" as output. not necessarily. consider this XML snippet: ሴ if I run this through an XML parser and write it out as UTF-8, I get: ?^? in other words, the parser processes "&#x" after decoding to unicode, not before. I see no reason why Python cannot do the same. From effbot at telia.com Sat Apr 15 14:02:02 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sat, 15 Apr 2000 14:02:02 +0200 Subject: [Python-Dev] Object customization References: Message-ID: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Greg Stein wrote: > On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > > >>>>> "FL" == Fredrik Lundh writes: > > > > FL> fwiw, I'd love to see a good syntax for this. might even > > FL> change my mind... > > > > def foo(x): > > self.x = x > > > > ? :) > > Hehe... actually, I'd take Skip's "_.x = x" over the above suggestion. The > above syntax creates too much of an expectation to look for "self". There > would, of course, be problems that self.x doesn't work in a method while > _.x could. how about the obvious one: adding the name of the function to the local namespace? def foo(x): foo.x = x (in theory, this might of course break some code. but is that a real problem?) after all, my concern is that the above appears to work, but mostly by accident: >>> def foo(x): >>> foo.x = x >>> foo(10) >>> foo.x 10 >>> # cool. now let's try this on a method >>> class Foo: >>> def bar(self, x): >>> bar.x = x >>> foo = Foo() >>> foo.bar(10) Traceback (most recent call first): NameError: bar >>> # huh? maybe making it work in both cases would help? ... but on third thought, maybe it's sufficient to just keep the "static variable" aspect out of the docs. I just browsed a number of javascript manuals, and I couldn't find a trace of this feature. so how about this? -0.90000000000000002 on documenting this as "this can be used to store static data in a function" +1 on the feature itself. From Vladimir.Marangozov at inrialpes.fr Sat Apr 15 16:33:51 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 16:33:51 +0200 (CEST) Subject: [Python-Dev] veto? (was: Object customization) In-Reply-To: from "Greg Stein" at Apr 14, 2000 03:19:24 PM Message-ID: <200004151433.QAA04376@python.inrialpes.fr> Greg Stein wrote: > > Note that all votes are important, but only a signal to Guido ... > [good stuff deleted] Very true. I think we've made good progress here. > Now, people can respond to your vote and attempt to get you to change it. > ... After explanation and enlightenment, you could change the vote. Or vice-versa :-) Fredrik has been very informative about the evolution of his opinions as the discussion evolved. As was I, but I don't count ;-) It would be nice if we adopt his example and send more signals to Guido, emitted with a fixed (positive or negative) or with a sinusoidal frequency. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From bwarsaw at cnri.reston.va.us Sat Apr 15 18:45:27 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 15 Apr 2000 12:45:27 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <14583.25258.427604.293809@seahag.cnri.reston.va.us> <14583.38142.520596.804466@beluga.mojam.com> Message-ID: <14584.40103.511210.929864@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> Barry said "_" is effectively taken because it means something SM> (at least when used a function?) to pygettext. How about "__" SM> then? oops, yes, only when used as a function. so _.x would be safe. From bwarsaw at cnri.reston.va.us Sat Apr 15 18:52:56 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Sat, 15 Apr 2000 12:52:56 -0400 (EDT) Subject: [Python-Dev] Object customization References: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Message-ID: <14584.40552.13918.707019@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> so how about this? FL> -0.90000000000000002 on documenting this as "this FL> can be used to store static data in a function" FL> +1 on the feature itself. Works for me! I think function attrs would be a lousy place to put statics anyway. -Barry From tim_one at email.msn.com Sat Apr 15 19:43:05 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 15 Apr 2000 13:43:05 -0400 Subject: [Python-Dev] Object customization In-Reply-To: <14584.40552.13918.707019@anthem.cnri.reston.va.us> Message-ID: <000801bfa702$0e1b3320$8a2d153f@tim> [/F] > -0.90000000000000002 on documenting this as "this > can be used to store static data in a function" -1 on that part from me. I never recommended to do it, I merely predicted that people *will* do it. And, they will. > +1 on the feature itself. I remain +0. [Barry] > Works for me! I think function attrs would be a lousy place to put > statics anyway. Yes, but the alternatives are also lousy: a global, or abusing default args. def f(): f.n = f.n + 1 return 42 f.n = 0 ... print "f called", f.n, "times" vs _f_n = 0 def f(): global _f_n _f_n = _f_n + 1 return 42 ... print "f called", _f_n, "times" vs def f(n=[0]): n[0] = n[0] + 1 return 42 ... print "f called ??? times" As soon as s person bumps into the first way, they're likely to adopt it, simply because it's less lousy than the others on first sight. From moshez at math.huji.ac.il Sat Apr 15 19:44:25 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 15 Apr 2000 19:44:25 +0200 (IST) Subject: [Python-Dev] Object customization In-Reply-To: <000801bfa702$0e1b3320$8a2d153f@tim> Message-ID: On Sat, 15 Apr 2000, Tim Peters wrote: [Barry] > Works for me! I think function attrs would be a lousy place to put > statics anyway. [Tim Peters] > Yes, but the alternatives are also lousy: a global, or abusing default > args. Personally I kind of like the alternative of a class: class _Foo: def __init__(self): self.n = 0 def f(self): self.n = self.n+1 return 42 f = _Foo().f getting-n-out-of-f-is-left-as-an-exercise-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Sun Apr 16 17:52:20 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 16 Apr 2000 17:52:20 +0200 Subject: [Python-Dev] Python source code encoding Message-ID: <38F9E1B4.F664D173@lemburg.com> [Fredrik]: > [MAL]: > > > To reinforce Fredrik's point here, note that XML only supports > > > encodings at the level of an entire file (or external entity). You > > > can't tell an XML parser that a file is in UTF-8, except for this one > > > element whose contents are in Latin1. > > > > Hmm, this would mean that someone who writes: > > > > """ > > #pragma script-encoding utf-8 > > > > u = u"\u1234" > > print u > > """ > > > > would suddenly see "\u1234" as output. > > not necessarily. consider this XML snippet: > > > ሴ > > if I run this through an XML parser and write it > out as UTF-8, I get: > > ?^? > > in other words, the parser processes "&#x" after > decoding to unicode, not before. > > I see no reason why Python cannot do the same. Sure, and this is what I meant when I said that the compiler has to deal with several different encodings. Unicode escape sequences are currently handled by a special codec, the unicode-escape codec which reads all characters with ordinal < 256 as-is (meaning Latin-1, since the first 256 Unicode ordinals map to Latin-1 characters (*)) except a few escape sequences which it processes much like the Python parser does for 8-bit strings and the new \uXXXX escape. Perhaps we should make this processing use two levels... the escape codecs would need some rewriting to process Unicode-> Unicode instead of 8-bit->Unicode as they do now. -- To move along the method Fredrik is proposing I would suggest (for Python 1.7) to introduce a preprocessor step which gets executed even before the tokenizer. The preprocessor step would then translate char* input into Py_UNICODE* (using an encoding hint which would have to appear in the first few lines of input using some special format). The tokenizer could then work on Py_UNICODE* buffer and the parser would then take care of the conversion from Py_UNICODE* back to char* for Python's 8-bit strings. It should shout out loud in case it sees input data outside Unicode range(256) in what is supposed to be a 8-bit string. To make this fully functional we would have to change the 8-bit string to Unicode coercion mechanism, though. It would have to make a Latin-1 assumption instead of the current UTF-8 assumption. In contrast to the current scheme, this assumption would be correct for all constant strings appearing in source code given the above preprocessor logic. For strings constructed from file or user input the programmer would have to assure proper encoding or do the Unicode conversion himself. Sidenote: The UTF-8->Latin-1 change would probably also have to be propogated to all other Unicode in/output logic -- perhaps Latin-1 is the better default encoding after all... A programmer could then write a Python script completely in UTF-8, UTF-16 or Shift-JIS and the above logic would convert the input data to Unicode or Latin-1 (which is 8-bit Unicode) as appropriate and it would warn about impossible conversions to Latin-1 in the compile step. The programmer would still have to make sure that file and user input gets converted using the proper encoding, but this can easily be done using the stream wrappers in the standard codecs module. Note that in this discussion we need to be very careful not to mangle encodings used for source code and ones used when reading/writing to files or other streams (including stdin/stdout). BTW, to experiment with all this you can use the codecs.EncodedFile stream wrapper. It allows specifying both data and stream side encodings, e.g. you can redirect a UTF-8 stdin stream to Latin-1 returning file object which can then be used as source of data input. (*) The conversion from Unicode to Latin-1 is similar to converting a 2-byte unsigned short to an unsigned byte with some extra logic to catch data loss. Latin-1 is comparable to 8-bit Unicode... this is where all this talk about Latin-1 originates from :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Sun Apr 16 22:28:41 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sun, 16 Apr 2000 22:28:41 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Greg Stein" at Apr 14, 2000 02:48:27 PM Message-ID: <200004162028.WAA06045@python.inrialpes.fr> [Skip, on attaching access control rights to function objects] > [VM] > >... > > If you prefer embedded definitions, among other things, you could do: > > > > __zope_access__ = { 'Spam' : 'public' } > > > > class Spam: > > __zope_access__ = { 'eggs' : 'private', > > 'eats' : 'public' } > > def eggs(self, ...): ... > > def eats(self, ...): ... [Greg] > This is uglier than attaching the metadata directly to the target that you > are describing! If you want to apply metadata to functions, then apply > them to the function! Don't shove them off in a separate structure. > > You're the one talking about cleanliness, yet you suggest something that > is very poor from a readability, maintainability, and semantic angle. Ick. [Moshe] > This solution is close to what the eff-bot suggested. In this case it > is horrible because of "editing effort": the meta-data and code of a > function are better off together physically, so you would change it > to ... > [equivalent solution deleted] In this particular use case, we're discussing access control rights which are part of some protection policy. A protection policy is a matrix Objects/Rights. It can be impemented in 3 ways, depending on the system: 1. Attach the Rights to the Objects 2. Attach the Objects to the Rights 3. Have a separate structure which implements the matrix. I agree that in this particular case, it seems handy to attach the rights to the objects. But in other cases, it's more appropriate to attach the objects to the rights. However, the 3rd solution is the one to be used when the objects (respectively, the rights) are fixed from the start and cannot be modified, and solution 2 (resp, 3) is not desirable/optimal/plausible... That's what I meant with: [VM] > > or have a completely separate class/structure for access control > > (which is what you would do it in C, btw, for existing objects > > to which you can't add slots, ex: file descriptors, mem segments, etc). Which presents an advantage: the potential to change completely the protection policy of the system in future versions of the software, because the protection implementation is decoupled from the objects' and the rights' implementation. damned-but-persistent-first-principles-again-'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From klm at digicool.com Sun Apr 16 23:45:00 2000 From: klm at digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 17:45:00 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: On Sun, 16 Apr 2000 Vladimir.Marangozov at inrialpes.fr wrote: > > [Skip, on attaching access control rights to function objects] > > > [VM] > > >... > > > If you prefer embedded definitions, among other things, you could do: > > > > > > __zope_access__ = { 'Spam' : 'public' } > > > > > > class Spam: > > > __zope_access__ = { 'eggs' : 'private', > > > 'eats' : 'public' } > > > def eggs(self, ...): ... > > > def eats(self, ...): ... > > [Greg] > > This is uglier than attaching the metadata directly to the target that you > > are describing! If you want to apply metadata to functions, then apply > > them to the function! Don't shove them off in a separate structure. > [...] > In this particular use case, we're discussing access control rights > which are part of some protection policy. > A protection policy is a matrix Objects/Rights. It can be impemented > in 3 ways, depending on the system: > 1. Attach the Rights to the Objects > 2. Attach the Objects to the Rights > 3. Have a separate structure which implements the matrix. > [...] > [VM] > > > or have a completely separate class/structure for access control > > > (which is what you would do it in C, btw, for existing objects > > > to which you can't add slots, ex: file descriptors, mem segments, etc). > > Which presents an advantage: the potential to change completely the > protection policy of the system in future versions of the software, > because the protection implementation is decoupled from the objects' > and the rights' implementation. It may well make sense to have the system *implement* the rights somewhere else. (Distributed system, permissions caches in an object system, etc.) However it seems to me to make exceeding sense to have the initial intrinsic settings specified as part of the object! More generally, it is the ability to associate intrinsic metadata that is the issue, not the designs of systems that employ the metadata. Geez. And, in the case of functions, it seems to me to be outstandingly consistent with python's treatment of objects. I'm mystified about why you would reject that so adamantly! That said, i can entirely understand concerns about whether or how to express references to the metadata from within the function's body. We haven't even seen a satisfactory approach to referring to the function, itself, from within the function. Maybe it's not even desirable to be able to do that - that's an interesting question. (I happen to think it's a good idea, just requiring a suitable means of expression.) But being able to associate metadata with functions seems like a good idea, and i've seen no relevant clues in your "first principles" about why it would be bad. Ken klm at digicool.com Return-Path: Delivered-To: python-dev at python.org Received: from merlin.codesourcery.com (merlin.codesourcery.com [206.168.99.1]) by dinsdale.python.org (Postfix) with SMTP id 7312F1CD5A for ; Sat, 15 Apr 2000 12:50:20 -0400 (EDT) Received: (qmail 17758 invoked by uid 513); 15 Apr 2000 16:57:54 -0000 Mailing-List: contact sc-publicity-help at software-carpentry.com; run by ezmlm Precedence: bulk X-No-Archive: yes Delivered-To: mailing list sc-publicity at software-carpentry.com Delivered-To: moderator for sc-publicity at software-carpentry.com Received: (qmail 16214 invoked from network); 15 Apr 2000 16:19:12 -0000 Date: Sat, 15 Apr 2000 12:11:52 -0400 (EDT) From: To: sc-announce at software-carpentry.com, sc-publicity at software-carpentry.com Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: [Python-Dev] Software Carpentry entries now on-line Sender: python-dev-admin at python.org Errors-To: python-dev-admin at python.org X-BeenThere: python-dev at python.org X-Mailman-Version: 2.0beta3 List-Id: Python core developers First-round entries in the Software Carpentry design competition are now available on-line at: http://www.software-carpentry.com/entries/index.html Our thanks to everyone who entered; we look forward to some lively discussion on the "sc-discuss" list. Best regards, Greg Wilson Software Carpentry Project Coordinator From skip at mojam.com Mon Apr 17 00:06:08 2000 From: skip at mojam.com (Skip Montanaro) Date: Sun, 16 Apr 2000 17:06:08 -0500 (CDT) Subject: [Python-Dev] Object customization In-Reply-To: References: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: <14586.14672.949500.986951@beluga.mojam.com> Ken> We haven't even seen a satisfactory approach to referring to the Ken> function, itself, from within the function. Maybe it's not even Ken> desirable to be able to do that - that's an interesting question. I hereby propose that within a function the special name __ refer to the function. You could have def fact(n): if n <= 1: return 1 return __(n-1) * n You could also refer to function attributes through __ (presuming Barry's proposed patch gets adopted): def pub(*args): if __.access == "private": do_private_stuff(*args) else: do_public_stuff(*args) ... if validate_permissions(): pub.access = "private" else: pub.access = "public" When in a bound method, __ should refer to the bound method, not the unbound method, which is already accessible via the class name. As far as lexical scopes are concerned, this won't change anything. I think it could be implemented by adding a reference to the function called __ in the local vars of each function. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From klm at digicool.com Mon Apr 17 00:09:18 2000 From: klm at digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:09:18 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Message-ID: On Sat, 15 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > > > >>>>> "FL" == Fredrik Lundh writes: > > > > > > FL> fwiw, I'd love to see a good syntax for this. might even > > > FL> change my mind... > > > > > > def foo(x): > > > self.x = x > > > > > > ? :) > > > > Hehe... actually, I'd take Skip's "_.x = x" over the above suggestion. The > > above syntax creates too much of an expectation to look for "self". There > > would, of course, be problems that self.x doesn't work in a method while > > _.x could. > > how about the obvious one: adding the name of the > function to the local namespace? > > def foo(x): > foo.x = x 'self.x' would collide profoundly with the convention of using 'self' for the instance-argument in bound methods. Here, foo.x assumes that 'foo' is not rebound in the context of the def - the class, module, function, or wherever it's defined. That seems like an unnecessarily too strong an assumption. Both of these things suggest to me that we don't want to use a magic variable name, but rather some kind of builtin function to get the object (lexically) containing the block. It's tempting to name it something like 'this()', but that would be much too easily confused in methods with 'self'. Since we're looking for the lexically containing object, i'd call it something like current_object(). class Something: """Something's cooking, i can feel it.""" def help(self, *args): """Spiritual and operational guidance for something or other. Instructions for using help: ...""" print self.__doc__ print current_object().__doc__ if args: self.do_mode_specific_help(args) I think i'd be pretty happy with the addition of __builtins__.current_object, and the allowance of arbitrary metadata with functions (and other funtion-like objects like methods). Ken klm at digicool.com From klm at digicool.com Mon Apr 17 00:12:29 2000 From: klm at digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:12:29 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14584.40552.13918.707019@anthem.cnri.reston.va.us> Message-ID: On Sat, 15 Apr 2000 bwarsaw at cnri.reston.va.us wrote: > >>>>> "FL" == Fredrik Lundh writes: > > FL> so how about this? > > FL> -0.90000000000000002 on documenting this as "this > FL> can be used to store static data in a function" > > FL> +1 on the feature itself. > > Works for me! I think function attrs would be a lousy place to put > statics anyway. Huh? Why? (I don't have a problem with omitting mention of this use - seems like encouraging the use of globals, often a mistake.) Ken klm at digicool.com From klm at digicool.com Mon Apr 17 00:21:59 2000 From: klm at digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:21:59 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14586.14672.949500.986951@beluga.mojam.com> Message-ID: On Sun, 16 Apr 2000, Skip Montanaro wrote: > > Ken> We haven't even seen a satisfactory approach to referring to the > Ken> function, itself, from within the function. Maybe it's not even > Ken> desirable to be able to do that - that's an interesting question. > > I hereby propose that within a function the special name __ refer to the > function. You could have > > def fact(n): > if n <= 1: return 1 > return __(n-1) * n > > You could also refer to function attributes through __ (presuming Barry's > proposed patch gets adopted): At first i thought you were kidding about using '__' because '_' was taken - on lots of terminals that i use, there is no intervening whitespace separating the two '_'s, so it's pretty hard to tell the difference between it and '_'! Now, i wouldn't mind using '_' if it's available, but guido was pretty darned against using it in my initial designs for packages - i wanted to use it to refer to the package containing the current module, like unix '..'. I gathered that a serious part of the objection was in using a character to denote non-operation syntax - python just doesn't do that. I also like the idea of using a function instead of a magic variable - most of python's magic variables are in packages, like os.environ. Ken klm at digicool.com From Vladimir.Marangozov at inrialpes.fr Mon Apr 17 04:30:48 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 17 Apr 2000 04:30:48 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Ken Manheimer" at Apr 16, 2000 05:45:00 PM Message-ID: <200004170230.EAA06492@python.inrialpes.fr> Ken Manheimer wrote: > > However it seems to me to make exceeding sense to have the initial > intrinsic settings specified as part of the object! Indeed. It makes perfect sense to have _intial_, intrinsic attributes. The problem is that currently they can't be specified for builtin objects. Skip asked for existing solutions, so I've made a quick tour on the problem, pointing him to 3). > And, in the case of functions, it seems to me to be outstandingly > consistent with python's treatment of objects. Oustandingly consistent isn't my opinion, but that's fine with both of us. If functions win this cause, the next imminent wish of all (Zope) users will be to attach (protection, or other) attributes to *all* objects: class Spam(...): """ Spam product""" zope_product_version = "2.51" zope_persistency = 0 zope_cache_limit = 64 * 1024 def eggs(self): ... def eats(self): ... How would you qualify the zope_* attributes so that only the zope_product version is accessible? (without __getattr__ tricks, since we're talking about `metadata'). Note that I don't expect an answer :-). The issue is crystal clear already. Be prepared to answer cool questions like this one to your customers. > > I'm mystified about why you would reject that so adamantly! Oops, I'll demystify you instantly, here, by summing up my posts: I'm not rejecting anything adamantly! To the countrary, I've suggested more. Greg said it quite well: Barry's proposal made me sending you signals about different issues you've probably not thought about before, yet we'd better sort them out before adopting his patch. As a member of this list, I feel obliged to share with you my concerns whenever I have them. My concerns in this case are: a) consistency of the class model. Apparently this signal was lost in outerspace, because my interpretation isn't yours. Okay, fine by me. This one will come back in Py3K. I'm already curious to see what will be on the table at that time. :-) b) confusion about the namespaces associated with a function object. You've been more receptive to this one. It's currently being discussed. c) generalize user-attributes for all builtin objects. You'd like to, but it looks expensive. This one is a compromise: it's related with sharing, copy on write builtin objects with modified user-attr, etc. In short, it doesn't seem to be on the table, because this signal hasn't been emitted before, nor it was really decrypted on python-dev. Classifying objects as light and heavy, and attributing them specific functionality only because of their "weight" looks very hairy. That's all for now. Discussing these issues in prime time here is goodness for Python and its users! Adopting the proposal in a hurry, because of the tight schedule for 1.6, isn't. It needs more maturation. Witness the length of the thread. it's-vacation-time-for-me-so-see-you-all-after-Easter'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein at lyra.org Mon Apr 17 10:14:51 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 17 Apr 2000 01:14:51 -0700 (PDT) Subject: [Python-Dev] baby steps for free-threading Message-ID: A couple months ago, I exchanged a few emails with Guido about doing the free-threading work. In particular, for the 1.6 release. At that point (and now), I said that I wouldn't be starting on it until this summer, which means it would miss the 1.6 release. However, there are some items that could go into 1.6 *today* that would make it easier down the road to add free-threading to Python. I said that I'd post those in the hope that somebody might want to look at developing the necessary patches. It fell off my plate, so I'm getting back to that now... Python needs a number of basic things to support free threading. None of these should impact its performance or reliability. For the most part, they just provide a platform for the later addition. 1) Create a portable abstraction for using the platform's per-thread state mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. This mechanism will be used to store PyThreadState structure pointers, rather than _PyThreadState_Current. The latter variable must go away. Rationale: two threads will be operating simultaneously. An inherent conflict arises if _PyThreadState_Current is used. The TLS-like mechanism is used by the threads to look up "their" state. There will be a ripple effect on PyThreadState_Swap(); dunno offhand what. It may become empty. 2) Python needs a lightweight, short-duration, internally-used critical section type. The current lock type is used at the Python level and internally. For internal operations, it is rather heavyweight, has unnecessary semantics, and is slower than a plain crit section. Specifically, I'm looking at Win32's CRITICAL_SECTION and pthread's mutex type. A spinlock mechanism would be coolness. Rationale: Python needs critical sections to protect data from being trashed by multiple, simultaneous access. These crit sections need to be as fast as possible since they'll execute at all key points where data is manipulated. 3) Python needs an atomic increment/decrement (internal) operation. Rationale: these are used in INCREF/DECREF to correctly increment or decrement the refcount in the face of multiple threads trying to do this. Win32: InterlockedIncrement/Decrement. pthreads would use the lightweight crit section above (on every INC/DEC!!). Some other platforms may have specific capabilities to keep this fast. Note that platforms (outside of their threading libraries) may have functions to do this. 4) Python's configuration system needs to be updated to include a --with-free-thread option since this will not be enabled by default. Related changes to acconfig.h would be needed. Compiling in the above pieces based on the flag would be nice (although Python could switch to the crit section in some cases where it uses the heavy lock today) Rationale: duh 5) An analysis of Python's globals needs to be performed. Any global that can safely be made "const" should. If a global is write-once (such as classobject.c::getattrstr), then these are marginally okay (there is a race condition, with an acceptable outcome, but a mem leak occurs). Personally, I would prefer a general mechanism in Python for creating "constants" which can be tracked by the runtime and freed. I would also like to see a generalized "object pool" mechanism be built and used for tuples, ints, floats, frames, etc. Rationale: any globals which are mutable must be made thread-safe. The fewer non-const globals to examine, the fewer to analyze for race conditions and thread-safety requirements. Note: making some globals "const" has a ripple effect through Python. This is sometimes known as "const poisoning". Guido has stated an acceptance to adding "const" throughout the interpreter, but would prefer a complete (rather than ripple-based, partial) overhaul. I think that is all for now. Achieving these five steps within the 1.6 timeframe means that the free-threading patches will be *much* smaller. It also creates much more visibility and testing for these sections. Post 1.6, a patch set to add critical sections to lists and dicts would be built. In addition, a new analysis would be done to examine the globals that are available along with possible race conditions in other mutable types and structures. Not all structures will be made thread-safe; for example, frame objects are used by a single thread at a time (I'm sure somebody could find a way to have multiple threads use or look at them, but that person can take a leap, too :-) Depending upon Guido's desire, the various schedules, and how well the development goes, Python 1.6.1 could incorporate the free-threading option in the base distribution. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping at lfw.org Mon Apr 17 03:54:41 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 16 Apr 2000 18:54:41 -0700 (PDT) Subject: [Python-Dev] OT: XML In-Reply-To: <38F6344F.25D344B5@prescod.net> Message-ID: I'll begin with my conclusion, so you can get the high-order bit and skip the rest of the message if you like: XML is useful, but it's not a language. On Thu, 13 Apr 2000, Paul Prescod wrote: > > What definition of "language" are you using? And while you're at it, > what definition of "semantics" are you using? > > As I recall, a string is an ordered list of symbols and a language is an > unordered set of strings. I use the word "language" to mean an expression medium that carries semantics at a usefully high level. The computer-science definition you gave would include, say, the set of all pairs of integers (not a language to most people), but not include classical music [1] (indeed a language to many people). I admit that the boundary of "enough" semantics to qualify as a language is fuzzy, but some things do seem quite clearly to fall on either side of the line for me. For example, saying that XML has semantics is roughly equivalent to saying that ASCII has semantics. Well, sure, i suppose 65 has the "semantics" of the uppercase letter A, but that's not semantics at any level high enough to be useful. That is why most people would probably not normally call ASCII a "language". It has to express something to be a language. Granted, you can pick nits and say that XML has semantics as you did, but to me that essentially amounts to calling the syntax the semantics. > I know that Ka-Ping, despite going to a great university was in > Engineering, not computer science Cute. (I'm glad i was in engineering; at least we got a little design and software engineering background there, and i didn't see much of that in CS, unfortunately.) > Most XML people will happily admit that XML has no "semantics" but I > think that's bullshit too. The mapping from the string to the abstract > tree data model *is the semantic content* of the XML specification. Okay, fine. Technically, it has semantics; they're just very minimal semantics (so minimal that i felt quite comfortable in saying that it has none). But that doesn't change my point -- for "it has no semantics and therefore doesn't qualify as a language" just read "it has far too minimal semantics to qualify as a language". > It makes as little sense to reject XML out of hand because it is a > buzzword but is not innovative as it does for people to embrace it > mystically because it is Microsoft's flavor of the week. Before you get the wrong impression, i don't intend to reject XML out of hand, or to suggest that people do. It has its uses, just as ASCII has its uses. As a way of serializing trees, it's quite acceptable. I am, however, reacting to the sudden onslaught of hype that gives people the impression that XML can do anything. It's this sort of attitude that "oh, all of our representation problems will go away if we throw XML at it" that makes me cringe; that's only avoiding the issue. (I'm not saying that you are this clueless, Paul! -- just that some people seem to be.) As long as we recognize XML as exactly what it is, no more and no less -- a generic mechanism for serializing trees, with associated technologies for manipulating those trees -- there's no problem. > By the way, what data model or text encoding is NOT isomorphic to Lisp > S-expressions? Isn't Python code isomorphic to Lisp s-expessions? No! You can run Python code. The code itself, of course, can be interpreted as a stream of bytes, or arranged into a tree of LISP s-expressions. But if s-expressions that were *all* that constituted Python, Python would be pretty useless indeed! The entity we call Python includes real content: the rules for deriving the expected behaviour of a Python program from its parse tree, as variously specified in the reference manual, the library manual, and in our heads. LISP itself is a great deal more than just s-expressions. The language system specifies the behaviour you expect from a given piece of LISP code, and *that* is the part i call semantics. "real" semantics: Python LISP English MIDI minimal or no semantics: ASCII lists alphabet bytes The things in the top row are generally referred to as "languages"; the things in the bottom row are not. Although each thing in the top row is constructed from its corresponding thing in the bottom row, the difference between the two is what i am calling "semantics". If the top row says A and the bottom row says B, you can look at the B-type things that constitute the A and say, "if you see this particular B, it means foo". XML belongs in the bottom row, not the top row. Python: "If you see 'a = 3' in a function, it means you take the integer object 3 and bind it to the name 'a' in the local namespace." XML: "If you see the tag , it means... well... uh, nothing. Sorry. But you do get to decide that 'spam' and 'eggs' and 'boiled' mean whatever you want." That is why i am unhappy with XML being referred to as a "language": it is a misleading label that encourages people to make the mistake of imagining that XML has more semantic power than it really does. Why is this a fatal mistake? Because using XML will no more solve your information interchange problems than writing Japanese using the Roman alphabet will suddenly cause English speakers to be able to read Japanese novels. It may *help*, but there's a lot more to it than serialization. Thus: XML is useful, but it's not a language. And, since that reasonably summarizes my views on the issue, i'll say no more on this topic on the python-dev list -- any further blabbing i'll do in private e-mail. -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton [1] I anticipate an objection such as "but you can encode a piece of classical music as accurately as you like as a sequence of symbols." But the music itself doesn't fit the Chomskian definition of "language" until you add that symbolic mapping and the rules to arrange those symbols in sequence. At that point the thing you've just added *is* the language: it's the mapping from symbols to the semantics of e.g. "and at time 5.36 seconds the first violinist will play an A-flat at medium volume". From ping at lfw.org Mon Apr 17 03:06:40 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 16 Apr 2000 18:06:40 -0700 (PDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > Looks like the proposed changed to PyObject_Compare matches E for your > example. The printed representation doesn't match, but I'm not sure > that is as important. > > >>> tight = [1, None, "x"] > >>> tight[1] = tight > >>> tight > [1, [...], 'x'] > >>> loose = [1, [1, None, "x"], "x"] > >>> loose[1][1] = loose > >>> loose > [1, [1, [...], 'x'], 'x'] > >>> tight > [1, [...], 'x'] > >>> tight == loose > 1 Actually, i thought about this a little more and realized that the above *is* exactly the correct behaviour. In E, [] makes an immutable list. To make it mutable you then have to "flex" it. A mutable empty list is written "[] flex" (juxtaposition means a method call). In the above, the identities of the inner and outer lists of "loose" are different, and so should be printed separately. They are equal but not identical: >>> loose == loose[1] 1 >>> loose is loose[1] 0 >>> loose is loose[1][1] 1 >>> loose.append(4) >>> loose [1, [1, [...], 'x'], 'x', 4] -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From paul at prescod.net Tue Apr 18 16:58:08 2000 From: paul at prescod.net (Paul Prescod) Date: Tue, 18 Apr 2000 09:58:08 -0500 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> Message-ID: <38FC7800.D88E14D7@prescod.net> "M.-A. Lemburg" wrote: > > ... > The current need for #pragmas is really very simple: to tell > the compiler which encoding to assume for the characters > in u"...strings..." (*not* "...8-bit strings..."). The idea > behind this is that programmers should be able to use other > encodings here than the default "unicode-escape" one. I'm totally confused about this. Are we going to allow UCS-2 sequences in the middle of Python programs that are otherwise ASCII? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself From paul at prescod.net Mon Apr 17 15:37:19 2000 From: paul at prescod.net (Paul Prescod) Date: Mon, 17 Apr 2000 08:37:19 -0500 Subject: [Python-Dev] Unicode and XML References: Message-ID: <38FB138F.1DAF3891@prescod.net> Let's presume that we agreed that XML is not a language because it doesn't have semantics. What does that have to do with the applicability of its Unicode-handling model? Here is a list of a hundred specifications which we can probably agree have "useful semantics" that are all based on XML and thus have the same Unicode model: http://www.xml.org/xmlorg_registry/index.shtml XML's unicode model seems mostly appropriate to me. I can only see one reason it might not apply: which comes first the #! line or the #encoding line? We could say that the #! line can only be used in encodings that are direct supersets of ASCII (e.g. UTF-8 but not UTF-16). That shouldnt' cause any problems with Unix because as far as I know, Unix can only read the first line if it is in an ASCII superset anyhow! Then the second line could describe the precise ASCII superset in use (8859-1, 8859-2, UTF-8, raw ASCII, etc.). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From jeremy at cnri.reston.va.us Mon Apr 17 17:41:26 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 17 Apr 2000 11:41:26 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: References: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: <14587.12454.7542.709571@goon.cnri.reston.va.us> >>>>> "KLM" == Ken Manheimer writes: KLM> It may well make sense to have the system *implement* the KLM> rights somewhere else. (Distributed system, permissions caches KLM> in an object system, etc.) However it seems to me to make KLM> exceeding sense to have the initial intrinsic settings KLM> specified as part of the object! It's not clear to me that the person writing the code is or should be the person specifying the security policy. I believe the CORBA security model separates policy definition into three parts -- security attributes, required rights, and policy domains. The developer would only be responsible for the first part -- the security attributes, which describe methods in a general way so that a security administrators can develop an effective policy for it. I suppose that function attributes would be a sensible way to do this, but it might also be accomplished with a separate wrapper object. I'm still not thrilled with the idea of using regular attribute access to describe static properties on code. To access the properties, yes, to define and set them, probably not. Jeremy From jeremy at cnri.reston.va.us Mon Apr 17 17:49:11 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 17 Apr 2000 11:49:11 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14586.14672.949500.986951@beluga.mojam.com> References: <200004162028.WAA06045@python.inrialpes.fr> <14586.14672.949500.986951@beluga.mojam.com> Message-ID: <14587.12919.488508.522746@goon.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: Ken> We haven't even seen a satisfactory approach to referring to Ken> the function, itself, from within the function. Maybe it's not Ken> even desirable to be able to do that - that's an interesting Ken> question. SM> I hereby propose that within a function the special name __ SM> refer to the function. I think the syntax is fairly obscure. I'm neurtral on the whole idea of having a special way to get at the function object from within the body of the code. Also, the proposal to handle security policies using attributes attached to the function seems wrong. The access control decision depends on the security policy defined for the object *and* the authorization of the caller. You can't decide based solely on some attribute of the function, nor can you assume that every call of a function object will be made with the same authorization (from the same protection domain). Jeremy From gstein at lyra.org Mon Apr 17 22:28:18 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 17 Apr 2000 13:28:18 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <14587.12919.488508.522746@goon.cnri.reston.va.us> Message-ID: On Mon, 17 Apr 2000, Jeremy Hylton wrote: > >>>>> "SM" == Skip Montanaro writes: > > Ken> We haven't even seen a satisfactory approach to referring to > Ken> the function, itself, from within the function. Maybe it's not > Ken> even desirable to be able to do that - that's an interesting > Ken> question. > > SM> I hereby propose that within a function the special name __ > SM> refer to the function. > > I think the syntax is fairly obscure. I'm neurtral on the whole idea > of having a special way to get at the function object from within the > body of the code. I agree. > Also, the proposal to handle security policies using attributes > attached to the function seems wrong. This isn't the only application of function attributes. Can't throw them out because one use seems wrong :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA at ActiveState.com Mon Apr 17 23:37:16 2000 From: DavidA at ActiveState.com (David Ascher) Date: Mon, 17 Apr 2000 14:37:16 -0700 Subject: [Python-Dev] Encoding of code in XML Message-ID: Lots of projects embed scripting & other code in XML, typically as CDATA elements. For example, XBL in Mozilla. As far as I know, no one ever bothers to define how one should _encode_ code in a CDATA segment, and it appears that at least in the Mozilla world the 'encoding' used is 'cut & paste', and it's the XBL author's responsibility to make sure that ]]> is nowhere in the JavaScript code. That seems suboptimal to me, and likely to lead to disasters down the line. The only clean solution I can think of is to define a standard encoding/decoding process for storing program code (which may very well contain occurences of ]]> in CDATA, which effectively hides that triplet from the parser. While I'm dreaming, it would be nice if all of the relevant language communities (JS, Python, Perl, etc.) could agree on what that encoding is. I'd love to hear of a recommendation on the topic by the XML folks, but I haven't been able to find any such document. Any thoughts? --david ascher From ping at lfw.org Mon Apr 17 23:47:40 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 16:47:40 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts Message-ID: One gripe that i hear a lot is that it's really difficult to cut and paste chunks of Python code when you're working with the interpreter because the ">>> " and "... " prompts keep getting in the way. Does anyone else often have or hear of this problem? Here is a suggested solution: for interactive mode only, the console maintains a flag "dropdots", initially false. After line = raw_input(">>> "): if line[:4] in [">>> ", "... "]: dropdots = 1 line = line[4:] else: dropdots = 0 interpret(line) After line = raw_input("... "): if dropdots and line[:4] == "... ": line = line[4:] interpret(line) The above solution depends on the fact that ">>> " and "... " are always invalid at the beginning of a bit of Python. So, if sys.ps1 is not ">>> " or sys.ps2 is not "... ", all dropdots behaviour is disabled. I realize it's not going to handle all cases (in particular mixing pasted text with typed-in text), but at least it makes it *possible* to paste code, and it's quite a simple rule. I suppose it all depends on whether or not you guys often experience this particular little irritation. Any thoughts on this? -- ?!ng From skip at mojam.com Mon Apr 17 23:47:30 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 17 Apr 2000 16:47:30 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: References: Message-ID: <14587.34418.633570.133957@beluga.mojam.com> Ping> One gripe that i hear a lot is that it's really difficult to cut Ping> and paste chunks of Python code when you're working with the Ping> interpreter because the ">>> " and "... " prompts keep getting in Ping> the way. Does anyone else often have or hear of this problem? First time I encountered this and complained about it Guido responded with import sys sys.ps1 = sys.ps2 = "" Crude, but effective... -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From ping at lfw.org Tue Apr 18 00:07:47 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 17:07:47 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: <14587.34418.633570.133957@beluga.mojam.com> Message-ID: On Mon, 17 Apr 2000, Skip Montanaro wrote: > > First time I encountered this and complained about it Guido responded with > > import sys > sys.ps1 = sys.ps2 = "" > > Crude, but effective... Yeah, i tried that, but it's suboptimal (no feedback), not the default behaviour, and certainly non-obvious to the beginner. -- ?!ng From ping at lfw.org Tue Apr 18 00:34:54 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 17:34:54 -0500 (CDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: On Mon, 17 Apr 2000, David Ascher wrote: > > The only clean solution I can think of is to define a standard > encoding/decoding process for storing program code (which may very well > contain occurences of ]]> in CDATA, which effectively hides that triplet > from the parser. Hmm. I think the way everybody does it is to use the language to get around the need for ever saying "]]>". For example, in Python, if that was outside of a string, you could insert some spaces without changing the meaning, or if it was inside a string, you could add two strings together etc. You're right that this seems a bit ugly, but i think it could be even harder to get all the language communities to swallow something like "replace all occurrences of ]]> with some ugly escape string" -- since the above (hackish) method has the advantage that you can just run code directly copied from a piece of CDATA, and now you're asking them all to run the CDATA through some unescaping mechanism beforehand. Although i'm less optimistic about the success of such a standard, i'd certainly be up for it, if we had a good answer to propose. Here is one possible answer (to pick "@@" as a string very unlikely to occur much in most scripting languages): @@ --> @@@ ]]> --> @@> def escape(text): cdata = replace(text, "@@", "@@@") cdata = replace(cdata, "]]>", "@@>") return cdata def unescape(cdata): text = replace(cdata, "@@>", "]]>") text = replace(text, "@@@", "@@") return text The string "@@" occurs nowhere in the Python standard library. Another possible solution: <] --> <]> ]]> --> <][ etc. Generating more solutions is left as an exercise to the reader. :) -- ?!ng From DavidA at ActiveState.com Tue Apr 18 00:51:21 2000 From: DavidA at ActiveState.com (David Ascher) Date: Mon, 17 Apr 2000 15:51:21 -0700 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: > Hmm. I think the way everybody does it is to use the language > to get around the need for ever saying "]]>". For example, in > Python, if that was outside of a string, you could insert some > spaces without changing the meaning, or if it was inside a string, > you could add two strings together etc. > You're right that this seems a bit ugly, but i think it could be > even harder to get all the language communities to swallow > something like "replace all occurrences of ]]> with some ugly > escape string" -- since the above (hackish) method has the > advantage that you can just run code directly copied from a piece > of CDATA, and now you're asking them all to run the CDATA through > some unescaping mechanism beforehand. But it has the bad disadvantages that it's language-specific and modifies code rather than encode it. It has the even worse disadvantage that it requires you to parse the code to encode/decode it, something much more expensive than is really necessary! > Although i'm less optimistic about the success of such a standard, > i'd certainly be up for it, if we had a good answer to propose. I'm thinking that if we had a good answer, we can probably get it into the core libraries for a few good languages, and document it as 'the standard', if we could get key people on board. > Here is one possible answer Right, that's the sort of thing I was looking for. > def escape(text): > cdata = replace(text, "@@", "@@@") > cdata = replace(cdata, "]]>", "@@>") > return cdata > > def unescape(cdata): > text = replace(cdata, "@@>", "]]>") > text = replace(text, "@@@", "@@") > return text (the above fails on @@>, but that's the general idea I had in mind). --david I know!: "]]>" <==> "Microsoft engineers are puerile weenies!" From ping at lfw.org Tue Apr 18 01:01:58 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 18:01:58 -0500 (CDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: On Mon, 17 Apr 2000, David Ascher wrote: > > (the above fails on @@>, but that's the general idea I had in mind). Oh, that's stupid of me. I used the wrong test harness. Okay, well the latter example works (i tested it): <] --> <]> ]]> --> <][ And this also works: @@ --> @@] ]]> --> @@> -- ?!ng From ping at lfw.org Tue Apr 18 01:08:53 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 18:08:53 -0500 (CDT) Subject: [Python-Dev] Escaping CDATA Message-ID: Here's what i'm playing with, if you want to mess with it too: import string def replace(text, old, new, join=string.join, split=string.split): return join(split(text, old), new) la, ra = "@@", "@@]" lb, rb = "]]>", "@@>" la, ra = "<]", "<]>" lb, rb = "]]>", "<][" def escape(text): cdata = replace(text, la, ra) cdata = replace(cdata, lb, rb) return cdata def unescape(cdata): text = replace(cdata, rb, lb) text = replace(text, ra, la) return text chars = "" for ch in la + ra + lb + rb: if ch not in chars: chars = chars + ch if __name__ == "__main__": class Tester: def __init__(self): self.failed = [] self.count = 0 def test(self, s, find=string.find): cdata = escape(s) text = unescape(cdata) print "%s -e-> %s -u-> %s" % (s, cdata, text) if find(cdata, "]]>") >= 0: print "EXPOSURE!" self.failed.append(s) elif s != text: print "MISMATCH!" self.failed.append(s) self.count = self.count + 1 tester = Tester() test = tester.test for a in chars: for b in chars: for c in chars: for d in chars: for e in chars: for f in chars: for g in chars: for h in chars: test(a+b+c+d+e+f+g+h) print if tester.failed == []: print "All tests succeeded." else: print "Failed %d of %d tests." % (len(tester.failed), tester.count) for t in tester.failed: tester.test(t) From moshez at math.huji.ac.il Tue Apr 18 08:55:20 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 18 Apr 2000 08:55:20 +0200 (IST) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Mon, 17 Apr 2000, Ka-Ping Yee wrote: > One gripe that i hear a lot is that it's really difficult > to cut and paste chunks of Python code when you're working > with the interpreter because the ">>> " and "... " prompts > keep getting in the way. Does anyone else often have or > hear of this problem? > > Here is a suggested solution: for interactive mode only, > the console maintains a flag "dropdots", initially false. > > After line = raw_input(">>> "): > if line[:4] in [">>> ", "... "]: > dropdots = 1 > line = line[4:] > else: > dropdots = 0 > interpret(line) Python 1.5.2 (#1, Feb 21 2000, 14:52:33) [GCC 2.95.2 19991024 (release)] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> a=[] >>> a[ ... ... ... ] Traceback (innermost last): File "", line 1, in ? TypeError: sequence index must be integer >>> Sorry. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Tue Apr 18 10:01:54 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 01:01:54 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Moshe Zadka wrote: > > >>> a=[] > >>> a[ > ... ... > ... ] > Traceback (innermost last): > File "", line 1, in ? > TypeError: sequence index must be integer > >>> > > Sorry. What was your point? -- ?!ng From effbot at telia.com Tue Apr 18 09:44:50 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 09:44:50 +0200 Subject: [Python-Dev] Pasting interpreter prompts References: Message-ID: <00e301bfa909$fbb053a0$34aab5d4@hagrid> Ka-Ping Yee wrote: > On Tue, 18 Apr 2000, Moshe Zadka wrote: > > > > >>> a=[] > > >>> a[ > > ... ... > > ... ] > > Traceback (innermost last): > > File "", line 1, in ? > > TypeError: sequence index must be integer > > >>> > > > > Sorry. > > What was your point? a[...] is valid syntax, and not the same thing as a[]. From moshez at math.huji.ac.il Tue Apr 18 11:29:04 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 18 Apr 2000 11:29:04 +0200 (IST) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Ka-Ping Yee wrote: > On Tue, 18 Apr 2000, Moshe Zadka wrote: > > > > >>> a=[] > > >>> a[ > > ... ... > > ... ] > > Traceback (innermost last): > > File "", line 1, in ? > > TypeError: sequence index must be integer > > >>> > > > > Sorry. > > What was your point? That "... " in the beginning of the line is not a syntax error. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Tue Apr 18 00:01:38 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 18 Apr 2000 00:01:38 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> Message-ID: <38FB89C2.26817F97@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > > ... > > The current need for #pragmas is really very simple: to tell > > the compiler which encoding to assume for the characters > > in u"...strings..." (*not* "...8-bit strings..."). The idea > > behind this is that programmers should be able to use other > > encodings here than the default "unicode-escape" one. > > I'm totally confused about this. Are we going to allow UCS-2 sequences > in the middle of Python programs that are otherwise ASCII? The idea is to make life a little easier for programmers who's native script is not easily writable using ASCII, e.g. the whole Asian world. While originally only the encoding used within the quotes of u"..." was targetted (on the i18n sig), there has now been some discussion on this list about whether to move forward in a whole new direction: that of allowing whole Python scripts to be encoded in many different encodings. The compiler will then convert the scripts first to Unicode and then to 8-bit strings as needed. Using this technique which was introduced by Fredrik Lundh we could in fact have Python scripts which are encoded in UTF-16 (two bytes per character) or other more obscure encodings. The Python interpreter would only see Unicode and Latin-1. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Apr 18 00:10:12 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 18 Apr 2000 00:10:12 +0200 Subject: [Python-Dev] Unicode and XML References: <38FB138F.1DAF3891@prescod.net> Message-ID: <38FB8BC4.96678FD6@lemburg.com> Paul Prescod wrote: > > Let's presume that we agreed that XML is not a language because it > doesn't have semantics. What does that have to do with the applicability > of its Unicode-handling model? > > Here is a list of a hundred specifications which we can probably agree > have "useful semantics" that are all based on XML and thus have the same > Unicode model: > > http://www.xml.org/xmlorg_registry/index.shtml > > XML's unicode model seems mostly appropriate to me. I can only see one > reason it might not apply: which comes first the #! line or the > #encoding line? We could say that the #! line can only be used in > encodings that are direct supersets of ASCII (e.g. UTF-8 but not > UTF-16). That shouldnt' cause any problems with Unix because as far as I > know, Unix can only read the first line if it is in an ASCII superset > anyhow! > > Then the second line could describe the precise ASCII superset in use > (8859-1, 8859-2, UTF-8, raw ASCII, etc.). Sounds like a good idea... how would such a line look like ? #!/usr/bin/env python # version: 1.6, encoding: iso-8859-1 ... Meaning: the module script needs Python version >=1.6 and uses iso-8859-1 as source file encoding. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Tue Apr 18 12:35:33 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 06:35:33 -0400 Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Your message of "Tue, 18 Apr 2000 00:01:38 +0200." <38FB89C2.26817F97@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <200004181035.GAA12526@eric.cnri.reston.va.us> > The idea is to make life a little easier for programmers > who's native script is not easily writable using ASCII, e.g. > the whole Asian world. > > While originally only the encoding used within the quotes of > u"..." was targetted (on the i18n sig), there has now been > some discussion on this list about whether to move forward > in a whole new direction: that of allowing whole Python scripts > to be encoded in many different encodings. The compiler will > then convert the scripts first to Unicode and then to 8-bit > strings as needed. > > Using this technique which was introduced by Fredrik Lundh > we could in fact have Python scripts which are encoded in > UTF-16 (two bytes per character) or other more obscure > encodings. The Python interpreter would only see Unicode > and Latin-1. Wouldn't it make more sense to have the Python compiler *always* see UTF-8 and to use a simple preprocessor to deal with encodings? (Disclaimer: there are about 300 unread python-dev messages in my inbox still.) --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 18 12:56:55 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 12:56:55 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <200004181035.GAA12526@eric.cnri.reston.va.us> Message-ID: <001001bfa925$021cf700$34aab5d4@hagrid> Guido van Rossum wrote: > > Using this technique which was introduced by Fredrik Lundh > > we could in fact have Python scripts which are encoded in > > UTF-16 (two bytes per character) or other more obscure > > encodings. The Python interpreter would only see Unicode > > and Latin-1. > > Wouldn't it make more sense to have the Python compiler *always* see > UTF-8 and to use a simple preprocessor to deal with encodings? to some extent, this depends on what the "everybody" in CP4E means -- if you were to do user-testing on non-americans, I suspect "why cannot I use my own name as a variable name" might be as common as "why are SPAM and spam two different variables?". and if you're willing to address both issues in Py3K, it's much easier to use a simple internal representation, and handle en- codings on the way in and out. and PY_UNICODE* strings are easier to process than UTF-8 encoded char* strings... From ping at lfw.org Tue Apr 18 13:59:34 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 04:59:34 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Moshe Zadka wrote: > > What was your point? > > That "... " in the beginning of the line is not a syntax error. So? You can put "... " at the beginning of a line in a string, too: >>> a = """ ... ... spam spam""" >>> a '\012... spam spam' That isn't a problem with the suggested mechanism, since dropdots only comes into effect when the *first* line entered at a >>> begins with ">>> " or "... ". -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From guido at python.org Tue Apr 18 15:01:47 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 09:01:47 -0400 Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Your message of "Tue, 18 Apr 2000 04:59:34 PDT." References: Message-ID: <200004181301.JAA12697@eric.cnri.reston.va.us> Has anybody noticed that this is NOT a problem in IDLE? It will eventually go away, especially for the vast masses. So I don't think a solution is necessary -- and as was shown, the simple hacks don't really work. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Apr 18 15:38:36 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 18 Apr 2000 09:38:36 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38FB89C2.26817F97@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <14588.25948.202273.502469@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > The idea is to make life a little easier for programmers > who's native script is not easily writable using ASCII, e.g. > the whole Asian world. > > While originally only the encoding used within the quotes of > u"..." was targetted (on the i18n sig), there has now been > some discussion on this list about whether to move forward > in a whole new direction: that of allowing whole Python scripts I had thought this was still an issue for interpretation of string contents, and really only meaningful when converting the source representations of Unicode strings to the internal represtenation. I see no need to change the language definition in general. Unless we *really* want to impose those evil trigraph sequences from C! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Tue Apr 18 16:27:53 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 16:27:53 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com><38FC7800.D88E14D7@prescod.net><38FB89C2.26817F97@lemburg.com> <14588.25948.202273.502469@seahag.cnri.reston.va.us> Message-ID: <004401bfa942$49bcaa20$34aab5d4@hagrid> Fred Drake wrote: > > While originally only the encoding used within the quotes of > > u"..." was targetted (on the i18n sig), there has now been > > some discussion on this list about whether to move forward > > in a whole new direction: that of allowing whole Python scripts > > I had thought this was still an issue for interpretation of string > contents, and really only meaningful when converting the source > representations of Unicode strings to the internal represtenation. why restrict the set of possible source encodings to ASCII compatible 8-bit encodings? (or are there really authoring systems out there that can use different encodings for different parts of the file?) > I see no need to change the language definition in general. Unless > we *really* want to impose those evil trigraph sequences from C! ;) sorry, but I don't see the connection. From fdrake at acm.org Tue Apr 18 16:35:37 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 18 Apr 2000 10:35:37 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <004401bfa942$49bcaa20$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <14588.25948.202273.502469@seahag.cnri.reston.va.us> <004401bfa942$49bcaa20$34aab5d4@hagrid> Message-ID: <14588.29369.39945.849489@seahag.cnri.reston.va.us> Fredrik Lundh writes: > why restrict the set of possible source encodings to ASCII > compatible 8-bit encodings? I'm not suggesting that. I just don't see any call to change the language definition (such as allowing additional characters in NAME tokens). I don't mind whatsoever if the source is stored in UCS-2, and the tokenizer does need to understand that to create the right value for Unicode strings specified as u'...' literals. > (or are there really authoring systems out there that can use > different encodings for different parts of the file?) Not that I know of, and I doubt I'd want to see the result! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From paul at prescod.net Tue Apr 18 16:42:32 2000 From: paul at prescod.net (Paul Prescod) Date: Tue, 18 Apr 2000 09:42:32 -0500 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <38FC7458.EA90F085@prescod.net> My vote is all or nothing. Either the whole file is in UCS-2 (for example) or none of it is. I'm not sure if we really need to allow multiple file encodings in version 1.6 but we do need to allow that ultimately. If we agree to allow the whole file to be in another encoding then we should use the XML trick of having a known start-sequence for encodings other than UTF-8. It doesn't matter much whether it is syntactically a comment or a pragma. I am still in favor of compile time pragmas but they can probably wait for Python 1.7. > Using this technique which was introduced by Fredrik Lundh > we could in fact have Python scripts which are encoded in > UTF-16 (two bytes per character) or other more obscure > encodings. The Python interpreter would only see Unicode > and Latin-1. In what sense is Latin-1 not Unicode? Isn't it just the first 256 characters of Unicode or something like that? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself [In retrospect] the story of a Cold War that was the scene of history's only nuclear arms race will be very different from the story of a Cold War that turned out to be only the first of many interlocking nuclear arms races in many parts of the world. The nuclear, question, in sum, hangs like a giant question mark over our waning century. - The Unfinished Twentieth Century by Jonathan Schell Harper's Magazine, January 2000 From effbot at telia.com Tue Apr 18 16:56:28 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 16:56:28 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <38FC7458.EA90F085@prescod.net> Message-ID: <00b301bfa946$47ebbde0$34aab5d4@hagrid> Paul Prescod wrote: > My vote is all or nothing. Either the whole file is in UCS-2 (for > example) or none of it is. agreed. > In what sense is Latin-1 not Unicode? Isn't it just the first 256 > characters of Unicode or something like that? yes. ISO Latin-1 is unicode. what MAL really meant was that the interpreter would only deal with 8-bit (traditional) or 16-bit (unicode) strings. (in my string type proposals, the same applies to text strings manipulated by the user. if it's not unicode, it's a byte array, and methods expecting text don't work) From jeremy at cnri.reston.va.us Tue Apr 18 17:40:10 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 11:40:10 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce Message-ID: <14588.33242.799629.934118@goon.cnri.reston.va.us> As many of you have probably noticed, the moderators of comp.lang.python.announce do not deal with pending messages in a timely manner. There have been no new posts since Mar 27, and delays of several weeks were common before then. I wanted to ask a smallish group of potential readers of this group what we should do about the problem. I have tried to contact the moderators several times, but haven't heard a peep from them since late February, when the response was: "Sorry. Temporary problem. It's all fixed now." Three possible solutions come to mind: - Get more moderators. It appears that Marcus Fleck is the only active moderator. I have never received a response to private email sent to Vladimir Ulogov. I suggested to Marcus that we get more moderators, but he appeared to reject the idea. Perhaps some peer pressure from other unsatisfied readers would help. - De-couple the moderation of comp.lang.python.announce and of python-annouce at python.org. We could keep the gateway between the lists going, but have different moderators for the mailing list. This would be less convenient for people who prefer to read news, but would at least get announcement out in a timely fashion. - Give up on comp.lang.python.announce. Since moderation has been so spotty, most people have reverted to making all anouncements to comp.lang.python anyway. This option is unfortunate, because it makes it harder for people who don't have time to read comp.lang.python to keep up with announcements. Any other ideas? Suggestions on how to proceed? Jeremy From skip at mojam.com Tue Apr 18 18:04:29 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 18 Apr 2000 11:04:29 -0500 (CDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14588.33242.799629.934118@goon.cnri.reston.va.us> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> Message-ID: <14588.34701.354301.740696@beluga.mojam.com> Jeremy> Any other ideas? Suggestions on how to proceed? How about decouple the python-announce mailing list from the newsgroup (at least partially), manage the mailing list from Mailman (it probably already is), then require moderator approval to post? With a handful of moderators (5-10), the individual effort should be fairly low. You can set up the default reject message to be strongly related to the aims of the list so that most of the time the moderator needs only to click the approve or drop buttons or make a slight edit to the response and click the reject button. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From ping at lfw.org Tue Apr 18 19:03:35 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 10:03:35 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: <200004181301.JAA12697@eric.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Guido van Rossum wrote: > Has anybody noticed that this is NOT a problem in IDLE? Certainly. This was one of the first problems i solved when writing my console script, too. (Speaking of that, i still can't find auto-completion in IDLE -- is it in there?) But: startup time, startup time. I'm not going to wait to start IDLE every time i want to ask Python a quick question. Hey, i just tried it and actually it doesn't work. I mean, yes, sys.ps2 is missing, but that still doesn't mean you can select a whole line and paste it. You have to aim very carefully to start dragging from the fourth column. > So I don't think a solution is necessary -- and as was shown, the > simple hacks don't really work. I don't think this was shown at all. -- ?!ng From guido at python.org Tue Apr 18 18:50:49 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 12:50:49 -0400 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: Your message of "Tue, 18 Apr 2000 11:04:29 CDT." <14588.34701.354301.740696@beluga.mojam.com> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> <14588.34701.354301.740696@beluga.mojam.com> Message-ID: <200004181650.MAA12894@eric.cnri.reston.va.us> I vote to get more moderators for the newsgroup. If Marcus and Gandalf don't moderate quickly the community can oust them. --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 18 18:54:40 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 18:54:40 +0200 Subject: [Python-Dev] comp.lang.python.announce References: <14588.33242.799629.934118@goon.cnri.reston.va.us> Message-ID: <007201bfa956$d1311680$34aab5d4@hagrid> Jeremy Hylton wrote: > As many of you have probably noticed, the moderators of > comp.lang.python.announce do not deal with pending messages in a > timely manner. There have been no new posts since Mar 27, and > delays of several weeks were common before then. and as noted on c.l.py, those posts didn't make it to many servers, since they use "00" instead of "2000". I haven't seen any announcements on any local news- server since last year. > Any other ideas? Suggestions on how to proceed. post to comp.lang.python, and tell people who don't want to read the newsgroup to watch the python.org news page and/or the daily python URL? <0.5 wink> From jeremy at cnri.reston.va.us Tue Apr 18 18:58:54 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 12:58:54 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004181650.MAA12894@eric.cnri.reston.va.us> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> <14588.34701.354301.740696@beluga.mojam.com> <200004181650.MAA12894@eric.cnri.reston.va.us> Message-ID: <14588.37966.825565.8871@goon.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> I vote to get more moderators for the newsgroup. That seems like the simplest mechanism. We just need volunteers (I am one), and we need to get Marcus to notify the Usenet powers-that-be of the new moderators. GvR> If Marcus and Gandalf don't moderate quickly the community can GvR> oust them. A painful process. Vladimir/Gandalf seems to have disappeared completely. (The original message in this thread bounced when I sent it to him.) The only way to add new moderators without Marcus's help is to have a new RFD/CFV process. It would be like creating the newsgroup all over again, except we'd have to convince the moderator of news.announce.newsgroups that the current moderator was unfit first. Jeremy From bwarsaw at cnri.reston.va.us Tue Apr 18 20:24:30 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 18 Apr 2000 14:24:30 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce References: <14588.33242.799629.934118@goon.cnri.reston.va.us> Message-ID: <14588.43102.864339.347892@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> - De-couple the moderation of comp.lang.python.announce and of JH> python-annouce at python.org. We could keep the gateway between JH> the lists going, but have different moderators for the mailing JH> list. This would be less convenient for people who prefer to JH> read news, but would at least get announcement out in a timely JH> fashion. We could do this -- and in fact, this was the effective set up until a couple of weeks ago. We'd set it up as a moderated group, so that /every/ message is held for approval. I'd have to investigate, but we probably don't want to hold messages that originate on Usenet. Of course, gating back to Usenet will still be held up for c.l.py.a's moderators. Still, I'd rather not do this. It would be best to get more moderators helping out with the c.l.py.a content. -Barry From guido at python.org Tue Apr 18 20:25:11 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 14:25:11 -0400 Subject: [Python-Dev] baby steps for free-threading In-Reply-To: Your message of "Mon, 17 Apr 2000 01:14:51 PDT." References: Message-ID: <200004181825.OAA13261@eric.cnri.reston.va.us> > A couple months ago, I exchanged a few emails with Guido about doing the > free-threading work. In particular, for the 1.6 release. At that point > (and now), I said that I wouldn't be starting on it until this summer, > which means it would miss the 1.6 release. However, there are some items > that could go into 1.6 *today* that would make it easier down the road to > add free-threading to Python. I said that I'd post those in the hope that > somebody might want to look at developing the necessary patches. It fell > off my plate, so I'm getting back to that now... > > Python needs a number of basic things to support free threading. None of > these should impact its performance or reliability. For the most part, > they just provide a platform for the later addition. I agree with the general design sketched below. > 1) Create a portable abstraction for using the platform's per-thread state > mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. There are at least 7 other platform specific thread implementations -- probably an 8th for the Mac. These all need to support this. (One solution would be to have a portable implementation that uses the thread-ID to index an array.) > This mechanism will be used to store PyThreadState structure pointers, > rather than _PyThreadState_Current. The latter variable must go away. > > Rationale: two threads will be operating simultaneously. An inherent > conflict arises if _PyThreadState_Current is used. The TLS-like > mechanism is used by the threads to look up "their" state. > > There will be a ripple effect on PyThreadState_Swap(); dunno offhand > what. It may become empty. Cool. > 2) Python needs a lightweight, short-duration, internally-used critical > section type. The current lock type is used at the Python level and > internally. For internal operations, it is rather heavyweight, has > unnecessary semantics, and is slower than a plain crit section. > > Specifically, I'm looking at Win32's CRITICAL_SECTION and pthread's > mutex type. A spinlock mechanism would be coolness. > > Rationale: Python needs critical sections to protect data from being > trashed by multiple, simultaneous access. These crit sections need to > be as fast as possible since they'll execute at all key points where > data is manipulated. Agreed. > 3) Python needs an atomic increment/decrement (internal) operation. > > Rationale: these are used in INCREF/DECREF to correctly increment or > decrement the refcount in the face of multiple threads trying to do > this. > > Win32: InterlockedIncrement/Decrement. pthreads would use the > lightweight crit section above (on every INC/DEC!!). Some other > platforms may have specific capabilities to keep this fast. Note that > platforms (outside of their threading libraries) may have functions to > do this. I'm worried here that since INCREF/DECREF are used so much this will slow down significantly, especially on platforms that don't have safe hardware instructions for this. So it should only be enabled when free threading is turned on. > 4) Python's configuration system needs to be updated to include a > --with-free-thread option since this will not be enabled by default. > Related changes to acconfig.h would be needed. Compiling in the above > pieces based on the flag would be nice (although Python could switch to > the crit section in some cases where it uses the heavy lock today) > > Rationale: duh Maybe there should be more fine-grained choices? As you say, some stuff could be used without this flag. But in any case this is trivial to add. > 5) An analysis of Python's globals needs to be performed. Any global that > can safely be made "const" should. If a global is write-once (such as > classobject.c::getattrstr), then these are marginally okay (there is a > race condition, with an acceptable outcome, but a mem leak occurs). > Personally, I would prefer a general mechanism in Python for creating > "constants" which can be tracked by the runtime and freed. They are almost all string constants, right? How about a macro Py_CONSTSTROBJ("value", variable)? > I would also like to see a generalized "object pool" mechanism be built > and used for tuples, ints, floats, frames, etc. Careful though -- generalizing this will slow it down. (Here I find myself almost wishing for C++ templates :-) > Rationale: any globals which are mutable must be made thread-safe. The > fewer non-const globals to examine, the fewer to analyze for race > conditions and thread-safety requirements. > > Note: making some globals "const" has a ripple effect through Python. > This is sometimes known as "const poisoning". Guido has stated an > acceptance to adding "const" throughout the interpreter, but would > prefer a complete (rather than ripple-based, partial) overhaul. Actually, it's okay to do this on an "as-neeed" basis. I'm also in favor of changing all the K&R code to ANSI, and getting rid of Py_PROTO and friends. Cleaner code! > I think that is all for now. Achieving these five steps within the 1.6 > timeframe means that the free-threading patches will be *much* smaller. It > also creates much more visibility and testing for these sections. Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough testing of some of these changes, the extensive nature of some of the changes, and my other obligations during those 6 weeks, I don't see how it can be done for 1.6. I would prefer to do an accellerated 1.7 or 1.6.1 release that incorporates all this. (It could be called 1.6.1 only if it'nearly identical to 1.6 for the Python user and not too different for the extension writer.) > Post 1.6, a patch set to add critical sections to lists and dicts would be > built. In addition, a new analysis would be done to examine the globals > that are available along with possible race conditions in other mutable > types and structures. Not all structures will be made thread-safe; for > example, frame objects are used by a single thread at a time (I'm sure > somebody could find a way to have multiple threads use or look at them, > but that person can take a leap, too :-) It is unacceptable to have thread-unsafe structures that can be accessed in a thread-unsafe way using pure Python code only. > Depending upon Guido's desire, the various schedules, and how well the > development goes, Python 1.6.1 could incorporate the free-threading option > in the base distribution. Indeed. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Tue Apr 18 23:03:32 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 18 Apr 2000 14:03:32 -0700 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004181650.MAA12894@eric.cnri.reston.va.us> Message-ID: > I vote to get more moderators for the newsgroup. If Marcus and > Gandalf don't moderate quickly the community can oust them. FWIW, I think they should step down now. They've not held up their end of the bargain, even though several folks have offered to help repeatedly throughout the 'problem period', which includes most of the life of c.l.p.a. As a compromise solution, and only if it's effective, we can add moderators. I'll volunteer, as long as someone gives me hints as to the mechanisms (it's been a while since I was doing usenet for real). --david PS: I think decoupling the mailing list from the newsgroup is a bad precedent and a political trouble zone. From gstein at lyra.org Tue Apr 18 23:16:44 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 14:16:44 -0700 (PDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14588.37966.825565.8871@goon.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Jeremy Hylton wrote: > >>>>> "GvR" == Guido van Rossum writes: >... > GvR> If Marcus and Gandalf don't moderate quickly the community can > GvR> oust them. > > A painful process. Vladimir/Gandalf seems to have disappeared > completely. (The original message in this thread bounced when I sent > it to him.) The only way to add new moderators without Marcus's help > is to have a new RFD/CFV process. It would be like creating the > newsgroup all over again, except we'd have to convince the moderator > of news.announce.newsgroups that the current moderator was unfit > first. Nevertheless, adding more moderators is the "proper" answer to the problem. Even if it is difficult to get more moderators into the system, there doesn't seem to be a better alternative. Altering the mailing list gateway will simply serve to create divergent announcement forums. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Tue Apr 18 23:30:01 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 17:30:01 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: References: <14588.37966.825565.8871@goon.cnri.reston.va.us> Message-ID: <14588.54233.528093.55997@goon.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GvR> If Marcus and Gandalf don't moderate quickly the community can GvR> oust them. JH> A painful process. Vladimir/Gandalf seems to have disappeared JH> completely. (The original message in this thread bounced when I JH> sent it to him.) The only way to add new moderators without JH> Marcus's help is to have a new RFD/CFV process. It would be like JH> creating the newsgroup all over again, except we'd have to JH> convince the moderator of news.announce.newsgroups that the JH> current moderator was unfit first. GS> Nevertheless, adding more moderators is the "proper" answer to GS> the problem. Even if it is difficult to get more moderators into GS> the system, there doesn't seem to be a better alternative. Proper is not necessarily the same as possible. We may fail in an attempt to add a moderator without cooperation from Marcus. GS> Altering the mailing list gateway will simply serve to create GS> divergent announcement forums. If only one of the forums works, this isn't a big problem. Jeremy From gstein at lyra.org Wed Apr 19 02:05:01 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 17:05:01 -0700 (PDT) Subject: [Python-Dev] switch to ANSI C (was: baby steps for free-threading) In-Reply-To: <14588.58817.746201.456992@anthem.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Barry A. Warsaw wrote: > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Actually, it's okay to do this on an "as-neeed" basis. I'm > GvR> also in favor of changing all the K&R code to ANSI, and > GvR> getting rid of Py_PROTO and friends. Cleaner code! > > I agree, and here's yet another plea for moving to 4-space indents in > the C code. For justification, look at the extended call syntax hacks > in ceval.c. They essentially /use/ 4si because they have no choice! > > Let's clean it up in one fell swoop! Obviously not for 1.6. I > volunteer to do all three mutations. Why not for 1.6? These changes are pretty brain-dead ("does it compile?") and can easily be reviewed. If somebody out there happens to have the time to work up ANSI C patches, then why refuse them? Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Apr 19 08:46:56 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 23:46:56 -0700 (PDT) Subject: [Python-Dev] baby steps for free-threading In-Reply-To: <200004181825.OAA13261@eric.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Guido van Rossum wrote: >... > > 1) Create a portable abstraction for using the platform's per-thread state > > mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. > > There are at least 7 other platform specific thread implementations -- > probably an 8th for the Mac. These all need to support this. (One > solution would be to have a portable implementation that uses the > thread-ID to index an array.) Yes. As the platforms "come up to speed", they can replace the fallback, portable implementation. "Users" of the TLS mechanism would allocate indices into the per-thread arrays. Another alternative is to only manage a mapping of thread-ID to ThreadState structures. The TLS code can then get the ThreadState and access the per-thread dict. Of course, the initial impetus is to solve the lookup of the ThreadState rather than a general TLS mechanism :-) Hmm. I'd say that we stick with defining a Python TLS API (in terms of the platform when possible). The fallback code would be the per-thread arrays design. "thread dict" would still exist, but is deprecated. >... > > 3) Python needs an atomic increment/decrement (internal) operation. > > > > Rationale: these are used in INCREF/DECREF to correctly increment or > > decrement the refcount in the face of multiple threads trying to do > > this. > > > > Win32: InterlockedIncrement/Decrement. pthreads would use the > > lightweight crit section above (on every INC/DEC!!). Some other > > platforms may have specific capabilities to keep this fast. Note that > > platforms (outside of their threading libraries) may have functions to > > do this. > > I'm worried here that since INCREF/DECREF are used so much this will > slow down significantly, especially on platforms that don't have safe > hardware instructions for this. This definitely slows Python down. If an object is known to be visible to only one thread, then you can avoid the atomic inc/dec. But that leads to madness :-) > So it should only be enabled when free threading is turned on. Absolutely. No question. Note to readers: the different definitions of INCREF/DECREF has an impact on mixing modules in the same way Py_TRACE_REFS does. > > 4) Python's configuration system needs to be updated to include a > > --with-free-thread option since this will not be enabled by default. > > Related changes to acconfig.h would be needed. Compiling in the above > > pieces based on the flag would be nice (although Python could switch to > > the crit section in some cases where it uses the heavy lock today) > > > > Rationale: duh > > Maybe there should be more fine-grained choices? As you say, some > stuff could be used without this flag. But in any case this is > trivial to add. Sure. For example, something like the Python TLS API could be keyed off --with-threads. Replacing _PyThreadState_Current with a TLS-based mechanism should be keyed on free threads. The "critical section" stuff could be keyed on threading -- they would be nice for Python to use internally for its standard threading operation. > > 5) An analysis of Python's globals needs to be performed. Any global that > > can safely be made "const" should. If a global is write-once (such as > > classobject.c::getattrstr), then these are marginally okay (there is a > > race condition, with an acceptable outcome, but a mem leak occurs). > > Personally, I would prefer a general mechanism in Python for creating > > "constants" which can be tracked by the runtime and freed. > > They are almost all string constants, right? Yes, I believe so. (Analysis needed) > How about a macro Py_CONSTSTROBJ("value", variable)? Sure. Note that the variable name can usually be constructed from the value. > > I would also like to see a generalized "object pool" mechanism be built > > and used for tuples, ints, floats, frames, etc. > > Careful though -- generalizing this will slow it down. (Here I find > myself almost wishing for C++ templates :-) :-) This is a desire, but not a requirement. Same with the write-once stuff. A general pool mechanism would reduce code duplication for lock management, and possibly clarify some operation. >... > > Note: making some globals "const" has a ripple effect through Python. > > This is sometimes known as "const poisoning". Guido has stated an > > acceptance to adding "const" throughout the interpreter, but would > > prefer a complete (rather than ripple-based, partial) overhaul. > > Actually, it's okay to do this on an "as-neeed" basis. I'm also in > favor of changing all the K&R code to ANSI, and getting rid of > Py_PROTO and friends. Cleaner code! Yay! :-) > > I think that is all for now. Achieving these five steps within the 1.6 > > timeframe means that the free-threading patches will be *much* smaller. It > > also creates much more visibility and testing for these sections. > > Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough > testing of some of these changes, the extensive nature of some of the [ aside: most of these changes are specified with the intent of reducing the impact on Python. most are additional behavior rather than changing behavior. ] > changes, and my other obligations during those 6 weeks, I don't see > how it can be done for 1.6. I would prefer to do an accellerated 1.7 > or 1.6.1 release that incorporates all this. (It could be called > 1.6.1 only if it'nearly identical to 1.6 for the Python user and not > too different for the extension writer.) Ah. That would be nice. It also provides some focus on what would need to occur for the extension writer: *) Python TLS API *) critical sections *) WITH_FREE_THREAD from the configure process The INCREF/DECREF and const-ness is hidden from the extension writer. Adding integrity locks to list/dict/etc is also hidden. > > Post 1.6, a patch set to add critical sections to lists and dicts would be > > built. In addition, a new analysis would be done to examine the globals > > that are available along with possible race conditions in other mutable > > types and structures. Not all structures will be made thread-safe; for > > example, frame objects are used by a single thread at a time (I'm sure > > somebody could find a way to have multiple threads use or look at them, > > but that person can take a leap, too :-) > > It is unacceptable to have thread-unsafe structures that can be > accessed in a thread-unsafe way using pure Python code only. Hmm. I guess that I can grab a frame object reference via a traceback object. The frame and traceback objects can then be shared between threads. Now the question arises: if the original thread resumes execution and starts modifying these objects (inside the interpreter since both are readonly to Python), then the passed-to thread might see invalid data. I'm not sure whether these objects have multi-field integrity constraints. Conversely: if they don't, then changing a single field will simply create a race condition with the passed-to thread. Oh, and assuming that we remove a value from the structure before DECREF'ing it. By your "pure Python" statement, I'm presuming that you aren't worried about PyTuple_SET_ITEM() and similar. However, do you really want to start locking up the frame and traceback objects? (and code objects and ...) Cheers, -g -- Greg Stein, http://www.lyra.org/ From sjoerd at oratrix.nl Wed Apr 19 11:51:53 2000 From: sjoerd at oratrix.nl (Sjoerd Mullender) Date: Wed, 19 Apr 2000 11:51:53 +0200 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Your message of Mon, 17 Apr 2000 14:37:16 -0700. References: Message-ID: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> What is wrong with encoding ]]> in the XML way by using an extra CDATA. In other words split up the CDATA section into two in the middle of the ]]> sequence: import string def encode_cdata(str): return ''), ']]]]>')) + \ ']]>' On Mon, Apr 17 2000 "David Ascher" wrote: > Lots of projects embed scripting & other code in XML, typically as CDATA > elements. For example, XBL in Mozilla. As far as I know, no one ever > bothers to define how one should _encode_ code in a CDATA segment, and it > appears that at least in the Mozilla world the 'encoding' used is 'cut & > paste', and it's the XBL author's responsibility to make sure that ]]> is > nowhere in the JavaScript code. > > That seems suboptimal to me, and likely to lead to disasters down the line. > > The only clean solution I can think of is to define a standard > encoding/decoding process for storing program code (which may very well > contain occurences of ]]> in CDATA, which effectively hides that triplet > from the parser. > > While I'm dreaming, it would be nice if all of the relevant language > communities (JS, Python, Perl, etc.) could agree on what that encoding is. > I'd love to hear of a recommendation on the topic by the XML folks, but I > haven't been able to find any such document. > > Any thoughts? > > --david ascher > > -- Sjoerd Mullender From SalzR at CertCo.com Wed Apr 19 16:57:16 2000 From: SalzR at CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 10:57:16 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading Message-ID: >This definitely slows Python down. If an object is known to be visible to >only one thread, then you can avoid the atomic inc/dec. But that leads to >madness :-) I would much rather see the language extended to indicate that a particular variable is "shared" across free-threaded interpreters. The hit of taking a mutex on every incref/decref is way bad. From gvwilson at nevex.com Wed Apr 19 17:03:17 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 19 Apr 2000 11:03:17 -0400 (EDT) Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading In-Reply-To: Message-ID: > Rich Salz wrote: > I would much rather see the language extended to indicate that a > particular variable is "shared" across free-threaded interpreters. The > hit of taking a mutex on every incref/decref is way bad. In my experience, allowing/requiring programmers to specify sharedness is a very rich source of hard-to-find bugs. (Not saying I have an answer to the performance hit of locking on incref/decref, just saying that the development cost of 'shared' is very high.) Greg From petrilli at amber.org Wed Apr 19 17:09:04 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Wed, 19 Apr 2000 11:09:04 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading In-Reply-To: ; from SalzR@CertCo.com on Wed, Apr 19, 2000 at 10:57:16AM -0400 References: Message-ID: <20000419110904.C6107@trump.amber.org> Salz, Rich [SalzR at CertCo.com] wrote: > >This definitely slows Python down. If an object is known to be visible to > >only one thread, then you can avoid the atomic inc/dec. But that leads to > >madness :-) > > I would much rather see the language extended to indicate that a particular > variable is "shared" across free-threaded interpreters. The hit of taking > a mutex on every incref/decref is way bad. I wonder if the energy is better spent in a truly highly-optimized implementation on the major platforms rather than trying to conditional this. This may mean writing x86 assembler, and a few others, but then again, once written, it shouldn't need much modification. I wonder if the conditional mutexing might be slower because of the check and lack of focus on bringing the core implementation up to speed. Chris -- | Christopher Petrilli | petrilli at amber.org From ping at lfw.org Wed Apr 19 17:40:08 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 08:40:08 -0700 (PDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> Message-ID: On Wed, 19 Apr 2000, Sjoerd Mullender wrote: > What is wrong with encoding ]]> in the XML way by using an extra > CDATA. In other words split up the CDATA section into two in the > middle of the ]]> sequence: Brilliant. Now that i've seen it, this has to be the right answer. -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From SalzR at CertCo.com Wed Apr 19 18:04:48 2000 From: SalzR at CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 12:04:48 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading Message-ID: >In my experience, allowing/requiring programmers to specify sharedness is >a very rich source of hard-to-find bugs. My experience is the opposite, since most objects aren't shared. :) You could probably do something like add an "owning thread" to each object structure, and on refcount throw an exception if not shared and the current thread isn't the owner. Not sure if space is a concern, but since the object is either shared or needs its own mutex, you make them a union: bool shared; union { python_thread_id_type id; python_mutex_type m; }; (Not saying I have an answer to the performance hit of locking on incref/decref, just saying that the development cost of 'shared' is very high.) Greg _______________________________________________ Thread-SIG maillist - Thread-SIG at python.org http://www.python.org/mailman/listinfo/thread-sig From ping at lfw.org Wed Apr 19 19:07:36 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 10:07:36 -0700 (PDT) Subject: [Python-Dev] Generic notifier module Message-ID: I think it would be very nice for the Python standard library to provide a messaging mechanism (you may know it as signals/slots, publish/subscribe, listen/notify, etc.). This could be very useful, especially for interactive applications where various components need to keep each other up to date about things. I know of several Tkinter programs where i'd like to use this mechanism. The proposed interface is: To add notification ability, mix in class notifier.Notifier. object.notify(message, callback) - Set up notification for message. object.denotify(message[, callback]) - Turn off notification. object.send(message, **args) - Call all callbacks registered on object for message, in reverse order of registration, passing along message and **args as arguments to each callback. If a callback returns notifier.BREAK, no further callbacks are called. (Alternatively, we could use signals/slots terminology: connect/disconnect/emit. I'm not aware of anything the signals/slots mechanism has that the above lacks.) Two kinds of messages are supported: 1. The 'message' passed to notify/denotify may be a class, and the 'message' passed to send may be a class or instance of a message class. In this case callbacks registered on that class and all its bases are called. 2. The 'message' passed to all three methods may be any other hashable object, in which case it is looked up by its hash, and callbacks registered on a hash-equal object are called. Thoughts and opinions are solicited (especially from those who have worked with messaging-type things before, and know the gotchas!). I haven't run into many tricky problems with these things in general, and i figure that the predictable order of callbacks should reduce complication. (I chose reverse ordering so that you always have the ability to add a callback that overrides existing ones.) A straw-man implementation follows. The callback registry is maintained in the notifier module so you don't have to worry about it messing up the attributes of your objects. -------- snip snip ---------------------------------- notifier.py -------- # If a callback returns BREAK, no more callbacks are called. BREAK = "break" # This number goes up every time a callback is added. serial = 0 # This dictionary maps callback functions to serial numbers. callbacks = {} def recipients(sender, message): """Return a list of (serial, callback) pairs for all the callbacks on this message and its base classes.""" key = (sender, message) if callbacks.has_key(key): list = map(lambda (k, v): (v, k), callbacks[key].items()) else: list = [] if hasattr(message, "__bases__"): for base in message.__bases__: list.extend(recipients(sender, base)) return list class Notifier: def send(self, message, **args): """Call any callbacks registered on this object for the given message. If message is a class or instance, callbacks registered on the class or any base class are called. Otherwise callbacks registered on a message of the same value (compared by hash) are called. The message and any extra keyword arguments are passed along to each callback.""" if hasattr(message, "__class__"): message = message.__class__ recip = recipients(self, message) recip.sort() recip.reverse() for serial, callback in recip: if callback(message, **args) == BREAK: return def notify(self, message, callback): """Register a callback on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if not callbacks.has_key(key): callbacks[key] = {} callbacks[key][callback] = serial = serial + 1 def denotify(self, message, callback=None): """Unregister a particular callback or all existing callbacks on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] elif callbacks[key].has_key(callback): del callbacks[key][callback] -------- snip snip ---------------------------------- notifier.py -------- -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From ping at lfw.org Wed Apr 19 19:25:12 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 10:25:12 -0700 (PDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Ka-Ping Yee wrote: > object.send(message, **args) - Call all callbacks registered on > object for message, in reverse order of registration, passing > along message and **args as arguments to each callback. One revision to the above: callbacks should get the sender of the message passed in as well as the message. The tweaked module follows. -------- snip snip ---------------------------------- notifier.py -------- # If a callback returns BREAK, no more callbacks are called. BREAK = "break" # This number goes up every time a callback is added. serial = 0 # This dictionary maps callback functions to serial numbers. callbacks = {} def recipients(sender, message): """Return a list of (serial, callback) pairs for all the callbacks on this message and its base classes.""" key = (sender, message) if callbacks.has_key(key): list = map(lambda (k, v): (v, k), callbacks[key].items()) else: list = [] if hasattr(message, "__bases__"): for base in message.__bases__: list.extend(recipients(sender, base)) return list class Notifier: """Mix in this class to provide notifier functionality on your objects. On a notifier object, use the 'notify' and 'denotify' methods to register or unregister callbacks on messages, and use the 'send' method to send a message from the object.""" def send(self, message, **args): """Call any callbacks registered on this object for the given message. If message is a class or instance, callbacks registered on the class or any base class are called. Otherwise callbacks registered on a message of the same value (compared by hash) are called. The message and any extra keyword arguments are passed along to each callback.""" if hasattr(message, "__class__"): message = message.__class__ recip = recipients(self, message) recip.sort() recip.reverse() for serial, callback in recip: if callback(self, message, **args) == BREAK: return def notify(self, message, callback): """Register a callback on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if not callbacks.has_key(key): callbacks[key] = {} callbacks[key][callback] = serial def denotify(self, message, callback=None): """Unregister a particular callback or all existing callbacks on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] elif callbacks[key].has_key(callback): del callbacks[key][callback] -------- snip snip ---------------------------------- notifier.py -------- -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From effbot at telia.com Wed Apr 19 19:15:28 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 19 Apr 2000 19:15:28 +0200 Subject: [Python-Dev] Generic notifier module References: Message-ID: <001901bfaa22$e202af60$34aab5d4@hagrid> Ka-Ping Yee wrote: > I think it would be very nice for the Python standard library to > provide a messaging mechanism (you may know it as signals/slots, > publish/subscribe, listen/notify, etc.). your notifier looks like a supercharged version of the "Observer" pattern [1]. here's a minimalistic observer mixin from "(the eff- bot guide to) Python Patterns and Idioms" [2]. class Observable: __observers = None def addobserver(self, observer): if not self.__observers: self.__observers = [] self.__observers.append(observer) def removeobserver(self, observer): self.__observers.remove(observer) def notify(self, event): for o in self.__observers or (): o(event) notes: -- in the GOF pattern, to "notify" is to tell observers that something happened, not to register an observer. -- GOF uses "attach" and "detach" to install and remove observers; the pattern book version uses slightly more descriptive names. -- the user is expected to use bound methods and event instances (or classes) to associate data with the notifier and events. earlier implementations were much more elaborate, but we found that the standard mechanisms was more than sufficient in real life... 1) "Design Patterns", by Gamma et al. 2) http://www.pythonware.com/people/fredrik/patternbook.htm From DavidA at ActiveState.com Wed Apr 19 19:43:26 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 19 Apr 2000 10:43:26 -0700 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> Message-ID: > What is wrong with encoding ]]> in the XML way by using an extra > CDATA. In other words split up the CDATA section into two in the > middle of the ]]> sequence: > > import string > def encode_cdata(str): > return ' string.join(string.split(str, ']]>'), ']]]]>')) + \ > ']]>' If I understand what you're proposing, you're splitting a single bit of Python code into N XML elements. This requires smarts not on the decode function (where they should be, IMO), but on the XML parsing stage (several leaves of the tree have to be merged). Seems like the wrong direction to push things. Also, I can imagine cases where the app puts several scripts in consecutive CDATA elements (assuming that's legal XML), and where a merge which inserted extra ]]> would be very surprising. Maybe I'm misunderstanding you, though.... --david ascher From effbot at telia.com Wed Apr 19 19:50:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 19 Apr 2000 19:50:17 +0200 Subject: [Python-Dev] Encoding of code in XML References: Message-ID: <000701bfaa27$c3546e00$34aab5d4@hagrid> David Ascher wrote: > > What is wrong with encoding ]]> in the XML way by using an extra > > CDATA. In other words split up the CDATA section into two in the > > middle of the ]]> sequence: > > > > import string > > def encode_cdata(str): > > return ' > string.join(string.split(str, ']]>'), ']]]]>')) + \ > > ']]>' > > If I understand what you're proposing, you're splitting a single bit of > Python code into N XML elements. nope. CDATA sections are used to encode data, they're not elements: XML 1.0, section 2.7: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as mark- up. you can put each data character in its own CDATA section, if you like. if the parser cannot handle that, it's broken. (if you've used xmllib, think handle_data, not start_cdata). From sjoerd at oratrix.nl Wed Apr 19 21:24:31 2000 From: sjoerd at oratrix.nl (Sjoerd Mullender) Date: Wed, 19 Apr 2000 21:24:31 +0200 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Your message of Wed, 19 Apr 2000 10:43:26 -0700. References: Message-ID: <20000419192432.F2A19301CF9@bireme.oratrix.nl> On Wed, Apr 19 2000 "David Ascher" wrote: > > What is wrong with encoding ]]> in the XML way by using an extra > > CDATA. In other words split up the CDATA section into two in the > > middle of the ]]> sequence: > > > > import string > > def encode_cdata(str): > > return ' > string.join(string.split(str, ']]>'), ']]]]>')) + \ > > ']]>' > > If I understand what you're proposing, you're splitting a single bit of > Python code into N XML elements. This requires smarts not on the decode > function (where they should be, IMO), but on the XML parsing stage (several > leaves of the tree have to be merged). Seems like the wrong direction to > push things. Also, I can imagine cases where the app puts several scripts > in consecutive CDATA elements (assuming that's legal XML), and where a merge > which inserted extra ]]> would be very surprising. > > Maybe I'm misunderstanding you, though.... I think you're not misunderstanding me, but maybe you are misunderstanding XML. :-) [Of course, it is also conceivable that I misunderstand XML. :-] First of all, I don't propose to split up the single bit of Python into multiple XML elements. CDATA sections are not XML elements. The XML standard says this: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. [http://www.w3.org/TR/REC-xml#sec-cdata-sect] In other words, according to the XML standard wherever you are allowed to put character data (such as in this case Python code), you are allowed to use CDATA sections. Their purpose is to escape blocks of text containing characters that would otherwise be recognized as markup. CDATA sections are not part of the markup, so the XML parser is allowed to coallese the multiple CDATA sections and other character data into one string before it gives it to the application. So, yes, this requires smarts on the XML parsing stage, but I think those smarts need to be there anyway. If an application put several pieces of Python code in one character data section, it is basically on its own. I don't think XML guarantees that those pieces aren't merged into one string by the XML parser before it gets to the application. As I said already, this is my interpretation of XML, and I could be misinterpreting things. -- Sjoerd Mullender From ping at lfw.org Wed Apr 19 22:14:39 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 15:14:39 -0500 (CDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <001901bfaa22$e202af60$34aab5d4@hagrid> Message-ID: On Wed, 19 Apr 2000, Fredrik Lundh wrote: > > your notifier looks like a supercharged version of the "Observer" > pattern [1]. here's a minimalistic observer mixin from "(the eff- > bot guide to) Python Patterns and Idioms" [2]. Oh, yeah, "observer". That was the other name for this mechanism that i forgot. > class Observable: I'm not picky about names... anything is fine. > def notify(self, event): > for o in self.__observers or (): > o(event) *Some* sort of dispatch would be nice, i think, rather than having to check the kind of event you're getting in every callback. Here are the three sources of "more stuff" in Notifier as opposed to Observer: 1. Dispatch. You register callbacks for particular messages rather than on the whole object. 2. Ordering. The callbacks are always called in reverse order of registration, which makes BREAK meaningful. 3. Inheritance. You can use a class hierarchy of messages. I think #1 is fairly essential, and i know i've come across situations where #3 is useful. The need for #2 is only a conjecture on my part. Does anyone care about the order in which callbacks get called? If not (and no one needs to use BREAK), we can throw out #2 and make Notifier simpler: callbacks = {} def send(key, message, **args): if callbacks.has_key(key): for callback in callbacks[key]: callback(key[0], message, **args) if hasattr(key[1], "__bases__"): for base in key[1].__bases__: send((key[0], base), message, **args) class Notifier: def send(self, message, **args): if hasattr(message, "__class__"): send((self, message.__class__), message, **args) else: send((self, message), message, **args) def notify(self, message, callback): key = (self, message) if not callbacks.has_key(key): callbacks[key] = [] callbacks[key].append(callback) def denotify(self, message, callback=None): key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] else: callbacks[key].remove(callback) -- ?!ng From paul at prescod.net Wed Apr 19 22:19:31 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 19 Apr 2000 15:19:31 -0500 Subject: [Python-Dev] Encoding of code in XML References: Message-ID: <38FE14D3.AC05DAE0@prescod.net> David Ascher wrote: > > ... > > If I understand what you're proposing, you're splitting a single bit of > Python code into N XML elements. No, a CDATA section is not an element. But the question of whether boundary placements are meaningful is sepearate. This comes back to the "semantics question". Most tools will not differentiate between two adjacent CDATA sections and one. The XML specification does not say whether they should or should not but in practice tools that consume XML and then throw it away typically do NOT care about CDATA section boundaries and tools that edit XML *do* care. This "break it into to two sections" solution is the typical one but it is god-awful ugly, even in XML editors. Many stream-based XML tools (e.g. mostSAX parsers, xmllib) *will* report two separate CDATA sections as two different character events. Application code must be able to handle this situation. It doesn't only occur with CDATA sections. XML parsers could equally break up long text chunks based on 1024-byte block boundaries or line breaks or whatever they feel like. In my opinion these variances in behvior stem from the myth that XML has no semantics but that's another off-topic topic. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From gstein at lyra.org Wed Apr 19 22:27:11 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 13:27:11 -0700 (PDT) Subject: [Python-Dev] marking shared-ness (was: baby steps for free-threading) In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Salz, Rich wrote: > >In my experience, allowing/requiring programmers to specify sharedness is > >a very rich source of hard-to-find bugs. > > My experience is the opposite, since most objects aren't shared. :) > You could probably do something like add an "owning thread" to each object > structure, and on refcount throw an exception if not shared and the current > thread isn't the owner. Not sure if space is a concern, but since the object > is either shared or needs its own mutex, you make them a union: > bool shared; > union { > python_thread_id_type id; > python_mutex_type m; > }; > > > (Not saying I have an answer to > the performance hit of locking on incref/decref, just saying that the > development cost of 'shared' is very high.) Regardless of complexity or lack thereof, any kind of "specified sharedness" cannot be implemented. Consider the case where a programmer forgets to note the sharedness. He passes the object to another thread. At certain points: BAM! The interpreter dumps core. Guido has specifically stated that *nothing* should ever allow that (in terms of pure Python code; bad C extension coding is all right). Sharedness has merit, but it cannot be used :-( Cheers, -g -- Greg Stein, http://www.lyra.org/ From SalzR at CertCo.com Wed Apr 19 22:27:10 2000 From: SalzR at CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 16:27:10 -0400 Subject: [Python-Dev] RE: [Thread-SIG] marking shared-ness (was: baby steps for free-th reading) Message-ID: >Consider the case where a programmer forgets to note the sharedness. He >passes the object to another thread. At certain points: BAM! The >interpreter dumps core. No. Using the "owning thread" idea prevents coredumps and allows the interpreter to throw an exception. Perhaps my note wasn't clear enough? /r$ From paul at prescod.net Wed Apr 19 22:25:42 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 19 Apr 2000 15:25:42 -0500 Subject: [Python-Dev] Encoding of code in XML References: <20000419192432.F2A19301CF9@bireme.oratrix.nl> Message-ID: <38FE1646.B29CAA8A@prescod.net> Sjoerd Mullender wrote: > > ... > > CDATA sections are not part of the markup, so the XML parser > is allowed to coallese the multiple CDATA sections and other character > data into one string before it gives it to the application. Allowed but not required. Most SAX parsers will not. Some DOM parsers will and some won't. :( > So, yes, this requires smarts on the XML parsing stage, but I think > those smarts need to be there anyway. I don't follow this part. Typically those "smarts" are not there. At the end of one CDATA section you get an event and at the beginning of the next you get a different event. It's the application's job to glue them together. :( Fixing this is one of the goals of EventDOM. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From tismer at tismer.com Wed Apr 19 22:38:31 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 19 Apr 2000 22:38:31 +0200 Subject: [Python-Dev] marking shared-ness (was: baby steps for free-threading) References: Message-ID: <38FE1947.70FC6AEE@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Salz, Rich wrote: > > >In my experience, allowing/requiring programmers to specify sharedness is > > >a very rich source of hard-to-find bugs. > > > > My experience is the opposite, since most objects aren't shared. :) > > You could probably do something like add an "owning thread" to each object > > structure, and on refcount throw an exception if not shared and the current > > thread isn't the owner. Not sure if space is a concern, but since the object > > is either shared or needs its own mutex, you make them a union: > > bool shared; > > union { > > python_thread_id_type id; > > python_mutex_type m; > > }; > > > > > > (Not saying I have an answer to > > the performance hit of locking on incref/decref, just saying that the > > development cost of 'shared' is very high.) > > Regardless of complexity or lack thereof, any kind of "specified > sharedness" cannot be implemented. > > Consider the case where a programmer forgets to note the sharedness. He > passes the object to another thread. At certain points: BAM! The > interpreter dumps core. > > Guido has specifically stated that *nothing* should ever allow that (in > terms of pure Python code; bad C extension coding is all right). > > Sharedness has merit, but it cannot be used :-( Too bad that we don't have incref/decref as methods. The possible mutables which have to be protected could in fact carry a thread handle of their current "owner" (probably the one who creted them), and incref would check whether the owner is still same. If it is not same, then the owner field would be wiped, and that turns the (higher cost) shared refcounting on, and all necessary protection as well. (Maybe some extra care is needed to ensure that this info isn't changed while we are testing it). Without inc/dec-methods, something similar could be done, but every inc/decref will be a bit more expensive since we must figure out wether we have a mutable or not. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein at lyra.org Wed Apr 19 22:52:12 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 13:52:12 -0700 (PDT) Subject: [Python-Dev] RE: marking shared-ness In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Salz, Rich wrote: > >Consider the case where a programmer forgets to note the sharedness. He > >passes the object to another thread. At certain points: BAM! The > >interpreter dumps core. > > No. Using the "owning thread" idea prevents coredumps and allows the > interpreter to throw an exception. Perhaps my note wasn't clear > enough? INCREF and DECREF cannot throw exceptions. Are there other points where you could safely detect erroneous sharing of objects? (in a guaranteed fashion) For example: what are all the ways that objects can be transported between threads. Can you erect tests at each of those points? I believe "no" since there are too many ways (func arg or an item in a shared ob). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Apr 19 23:15:39 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:15:39 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FE1947.70FC6AEE@tismer.com> Message-ID: On Wed, 19 Apr 2000, Christian Tismer wrote: >... > Too bad that we don't have incref/decref as methods. This would probably impose more overhead than some of the atomic inc/dec mechanisms. > The possible mutables which have to be protected could Non-mutable objects must be protected, too. An integer can be shared just as easily as a list. > in fact carry a thread handle of their current "owner" > (probably the one who creted them), and incref would > check whether the owner is still same. > If it is not same, then the owner field would be wiped, > and that turns the (higher cost) shared refcounting on, > and all necessary protection as well. > (Maybe some extra care is needed to ensure that this info > isn't changed while we are testing it). Ah. Neat. "Automatic marking of shared-ness" Could work. That initial test for the thread id could be expensive, though. What is the overhead of getting the current thread id? [ ... thinking about the code ... ] Nope. Won't work at all. There is a race condition when an object "becomes shared". DECREF: if ( object is not shared ) /* whoops! it just became shared! */ --(op)->ob_refcnt; else atomic_decrement(op) To prevent the race, you'd need an interlock which is more expensive than an atomic decrement. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer at tismer.com Wed Apr 19 23:25:45 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 19 Apr 2000 23:25:45 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <38FE2459.E0300B5@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Christian Tismer wrote: > >... > > Too bad that we don't have incref/decref as methods. > > This would probably impose more overhead than some of the atomic inc/dec > mechanisms. > > > The possible mutables which have to be protected could > > Non-mutable objects must be protected, too. An integer can be shared just > as easily as a list. Uhh, right. Everything is mutable, since me mutate the refcount :-( ... > Ah. Neat. "Automatic marking of shared-ness" > > Could work. That initial test for the thread id could be expensive, > though. What is the overhead of getting the current thread id? Zero if we cache it in the thread state. > [ ... thinking about the code ... ] > > Nope. Won't work at all. @#$%?!!-| yes-you-are-right - gnnn! > There is a race condition when an object "becomes shared". > > DECREF: > if ( object is not shared ) > /* whoops! it just became shared! */ > --(op)->ob_refcnt; > else > atomic_decrement(op) > > To prevent the race, you'd need an interlock which is more expensive than > an atomic decrement. Really, sad but true. Are atomic decrements really so cheap, meaning "are they mapped to the atomic dec opcode"? Then this is all ok IMHO. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein at lyra.org Wed Apr 19 23:34:19 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:34:19 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FE2459.E0300B5@tismer.com> Message-ID: On Wed, 19 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Ah. Neat. "Automatic marking of shared-ness" > > > > Could work. That initial test for the thread id could be expensive, > > though. What is the overhead of getting the current thread id? > > Zero if we cache it in the thread state. You don't have the thread state at incref/decref time. And don't say "_PyThreadState_Current" or I'll fly to Germany and personally kick your ass :-) >... > > There is a race condition when an object "becomes shared". > > > > DECREF: > > if ( object is not shared ) > > /* whoops! it just became shared! */ > > --(op)->ob_refcnt; > > else > > atomic_decrement(op) > > > > To prevent the race, you'd need an interlock which is more expensive than > > an atomic decrement. > > Really, sad but true. > > Are atomic decrements really so cheap, meaning "are they mapped > to the atomic dec opcode"? On some platforms and architectures, they *might* be. On Win32, we call InterlockedIncrement(). No idea what that does, but I don't think that it is a macro or compiler-detected thingy to insert opcodes. I believe there is a function call involved. pthreads do not define atomic inc/dec, so we must use a critical section + normal inc/dec operators. Linux has a kernel macro for atomic inc/dec, but it is only valid if __SMP__ is defined in your compilation context. etc. Platforms that do have an API (as Donn stated: BeOS has one; Win32 has one), they will be cheaper than an interlock. Therefore, we want to take advantage of an "atomic inc/dec" semantic when possible (and fallback to slower stuff when not). Cheers, -g -- Greg Stein, http://www.lyra.org/ From fleck at triton.informatik.uni-bonn.de Wed Apr 19 23:32:42 2000 From: fleck at triton.informatik.uni-bonn.de (Markus Fleck) Date: Wed, 19 Apr 2000 23:32:42 +0200 (MET DST) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: from "Greg Stein" at Apr 18, 2000 02:16:44 PM Message-ID: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Greg Stein: > Nevertheless, adding more moderators is the "proper" answer to the > problem. Even if it is difficult to get more moderators into the system, > there doesn't seem to be a better alternative. I agree with this. What would be helpful would be (i) a web interface for multiple-moderator moderation (which I believe Mailman already provides), and (ii) some rather simple changes to the list-to-newsgroup gateway to do some header manipulations before posting each approved message to c.l.py.a. I've been more or less "off the Net" for almost two months now, while getting started at my new job, and I will try to do some (summary-style) retro-moderation of the ca. 50 c.l.py.a submissions that I missed during this time. Automating the submission process and getting additional moderators would make c.l.py.a less dependent on me and avoid such moderation lags in the future. (And yes, of course, I'm sorry for the lag. But now I'm back, and I'm willing to help change the process so that such lags won't happen again in the future. Getting additional moderators would likely help with this.) Yours, Markus. From gstein at lyra.org Wed Apr 19 23:42:34 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:42:34 -0700 (PDT) Subject: [Python-Dev] optimize atomic inc/dec? (was: baby steps for free-threading) In-Reply-To: <20000419110904.C6107@trump.amber.org> Message-ID: On Wed, 19 Apr 2000, Christopher Petrilli wrote: > Salz, Rich [SalzR at CertCo.com] wrote: > > >This definitely slows Python down. If an object is known to be visible to > > >only one thread, then you can avoid the atomic inc/dec. But that leads to > > >madness :-) > > > > I would much rather see the language extended to indicate that a particular > > variable is "shared" across free-threaded interpreters. The hit of taking > > a mutex on every incref/decref is way bad. > > I wonder if the energy is better spent in a truly highly-optimized > implementation on the major platforms rather than trying to > conditional this. This may mean writing x86 assembler, and a few > others, Bill Tutt had a good point -- we can get a bunch of assembler fragments from the Linux kernel for atomic inc/dec. On specific compiler and processor architecture combinations, we could drop to assembly to provide an atomic dec/inc. For example, when we see we're using GCC on an x86 processor (whether FreeBSD or Linux), we can define atomic_inc() as an __asm fragment. > but then again, once written, it shouldn't need much > modification. I wonder if the conditional mutexing might be slower > because of the check and lack of focus on bringing the core > implementation up to speed. Won't work anyhow. See previous email. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Wed Apr 19 23:32:17 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 19 Apr 2000 17:32:17 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004192132.XAA14501@hera.informatik.uni-bonn.de> References: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: <14590.9697.366632.708503@goon.cnri.reston.va.us> Glad to hear from you, Marcus! I'm willing to help with both (a) and (b). I'll talk to Barry about the Mailman issues tomorrow. Jeremy From DavidA at ActiveState.com Wed Apr 19 23:46:49 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 19 Apr 2000 14:46:49 -0700 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: I can help moderate as well. --david ascher From billtut at microsoft.com Wed Apr 19 23:28:12 2000 From: billtut at microsoft.com (Bill Tutt) Date: Wed, 19 Apr 2000 14:28:12 -0700 Subject: [Python-Dev] RE: [Thread-SIG] Re: marking shared-ness Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF71@RED-MSG-50> > From: Christian Tismer [mailto:tismer at tismer.com] > Are atomic decrements really so cheap, meaning "are they mapped > to the atomic dec opcode"? > Then this is all ok IMHO. > On x86en they are mapped to an "atomic assembly fragment" i.e. params to some registers and then stick in a "lock add" instruction or something. (please forgive me if I've botched the details). So in that respect they are cheap, its a hardware level feature. On the otherhand though, given the effect that these instructions have on the CPU (its caches, buses, and so forth) it is by for no means free. My recollection vaguely recalls someone saying that all the platforms NT has supported so far has had at the minimum an InterlockedInc/Dec. InterlockCompareExchange() is where I think not all of the Intel family (386) and some of the other platforms may not have had the appropriate instructions. InterlockCompareExchange() is useful for creating your own spinlocks. The GCC list might be a good place for enquiring about the feasability of InterlockedInc/Dec on various platforms. Bill From bwarsaw at cnri.reston.va.us Thu Apr 20 02:51:55 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 19 Apr 2000 20:51:55 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce References: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: <14590.21675.154913.979353@anthem.cnri.reston.va.us> >>>>> "MF" == Markus Fleck writes: MF> I agree with this. What would be helpful would be (i) a web MF> interface for multiple-moderator moderation (which I believe MF> Mailman already provides), and (ii) some rather simple changes MF> to the list-to-newsgroup gateway to do some header MF> manipulations before posting each approved message to MF> c.l.py.a. This is doable in Mailman, but I'm not so sure how much it will help, unless we make a further refinement. I don't know enough about the Usenet moderation process to know if this will work, but let me outline things here. There's two ways a message can get announced, first via email or first via Usenet. Here's what happens in each case: - A message is sent to python-announce at python.org. This is the preferred email address to post to. These messages get forwarded to clpa at python.net, which I believe is just a simple exploder to Markus and Vladimir. Obviously with the Starship current dead, this is broken too. I don't know what happens to these messages once Markus and Vladimir get it, but I assume that Markus adds a magic approval header and forwards the message to Usenet. Perhaps Markus can explain this process in more detail. - A message is sent to python-announce-list at python.org. This is not the official place to send announcements, but this specific alias simply forwards to python-announce at python.org so see above. Note that the other standard Mailman python-announce-list-* aliases are in place, and python-announce-list is a functioning Mailman mailing list. This list gates from Usenet, but not /to/ Usenet because of the forwarding described above. When it sees a message on c.l.py.a, it sucks the messages off the newsgroup and forwards it to all list members. Obviously those messages must have already been approved by the Usenet moderators. - A message is sent directly to c.l.py.a. From what I understand, the Usenet software itself forwards to the moderators, who again, do their magic and forwards the message to Usenet. So, given this arrangement, the messages never arrive unapproved at a mailing list. What it sounds like Markus is proposing is that the official Usenet moderator address would be a mailing list. It would be a closed mailing list whose members are approved moderators, with a shared Mailman alias. Any message posted there would be held for approval, and once approved, it would be injected directly into Usenet, with the appropriate magic header. I think I know what I'd need to add to Mailman to support this, though it'll be a little tricky. I need to know exactly how approved messages should be posted to Usenet. Does someone have a URL reference to this procedure, or is it easy enough to explain? -Barry From gstein at lyra.org Thu Apr 20 06:09:13 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 21:09:13 -0700 (PDT) Subject: [Python-Dev] [OT] [Q] corruption in DB files? Message-ID: Hey guys, You're the Smart Guys that I know, and it seems this is also the forum where I once heard a long while back that DB can occasionally corrupt its files. True? Was it someone here that mentioned that? (Skip?) Or maybe it was bsddb? (or is that the same as the Berkeley DB, now handled by Sleepycat) A question just came up elsewhere about DB and I seemed to recall somebody mentioning the occasional corruption. Oh, maybe it was related to multiple threads. Any help appreciated! Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Thu Apr 20 08:09:45 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 20 Apr 2000 09:09:45 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Ka-Ping Yee wrote: > object.denotify(message[, callback]) - Turn off notification. You need to be a bit more careful here. What if callback is foo().function? It's unique, so I could never denotify it. A better way, and more popular (at least in the signal/slot terminology), is to return a cookie on connect, and have disconnect requests by a cookie. > object.send(message, **args) - Call all callbacks registered on > object for message, in reverse order of registration, passing > along message and **args as arguments to each callback. > If a callback returns notifier.BREAK, no further callbacks > are called. When I implemented that mechanism, I just used a special exception (StopCommandExecution). I prefer that, since it allows the programmer much more flexibility (which I used) > (Alternatively, we could use signals/slots terminology: > connect/disconnect/emit. I'm not aware of anything the signals/slots > mechanism has that the above lacks.) Me neither. Some offer a variety of connect-methods: connect after, connect-before (this actually has some uses). Have a short look at the Gtk+ signal mechanism -- it has all these. > 1. The 'message' passed to notify/denotify may be a class, and > the 'message' passed to send may be a class or instance of > a message class. In this case callbacks registered on that > class and all its bases are called. This seems a bit unneccessary, but YMMV. In all cases I've needed this, a simple string sufficed (i.e., method 2) Implementation nit: I usually use class _BREAK: pass BREAK=_BREAK() That way it is gurranteed that BREAK is unique. Again, I use this mostly with exceptions. All in all, great idea Ping! -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fleck at triton.informatik.uni-bonn.de Thu Apr 20 09:02:33 2000 From: fleck at triton.informatik.uni-bonn.de (Markus Fleck) Date: Thu, 20 Apr 2000 09:02:33 +0200 (MET DST) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14590.21675.154913.979353@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at Apr 19, 2000 08:51:55 PM Message-ID: <200004200702.JAA14939@hera.informatik.uni-bonn.de> Barry A. Warsaw: > What it sounds like Markus is proposing is that the official Usenet > moderator address would be a mailing list. It would be a closed > mailing list whose members are approved moderators, with a shared > Mailman alias. Any message posted there would be held for approval, > and once approved, it would be injected directly into Usenet, with the > appropriate magic header. Exactly. (In fact, each approved message could be both posted to Usenet and forwarded to the subscription-based shadow mailing list at the same time.) > I think I know what I'd need to add to Mailman to support this, though > it'll be a little tricky. I need to know exactly how approved messages > should be posted to Usenet. Does someone have a URL reference to this > procedure, or is it easy enough to explain? Basically, you need two headers: Newsgroups: comp.lang.python.announce Approved: python-announce at python.org The field contents of the "Approved:" header are in fact never checked for validity; it only has to be non-empty for the message to be successfully posted to a moderated newsgroup. (BTW, posting to the "alt.hackers" newsgroup actually relies on posters inserting "Approved: whatever" headers on their own, because "alt.hackers" is a moderated newsgroup without a moderator. You need to "hack" the Usenet moderation mechanism to be able to post there. :-) Because of the simplicity of this mechanism, no cross-posting to another moderated newsgroup should occur when posting an approved message to Usenet; e.g. if someone cross-posts to comp.lang.python, comp.lang.python.announce, comp.os.linux.misc and comp.os.linux.announce, the posting will go to the moderation e-mail address of the first moderated newsgroup in the "Newsgroups:" header supplied by the author's Usenet posting agent. (I.e., in this case, clpa at starship.skyport.net, if the header enumerates newsgroups in the above-mentioned order, "c.l.py,c.l.py.a,c.o.l.a,c.o.l.m".) Ideally, the moderators (or moderation software) of this first moderated newsgroup should split the posting up accordingly: a) remove names of newsgroups that we want to handle ourselves (e.g. c.l.py.a, possibly also c.l.py if cross-posted), and re-post the otherwise unchanged message to Usenet with only a changed "Newsgroups:" header (Headers: "Newsgroups: c.o.l.a,c.o.l.m" / no "Approved:" header added) -> this is necessary for the message to ever reach c.o.l.a and c.o.l.m -> the message will get forwarded by the Usenet server software to the moderation address of c.o.l.a, which is the first moderated newsgroup in the remaining list of newsgroups c) approve (or reject) posting to c.l.py.a and/or c.l.py (Headers: "Newsgroups: c.l.py.a" or "Newsgroups: c.l.py.a,c.l.py" or "Newsgroups: c.l.py" / an "Approved: python-announce at python.org" header may always be added, but is only necessary if also posting to c.l.py.a) According to the c.l.py.a posting guidelines, a "Followup-To:" header, will be added, if it doesn't exist yet, pointing to c.l.py for follow-up messages ("Follow-Up: c.l.py"). While a) may always happen automatically, prior to moderation, and needs to be custom-tailored for our c.l.py.a/c.l.py use case, the moderation software for b), i.e. Mailman, should allow moderators to adjust the "Newsgroups:" header while approving a message. It might also be nice to have an "X-Original-Newsgroups:" line in Mailman with a copy of the original "Newsgroups:" line. Regarding headers, usually e-mail will allow and forward almost any non-standard header field (a feature that is used to preserve the "Newsgroups:" header even when forwarding a posting to an e-mail address), but the Usenet server software may not accept all kinds of headers, so that just before posting, only known "standard" header fields should be preserved; any "X-*:" headers, for example, might be candidates for removal prior to posting, because some Usenet servers return strange errors when a message is posted that contains certain special "X-*:" headers. OTOH, AFAIK, the posting agent should generate and add a unique "Message-ID:" header for each Usenet posting itself. But if you have a Usenet forwarding agent already running, much of this should be implemented there already. Okay, now some links to resources and FAQs on that subject: Moderated Newsgroups FAQ http://www.swcp.com/~dmckeon/mod-faq.html USENET Moderators Archive http://www.landfield.com/moderators/ NetNews Moderators Handbook - 5.2.1 Approved: Line http://www.landfield.com/usenet/moderators/handbook/mod05.html#5.2.1 Please e-mail me if you have any further questions. Yours, Markus. From harri.pasanen at trema.com Thu Apr 20 09:42:29 2000 From: harri.pasanen at trema.com (Harri Pasanen) Date: Thu, 20 Apr 2000 09:42:29 +0200 Subject: [Python-Dev] Re: [Thread-SIG] optimize atomic inc/dec? (was: baby steps for free-threading) References: Message-ID: <38FEB4E5.47DCD834@trema.com> Greg Stein wrote, talking about optimizing atomic inc/dec: > > For example, when we see we're using GCC on an x86 processor (whether > FreeBSD or Linux), we can define atomic_inc() as an __asm fragment. > The same applies for Sparc. In our C++ software we have currently the atomic increment as inlined assembly for x86, sparc and sparc-v9, using GCC. It is a function though, so there is a function call involved. -Harri From tismer at tismer.com Thu Apr 20 15:23:31 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 20 Apr 2000 15:23:31 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <38FF04D3.4CE2067E@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Christian Tismer wrote: > > Greg Stein wrote: > >... > > > Ah. Neat. "Automatic marking of shared-ness" > > > > > > Could work. That initial test for the thread id could be expensive, > > > though. What is the overhead of getting the current thread id? > > > > Zero if we cache it in the thread state. > > You don't have the thread state at incref/decref time. > > And don't say "_PyThreadState_Current" or I'll fly to Germany and > personally kick your ass :-) A real temptation to see whether I can really get you to Germany :-)) ... Thanks for all the info. > Linux has a kernel macro for atomic inc/dec, but it is only valid if > __SMP__ is defined in your compilation context. Well, and while it looks cheap, it is for sure expensive since several caches are flushed, and the system is stalled until the modified value is written back into the memory bank. Could it be that we might want to use another thread design at all? I'm thinking of running different interpreters in the same process space, but with all objects really disjoint, invisible between the interpreters. This would perhaps need some internal changes, in order to make all the builtin free-lists disjoint as well. Now each such interpreter would be running in its own thread without any racing condition at all so far. To make this into threading and not just a flavor of multitasking, we now need of course shared objects, but only those objects which we really want to share. This could reduce the cost for free threading to nearly zero, except for the (hopefully) few shared objects. I think, instead of shared globals, it would make more sense to have some explicit shared resource pool, which controls every access via mutexes/semas/whateverweneed. Maybe also that we would prefer to copy objects into it over sharing, in order to minimize collisions. I hope the need for true sharing can be minimized to a few variables. Well, I hope. "freethreads" could even coexist with the current locking threads, we would not even need a special build for them, but to rethink threading. Like "the more free threading is, the more disjoint threads are". are-you-now-convinced-to-come-and-kick-my-ass-ly y'rs - chris :-) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From skip at mojam.com Thu Apr 20 15:29:37 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 20 Apr 2000 08:29:37 -0500 (CDT) Subject: [Python-Dev] [OT] [Q] corruption in DB files? In-Reply-To: References: Message-ID: <14591.1601.125779.714243@beluga.mojam.com> Greg> You're the Smart Guys that I know, and it seems this is also the Greg> forum where I once heard a long while back that DB can Greg> occasionally corrupt its files. ... Greg> A question just came up elsewhere about DB and I seemed to recall Greg> somebody mentioning the occasional corruption. Oh, maybe it was Greg> related to multiple threads. Yes, Berkeley DB 1.85 (exposed through the bsddb module in Python) has bugs in the hash implementation. They never fixed them (well maybe in 1.86?), but moved on to version 2.x. Of course, they changed the function call interface and the file format, so many people didn't follow. They do provide a 1.85-compatible API but you have to #include db_185.h instead of db.h. As far as I know, if you stick to the btree interface with 1.85 you should be okay. Unfortunately, both the anydbm and dbhash modules both use the hash interface, so if you're trying to be more or less portable and not modify your Python sources, you've also got buggy db files... Someone did create a libdb 2.x-compatible module that exposes more of the underlying functionality. Check the VoP for it. libdb == Berkeley DB == Sleepycat... Skip From skip at mojam.com Thu Apr 20 15:40:06 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 20 Apr 2000 08:40:06 -0500 (CDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF04D3.4CE2067E@tismer.com> References: <38FF04D3.4CE2067E@tismer.com> Message-ID: <14591.2230.630609.500780@beluga.mojam.com> Chris> I think, instead of shared globals, it would make more sense to Chris> have some explicit shared resource pool, which controls every Chris> access via mutexes/semas/whateverweneed. Tuple space, anyone? Check out http://www.snurgle.org/~pybrenda/ It's a Linda implementation for Python. Linda was developed at Yale by David Gelernter. Unfortunately, he's better known to the general public as being one of the Unabomber's targets. You can find out more about Linda at http://www.cs.yale.edu/Linda/linda.html Skip From fredrik at pythonware.com Thu Apr 20 15:55:52 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 20 Apr 2000 15:55:52 +0200 Subject: [Python-Dev] Generic notifier module References: Message-ID: <000c01bfaad0$f2d1d2e0$0500a8c0@secret.pythonware.com> Moshe Zadka" wrote: > > object.denotify(message[, callback]) - Turn off notification. > > You need to be a bit more careful here. What if callback is > foo().function? It's unique, so I could never denotify it. if you need a value later, the usual approach is to bind it to a name. works in all other situations, so why not use it here? > A better way, and more popular (at least in the signal/slot terminology), > is to return a cookie on connect, and have disconnect requests by a cookie. in which way is "harder to use in all common cases" better? ... as for the "break" functionality, I'm not sure it really belongs in a basic observer class (in GOF terms, that's a "chain of responsibility"). but if it does, I sure prefer an exception over a magic return value. From tismer at tismer.com Thu Apr 20 16:23:56 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 20 Apr 2000 16:23:56 +0200 Subject: [Python-Dev] Re: marking shared-ness References: <38FF04D3.4CE2067E@tismer.com> <14591.2230.630609.500780@beluga.mojam.com> Message-ID: <38FF12FC.32052356@tismer.com> Skip Montanaro wrote: > > Chris> I think, instead of shared globals, it would make more sense to > Chris> have some explicit shared resource pool, which controls every > Chris> access via mutexes/semas/whateverweneed. > > Tuple space, anyone? Check out > > http://www.snurgle.org/~pybrenda/ Very interesting, indeed. > It's a Linda implementation for Python. Linda was developed at Yale by > David Gelernter. Unfortunately, he's better known to the general public as > being one of the Unabomber's targets. You can find out more about Linda at > > http://www.cs.yale.edu/Linda/linda.html Many broken links. The most activity appears to have stopped around 94/95, the project looks kinda dead. But this doesn't mean that we cannot learn from them. Will think more when the starship problem is over... ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From moshez at math.huji.ac.il Thu Apr 20 16:24:49 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 20 Apr 2000 17:24:49 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <000c01bfaad0$f2d1d2e0$0500a8c0@secret.pythonware.com> Message-ID: On Thu, 20 Apr 2000, Fredrik Lundh wrote: > > A better way, and more popular (at least in the signal/slot terminology), > > is to return a cookie on connect, and have disconnect requests by a cookie. > > in which way is "harder to use in all common cases" > better? I'm not sure I agree this is harder to use in all common cases, but YMMV. Strings are prone to collisions, etc. And usually the code which connects the callback is pretty close (flow-control wise) to the code that would disconnect. FWIW, the Gtk+ signal mechanism has 3-4 different disconnects, and it might not be a bad idea, now that I think of it. > as for the "break" functionality, I'm not sure it really > belongs in a basic observer class (in GOF terms, that's ^^^ TLA overload! What's GOF? > a "chain of responsibility"). but if it does, I sure prefer > an exception over a magic return value. I don't know if it belongs or not, but I do know that it is sometimes needed, and is very hard and ugly to simulate otherwise. That's one FAQ I don't want to answer -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip at mojam.com Thu Apr 20 16:38:08 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 20 Apr 2000 09:38:08 -0500 (CDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF12FC.32052356@tismer.com> References: <38FF04D3.4CE2067E@tismer.com> <14591.2230.630609.500780@beluga.mojam.com> <38FF12FC.32052356@tismer.com> Message-ID: <14591.5712.162339.740646@beluga.mojam.com> >> http://www.cs.yale.edu/Linda/linda.html Chris> Many broken links. The most activity appears to have stopped Chris> around 94/95, the project looks kinda dead. But this doesn't mean Chris> that we cannot learn from them. Yes, I think Linda mostly lurks under the covers these days. Their Piranha project, which aims to soak up spare CPU cycles to do parallel computing, uses Linda. I suspect Linda is probably hidden somewhere inside Lifestreams as well. As a correction to my original note, Nicholas Carriero was the other primary lead on Linda. I no longer recall the details, but he may have been on of Gelernter's grad students in the late 80's. Skip From gvwilson at nevex.com Thu Apr 20 16:40:48 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 20 Apr 2000 10:40:48 -0400 (EDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <14591.2230.630609.500780@beluga.mojam.com> Message-ID: > Chris> I think, instead of shared globals, it would make more sense to > Chris> have some explicit shared resource pool, which controls every > Chris> access via mutexes/semas/whateverweneed. > Skip wrote: > Tuple space, anyone? Check out > http://www.snurgle.org/~pybrenda/ > It's a Linda implementation for Python. You can find out more about > Linda at > http://www.cs.yale.edu/Linda/linda.html Linda is also the inspiration for Sun's JavaSpaces, an easier-to-use layer on top of Jini: http://java.sun.com/products/javaspaces/ http://cseng.aw.com/bookpage.taf?ISBN=0-201-30955-6 On the plus side: 1. It's much (much) easier to use than mutex, semaphore, or monitor models: students in my parallel programming course could start writing C-Linda programs after (literally) five minutes of instruction. 2. If you're willing/able to do global analysis of access patterns, its simplicity doesn't have a significant performance penalty. 3. (Bonus points) It integrates very well with persistence schemes. On the minus side: 1. Some things that "ought" to be simple (e.g. barrier synchronization) are surprisingly difficult to get right, efficiently, in vanilla Linda-like systems. Some VHLL derivates (based on SETL and Lisp dialects) solved this in interesting ways. 2. It's different enough from hardware-inspired shared-memory + mutex models to inspire the same "Huh, that looks weird" reaction as Scheme's parentheses, or Python's indentation. On the other hand, Bill Joy and company are now backing it... Personal opinion: I've felt for 15 years that something like Linda could be to threads and mutexes what structured loops and conditionals are to the "goto" statement. Were it not for the "Huh" effect, I'd recommend hanging "Danger!" signs over threads and mutexes, and making tuple spaces the "standard" concurrency mechanism in Python. I'd also recommend calling the system "Carol", after Monty Python regular Carol Cleveland. The story is that Linda itself was named after the 70s porn star Linda Lovelace, in response to the DoD naming its language "Ada" after the other Lovelace... Greg p.s. I talk a bit about Linda, and the limitations of the vanilla approach, in http://mitpress.mit.edu/book-home.tcl?isbn=0262231867. From mlh at swl.msd.ray.com Thu Apr 20 17:02:30 2000 From: mlh at swl.msd.ray.com (Milton L. Hankins) Date: Thu, 20 Apr 2000 11:02:30 -0400 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF12FC.32052356@tismer.com> Message-ID: On Thu, 20 Apr 2000, Christian Tismer wrote: > Skip Montanaro wrote: > > > > Tuple space, anyone? Check out > > > > http://www.snurgle.org/~pybrenda/ > > Very interesting, indeed. *Steps out of the woodwork and bows* PyBrenda doesn't have a thread implementation, but it could be adapted to do so. It might be prudent to eliminate the use of TCP/IP in that case as well. In case anyone is interested, I just created a mailing list for PyBrenda at egroups: http://www.egroups.com/group/pybrenda-users -- Milton L. Hankins \\ ><> Ephesians 5:2 ><> http://www.snurgle.org/~mhankins // These are my opinions, not Raytheon's. \\ W. W. J. D. ? From effbot at telia.com Thu Apr 20 19:14:08 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 20 Apr 2000 19:14:08 +0200 Subject: [Python-Dev] Generic notifier module References: Message-ID: <018101bfaaec$65e56740$34aab5d4@hagrid> Moshe Zadka wrote: > > in which way is "harder to use in all common cases" > > better? > > I'm not sure I agree this is harder to use in all common cases, but YMMV. > Strings are prone to collisions, etc. not sure what you're talking about here, so I suppose we're talking past each other. what I mean is that: model.addobserver(view.notify) model.removeobserver(view.notify) works just fine without any cookies. having to do: view.cookie = model.addobserver(view.notify) model.removeobserver(view.cookie) is definitely no improvement. and if you have an extraordinary case (like a function pointer extracted from an object returned from a factory function), you just have to assign the function pointer to a local variable: self.callback = strangefunction().notify model.addobserver(self.callback) model.removeobserver(self.callback) in this case, you would probably keep a pointer to the object returned by the function anyway: self.viewer = getviewer() model.addobserver(viewer.notify) model.removeobserver(viewer.notify) > And usually the code which connects > the callback is pretty close (flow-control wise) to the code that would > disconnect. FWIW, the Gtk+ signal mechanism has 3-4 different disconnects, > and it might not be a bad idea, now that I think of it. you really hate keeping things as simple as possible, don't you? ;-) what are these 3-4 "disconnects" doing? > > as for the "break" functionality, I'm not sure it really > > belongs in a basic observer class (in GOF terms, that's > ^^^ TLA overload! What's GOF? http://www.hillside.net/patterns/DPBook/GOF.html > > a "chain of responsibility"). but if it does, I sure prefer > > an exception over a magic return value. > > I don't know if it belongs or not, but I do know that it is sometimes > needed, and is very hard and ugly to simulate otherwise. That's one FAQ > I don't want to answer yeah, but the two patterns have different uses. From moshez at math.huji.ac.il Thu Apr 20 21:31:05 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 20 Apr 2000 22:31:05 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <018101bfaaec$65e56740$34aab5d4@hagrid> Message-ID: [Fredrik Lundh] > not sure what you're talking about here, so I suppose > we're talking past each other. Nah, I guess it was a simple case of you being right and me being wrong. (In other words, you've convinced me) [Moshe] > FWIW, the Gtk+ signal mechanism has 3-4 different disconnects, > and it might not be a bad idea, now that I think of it. [Fredrik Lundh] > you really hate keeping things as simple as possible, > don't you? ;-) > > what are these 3-4 "disconnects" doing? gtk_signal_disconnect -- disconnect by cookie gtk_signal_disconnect_by_func -- disconnect by function pointer gtk_signal_disconnect_by_data -- disconnect by the void* pointer passed Hey, you asked just-preparing-for-my-lecture-next-friday-ly y'rs, Z. (see www.linux.org.il for more) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein at lyra.org Thu Apr 20 22:43:24 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 20 Apr 2000 13:43:24 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF04D3.4CE2067E@tismer.com> Message-ID: On Thu, 20 Apr 2000, Christian Tismer wrote: >... > > Linux has a kernel macro for atomic inc/dec, but it is only valid if > > __SMP__ is defined in your compilation context. > > Well, and while it looks cheap, it is for sure expensive > since several caches are flushed, and the system is stalled > until the modified value is written back into the memory bank. Yes, Bill mentioned that yesterday. Important fact, but there isn't much you can do -- they must be atomic. > Could it be that we might want to use another thread design > at all? I'm thinking of running different interpreters in > the same process space, but with all objects really disjoint, > invisible between the interpreters. This would perhaps need > some internal changes, in order to make all the builtin > free-lists disjoint as well. > Now each such interpreter would be running in its own thread > without any racing condition at all so far. > To make this into threading and not just a flavor of multitasking, > we now need of course shared objects, but only those objects > which we really want to share. This could reduce the cost for > free threading to nearly zero, except for the (hopefully) few > shared objects. > I think, instead of shared globals, it would make more sense > to have some explicit shared resource pool, which controls > every access via mutexes/semas/whateverweneed. Maybe also that > we would prefer to copy objects into it over sharing, in order > to minimize collisions. I hope the need for true sharing > can be minimized to a few variables. Well, I hope. > "freethreads" could even coexist with the current locking threads, > we would not even need a special build for them, but to rethink > threading. > Like "the more free threading is, the more disjoint threads are". No. Now you're just talking processes with IPC. Yes, they happen to run in threads, but you got none of the advantages of a threaded application. Threading is about sharing an address space. Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA at ActiveState.com Thu Apr 20 22:40:54 2000 From: DavidA at ActiveState.com (David Ascher) Date: Thu, 20 Apr 2000 13:40:54 -0700 Subject: [Python-Dev] String issues -- see the JavaScript world Message-ID: Just an FYI to those discussing Unicode issues. There is currently a big debate over in Mozilla-land looking at how XPIDL (their interface definition language) should deal with the various kinds of string types. Someone who cares may want to follow up on that to see if some of their issues apply to Python as well. News server: news.mozilla.org Newsgroup: netscape.public.mozilla.xpcom Thread: Encoding wars -- more in the Big String Story Cheers, --david From tismer at tismer.com Fri Apr 21 14:38:27 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 21 Apr 2000 14:38:27 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <39004BC3.1DD108D0@tismer.com> Greg Stein wrote: > > On Thu, 20 Apr 2000, Christian Tismer wrote: [me, about free threading with less sharing] > No. Now you're just talking processes with IPC. Yes, they happen to run in > threads, but you got none of the advantages of a threaded application. Are you shure that every thread user shares your opinion? I see many people using threads just in order to have multiple tasks in parallel, with none or quite few shared variables. > Threading is about sharing an address space. This is part of the truth. There are a number of other reasons to use threads, too. Since Python has nothing really private, this implies in fact to protect every single object for free threading, although nobody wants this in the first place to happen. Other languages have much fewer problems here (I mean C, C++, Delphi...), they are able to do the right thing in the right place. Python is not designed for that. Why do you want to enforce the impossible, letting every object pay a high penalty to become completely thread-safe? Sharing an address space should not mean to share everything, but something. If Python does not support this, we should think of a redesign of its threading model, instead of loosing so much of efficiency. You end up in a situation where all your C extensions can run free threaded at high speed, just Python is busy all the time to fight the threading. That is not Python. You know that I like to optimize things. For me, optimization mut give an overall gain, not just in one area, where others get worse. If free threading cannot be optimized in a way that gives better overall performance, then it is a wrong optimization to me. Well, this is all speculative until we did some measures. Maybe I'm just complaining about 1-2 percent of performance loss, then I'd agree to move my complaining into /dev/null :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mlh at swl.msd.ray.com Fri Apr 21 18:36:40 2000 From: mlh at swl.msd.ray.com (Milton L. Hankins) Date: Fri, 21 Apr 2000 12:36:40 -0400 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: <39004BC3.1DD108D0@tismer.com> Message-ID: On Fri, 21 Apr 2000, Christian Tismer wrote: > Are you shure that every thread user shares your opinion? > I see many people using threads just in order to have > multiple tasks in parallel, with none or quite few shared > variables. About the only time I use threads is when 1) I'm doing something asynchronous in an event loop-driven paradigm (such as Tkinter) or 2) I'm trying to emulate fork() under win32 > Since Python has nothing really private, this implies in > fact to protect every single object for free threading, > although nobody wants this in the first place to happen. How does Java solve this problem? (Is this analagous to native vs. green threads?) > Python is not designed for that. Why do you want to enforce > the impossible, letting every object pay a high penalty > to become completely thread-safe? Hmm, how about declaring only certain builtins as free-thread safe? Or is "the impossible" necessary because of the nature of incref/decref? -- Milton L. Hankins :: ><> Ephesians 5:2 ><> Software Engineer, Raytheon Systems Company :: http://amasts.msd.ray.com/~mlh :: RayComNet 7-225-4728 From billtut at microsoft.com Fri Apr 21 18:50:47 2000 From: billtut at microsoft.com (Bill Tutt) Date: Fri, 21 Apr 2000 09:50:47 -0700 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF9F@RED-MSG-50> > From: Milton L. Hankins [mailto:mlh at swl.msd.ray.com] > > On Fri, 21 Apr 2000, Christian Tismer wrote: > > > Are you shure that every thread user shares your opinion? > > I see many people using threads just in order to have > > multiple tasks in parallel, with none or quite few shared > > variables. > > About the only time I use threads is when > 1) I'm doing something asynchronous in an event loop-driven > paradigm > (such as Tkinter) or > 2) I'm trying to emulate fork() under win32 > 3) I'm doing something that would block in an asynchronous FSM. (e.g. Medusa, or an NT I/O completion port driven system) > > Since Python has nothing really private, this implies in > > fact to protect every single object for free threading, > > although nobody wants this in the first place to happen. > > How does Java solve this problem? (Is this analagous to > native vs. green > threads?) > Java allows you to specifically mention whether something should be seralized or not, and no, this doesn't have anything to do with native vs. green threads) > > Python is not designed for that. Why do you want to enforce > > the impossible, letting every object pay a high penalty > > to become completely thread-safe? > > Hmm, how about declaring only certain builtins as free-thread > safe? incref/decref are not type object specific, they're global macros. Making them methods on the type object would be the sensible thing to do, but would definately be non-backward compatible. Bill From seanj at speakeasy.org Fri Apr 21 18:55:29 2000 From: seanj at speakeasy.org (Sean Jensen_Grey) Date: Fri, 21 Apr 2000 09:55:29 -0700 (PDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: > > Since Python has nothing really private, this implies in > > fact to protect every single object for free threading, > > although nobody wants this in the first place to happen. > > How does Java solve this problem? (Is this analagous to native vs. green > threads?) > > > Python is not designed for that. Why do you want to enforce > > the impossible, letting every object pay a high penalty > > to become completely thread-safe? > > Hmm, how about declaring only certain builtins as free-thread safe? Or is > "the impossible" necessary because of the nature of incref/decref? http://www.javacats.com/US/articles/MultiThreading.html I would like sync foo: bloc of code here maybe we could merge in some Occam while were at it. B^) sync would be a most excellent operator in python. From seanj at speakeasy.org Fri Apr 21 19:16:29 2000 From: seanj at speakeasy.org (Sean Jensen_Grey) Date: Fri, 21 Apr 2000 10:16:29 -0700 (PDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: http://www.cs.bris.ac.uk/~alan/javapp.html Take a look at the above link. It merges the Occam model with Java and uses 'channel based' interfaces (not sure exactly what this is). But they seem pretty exicted. I vote for using InterlockedInc/Dec as it is available as an assembly instruction on almost everyplatform. Could be then derive all other locking schemantics from this? And our portability problem is solved if it comes in the box with gcc. On Fri, 21 Apr 2000, Sean Jensen_Grey wrote: > > > Since Python has nothing really private, this implies in > > > fact to protect every single object for free threading, > > > although nobody wants this in the first place to happen. > > > > How does Java solve this problem? (Is this analagous to native vs. green > > threads?) > > > > > Python is not designed for that. Why do you want to enforce > > > the impossible, letting every object pay a high penalty > > > to become completely thread-safe? > > > > Hmm, how about declaring only certain builtins as free-thread safe? Or is > > "the impossible" necessary because of the nature of incref/decref? > > http://www.javacats.com/US/articles/MultiThreading.html > > I would like > > sync foo: > bloc of code here > > maybe we could merge in some Occam while were at it. B^) > > > sync would be a most excellent operator in python. > > > > > _______________________________________________ > Thread-SIG maillist - Thread-SIG at python.org > http://www.python.org/mailman/listinfo/thread-sig > From gvwilson at nevex.com Fri Apr 21 19:27:49 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Fri, 21 Apr 2000 13:27:49 -0400 (EDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: > On Fri, 21 Apr 2000, Sean Jensen_Grey wrote: > http://www.cs.bris.ac.uk/~alan/javapp.html > Take a look at the above link. It merges the Occam model with Java and uses > 'channel based' interfaces (not sure exactly what this is). Channel-based programming has been called "the revenge of the goto", as in, "Where the hell does this channel go to?" Programmers must manage conversational continuity manually (i.e. keep track of the origins of messages, so that they can be replied to). It also doesn't really help with the sharing problem that started this thread: if you want a shared integer, you have to write a little server thread that knows how to act like a semaphore, and then it read/write requests that are exactly equivalent to P and V operations (and subject to all the same abuses). Oh, and did I mention the joys of trying to draw a semi-accurate diagram of the plumbing in your program after three months of upgrade work? *shudder* Greg From guido at python.org Fri Apr 21 19:29:06 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Apr 2000 13:29:06 -0400 Subject: [Python-Dev] Inspiration Message-ID: <200004211729.NAA16454@eric.cnri.reston.va.us> http://www.perl.com/pub/2000/04/whatsnew.html --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Fri Apr 21 21:52:06 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 12:52:06 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <39004BC3.1DD108D0@tismer.com> Message-ID: On Fri, 21 Apr 2000, Christian Tismer wrote: >... > > No. Now you're just talking processes with IPC. Yes, they happen to run in > > threads, but you got none of the advantages of a threaded application. > > Are you shure that every thread user shares your opinion? Now you're just being argumentative. I won't respond to this. >... > Other languages have much fewer problems here (I mean > C, C++, Delphi...), they are able to do the right thing > in the right place. > Python is not designed for that. Why do you want to enforce > the impossible, letting every object pay a high penalty > to become completely thread-safe? Existing Python semantics plus free-threading places us in this scenario. Many people have asked for free-threading, and the number of inquiries that I receive have grown over time. (nobody asked in 1996 when I first published my patches; I get a query every couple months now) >... > You know that I like to optimize things. For me, optimization > mut give an overall gain, not just in one area, where others > get worse. If free threading cannot be optimized in > a way that gives better overall performance, then > it is a wrong optimization to me. > > Well, this is all speculative until we did some measures. > Maybe I'm just complaining about 1-2 percent of performance > loss, then I'd agree to move my complaining into /dev/null :-) It is more than this. In my last shot at this, pystone ran about half as fast. There are a few things that will be different this time around, but it certainly won't in the "few percent" range. Presuming you can keep your lock contention low, then your overall performances *goes up* once you have a multiprocessor machine. Sure, each processor runs Python (say) 10% slower, but you have *two* of them going. That is 180% compared to a central-lock Python on an MP machine. Lock contention: my last patches had really high contention. It didn't scale across processors well. This round will have more fine-grained locks than the previous version. But it will be interesting to measure the contention. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Apr 21 21:49:09 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Apr 2000 15:49:09 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Your message of "Fri, 21 Apr 2000 12:52:06 PDT." References: Message-ID: <200004211949.PAA16911@eric.cnri.reston.va.us> > It is more than this. In my last shot at this, pystone ran about half as > fast. There are a few things that will be different this time around, but > it certainly won't in the "few percent" range. Interesting thought: according to patches recently posted to patches at python.org (but not yet vetted), "turning on" threads on Win32 in regular Python also slows down Pystone considerably. Maybe it's not so bad? Maybe those patches contain a hint of what we could do? --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Fri Apr 21 22:02:23 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 13:02:23 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <200004211949.PAA16911@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > > It is more than this. In my last shot at this, pystone ran about half as > > fast. There are a few things that will be different this time around, but > > it certainly won't in the "few percent" range. > > Interesting thought: according to patches recently posted to > patches at python.org (but not yet vetted), "turning on" threads on Win32 > in regular Python also slows down Pystone considerably. Maybe it's > not so bad? Maybe those patches contain a hint of what we could do? I think that my tests were threaded vs. free-threaded. It has been so long ago, though... :-) Yes, we'll get those patches reviewed and installed. That will at least help the standard threading case. With more discrete locks (e.g. one per object or one per code section), then we will reduce lock contention. Working on improving the lock mechanism itself and the INCREF/DECREF system will help, too. But this initial thread was to seek people to assist with some coding to get stuff into 1.6. The heavy lifting will certainly be after 1.6, but we can get some good stuff in *today*. We'll examine performance later on, then start improving it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 21 22:21:55 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 13:21:55 -0700 (PDT) Subject: [Python-Dev] RE: [Thread-SIG] Re: marking shared-ness In-Reply-To: Message-ID: On Fri, 21 Apr 2000, Brent Fulgham wrote: >... > The problem is that having to grab the global interpreter lock > every time I want to manipulate Python objects from C seems wasteful. > This is perhaps more of a "interpreter" issue, rather than a > thread issue perhaps, but it does seem that if each thread (and > therefore interpreter state from my perspective) kept internal > track of itself, there would be much less lock contention as one > interpreter drops out of Python into the C code for a moment, then > releases the lock and returns, etc. > > So I think it's possible that free-threading changes might provide > some benefit even on uniprocessor systems. This is true. Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS become null macros. Your C extensions operate within their thread of execution, but have no central lock to worry about releasing before they block on something. And from an embedding standpoint, the same is true. You do not need to acquire any locks to start manipulating Python objects. Each object maintains its own integrity. Note: embedding/extending *can* destroy integrity. For example, tuples have no integrity locking -- Python programs cannot change them, so you cannot have two Python threads breaking things. C code can certainly destroy things with something this simple: Py_DECREF(PyTuple_GET_ITEM(tuple, 3)); PyTuple_SET_ITEM(tuple, 3, ob); Exercise for the reader on why the above code is a disaster waiting to happen :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer at tismer.com Fri Apr 21 22:29:06 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 21 Apr 2000 22:29:06 +0200 Subject: [Python-Dev] Re: marking shared-ness References: <200004211949.PAA16911@eric.cnri.reston.va.us> Message-ID: <3900BA12.DFE0A6EB@tismer.com> Guido van Rossum wrote: > > > It is more than this. In my last shot at this, pystone ran about half as > > fast. There are a few things that will be different this time around, but > > it certainly won't in the "few percent" range. > > Interesting thought: according to patches recently posted to > patches at python.org (but not yet vetted), "turning on" threads on Win32 > in regular Python also slows down Pystone considerably. Maybe it's > not so bad? Maybe those patches contain a hint of what we could do? I had a rough look at the patches but didn't understand enough yet. But I tried the sample scriptlet on python 1.5.2 and Stackless Python - see here: D:\python>python -c "import test.pystone;test.pystone.main()" Pystone(1.1) time for 10000 passes = 1.96765 This machine benchmarks at 5082.2 pystones/second D:\python>python spc/threadstone.py Pystone(1.1) time for 10000 passes = 5.57609 This machine benchmarks at 1793.37 pystones/second This is even worse than Markovitch's observation. Now, let's try with Stackless Python: D:\python>cd spc D:\python\spc>python -c "import test.pystone;test.pystone.main()" Pystone(1.1) time for 10000 passes = 1.843 This machine benchmarks at 5425.94 pystones/second D:\python\spc>python threadstone.py Pystone(1.1) time for 10000 passes = 3.27625 This machine benchmarks at 3052.27 pystones/second Isn't that remarkable? Stackless performs nearly 1.8 as good under threads. Why? I've optimized the ticker code away for all those "fast" opcodes which never can cause another interpreter incarnation. Standard Python does a bit too much here, dealing the same way with extremely fast opcodes like POP_TOP, as with a function call. Responsiveness is still very good. Markovitch's example also tells us this story: Even with his patches, the threading stuff still costs 10 percent. This is the lock that we touch every ten opcodes. In other words: touching a lock costs about as much as an opcode costs on average. ciao - chris threadstone.py: import thread # Start empty thread to initialise thread mechanics (and global lock!) # This thread will finish immediately thus won't make much influence on # test results by itself, only by that fact that it initialises global lock thread.start_new_thread(lambda : 1, ()) import test.pystone test.pystone.main() -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein at lyra.org Sat Apr 22 01:19:03 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 16:19:03 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1 _socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10 mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11 python16.dsp,1.2,1.3 pythonw.dsp,1.8,1.9 select.dsp,1.1,1.2 unicodedata.dsp,1.1,1.2 zlib.dsp,1.10,1.11 In-Reply-To: <200004212126.RAA18041@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: >... > * Base address for all extension modules updated. PC\dllbase_nt.txt > also updated. Erroneous "libpath" directory removed for all > projects. Rather than specifying the base address in each DSP, the Apache project has used a text file for this stuff. Here is the text file used: --snip-- -- Begin New BaseAddr.ref -- ; os/win32/BaseAddr.ref contains the central repository ; of all module base addresses ; to avoid relocation ; WARNING: Update this file by reviewing the image size ; of the debug-generated dll files; release images ; should fit in the larger debug-sized space. ; module name base-address max-size aprlib 0x6FFA0000 0x00060000 ApacheCore 0x6FF00000 0x000A0000 mod_auth_anon 0x6FEF0000 0x00010000 mod_cern_meta 0x6FEE0000 0x00010000 mod_auth_digest 0x6FED0000 0x00010000 mod_expires 0x6FEC0000 0x00010000 mod_headers 0x6FEB0000 0x00010000 mod_info 0x6FEA0000 0x00010000 mod_rewrite 0x6FE80000 0x00020000 mod_speling 0x6FE70000 0x00010000 mod_status 0x6FE60000 0x00010000 mod_usertrack 0x6FE50000 0x00010000 mod_proxy 0x6FE30000 0x00020000 --snip-- And here is what one of the link lines looks like: # ADD LINK32 ApacheCore.lib aprlib.lib kernel32.lib /nologo /base:@BaseAddr.ref,mod_usertrack /subsystem:windows /dll /map /debug /machine:I386 /libpath:"..\..\CoreD" /libpath:"..\..\lib\apr\Debug" This mechanism could be quite helpful for Python. The .ref file replaces the dllbase_nt.txt file, centralizes the management, and directly integrates with the tools. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond at skippinet.com.au Sat Apr 22 02:44:31 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat, 22 Apr 2000 10:44:31 +1000 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1_socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11python16.dsp In-Reply-To: Message-ID: [Greg writes] > Rather than specifying the base address in each DSP, the > Apache project > has used a text file for this stuff. Here is the text file used: Yes - I saw this in the docs for the linker when I was last playing here. I didnt bother with this, as it still seems to me the best longer term approach is to use the "rebind" tool. This would allow the tool to select the addresses (less chance of getting them wrong), but also would allow us to generate "debug info" for the release builds of Python... But I guess that in the meantime, having the linker process this file is an improvement... I will wait until Guido has got to my other build patches and look into this... Mark. From gstein at lyra.org Sat Apr 22 02:56:22 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 17:56:22 -0700 (PDT) Subject: [Python-Dev] base addresses (was: [Python-checkins] CVS: ...) In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Mark Hammond wrote: > [Greg writes] > > Rather than specifying the base address in each DSP, the > > Apache project > > has used a text file for this stuff. Here is the text file used: > > Yes - I saw this in the docs for the linker when I was last playing > here. > > I didnt bother with this, as it still seems to me the best longer > term approach is to use the "rebind" tool. This would allow the > tool to select the addresses (less chance of getting them wrong), > but also would allow us to generate "debug info" for the release > builds of Python... Yes, although having specific addresses also means that every Python executable/DLL has the same set of addresses. You can glean information from the addresses without having symbols handy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Apr 22 05:53:39 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 22 Apr 2000 06:53:39 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: <200004211729.NAA16454@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > http://www.perl.com/pub/2000/04/whatsnew.html Yeah, loads of cool stuff we should steal... And loads of stuff that we shouldn't steal, no matter how cool it looks (lvaluable subroutines, anyone?) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sat Apr 22 06:42:47 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 22 Apr 2000 07:42:47 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: <200004211729.NAA16454@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > http://www.perl.com/pub/2000/04/whatsnew.html OK, here's my summary of the good things we should copy: (In that order:) -- Weak references (as weak dictionaries? would "w{}" to signify a weak dictionary is alright parser-wise?) -- Binary numbers -- way way cool (and doesn't seem to hard -- need to patch the tokenizer, PyLong_FromString and PyOS_strtoul: anything I've missed?) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein at lyra.org Sat Apr 22 09:07:09 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 22 Apr 2000 00:07:09 -0700 (PDT) Subject: [Python-Dev] Inspiration In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Moshe Zadka wrote: > On Fri, 21 Apr 2000, Guido van Rossum wrote: > > http://www.perl.com/pub/2000/04/whatsnew.html > > OK, here's my summary of the good things we should copy: > > (In that order:) > > -- Weak references (as weak dictionaries? would "w{}" to signify a weak > dictionary is alright parser-wise?) > -- Binary numbers -- way way cool (and doesn't seem to hard -- need to > patch the tokenizer, PyLong_FromString and PyOS_strtoul: anything > I've missed?) Yet another numeric format? eek. If anything, we should be dropping octal, rather than adding binary. You want binary? Just use int("10010", 2). No need for more syntax. I'd go for weak objects (proxies) rather than weak dictionaries. Duplicating the dict type just to deal with weak refs seems a bit much. But I'm not a big brain on this stuff -- I tend to skip all the discussions people have had on this stuff. I just avoid the need for circular refs and weak refs :-) Most of the need for weak refs would disappear with some simple form of GC installed. And it seems we'll have that by 1.7. Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Apr 22 11:46:29 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 22 Apr 2000 12:46:29 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Greg Stein wrote: > Yet another numeric format? eek. If anything, we should be dropping octal, > rather than adding binary. > > You want binary? Just use int("10010", 2). No need for more syntax. Damn, but you're right. > Most of the need for weak refs would disappear with some simple form of GC > installed. And it seems we'll have that by 1.7. Disagree. Think "destructors": with weak references, there's no problems: the referant dies first, and if later, the referer needs the referant to die, well, he'll get a "DeletionError: this object does not exist anymore" in his face, which is alright, because a weak referant should not trust the reference to live. 90%-of-the-cyclic-__del__-full-trash-problem-would-go-away-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer at tismer.com Sat Apr 22 13:53:57 2000 From: tismer at tismer.com (Christian Tismer) Date: Sat, 22 Apr 2000 13:53:57 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <390192D5.57443E99@tismer.com> Greg, Greg Stein wrote: Presuming you can keep your lock contention low, then your overall > performances *goes up* once you have a multiprocessor machine. Sure, each > processor runs Python (say) 10% slower, but you have *two* of them going. > That is 180% compared to a central-lock Python on an MP machine. Why didn't I think of this. MP is a very very good point. Makes now all much sense to me. sorry for being dumb - happy easter - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From amk1 at erols.com Sat Apr 22 21:51:47 2000 From: amk1 at erols.com (A.M. Kuchling) Date: Sat, 22 Apr 2000 15:51:47 -0400 Subject: [Python-Dev] 1.6 speed Message-ID: <200004221951.PAA09193@mira.erols.com> Python 1.6a2 is around 10% slower than 1.5 on pystone. Any idea why? [amk at mira Python-1.6a2]$ ./python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.59 This machine benchmarks at 2785.52 pystones/second [amk at mira Python-1.6a2]$ python1.5 Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.19 This machine benchmarks at 3134.8 pystones/second --amk From tismer at tismer.com Sun Apr 23 04:21:47 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 23 Apr 2000 04:21:47 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> Message-ID: <39025E3B.35639080@tismer.com> "A.M. Kuchling" wrote: > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > Any idea why? > > [amk at mira Python-1.6a2]$ ./python Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.59 > This machine benchmarks at 2785.52 pystones/second > > [amk at mira Python-1.6a2]$ python1.5 Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.19 > This machine benchmarks at 3134.8 pystones/second Hee hee :-) D:\python>python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.92135 This machine benchmarks at 5204.66 pystones/second D:\python>cd \python16 D:\Python16>python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 2.06234 This machine benchmarks at 4848.86 pystones/second D:\Python16>cd \python\spc D:\python\spc>python Lib/test/pystone.py python: can't open file 'Lib/test/pystone.py' D:\python\spc>python ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.81034 This machine benchmarks at 5523.82 pystones/second More hee hee :-) Python has been at a critical size with its main loop. The recently added extra code exceeds this size. I had the same effect with Stackless Python, and I worked around it already. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mhammond at skippinet.com.au Sun Apr 23 04:21:01 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun, 23 Apr 2000 12:21:01 +1000 Subject: [Python-Dev] 1.6 speed In-Reply-To: <39025E3B.35639080@tismer.com> Message-ID: > Python has been at a critical size with its main loop. > The recently added extra code exceeds this size. > I had the same effect with Stackless Python, and > I worked around it already. OK - so let us in on your secret! :-) Were your work-arounds specific to the stackless work, or could they be applied here? Only-2-more-years-of-beating-up-Guido-before-stackless-time-ly, Mark. From tismer at tismer.com Sun Apr 23 16:43:10 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 23 Apr 2000 16:43:10 +0200 Subject: [Python-Dev] 1.6 speed References: Message-ID: <39030BFE.1675EE20@tismer.com> Mark Hammond wrote: > > > Python has been at a critical size with its main loop. > > The recently added extra code exceeds this size. > > I had the same effect with Stackless Python, and > > I worked around it already. > > OK - so let us in on your secret! :-) > > Were your work-arounds specific to the stackless work, or could they be applied here? My work-arounds originated from code from last January where I was on a speed trip, but with the (usual) low interest from Guido. Then, with Stackless I saw a minor speed loss and finally came to the conclusion that I would be good to apply my patches to my Python version. That was nothing special so far, and Stackless was still a bit slow. I though this came from the different way to call functions for quite a long time, until I finally found out this February: The central loop of the Python interpreter is at a critical size for caching. Speed depends very much on which code gets near which other code, and how big the whole interpreter loop is. What I did: - Un-inlined several code pieces again, back into functions in order to make the big switch smaller. - simplified error handling, especially I ensured that all local error variables have very short lifetime and are optimized away - simplified the big switch, tuned the why_code handling into special opcodes, therefore the whole things gets much simpler. This reduces code size and therefore the probability that we are in the cache, and due to short variable lifetime and a simpler loop structure, the compiler seems to do a better job of code ordering. > Only-2-more-years-of-beating-up-Guido-before-stackless-time-ly, Yup, and until then I will not apply my patches to Python, this is part of my license: Use it but only *with* Stackless. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at trixie.triqs.com Mon Apr 24 00:16:38 2000 From: tismer at trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 00:16:38 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> Message-ID: <39037646.DEF8A139@trixie.triqs.com> "A.M. Kuchling" wrote: > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > Any idea why? I submitted a comparison with Stackless Python. Now I actually applied the Stackless Python patches to the current CVS version. My version does again show up as faster than standard Python, with the same relative measures, but I too have this effect: Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. Claim: This is not related to ceval.c . Something else must have introduced a significant speed loss. Stackless Python, upon the pre-unicode tag version of CVS: D:\python\spc>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.80724 This machine benchmarks at 5533.29 pystones/second Stackless Python, upon the recent version of CVS: D:\python\spc\Python-cvs\PCbuild>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.94941 This machine benchmarks at 5129.75 pystones/second Less than 10 percent, but bad enough. I guess we have to use MAL's test suite and measure everything alone. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at trixie.triqs.com Mon Apr 24 00:45:12 2000 From: tismer at trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 00:45:12 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> Message-ID: <39037CF8.24E1D1BD@trixie.triqs.com> Ack, sorry. Please drop the last message. This one was done with the correct dictionaries. :-() Christian Tismer wrote: > > "A.M. Kuchling" wrote: > > > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > > Any idea why? > > I submitted a comparison with Stackless Python. > Now I actually applied the Stackless Python patches > to the current CVS version. > > My version does again show up as faster than standard Python, > with the same relative measures, but I too have this effect: > > Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. > > Claim: > This is not related to ceval.c . > Something else must have introduced a significant speed loss. > > Stackless Python, upon the pre-unicode tag version of CVS: > > D:\python\spc>python ../lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.80724 > This machine benchmarks at 5533.29 pystones/second > > Stackless Python, upon the recent version of CVS: > this one corrected: D:\python\spc\Python-slp\PCbuild>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.98433 This machine benchmarks at 5039.49 pystones/second > Less than 10 percent, but bad enough. It is 10 percent, and bad enough. > > I guess we have to use MAL's test suite and measure everything > alone. > > ciao - chris > > -- > Christian Tismer :^) > Applied Biometrics GmbH : Have a break! Take a ride on Python's > Kaunstr. 26 : *Starship* http://starship.python.net > 14163 Berlin : PGP key -> http://wwwkeys.pgp.net > PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF > where do you want to jump today? http://www.stackless.com > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Mon Apr 24 15:03:56 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 09:03:56 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 14:49:11 +0200." <390442C7.F30179D9@trixie.triqs.com> References: <390442C7.F30179D9@trixie.triqs.com> Message-ID: <200004241303.JAA19894@eric.cnri.reston.va.us> [Moving this to python-dev because it's a musing > > The main point is to avoid string.*. > > Agreed. Also replacing map by a loop might not even be slower. > What remains as open question: Several modules need access > to string constants, and they therefore still have to import > string. > Is there an elegant solution to this? import string > That's why i asked for some way to access "".__class__ or > whatever, to get into some common namespace with the constants. I dunno. However, I've noticed that in many situations where map() could be used with a string.* function (*if* you care about the speed-up and you don't care about the readability issue), there's no equivalent that uses the new string methods. This stems from the fact that map() wants a function, not a method. Python 3000 solves this partly, assuming types and classes are unified there. Where in 1.5 we wrote map(string.strip, L) in Python 3K we will be able to write map("".__class__.strip, L) However, this is *still* not as powerful as map(lambda s: s.strip(), L) because the former requires that all items in L are in fact strings, while the latter works for anything with a strip() method (in particular Unicode objects and UserString instances). Maybe Python 3000 should recognize map(lambda) and generate more efficient code for it... --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at trixie.triqs.com Mon Apr 24 16:01:26 2000 From: tismer at trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 16:01:26 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> Message-ID: <390453B6.745E852B@trixie.triqs.com> > Christian Tismer wrote: > > > > "A.M. Kuchling" wrote: > > > > > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > > > Any idea why? ... > > Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. > > > > Claim: > > This is not related to ceval.c . > > Something else must have introduced a significant speed loss. I guess I can explain now what's happening, at least for the Windows platform. Python 1.5.2's .dll was nearly about 512K, something more. I think to remember that 512K is a common size of the secondary cache. Now, linking with the MS linker does not give you any particularly useful order of modules. When I look into the map file, the modules appear sorted by name. This is for sure not providing optimum performance. As I read the docs, explicit ordering of the linkage would only make sense for C++ and wouldn't work out for C, since we could order the exported functions, but not the private ones, giving even more distance between releated code. My solution to see if I might be right was this: I ripped out almost all builtin extension modules and compiled/linked without them. This shrunk the dll size down from 647K to 557K, very close to the 1.5.2 size. Now I get the following figures: Python 1.6, with stackless patches: D:\python\spc\Python-slp\PCbuild>python /python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.95468 This machine benchmarks at 5115.92 pystones/second Python 1.6, from the dist: D:\Python16>python /python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 2.09214 This machine benchmarks at 4779.8 pystones/second That means my optimizations are in charge again, after the overall code size went below about 512K. I think these 10 percent are quite valuable. These options come to my mind: a) try to do optimum code ordering in the too large .dll . This seems to be hard to achieve. b) Split the dll into two dll's in a way that all the necessary internal stuff sits closely in one of them. c) try to split the library like above, but use a static library layout for one of them, and link the static library into the final dll. This would hopefully keep related things together. I don't know if c) is possible, but it might be tried. Any thoughts? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Mon Apr 24 17:11:14 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 11:11:14 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1 _socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10 mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11 python16.dsp,1.2,1.3 pythonw.dsp,1.8,1.9 select.dsp,1.1,1.2 unicodedata.dsp,1.1,1.2 zlib.dsp,1.10,1.11 In-Reply-To: Your message of "Fri, 21 Apr 2000 16:19:03 PDT." References: Message-ID: <200004241511.LAA28854@eric.cnri.reston.va.us> > And here is what one of the link lines looks like: > > # ADD LINK32 ApacheCore.lib aprlib.lib kernel32.lib /nologo > /base:@BaseAddr.ref,mod_usertrack /subsystem:windows /dll /map /debug > /machine:I386 /libpath:"..\..\CoreD" /libpath:"..\..\lib\apr\Debug" > > This mechanism could be quite helpful for Python. The .ref file replaces > the dllbase_nt.txt file, centralizes the management, and directly > integrates with the tools. I agree. Just send me patches -- I'm *really* overwhelmed with patch management at the moment, I don't feel like coming up with new code right now... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at trixie.triqs.com Mon Apr 24 17:19:41 2000 From: tismer at trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 17:19:41 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> Message-ID: <3904660D.6F22F798@trixie.triqs.com> Sorry, it was not really found... Christian Tismer wrote: [thought he had found the speed leak] After re-inserting all the builtin modules, I got nearly the same result after a complete re-build, just marginally slower. There must something else be happening that I cannot understand. Stackless Python upon 1.5.2+ is still nearly 10 percent faster, regardless what I do to Python 1.6. Testing whether Unicode has some effect? I changed PyUnicode_Check to always return 0. This should optimize most related stuff away. Result: No change at all! Which changes were done after the pre-unicode tag, which might really count for performance? I'm quite desperate, any ideas? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim_one at email.msn.com Tue Apr 25 02:56:18 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 24 Apr 2000 20:56:18 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004241303.JAA19894@eric.cnri.reston.va.us> Message-ID: <000101bfae51$10f467a0$3ea0143f@tim> [Guido] > ... > However, this is *still* not as powerful as > > map(lambda s: s.strip(), L) > > because the former requires that all items in L are in fact strings, > while the latter works for anything with a strip() method (in > particular Unicode objects and UserString instances). > > Maybe Python 3000 should recognize map(lambda) and generate more > efficient code for it... [s.strip() for s in L] That is, list comprehensions solved the speed, generality and clarity problems here before they were discovered . From guido at python.org Tue Apr 25 03:21:42 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 21:21:42 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 20:56:18 EDT." <000101bfae51$10f467a0$3ea0143f@tim> References: <000101bfae51$10f467a0$3ea0143f@tim> Message-ID: <200004250121.VAA00320@eric.cnri.reston.va.us> > > Maybe Python 3000 should recognize map(lambda) and generate more > > efficient code for it... > > [s.strip() for s in L] > > That is, list comprehensions solved the speed, generality and clarity > problems here before they were discovered . Ah! I knew there had to be a solution without lambda! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Apr 25 05:19:35 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 24 Apr 2000 22:19:35 -0500 (CDT) Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <000101bfae51$10f467a0$3ea0143f@tim> References: <200004241303.JAA19894@eric.cnri.reston.va.us> <000101bfae51$10f467a0$3ea0143f@tim> Message-ID: <14597.3783.737317.226791@beluga.mojam.com> Tim> [s.strip() for s in L] Tim> That is, list comprehensions solved the speed, generality and Tim> clarity problems here before they were discovered . What is the status of list comprehensions in Python? I remember some work being done several months ago. They definitely don't appear to be in the 1.6a2. Was there some reason to defer them until later? Skip From tim_one at email.msn.com Tue Apr 25 05:26:24 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 24 Apr 2000 23:26:24 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <14597.3783.737317.226791@beluga.mojam.com> Message-ID: <000801bfae66$09191840$e72d153f@tim> [Skip Montanaro] > What is the status of list comprehensions in Python? I remember some work > being done several months ago. They definitely don't appear to be in the > 1.6a2. Was there some reason to defer them until later? Greg Ewing posted a patch to c.l.py that implemented a good start on the proposal. But nobody has pushed it. I had hoped to, but ran out of time; not sure Guido even knows about Greg's patch. Perhaps the 1.6 source distribution could contain a new "intriguing experimental patches" directory? Greg's list-comp and Christian's Stackless have enough fans that this would probably be appreciated. Perhaps some other things too, if we all run out of time (thinking mostly of Vladimir's malloc cleanup and NeilS's gc). From guido at python.org Tue Apr 25 06:13:51 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 00:13:51 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 23:26:24 EDT." <000801bfae66$09191840$e72d153f@tim> References: <000801bfae66$09191840$e72d153f@tim> Message-ID: <200004250413.AAA00577@eric.cnri.reston.va.us> > Greg Ewing posted a patch to c.l.py that implemented a good start on the > proposal. But nobody has pushed it. I had hoped to, but ran out of time; > not sure Guido even knows about Greg's patch. I vaguely remember, but not really. We did use his f(*args, **kwargs) patches as a starting point for a 1.6 feature though -- if the list comprehensions are in a similar state, they'd be great to start but definitely need work. > Perhaps the 1.6 source distribution could contain a new "intriguing > experimental patches" directory? Greg's list-comp and Christian's Stackless > have enough fans that this would probably be appreciated. Perhaps some > other things too, if we all run out of time (thinking mostly of Vladimir's > malloc cleanup and NeilS's gc). Perhaps a webpage woule make more sense? There's no point in loading every download with this. And e.g. stackless evolves at a much faster page than core Python. I definitely want Vladimir's patches in -- I feel very guilty for not having reviewed his latest proposal yet. I expect that it's right on the mark, but I understand if Vladimir wants to wait with preparing yet another set of patches until I'm happy with the design... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Apr 25 06:37:42 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 24 Apr 2000 23:37:42 -0500 (CDT) Subject: [Python-Dev] list comprehensions patch - updated for current CVS version Message-ID: <14597.8470.495090.799119@beluga.mojam.com> For those folks that might want to fiddle with list comprehensions I tweaked Greg Ewing's list comprehensions patch to work with the current CVS tree. The attached gzip'd patch contains diffs for Grammar/Grammar Include/graminit.h Lib/test/test_grammar.py Lib/test/output/test_grammar Python/compile.c Python/graminit.c I would have updated the corresponding section of the language reference, but the BNF there didn't match the contents of Grammar/Grammar, so I was a bit unclear what needed doing. If it gets that far perhaps someone else can contribute the necessary verbiage or at least point me in the right direction. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: listcomp.patch.gz Type: application/octet-stream Size: 4814 bytes Desc: list comprehensions patch for Python URL: From Vladimir.Marangozov at inrialpes.fr Tue Apr 25 08:13:32 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 25 Apr 2000 08:13:32 +0200 (CEST) Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004250413.AAA00577@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 25, 2000 12:13:51 AM Message-ID: <200004250613.IAA10174@python.inrialpes.fr> Hi, I'm back on-line. > [Tim] > > Perhaps the 1.6 source distribution could contain a new "intriguing > > experimental patches" directory? Greg's list-comp and Christian's > > Stackless have enough fans that this would probably be appreciated. > > Perhaps some other things too, if we all run out of time (thinking > > mostly of Vladimir's malloc cleanup and NeilS's gc). I'd be in favor of including gc as an optional (experimental) feature. I'm quite confident that it will evolve into a standard feature, in its current or in an improved state. The overall strategy looks good, but there are some black spots w.r.t its cost, both in speed and space. Neil reported in private mail something like 5-10% mem increase, but I doubt that the picture is so optimistic. My understanding is that these numbers reflect the behavior of the Linux VMM in terms of effectively used pages. In terms of absolute, peak requested virtual memory, things are probably worse than that. We're still unclear on this... For 1.6, the gc option would be a handy tool for detecting cyclic trash. It will answer some expectations, and I believe we're ready to give some good feedback on its functioning, its purpose, its limitations, etc. By the time 1.6 is finalized, I expect that we'll know roughly its cost in terms of mem overhead. Overall, it would be nice to have it in the distrib as an experimental feature -- it would both bootstrap some useful feedback, and would encourage enthousiasts to look more closely at DSA/GC (DSA - dynamic storage allocation). By 1.7 (with Py3K on the horizon), we would have a good understanding on what to do with gc and how to do it. If I go one step further, what I expect is that the garbage collector would be enabled together with a Python-specific memory allocator which will compensate the cost introduced by the collector. There will some some stable state again (in terms of speed and size) similar to what we have now, but with a bonus pack of additional memory services. > I definitely want Vladimir's patches in -- I feel very guilty for not > having reviewed his latest proposal yet. I expect that it's right on > the mark, but I understand if Vladimir wants to wait with preparing > yet another set of patches until I'm happy with the design... Yes, I'd prefer to wait and get it right. There's some basis, but it needs careful rethinking again. I'm willing to fit in the 1.6 timeline but I understand very well that it's a matter of time :-). -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Tue Apr 25 08:25:36 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:25:36 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: <000701bfae7f$1174a540$152d153f@tim> [Greg Stein] > ... > Many people have asked for free-threading, and the number of inquiries > that I receive have grown over time. (nobody asked in 1996 when I first > published my patches; I get a query every couple months now) Huh! That means people ask me about it more often than they ask you . I'll add, though, that you have to dig into the inquiry: almost everyone who asks me is running on a uniprocessor machine, and are really after one of two other things: 1. They expect threaded stuff to run faster if free-threaded. "Why?" is a question I can't answer <0.5 wink>. 2. Dealing with the global lock drives them insane, especially when trying to call back into Python from a "foreign" C thread. #2 may be fixable via less radical means (like a streamlined procedure enabled by some relatively minor core interpreter changes, and clearer docs). I'm still a fan of free-threading! It's just one of those things that may yield a "well, ya, that's what I asked for, but turns out it's not what I *wanted*" outcome as often as not. enthusiastically y'rs - tim From tim_one at email.msn.com Tue Apr 25 08:25:38 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:25:38 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: <000801bfae7f$12c456c0$152d153f@tim> [Greg Wilson, on Linda and JavaSpaces] > ... > Personal opinion: I've felt for 15 years that something like Linda could > be to threads and mutexes what structured loops and conditionals are to > the "goto" statement. Were it not for the "Huh" effect, I'd recommend > hanging "Danger!" signs over threads and mutexes, and making tuple spaces > the "standard" concurrency mechanism in Python. There's no question about tuple spaces being easier to learn and to use, but Python slams into a conundrum here akin to the "floating-point versus *anything* sane " one: Python's major real-life use is as a glue language, and threaded apps (ditto IEEE-754 floating-point apps) are overwhelmingly what it needs to glue *to*. So Python has to have a good thread story. Free-threading would be a fine enhancement of it, Tuple spaces (spelled "PyBrenda" or otherwise) would be a fine alternative to it, but Python can't live without threads too. And, yes, everyone who goes down Hoare's CSP road gets lost <0.7 wink>. From tim_one at email.msn.com Tue Apr 25 08:40:26 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:40:26 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004250613.IAA10174@python.inrialpes.fr> Message-ID: <000901bfae81$240b5580$152d153f@tim> [Vladimir Marangozov, on NeilS's gc patch] > ... > The overall strategy looks good, but there are some black spots > w.r.t its cost, both in speed and space. Neil reported in private > mail something like 5-10% mem increase, but I doubt that the picture > is so optimistic. My understanding is that these numbers reflect > the behavior of the Linux VMM in terms of effectively used pages. In > terms of absolute, peak requested virtual memory, things are probably > worse than that. We're still unclear on this... Luckily, that's what Open Source is all about: if we have to wait for you (or Neil, or Guido, or anyone else) to do a formal study of the issue, the patch will never go in. Put the code out there and let people try it, and 50 motivated users will run the only 50 tests that really matter: i.e., does their real code suffer or not? If so, a few of them may even figure out why. less-thought-more-eyeballs-ly y'rs - tim From mal at lemburg.com Tue Apr 25 11:43:46 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 25 Apr 2000 11:43:46 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code Message-ID: <390568D2.2CC50766@lemburg.com> After the discussion about #pragmas two weeks ago and some interesting ideas in the direction of source code encodings and ways to implement them, I would like to restart the talk about encodings in source code and runtime auto-conversions. Fredrik recently posted patches to the patches list which loosen the currently hard-coded default encoding used throughout the Unicode design and add a layer of abstraction which would make it easily possible to change the default encoding at some later point. While making things more abstract is certainly a wise thing to do, I am not sure whether this particular case fits into the design decisions made a few months ago. Here's a short summary of what was discussed recently: 1. Fredrik posted the idea of changing the default encoding from UTF-8 to Latin-1 (he calls this 8-bit Unicode which points to the motivation behind this: 8-bit strings should behave like 8-bit Unicode). His recent patches work into this direction. 2. Fredrik also posted an interesting idea which enables writing Python source code in any supported encoding by having the Python tokenizer read Py_UNICODE data instead of char data. A preprocessor would take care of converting the input to Py_UNICODE; the parser would assure that 8-bit string data gets converted back to char data (using e.g. UTF-8 or Latin-1 for the encoding) 3. Regarding the addition of pragmas to allow specifying the used source code encoding several possibilities were mentioned: - addition of a keyword "pragma" to define pragma dictionaries - usage of a "global" as basis for this - adding a new keyword "decl" which also allows defining other things such as type information - XML like syntax embedded into Python comments Some comments: Ad 1. UTF-8 is used as basis in many other languages such as TCL or Perl. It is not an intuitive way of writing strings and causes problems due to one character spanning 1-6 bytes. Still, the world seems to be moving into this direction, so going the same way can't be all wrong... Note that stream IO can be recoded in a way which allows Python to print and read e.g. Latin-1 (see below). The general idea behind the fixed default encoding design was to give all the power to the user, since she eventually knows best which encoding to use or expect. Ad 2. I like this idea because it enables writing Unicode- aware programs *in* Unicode... the only problem which remains is again the encoding to use for the classic 8-bit strings. Ad 3. For 2. to work, the encoding would have to appear close to the top of the file. The preprocessor would have to be BOM-mark aware to tell whether UTF-16 or some ASCII extension is used by the file. Guido asked me for some code which demonstrates Latin-1 recoding using the existing mechanisms. I've attached a simple script to this mail. It is not much tested yet, so please give it a try. You can also change it to use any other encoding you like. Together with the Japanese codecs provided by Tamito Kajiyama (http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/tmp/japanese-codecs.tar.gz) you should be able to type Shift-JIS at the raw_input() or interactive prompt, have it stored as UTF-8 and then printed back as Shift-JIS, provided you put add a recoder similar to the attached one for Latin-1 to your PYTHONSTARTUP or site.py script. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- A non-text attachment was scrubbed... Name: latin1io.py Type: text/python Size: 1740 bytes Desc: not available URL: From effbot at telia.com Tue Apr 25 17:16:25 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 25 Apr 2000 17:16:25 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> Message-ID: <00a401bfaec9$3aaae100$34aab5d4@hagrid> I'll follow up with a longer reply later; just one correction: M.-A. Lemburg wrote: > Ad 1. UTF-8 is used as basis in many other languages such > as TCL or Perl. It is not an intuitive way of > writing strings and causes problems due to one character > spanning 1-6 bytes. Still, the world seems to be moving > into this direction, so going the same way can't be all > wrong... the problem here is the current Python implementation doesn't use UTF-8 in the same way as Perl and Tcl. Perl and Tcl only exposes one string type, and that type be- haves exactly like it should: "The Tcl string functions properly handle multi- byte UTF-8 characters as single characters." "By default, Perl now thinks in terms of Unicode characters instead of simple bytes. /.../ All the relevant built-in functions (length, reverse, and so on) now work on a character-by-character basis instead of byte-by-byte, and strings are represented internally in Unicode." or in other words, both languages guarantee that given a string s: - s is a sequence of characters (not bytes) - len(s) is the number of characters in the string - s[i] is the i'th character - len(s[i]) is 1 and as I've pointed out a zillion times, Python 1.6a2 doesn't. this should be solved, and I see (at least) four ways to do that: -- the Tcl 8.1 way: make 8-bit strings UTF-8 aware. operations like len and getitem usually searches from the start of the string. to handle binary data, introduce a special ByteArray type. when mixing ByteArrays and strings, treat each byte in the array as an 8-bit unicode character (conversions from strings to byte arrays are lossy). [imho: lots of code, and seriously affects performance, even when unicode characters are never used. this approach was abandoned in Tcl 8.2] -- the Tcl 8.2 way: use a unified string type, which stores data as UTF-8 and/or 16-bit unicode: struct { char* bytes; /* 8-bit representation (utf-8) */ Tcl_UniChar* unicode; /* 16-bit representation */ } if one of the strings are modified, the other is regenerated on demand. operations like len, slice and getitem always convert to 16-bit first. still need a ByteArray type, similar to the one described above. [imho: faster than before, but still not as good as a pure 8-bit string type. and the need for a separate byte array type would break alot of existing Python code] -- the Perl 5.6 way? (haven't looked at the implementation, but I'm pretty sure someone told me it was done this way). essentially same as Tcl 8.2, but with an extra encoding field (to avoid con- versions if data is just passed through). struct { int encoding; char* bytes; /* 8-bit representation */ Tcl_UniChar* unicode; /* 16-bit representation */ } [imho: see Tcl 8.2] -- my proposal: expose both types, but let them contain characters from the same character set -- at least when used as strings. as before, 8-bit strings can be used to store binary data, so we don't need a separate ByteArray type. in an 8-bit string, there's always one character per byte. [imho: small changes to the existing code base, about as efficient as can be, no attempt to second-guess the user, fully backwards com- patible, fully compliant with the definition of strings in the language reference, patches are available, etc...] From jeremy at cnri.reston.va.us Tue Apr 25 19:20:44 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 25 Apr 2000 13:20:44 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <3904660D.6F22F798@trixie.triqs.com> References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> Message-ID: <14597.54252.185633.504968@goon.cnri.reston.va.us> The performance difference I see on my Sparc is smaller. The machine is a 200MHz Ultra Sparc 2 with 256MB of RAM, built both versions with GCC 2.8.1. It appears that 1.6a2 is about 3.3% slower. The median pystone time taken from 10 measurements are: 1.5.2 4.87 1.6a2 5.035 For comparison, the numbers I see on my Linux box (dual PII 266) are: 1.5.2 3.18 1.6a2 3.53 That's about 10% faster under 1.5.2. I'm not sure how important this change is. Three percent isn't enough for me to worry about, but it's a minority platform. I suppose 10 percent is right on the cusp. If the performance difference is the cost of the many improvements of 1.6, I think it's worth the price. Jeremy From tismer at tismer.com Tue Apr 25 20:12:39 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 25 Apr 2000 20:12:39 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> <14597.54252.185633.504968@goon.cnri.reston.va.us> Message-ID: <3905E017.1565757C@tismer.com> Jeremy Hylton wrote: > > The performance difference I see on my Sparc is smaller. The machine > is a 200MHz Ultra Sparc 2 with 256MB of RAM, built both versions with > GCC 2.8.1. It appears that 1.6a2 is about 3.3% slower. > > The median pystone time taken from 10 measurements are: > 1.5.2 4.87 > 1.6a2 5.035 > > For comparison, the numbers I see on my Linux box (dual PII 266) are: > > 1.5.2 3.18 > 1.6a2 3.53 > > That's about 10% faster under 1.5.2. Which GCC was it on the Linux box, and how much RAM does it have? > I'm not sure how important this change is. Three percent isn't enough > for me to worry about, but it's a minority platform. I suppose 10 > percent is right on the cusp. If the performance difference is the > cost of the many improvements of 1.6, I think it's worth the price. Yes, and I'm happy to pay the price if I can see where I pay. That's the problem, the changes between the pre-unicode tag and the current CVS are not enough to justify that speed loss. There must be something substantial. I also don't grasp why my optimizations are so much more powerful on 1.5.2+ as on 1.6 . Mark Hammond pointed me to the int/long unification. Was this done *after* the unicode patches? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at tismer.com Tue Apr 25 20:27:20 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 25 Apr 2000 20:27:20 +0200 Subject: [Python-Dev] Off-topic Message-ID: <3905E388.2C1911C1@tismer.com> Hey, don't blame me for posting a joke :-) Please read from the beginning, don't look at the end first. No, this is no offense... -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com -------------- next part -------------- An embedded message was scrubbed... From: "A.Bergmann bei BRAHMS" Subject: Moin..... Date: Tue, 25 Apr 2000 09:07:49 +0200 Size: 2723 URL: From mal at lemburg.com Tue Apr 25 22:13:39 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 25 Apr 2000 22:13:39 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> Message-ID: <3905FC73.7D7D6B1D@lemburg.com> Fredrik Lundh wrote: > > I'll follow up with a longer reply later; just one correction: > > M.-A. Lemburg wrote: > > Ad 1. UTF-8 is used as basis in many other languages such > > as TCL or Perl. It is not an intuitive way of > > writing strings and causes problems due to one character > > spanning 1-6 bytes. Still, the world seems to be moving > > into this direction, so going the same way can't be all > > wrong... > > the problem here is the current Python implementation > doesn't use UTF-8 in the same way as Perl and Tcl. Perl > and Tcl only exposes one string type, and that type be- > haves exactly like it should: > > "The Tcl string functions properly handle multi- > byte UTF-8 characters as single characters." > > "By default, Perl now thinks in terms of Unicode > characters instead of simple bytes. /.../ All the > relevant built-in functions (length, reverse, and > so on) now work on a character-by-character > basis instead of byte-by-byte, and strings are > represented internally in Unicode." > > or in other words, both languages guarantee that given a > string s: > > - s is a sequence of characters (not bytes) > - len(s) is the number of characters in the string > - s[i] is the i'th character > - len(s[i]) is 1 > > and as I've pointed out a zillion times, Python 1.6a2 doesn't. Just a side note: we never discussed turning the native 8-bit strings into any encoding aware type. > this > should be solved, and I see (at least) four ways to do that: > > ... > -- the Perl 5.6 way? (haven't looked at the implementation, but I'm > pretty sure someone told me it was done this way). essentially > same as Tcl 8.2, but with an extra encoding field (to avoid con- > versions if data is just passed through). > > struct { > int encoding; > char* bytes; /* 8-bit representation */ > Tcl_UniChar* unicode; /* 16-bit representation */ > } > > [imho: see Tcl 8.2] > > -- my proposal: expose both types, but let them contain characters > from the same character set -- at least when used as strings. > > as before, 8-bit strings can be used to store binary data, so we > don't need a separate ByteArray type. in an 8-bit string, there's > always one character per byte. > > [imho: small changes to the existing code base, about as efficient as > can be, no attempt to second-guess the user, fully backwards com- > patible, fully compliant with the definition of strings in the language > reference, patches are available, etc...] Why not name the beast ?! In your proposal, the old 8-bit strings simply use Latin-1 as native encoding. The current version doesn't make any encoding assumption as long as the 8-bit strings do not get auto-converted. In that case they are interpreted as UTF-8 -- which will (usually) fail for Latin-1 encoded strings using the 8th bit, but hey, at least you get an error message telling you what is going wrong. The key to these problems is using explicit conversions where 8-bit strings meet Unicode objects. Some more ideas along the convenience path: Perhaps changing just the way 8-bit strings are coerced to Unicode would help: strings would then be interpreted as Latin-1. str(Unicode) and "t" would still return UTF-8 to assure loss-less conversion. Another way to tackle this would be to first try UTF-8 conversion during auto-conversion and then fallback to Latin-1 in case it fails. Has anyone tried this ? Guido mentioned that TCL does something along these lines... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin at mems-exchange.org Tue Apr 25 22:54:11 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 25 Apr 2000 16:54:11 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <3905E017.1565757C@tismer.com> References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> <14597.54252.185633.504968@goon.cnri.reston.va.us> <3905E017.1565757C@tismer.com> Message-ID: <14598.1523.533352.759437@amarok.cnri.reston.va.us> Christian Tismer writes: >Mark Hammond pointed me to the int/long unification. >Was this done *after* the unicode patches? Before. It seems unlikely they're the cause (they just add a 'if (PyLong_Check(key)' branch to the slicing functions in abstract.c. OTOH, if pystone really exercises sequence multiplication, maybe they're related (but 10% worth?). -- A.M. Kuchling http://starship.python.net/crew/amk/ I know flattery when I hear it; but I do not often hear it. -- Robertson Davies, _Fifth Business_ From effbot at telia.com Tue Apr 25 23:51:45 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 25 Apr 2000 23:51:45 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.104,1.105 References: <200004252134.RAA02207@eric.cnri.reston.va.us> Message-ID: <002601bfaf00$74d462c0$34aab5d4@hagrid> > + insint(d, "MSG_DONWAIT", MSG_DONTWAIT); better make that > + insint(d, "MSG_DONTWAIT", MSG_DONTWAIT); right? From effbot at telia.com Wed Apr 26 00:05:54 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 00:05:54 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> <3905FC73.7D7D6B1D@lemburg.com> Message-ID: <002701bfaf02$734a2e60$34aab5d4@hagrid> M.-A. Lemburg wrote: > > and as I've pointed out a zillion times, Python 1.6a2 doesn't. > > Just a side note: we never discussed turning the native > 8-bit strings into any encoding aware type. hey, you just argued that we should use UTF-8 because Tcl and Perl use it, didn't you? my point is that they don't use it the way Python 1.6a2 uses it, and that their design is correct, while our design is slightly broken. so let's fix it ! > Why not name the beast ?! In your proposal, the old 8-bit > strings simply use Latin-1 as native encoding. in my proposal, there's an important distinction between character sets and character encodings. unicode is a character set. latin 1 is one of many possible encodings of (portions of) that set. maybe it's easier to grok if we get rid of the term "character set"? http://www.hut.fi/u/jkorpela/chars.html suggests the following replacements: character repertoire A set of distinct characters. character code A mapping, often presented in tabular form, which defines one-to-one correspondence between characters in a character repertoire and a set of nonnegative integers. character encoding A method (algorithm) for presenting characters in digital form by mapping sequences of code numbers of characters into sequences of octets. now, in my proposal, the *repertoire* contains all characters described by the unicode standard. the *codes* are defined by the same standard. but strings are sequences of characters, not sequences of octets: strings have *no* encoding. (the encoding used for the internal string storage is an implementation detail). (but sure, given the current implementation, the internal storage for an 8-bit string happens use Latin-1. just as the internal storage for a 16-bit string happens to use UCS-2 stored in native byte order. but from the outside, they're just character sequences). > The current version doesn't make any encoding assumption as > long as the 8-bit strings do not get auto-converted. In that case > they are interpreted as UTF-8 -- which will (usually) fail > for Latin-1 encoded strings using the 8th bit, but hey, at least > you get an error message telling you what is going wrong. sure, but I don't think you get the right message, or that you get it at the right time. consider this: if you're going from 8-bit strings to unicode using implicit con- version, the current design can give you: "UnicodeError: UTF-8 decoding error: unexpected code byte" if you go from unicode to 8-bit strings, you'll never get an error. however, the result is not always a string -- if the unicode string happened to contain any characters larger than 127, the result is a binary buffer containing encoded data. you cannot use string methods on it, you cannot use regular expressions on it. indexing and slicing won't work. unlike earlier versions of Python, and unlike unicode-aware versions of Tcl and Perl, the fundamental assumption that a string is a sequence of characters no longer holds. in my proposal, going from 8-bit strings to unicode always works. a character is a character, no matter what string type you're using. however, going from unicode to an 8-bit string may given you an OverflowError, say: "OverflowError: unicode character too large to fit in a byte" the important thing here is that if you don't get an exception, the result is *always* a string. string methods always work. etc. [8. Special cases aren't special enough to break the rules.] > The key to these problems is using explicit conversions where > 8-bit strings meet Unicode objects. yeah, but the flaw in the current design is the implicit conversions, not the explicit ones. [2. Explicit is better than implicit.] (of course, the 8-bit string type also needs an "encode" method under my proposal, but that's just a detail ;-) > Some more ideas along the convenience path: > > Perhaps changing just the way 8-bit strings are coerced > to Unicode would help: strings would then be interpreted > as Latin-1. ok. > str(Unicode) and "t" would still return UTF-8 to assure loss- > less conversion. maybe. or maybe str(Unicode) should return a unicode string? think about it! (after all, I'm pretty sure that ord() and chr() should do the right thing, also for character codes above 127) > Another way to tackle this would be to first try UTF-8 > conversion during auto-conversion and then fallback to > Latin-1 in case it fails. Has anyone tried this ? Guido > mentioned that TCL does something along these lines... haven't found any traces of that in the source code. hmm, you're right -- it looks like it attempts to "fix" invalid UTF-8 data (on a character by character basis), instead of choking on it. scary. [12. In the face of ambiguity, refuse the temptation to guess.] more tomorrow. From guido at python.org Wed Apr 26 00:35:30 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 18:35:30 -0400 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code In-Reply-To: Your message of "Tue, 25 Apr 2000 17:16:25 +0200." <00a401bfaec9$3aaae100$34aab5d4@hagrid> References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> Message-ID: <200004252235.SAA02554@eric.cnri.reston.va.us> [Fredrik] > -- my proposal: expose both types, but let them contain characters > from the same character set -- at least when used as strings. > > as before, 8-bit strings can be used to store binary data, so we > don't need a separate ByteArray type. in an 8-bit string, there's > always one character per byte. > > [imho: small changes to the existing code base, about as efficient as > can be, no attempt to second-guess the user, fully backwards com- > patible, fully compliant with the definition of strings in the language > reference, patches are available, etc...] Sorry, all this proposal does is change the default encoding on conversions from UTF-8 to Latin-1. That's very western-culture-centric. You already have control over the encoding: use unicode(s, "latin-1"). If there are places where you don't have enough control (e.g. file I/O), let's add control there. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 26 01:08:39 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 19:08:39 -0400 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) Message-ID: <200004252308.TAA05717@eric.cnri.reston.va.us> The email below is a serious bug report. A quick analysis shows that UserString.count() calls the count() method on a string object, which calls PyArg_ParseTuple() with the format string "O|ii". The 'i' format code truncates integers. It probably should raise an overflow exception instead. But that would still cause the test to fail -- just in a different way (more explicit). Then the string methods should be fixed to use long ints instead -- and then something else would probably break... --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Mon, 24 Apr 2000 19:26:27 -0400 From: mark.favas at per.dem.csiro.au To: python-bugs-list at python.org cc: bugs-py at python.org Subject: [Python-bugs-list] 1.6a2 issues with int/long on 64bit platforms - eg stringobject (PR#306) Full_Name: Mark Favas Version: 1.6a2 CVS of 25 April OS: DEC Alpha, Tru64 Unix 4.0F Submission from: wa107.dialup.csiro.au (130.116.4.107) There seems to be issues (and perhaps lurking cans of worms) on 64-bit platforms where sizeof(long) != sizeof(int). For example, the CVS version of 1.6a2 of 25 April fails the UserString regression test. The tests fail as follows (verbose set to 1): abcabcabc.count(('abc',)) no 'abcabcabc' 3 <> 2 abcabcabc.count(('abc', 1)) no 'abcabcabc' 2 <> 1 abcdefghiabc.find(('abc', 1)) no 'abcdefghiabc' 9 < > - -1 abcdefghiabc.rfind(('abc',)) no 'abcdefghiabc' 9 <> 0 abcabcabc.rindex(('abc',)) no 'abcabcabc' 6 <> 3 abcabcabc.rindex(('abc', 1)) no 'abcabcabc' 6 <> 3 These tests are failing because the calls from the UserString methods to the underlying string methods are setting the default value of the end-of-string parameter to sys.maxint, which is defined as LONG_MAX (9223372036854775807), whereas the string methods in stringobject.c are using ints and expecting them to be no larger than INT_MAX (2147483647). Thus the end-of-string parameter becomes -1 in the default case. The size of an int on my platform is 4, and the size of a long is 8, so the "natural size of a Python integer" should be 8, by my understanding. The obvious fix is to change stringobject.c to use longs, rather than ints, but the problem might be more widespread than that. INT_MAX is used in unicodeobject.c, pypcre.c, _sre.c, stropmodule.c, and ceval.c as well as stringobject.c. Some of these look as though LONG_MAX should have been used (variables compared to INT_MAX are longs, but I am not confident enough to submit patches for them... Mark _______________________________________________ Python-bugs-list maillist - Python-bugs-list at python.org http://www.python.org/mailman/listinfo/python-bugs-list ------- End of Forwarded Message From pf at artcom-gmbh.de Wed Apr 26 09:34:09 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 26 Apr 2000 09:34:09 +0200 (MEST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.104,1.105 In-Reply-To: <200004252134.RAA02207@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 25, 2000 5:34:56 pm" Message-ID: Guido van Rossum: > Modified Files: > socketmodule.c [...] > *** 2526,2529 **** > --- 2526,2532 ---- > #ifdef MSG_DONTROUTE > insint(d, "MSG_DONTROUTE", MSG_DONTROUTE); > + #endif > + #ifdef MSG_DONTWAIT > + insint(d, "MSG_DONWAIT", MSG_DONTWAIT); -------------------------^^? Shouldn't this read "MSG_DONTWAIT"? ----------------------------^! Nitpicking, Peter From fredrik at pythonware.com Wed Apr 26 11:00:03 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 11:00:03 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> <200004252235.SAA02554@eric.cnri.reston.va.us> Message-ID: <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> > Sorry, all this proposal does is change the default encoding on > conversions from UTF-8 to Latin-1. That's very > western-culture-centric. That decision was made by ISO and the Unicode consortium, not me. I don't know why, and I don't really care -- I'm arguing that strings should contain characters, just like the language reference says, and that all characters should be from the same character repertoire and use the same character codes. From just at letterror.com Wed Apr 26 14:04:08 2000 From: just at letterror.com (Just van Rossum) Date: Wed, 26 Apr 2000 13:04:08 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <8e6bsl$f1a$1@nnrp1.deja.com> References: <1256565470-46720619@hypernet.com> <6M1J4.662$rc9.209708544@newsb.telia.net> <8daop0$8fk$1@slb6.atl.mindspring.net> Message-ID: Fredrik Lundh replied to himself in c.l.py: >> as far as I can tell, it's supposed to be a feature. >> >> if you mix 8-bit strings with unicode strings, python 1.6a2 >> attempts to interpret the 8-bit string as an utf-8 encoded >> unicode string. >> >> but yes, I also think it's a bug. but this far, my attempts >> to get someone else to fix it has failed. might have to do >> it myself... ;-) > >postscript: the powers-that-be has decided that this is not >a bug. if you thought that strings were just sequences of >characters, just as in Perl and Tcl, you're in for one big >surprise in Python 1.6... I just read the last few posts of the powers-that-be-list on this subject (Thanks to Christian for pointing out the archives in c.l.py ;-), and I must say I completely agree with Fredrik. The current situation sucks. A string should always be a sequence of characters. A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". An 8-bit string should never be assumed to be utf-8 because of that distinction. (The default encoding for the builtin unicode() function may be another story.) Just From mal at lemburg.com Wed Apr 26 14:03:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 26 Apr 2000 14:03:36 +0200 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) References: <200004252308.TAA05717@eric.cnri.reston.va.us> Message-ID: <3906DB18.CB76EEC0@lemburg.com> Guido van Rossum wrote: > > The email below is a serious bug report. A quick analysis shows that > UserString.count() calls the count() method on a string object, which > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > format code truncates integers. It probably should raise an overflow > exception instead. But that would still cause the test to fail -- > just in a different way (more explicit). Then the string methods > should be fixed to use long ints instead -- and then something else > would probably break... All uses in stringobject.c and unicodeobject.c use INT_MAX together with integers, so there's no problem on that side of the fence ;-) Since strings and Unicode objects use integers to describe the length of the object (as well as most if not all other builtin sequence types), the correct default value should thus be something like sys.maxlen which then gets set to INT_MAX. I'd suggest adding sys.maxlen and the modifying UserString.py, re.py and sre_parse.py accordingly. > --Guido van Rossum (home page: http://www.python.org/~guido/) > > ------- Forwarded Message > > Date: Mon, 24 Apr 2000 19:26:27 -0400 > From: mark.favas at per.dem.csiro.au > To: python-bugs-list at python.org > cc: bugs-py at python.org > Subject: [Python-bugs-list] 1.6a2 issues with int/long on 64bit platforms - eg > stringobject (PR#306) > > Full_Name: Mark Favas > Version: 1.6a2 CVS of 25 April > OS: DEC Alpha, Tru64 Unix 4.0F > Submission from: wa107.dialup.csiro.au (130.116.4.107) > > There seems to be issues (and perhaps lurking cans of worms) on 64-bit > platforms > where sizeof(long) != sizeof(int). > > For example, the CVS version of 1.6a2 of 25 April fails the UserString > regression test. The tests fail as follows (verbose set to 1): > > abcabcabc.count(('abc',)) no > 'abcabcabc' 3 <> > 2 > abcabcabc.count(('abc', 1)) no > 'abcabcabc' 2 <> > 1 > abcdefghiabc.find(('abc', 1)) no > 'abcdefghiabc' 9 < > > > - -1 > abcdefghiabc.rfind(('abc',)) no > 'abcdefghiabc' 9 > <> 0 > abcabcabc.rindex(('abc',)) no > 'abcabcabc' 6 <> > 3 > abcabcabc.rindex(('abc', 1)) no > 'abcabcabc' 6 <> > 3 > > These tests are failing because the calls from the UserString methods to the > underlying string methods are setting the default value of the end-of-string > parameter to sys.maxint, which is defined as LONG_MAX (9223372036854775807), > whereas the string methods in stringobject.c are using ints and expecting them > to be no larger than INT_MAX (2147483647). > Thus the end-of-string parameter becomes -1 in the default case. The size of an > int on my platform is 4, and the size of a long is 8, so the "natural size of > a Python integer" should be 8, by my understanding. The obvious fix is to > change > stringobject.c to use longs, rather than ints, but the problem might be more > widespread than that. INT_MAX is used in unicodeobject.c, pypcre.c, _sre.c, > stropmodule.c, and ceval.c as well as stringobject.c. Some of these look as > though LONG_MAX should have been used (variables compared to INT_MAX are longs, > but I am not confident enough to submit patches for them... > > Mark > > _______________________________________________ > Python-bugs-list maillist - Python-bugs-list at python.org > http://www.python.org/mailman/listinfo/python-bugs-list > > ------- End of Forwarded Message > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Wed Apr 26 15:00:21 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 26 Apr 2000 06:00:21 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <000701bfae7f$1174a540$152d153f@tim> Message-ID: On Tue, 25 Apr 2000, Tim Peters wrote: > [Greg Stein] > > ... > > Many people have asked for free-threading, and the number of inquiries > > that I receive have grown over time. (nobody asked in 1996 when I first > > published my patches; I get a query every couple months now) > > Huh! That means people ask me about it more often than they ask you . > > I'll add, though, that you have to dig into the inquiry: almost everyone > who asks me is running on a uniprocessor machine, and are really after one > of two other things: > > 1. They expect threaded stuff to run faster if free-threaded. "Why?" is > a question I can't answer <0.5 wink>. Heh. Yes, I definitely see this one. But there are some clueful people out there, too, so I'm not totally discouraged :-) > 2. Dealing with the global lock drives them insane, especially when trying > to call back into Python from a "foreign" C thread. > > #2 may be fixable via less radical means (like a streamlined procedure > enabled by some relatively minor core interpreter changes, and clearer > docs). No doubt. I was rather upset with Guido's "Swap" API for the thread state. Grr. I sent him a very nice (IMO) API that I used for my patches. The Swap was simply a poor choice on his part. It implies that you are swapping a thread state for another (specifically: the "current" thread state). Of course, that is wholly inappropriate in a free-threading environment. All those calls to _Swap() will be overhead in an FT world. I liked my "PyThreadState *PyThreadState_Ensure()" function. It would create the sucker if it didn't exist, then return *this* thread's state to you. Handy as hell. No monkeying around with "Get. oops. didn't exist. let's create one now." > I'm still a fan of free-threading! It's just one of those things that may > yield a "well, ya, that's what I asked for, but turns out it's not what I > *wanted*" outcome as often as not. hehe. Damn straight. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From just at letterror.com Wed Apr 26 16:13:13 2000 From: just at letterror.com (Just van Rossum) Date: Wed, 26 Apr 2000 15:13:13 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: I wrote: >A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". Another way of putting this is: - utf-8 in an 8-bit string is to a unicode string what a pickle is to an object. - defaulting to utf-8 upon coercing is like implicitly trying to unpickle an 8-bit string when comparing it to an instance. Bad idea. Defaulting to Latin-1 is the only logical choice, no matter how western-culture-centric this may seem. Just From mal at lemburg.com Wed Apr 26 20:01:48 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 26 Apr 2000 20:01:48 +0200 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) References: Message-ID: <39072F0C.5214E339@lemburg.com> Just van Rossum wrote: > > I wrote: > >A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". > > Another way of putting this is: > - utf-8 in an 8-bit string is to a unicode string what a pickle is to an > object. > - defaulting to utf-8 upon coercing is like implicitly trying to unpickle > an 8-bit string when comparing it to an instance. Bad idea. > > Defaulting to Latin-1 is the only logical choice, no matter how > western-culture-centric this may seem. Please note that the support for mixing strings and Unicode objects is really only there to aid porting applications to Unicode. New code should use Unicode directly and apply all needed conversions explicitly using one of the many ways to encode or decode Unicode data. The auto-conversions are only there to help out and provide some convenience. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Apr 26 20:51:56 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 26 Apr 2000 14:51:56 -0400 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: Your message of "Wed, 26 Apr 2000 14:03:36 +0200." <3906DB18.CB76EEC0@lemburg.com> References: <200004252308.TAA05717@eric.cnri.reston.va.us> <3906DB18.CB76EEC0@lemburg.com> Message-ID: <200004261851.OAA06794@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > > > The email below is a serious bug report. A quick analysis shows that > > UserString.count() calls the count() method on a string object, which > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > format code truncates integers. It probably should raise an overflow > > exception instead. But that would still cause the test to fail -- > > just in a different way (more explicit). Then the string methods > > should be fixed to use long ints instead -- and then something else > > would probably break... > > All uses in stringobject.c and unicodeobject.c use INT_MAX > together with integers, so there's no problem on that side > of the fence ;-) > > Since strings and Unicode objects use integers to describe the > length of the object (as well as most if not all other > builtin sequence types), the correct default value should > thus be something like sys.maxlen which then gets set to > INT_MAX. > > I'd suggest adding sys.maxlen and the modifying UserString.py, > re.py and sre_parse.py accordingly. Hm, I'm not so sure. It would be much better if passing sys.maxint would just WORK... Since that's what people have been doing so far. --Guido van Rossum (home page: http://www.python.org/~guido/) From nascheme at enme.ucalgary.ca Wed Apr 26 21:06:51 2000 From: nascheme at enme.ucalgary.ca (Neil Schemenauer) Date: Wed, 26 Apr 2000 13:06:51 -0600 Subject: [Python-Dev] L1 data cache profile for Python 1.5.2 and 1.6 Message-ID: <20000426130651.C23227@acs.ucalgary.ca> Using this tool: http://www.cacheprof.org/ I got this output: http://www.enme.ucalgary.ca/~nascheme/python/cache.out http://www.enme.ucalgary.ca/~nascheme/python/cache-152.out The cache miss rate for eval_code2 is about two times larger in 1.6. The overall miss rate is about the same. Is this significant? I suspect that the instruction cache is more important for eval_code2. Unfortunately cacheprof can only profile the L1 data cache. Perhaps someone will find this data useful or interesting. Neil From tismer at tismer.com Wed Apr 26 23:24:39 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 26 Apr 2000 23:24:39 +0200 Subject: [Fwd: [Python-Dev] Where the speed is lost! (was: 1.6 speed)] Message-ID: <39075E97.23DBDD63@tismer.com> I forgot to cc python-dev. This file is closed for me. the sun is shining again, life is so wonderful and now for something completely different - chris -------------- next part -------------- An embedded message was scrubbed... From: Christian Tismer Subject: Re: [Python-Dev] Where the speed is lost! (was: 1.6 speed) Date: Wed, 26 Apr 2000 23:19:20 +0200 Size: 3299 URL: From effbot at telia.com Wed Apr 26 23:29:10 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 23:29:10 +0200 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) References: <39072F0C.5214E339@lemburg.com> Message-ID: <002f01bfafc6$779804a0$34aab5d4@hagrid> (forwarded from c.l.py, on request) > New code should use Unicode directly and apply all needed > conversions explicitly using one of the many ways to > encode or decode Unicode data. The auto-conversions are > only there to help out and provide some convenience. does this mean that the 8-bit string type is deprecated ??? From effbot at telia.com Wed Apr 26 23:45:40 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 23:45:40 +0200 Subject: [Python-Dev] fun with unicode, part 1 Message-ID: <004501bfafc8$c51d1240$34aab5d4@hagrid> >>> filename = u"gr?t" >>> file = open(filename, "w") >>> file.close() >>> import glob >>> print glob.glob("gr*") ['gr\303\266t'] >>> print glob.glob(u"gr*") [u'gr\366t'] >>> import os >>> os.system("dir gr*") ... GR??T 0 01-02-03 12.34 gr??t 1 fil(es) 0 byte 0 dir 12 345 678 byte free hmm. From mhammond at skippinet.com.au Thu Apr 27 02:08:23 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 27 Apr 2000 10:08:23 +1000 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <002f01bfafc6$779804a0$34aab5d4@hagrid> Message-ID: It is necessary for us to also have this scrag-fight in public? Most of the thread on c.l.py is filled in by people who are also py-dev members! [MAL writes] > Please note that the support for mixing strings and Unicode > objects is really only there to aid porting applications > to Unicode. > > New code should use Unicode directly and apply all needed > conversions explicitly using one of the many ways to > encode or decode Unicode data. This will _never_ happen. The Python programmer should never need to be aware they have a Unicode string versus a standard string - just a "string"! The fact there are 2 string types should be considered an implementation detail, and not a conceptual model for people to work within. I think we will be mixing Unicode and strings for ever! The only way to avoid it would be a unified type - possibly Py3k. Until then, people will still generally use strings as literals in their code, and should not even be aware they are mixing. Im never going to prefix my ascii-only strings with u"" just to avoid the possibility of mixing! Listening to the arguments, Ive got to say Im coming down squarely on the side of Fredrik and Just. strings must be sequences of characters, whose length is the number of characters. A string holding an encoding should be considered logically a byte array, and conversions should be explicit. > The auto-conversions are only there to help out and provide some convenience. Doesn't sound like it is working :-( Mark. From akuchlin at mems-exchange.org Thu Apr 27 03:45:37 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 26 Apr 2000 21:45:37 -0400 (EDT) Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug In-Reply-To: References: <002f01bfafc6$779804a0$34aab5d4@hagrid> Message-ID: <14599.39873.159386.778558@newcnri.cnri.reston.va.us> Mark Hammond writes: >It is necessary for us to also have this scrag-fight in public? >Most of the thread on c.l.py is filled in by people who are also >py-dev members! Attempting to walk a delicate line here, my reading of the situation is that Fredrik's frustration level is increaing as he points out problems, but nothing much is done about them. Marc-Andre will usually respond, but there's been no indication from Guido about what to do. But GvR might be waiting to hear from more users about their experience with Unicode; so far I don't know if anyone has much experience with the new code. But why not have it in public? The python-dev archives are publicly available anyway, so it's not like this discussion was going on behind closed doors. The problem with discussing this on c.l.py is that not everyone reads c.l.py any more due to volume. --amk From paul at prescod.net Thu Apr 27 03:47:41 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 26 Apr 2000 20:47:41 -0500 Subject: [Python-Dev] Python Unicode References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> <200004252235.SAA02554@eric.cnri.reston.va.us> <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> Message-ID: <39079C3D.4000C74C@prescod.net> Fredrik Lundh wrote: > > ... > > But alright, I give up. I've wasted way too much time on this, my > patches were rejected, and nobody seems to care. Not exactly > inspiring. I can understand how frustrating this is. Sometimes something seems just so clean and mathematically obvious that you can't see why others don't see it that way. A character is the "smallest unit of text." Strings are lists of characters. Characters in character sets have numbers. Python users should never know or care whether a string object is an 8-bit string or a Unicode string. There should be no distinction. u"" should be a syntactic shortcut. The primary reason I have not been involved is that I have not had a chance to look at the implementation and figure out if there is an overriding implementation-based reason to ignore the obvious right thing (e.g the right thing will break too much code or be too slow or...). "Unicode objects" should be an implementation detail (if they exist at all). Strings are strings are strings. The Python programmer shouldn't care about whether one string was read from a Unicode file and another from an ASCII file and one typed in with "u" and one without. It's all the same thing! If the programmer wants to do an explicit UTF-8 decode on a string (whether it is Unicode or 8-bit string...no difference) then that decode should proceed by looking at each character, deriving an integer and then treating that integer as an octet according to the UTF-8 specification. Char -> Integer -> Byte -> Char The end result (and hopefully the performance) would be the same but the model is much, much cleaner if there is only one kind of string. We should not ignore the example set by every other language (and yes, I'm including XML here :) ). I'm as desperate (if not as vocal) as Fredrick is here. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From gmcm at hypernet.com Thu Apr 27 04:13:00 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 26 Apr 2000 22:13:00 -0400 Subject: [Python-Dev] Python Unicode In-Reply-To: <39079C3D.4000C74C@prescod.net> Message-ID: <1255320912-64004084@hypernet.com> I haven't weighed in on this one, mainly because I don't even need ISO-1, let alone Unicode, (and damned proud of it, too!). But Fredrik's glob example was horrifying. I do know that I am always concious of whether a particular string is a sequence of characters, or a sequence of bytes. Seems to me the Py3K answer is to make those separate types. Until then, I guess I'll just remain completely xenophobic (and damned proud of it, too!). - Gordon From tim_one at email.msn.com Thu Apr 27 04:27:47 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 26 Apr 2000 22:27:47 -0400 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: Message-ID: <000101bfaff0$2d5f5ee0$272d153f@tim> [Just van Rossum] > ... > Defaulting to Latin-1 is the only logical choice, no matter how > western-culture-centric this may seem. Indeed, if someone from an inferior culture wants to chime in, let them find Python-Dev with their own beady little eyes . western-culture-is-better-than-none-&-at-least-*we*-understand-it-ly y'rs - tim From tim_one at email.msn.com Thu Apr 27 06:39:21 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 27 Apr 2000 00:39:21 -0400 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code In-Reply-To: <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> Message-ID: <000001bfb002$8f720260$0d2d153f@tim> [/F] > ... > But alright, I give up. I've wasted way too much time on this, my > patches were rejected, and nobody seems to care. Not exactly > inspiring. I lost track of this stuff months ago, and since I use only 7-bit ASCII in my own source code and file names and etc etc, UTF-8 and Latin-1 are identical to me <0.5 wink>. [Guido] > Sorry, all this proposal does is change the default encoding on > conversions from UTF-8 to Latin-1. That's very > western-culture-centric. Well, if you talk with an Asian, they'll probably tell you that Unicode itself is Eurocentric, and especially UTF-8 (UTF-7 introduces less bloat for non-Latin-1 Unicode characters). Most everyone likes their own national gimmicks best. Or, as Andy once said (paraphrasing), the virtue of UTF-8 is that it annoys everyone. I do expect that the vase bulk of users would be less surprised if Latin-1 *were* the default encoding. Then the default would be usable as-is for many more people; UTF-8 is usable as-is only for me (i.e., 7-bit Americans). The non-Euros are in for a world of pain no matter what. just-because-some-groups-can't-win-doesn't-mean-everyone-must- lose-ly y'rs - tim From just at letterror.com Thu Apr 27 07:42:43 2000 From: just at letterror.com (Just van Rossum) Date: Thu, 27 Apr 2000 06:42:43 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <000101bfaff0$2d5f5ee0$272d153f@tim> References: Message-ID: At 10:27 PM -0400 26-04-2000, Tim Peters wrote: >Indeed, if someone from an inferior culture wants to chime in, let them find >Python-Dev with their own beady little eyes . All irony aside, I think you've nailed one of the problems spot on: - most core Python developers seem to be too busy to read *anything* at all in c.l.py - most people that care about the issues are not on python-dev Just From tim_one at email.msn.com Thu Apr 27 07:08:11 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 27 Apr 2000 01:08:11 -0400 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparingstrings and ints) In-Reply-To: Message-ID: <000101bfb006$95962280$0d2d153f@tim> [Just van Rossum] > All irony aside, I think you've nailed one of the problems spot on: > - most core Python developers seem to be too busy to read > *anything* at all in c.l.py > - most people that care about the issues are not on python-dev But they're not on c.l.py either, are they? I still read everything there, although that's gotten so time-consuming I rarely reply anymore. In any case, I've seen almost nothing useful about Unicode issues on c.l.py that wasn't also on Python-Dev; perhaps I missed something. ask-10-more-people-&-you'll-get-20-more-opinions-ly y'rs - tim From alisa at robanal.demon.co.uk Thu Apr 27 12:29:54 2000 From: alisa at robanal.demon.co.uk (Alisa Pasic Robinson) Date: Thu, 27 Apr 2000 10:29:54 GMT Subject: [Python-Dev] Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: <39080ddd.9837445@post.demon.co.uk> >I wrote: >>A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". > >Another way of putting this is: >- utf-8 in an 8-bit string is to a unicode string what a pickle is to an >object. >- defaulting to utf-8 upon coercing is like implicitly trying to unpickle >an 8-bit string when comparing it to an instance. Bad idea. > >Defaulting to Latin-1 is the only logical choice, no matter how >western-culture-centric this may seem. > >Just The Van Rossum Common Sense gene strikes again! You guys owe it to the world to have lots of children. I agree 100%. Let me also add that if you want to do encoding work that goes beyond what the library gives you, you absolutely need a 'byte array' type which makes no assumptions and does nothing magic to its content. I have always thought of 8-bit strings as 'byte arrays' and not 'characer arrays', and doing anything magic to them in literals or standard input is going to cause lots of trouble. I think our proposal is BETTER than Java, Tcl, Visual Basic etc for the following reasons: - you can work with old fashioned strings, which are understood by everyone to be arrays of bytes, and there is no magic conversion going on. The bytes in literal strings in your script file are the bytes that end up in the program. - you can work with Unicode strings if you want - you are in explicit control of conversions between them - both types have similar methods so there isn't much to learn or remember The 'no magic' thing is very important with Japanese, where very often you need to roll your own codecs and look at the raw bytes; any auto-conversion might not go through the filter you want and you've already lost information before you started. Especially If your job is to repair possibly corrupt data. Any company with a few extra custom characters in the user-defined Shift-JIS range is going to suddenly find their Perl scripts are failing or trashing all their data as a result of the UTF-8 decision. I'm also convinced that the majority of Python scripts won't need to work in Unicode. Even working with exotic languages, there is always a native 8-bit encoding. I have only used Unicode when (a) working with data that is in several languages (b) doing conversions, which requires a 'central point' (b) wanting to do per-character operations safely on multi-byte data I still haven't sorted out in my head whether the default encoding thing is a big red herring or is important; I already have a safe way to construct Unicode literals in my source files if I want to using unicode('rawdata','myencoding'). But if there has to be one I'd say the following: - strict ASCII is an option - Latin-1 is the more generous option that is right for the most people, and has a 'special status' among 8-bit encodings - UTF-8 is not one byte per character and will confuse people Just my 2p worth, Andy From mal at lemburg.com Thu Apr 27 13:23:23 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 27 Apr 2000 13:23:23 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <000001bfb002$8f720260$0d2d153f@tim> Message-ID: <3908232B.C2668122@lemburg.com> Tim Peters wrote: > > [Guido about going Latin-1] > > Sorry, all this proposal does is change the default encoding on > > conversions from UTF-8 to Latin-1. That's very > > western-culture-centric. > > Well, if you talk with an Asian, they'll probably tell you that Unicode > itself is Eurocentric, and especially UTF-8 (UTF-7 introduces less bloat for > non-Latin-1 Unicode characters). Most everyone likes their own national > gimmicks best. Or, as Andy once said (paraphrasing), the virtue of UTF-8 is > that it annoys everyone. > > I do expect that the vase bulk of users would be less surprised if Latin-1 > *were* the default encoding. Then the default would be usable as-is for > many more people; UTF-8 is usable as-is only for me (i.e., 7-bit Americans). > The non-Euros are in for a world of pain no matter what. > > just-because-some-groups-can't-win-doesn't-mean-everyone-must- > lose-ly y'rs - tim People tend to forget that UTF-8 is a loss-less Unicode encoding while Latin-1 reduces Unicode to its lower 8 bits: conversion from non-Latin-1 Unicode to strings would simply not work, conversion from non-Latin-1 strings to Unicode would only be possible via unicode(). Thus mixing Unicode and strings would then run perfectly in all western countries using Latin-1 while the rest of the world would need to convert all their strings to Unicode... giving them an advantage over the western world we couldn't possibly accept ;-) FYI, here's a summary of which conversions take place (going Latin-1 would disable most of the Unicode integration in favour of conversion errors): Python: ------- string + unicode: unicode(string,'utf-8') + unicode string.method(unicode): unicode(string,'utf-8').method(unicode) print unicode: print unicode.encode('utf-8'); with stdout redirection this can be changed to any other encoding str(unicode): unicode.encode('utf-8') repr(unicode): repr(unicode.encode('unicode-escape')) C (PyArg_ParserTuple): ---------------------- "s" + unicode: same as "s" + unicode.encode('utf-8') "s#" + unicode: same as "s#" + unicode.encode('unicode-internal') "t" + unicode: same as "t" + unicode.encode('utf-8') "t#" + unicode: same as "t#" + unicode.encode('utf-8') This effects all C modules and builtins. In case a C module wants to receive a certain predefined encoding, it can use the new "es" and "es#" parser markers. Ways to enter Unicode: ---------------------- u'' + string same as unicode(string,'utf-8') unicode(string,encname) any supported encoding u'...unicode-escape...' unicode-escape currently accepts Latin-1 chars as single-char input; using escape sequences any Unicode char can be entered (*) codecs.open(filename,mode,encname) opens an encoded file for reading and writing Unicode directly raw_input() + stdin redirection (see one of my earlier posts for code) returns UTF-8 strings based on the input encoding Hmm, perhaps a codecs.raw_input(encname) which returns Unicode directly wouldn't be a bad idea either ?! (*) This should probably be changed to be source code encoding dependent, so that u"...data..." matches "...data..." in appearance in the Python source code (see below). IO: --- open(file,'w').write(unicode) same as open(file,'w').write(unicode.encode('utf-8')) open(file,'wb').write(unicode) same as open(file,'wb').write(unicode.encode('unicode-internal')) codecs.open(file,'wb',encname).write(unicode) same as open(file,'wb').write(unicode.encode(encname)) codecs.open(file,'rb',encname).read() same as unicode(open(file,'rb').read(),encname) stdin + stdout can be redirected using StreamRecoders to handle any of the supported encodings The Python parser should probably also be extended to read encoded Python source code using some hint at the start of the source file (perhaps only allowing a small subset of the supported encodings, e.g. ASCII, Latin-1, UTF-8 and UTF-16). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Thu Apr 27 12:27:18 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 27 Apr 2000 12:27:18 +0200 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> Message-ID: <39081606.2932F5FD@lemburg.com> Fredrik Lundh wrote: > > >>> filename = u"gr?t" > > >>> file = open(filename, "w") > >>> file.close() > > >>> import glob > >>> print glob.glob("gr*") > ['gr\303\266t'] > > >>> print glob.glob(u"gr*") > [u'gr\366t'] > > >>> import os > >>> os.system("dir gr*") > ... > GR??T 0 01-02-03 12.34 gr??t > 1 fil(es) 0 byte > 0 dir 12 345 678 byte free > > hmm. Where is the problem ? If you pass the output of glob() to open() you'll get the same file in both cases... even better, you can now even use Chinese in your filenames without the OS having to support Unicode filenames :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Thu Apr 27 13:49:07 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 27 Apr 2000 13:49:07 +0200 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> Message-ID: <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> > Fredrik Lundh wrote: > > > > >>> filename = u"gr?t" > > > > >>> file = open(filename, "w") > > >>> file.close() > > > > >>> import glob > > >>> print glob.glob("gr*") > > ['gr\303\266t'] > > > > >>> print glob.glob(u"gr*") > > [u'gr\366t'] > > > > >>> import os > > >>> os.system("dir gr*") > > ... > > GR??T 0 01-02-03 12.34 gr??t > > 1 fil(es) 0 byte > > 0 dir 12 345 678 byte free > > > > hmm. > > Where is the problem ? I'm speechless. From akuchlin at mems-exchange.org Thu Apr 27 14:00:18 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 27 Apr 2000 08:00:18 -0400 (EDT) Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> Message-ID: <14600.11218.24960.705642@newcnri.cnri.reston.va.us> Fredrik Lundh writes: >M.A. Lemburg wrote: >> Where is the problem ? >I'm speechless. Ummm... since I'm not sure how open() currently reacts to being passed a Unicode file or if there's something special in open() for Windows, and don't know how you think it should react (an exception? fold to UTF-8? fold to Latin1?), I don't see what the particular problem is either. For the sake of people who haven't followed this debate closely, or who were busy during the earlier lengthy threads and simply deleted most of the messages, please try to be explicit. Ilya Zakharevich on the perl5-porters mailing list often employs the "This code is buggy and if you're too clueless to see how it's broken *I* certainly won't go explaining it to you" strategy, to devastatingly divisive effect, and with little effectiveness in getting the bugs fixed. Let's not go down that road. --amk From guido at python.org Thu Apr 27 17:01:48 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 27 Apr 2000 11:01:48 -0400 Subject: [Python-Dev] Unicode debate In-Reply-To: Your message of "Thu, 27 Apr 2000 06:42:43 BST." References: Message-ID: <200004271501.LAA13535@eric.cnri.reston.va.us> I'd like to reset this discussion. I don't think we need to involve c.l.py yet -- I haven't seen anyone with Asian language experience chime in there, and that's where this matters most. I am directing this to the Python i18n-sig mailing list, because that's where the debate belongs, and there interested parties can join the discussion without having to be vetted as "fit for python-dev" first. I apologize for having been less than responsive in the matter; unfortunately there's lots of other stuff on my mind right now that has recently had a tendency to distract me with higher priority crises. I've heard a few people claim that strings should always be considered to contain "characters" and that there should be one character per string element. I've also heard a clamoring that there should only be one string type. You folks have never used Asian encodings. In countries like Japan, China and Korea, encodings are a fact of life, and the most popular encodings are ASCII supersets that use a variable number of bytes per character, just like UTF-8. Each country or language uses different encodings, even though their characters look mostly the same to western eyes. UTF-8 and Unicode is having a hard time getting adopted in these countries because most software that people use deals only with the local encodings. (Sounds familiar?) These encodings are much less "pure" than UTF-8, because they only encode the local characters (and ASCII), and because of various problems with slicing: if you look "in the middle" of an encoded string or file, you may not know how to interpret the bytes you see. There are overlaps (in most of these encodings anyway) between the codes used for single-byte and double-byte encodings, and you may have to look back one or more characters to know what to make of the particular byte you see. To get an idea of the nightmares that non-UTF-8 multibyte encodings give C/C++ programmers, see the Multibyte Character Set (MBCS) Survival Guide (http://msdn.microsoft.com/library/backgrnd/html/msdn_mbcssg.htm). See also the home page of the i18n-sig for more background information on encoding (and other i18n) issues (http://www.python.org/sigs/i18n-sig/). UTF-8 attempts to solve some of these problems: the multi-byte encodings are chosen such that you can tell by the high bits of each byte whether it is (1) a single-byte (ASCII) character (top bit off), (2) the start of a multi-byte character (at least two top bits on; how many indicates the total number of bytes comprising the character), or (3) a continuation byte in a multi-byte character (top bit on, next bit off). Many of the problems with non-UTF-8 multibyte encodings are the same as for UTF-8 though: #bytes != #characters, a byte may not be a valid character, regular expression patterns using "." may give the wrong results, and so on. The truth of the matter is: the encoding of string objects is in the mind of the programmer. When I read a GIF file into a string object, the encoding is "binary goop". When I read a line of Japanese text from a file, the encoding may be JIS, shift-JIS, or ENC -- this has to be an assumption built-in to my program, or perhaps information supplied separately (there's no easy way to guess based on the actual data). When I type a string literal using Latin-1 characters, the encoding is Latin-1. When I use octal escapes in a string literal, e.g. '\303\247', the encoding could be UTF-8 (this is a cedilla). When I type a 7-bit string literal, the encoding is ASCII. The moral of all this? 8-bit strings are not going away. They are not encoded in UTF-8 henceforth. Like before, and like 8-bit text files, they are encoded in whatever encoding you want. All you get is an extra mechanism to convert them to Unicode, and the Unicode conversion defaults to UTF-8 because it is the only conversion that is reversible. And, as Tim Peters quoted Andy Robinson (paraphrasing Tim's paraphrase), UTF-8 annoys everyone equally. Where does the current approach require work? - We need a way to indicate the encoding of Python source code. (Probably a "magic comment".) - We need a way to indicate the encoding of input and output data files, and we need shortcuts to set the encoding of stdin, stdout and stderr (and maybe all files opened without an explicit encoding). Marc-Andre showed some sample code, but I believe it is still cumbersome. (I have to play with it more to see how it could be improved.) - We need to discuss whether there should be a way to change the default conversion between Unicode and 8-bit strings (currently hardcoded to UTF-8), in order to make life easier for people who want to continue to use their favorite 8-bit encoding (e.g. Latin-1, or shift-JIS) but who also want to make use of the new Unicode datatype. We're still in alpha, so we can still fix things. --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Thu Apr 27 17:01:00 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 10:01:00 -0500 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> <14600.11218.24960.705642@newcnri.cnri.reston.va.us> Message-ID: <3908562C.C2A2E1BC@prescod.net> You're asking the file system to "find you a filename". Depending on how you ask, you get two different file names for the same file. They are "==" equal (I think) but are of different length. I agree with /F that it's a little strange. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From guido at python.org Thu Apr 27 17:23:50 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 27 Apr 2000 11:23:50 -0400 Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: Your message of "Wed, 26 Apr 2000 23:45:40 +0200." <004501bfafc8$c51d1240$34aab5d4@hagrid> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> Message-ID: <200004271523.LAA13614@eric.cnri.reston.va.us> > >>> filename = u"gr?t" > > >>> file = open(filename, "w") > >>> file.close() > > >>> import glob > >>> print glob.glob("gr*") > ['gr\303\266t'] > > >>> print glob.glob(u"gr*") > [u'gr\366t'] > > >>> import os > >>> os.system("dir gr*") > ... > GR??T 0 01-02-03 12.34 gr??t > 1 fil(es) 0 byte > 0 dir 12 345 678 byte free > > hmm. I presume that Fredrik's gripe is that the filename has been converted to UTF-8, while the encoding used by Windows to display his directory listing is Latin-1. (Not Microsoft's own 8-bit character set???) I'd like to solve this problem, but I have some questions: what *IS* the encoding used for filenames on Windows? This may differ per Windows version; perhaps it can differ drive letter? Or per application or per thread? On Windows NT, filenames are supposed to be Unicode. (I suppose also on Windowns 2000?) How do I open a file with a given Unicode string for its name, in a C program? I suppose there's a Win32 API call for that which has a Unicode variant. On Windows 95/98, the Unicode variants of the Win32 API calls don't exist. So what is the poor Python runtime to do there? Can Japanese people use Japanese characters in filenames on Windows 95/98? Let's assume they can. Since the filesystem isn't Unicode aware, the filenames must be encoded. Which encoding is used? Let's assume they use Microsoft's multibyte encoding. If they put such a file on a floppy and ship it to Link?ping, what will Fredrik see as the filename? (I.e., is the encoding fixed by the disk volume, or by the operating system?) Once we have a few answers here, we can solve the problem. Note that sometimes we'll have to refuse a Unicode filename because there's no mapping for some of the characters it contains in the filename encoding used. Question: how does Fredrik create a file with a Euro character (u'\u20ac') in its name? --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn at worldonline.dk Thu Apr 27 18:21:20 2000 From: bckfnn at worldonline.dk (Finn Bock) Date: Thu, 27 Apr 2000 16:21:20 GMT Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: <200004271523.LAA13614@eric.cnri.reston.va.us> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <200004271523.LAA13614@eric.cnri.reston.va.us> Message-ID: <3908679a.16700013@smtp.worldonline.dk> On Thu, 27 Apr 2000 11:23:50 -0400, you wrote: >> >>> filename = u"gr?t" >> >> >>> file = open(filename, "w") >> >>> file.close() >> >> >>> import glob >> >>> print glob.glob("gr*") >> ['gr\303\266t'] >> >> >>> print glob.glob(u"gr*") >> [u'gr\366t'] >> >> >>> import os >> >>> os.system("dir gr*") >> ... >> GR??T 0 01-02-03 12.34 gr??t >> 1 fil(es) 0 byte >> 0 dir 12 345 678 byte free >> >> hmm. > >I presume that Fredrik's gripe is that the filename has been converted >to UTF-8, while the encoding used by Windows to display his directory >listing is Latin-1. (Not Microsoft's own 8-bit character set???) > >I'd like to solve this problem, but I have some questions: what *IS* >the encoding used for filenames on Windows? [This is just for inspiration] JDK "solves" this by running the filename through a CharToByteConverter (a codec) which is setup as the default encoding used for the platform. On my danish w2k this is encoding happens to be called 'Cp1252'. The codec name is chosen based on the users language and region with fall back to Cp1252. The mapping table is: "ar", "Cp1256", "be", "Cp1251", "bg", "Cp1251", "cs", "Cp1250", "el", "Cp1253", "et", "Cp1257", "iw", "Cp1255", "hu", "Cp1250", "ja", "MS932", "ko", "MS949", "lt", "Cp1257", "lv", "Cp1257", "mk", "Cp1251", "pl", "Cp1250", "ro", "Cp1250", "ru", "Cp1251", "sh", "Cp1250", "sk", "Cp1250", "sl", "Cp1250", "sq", "Cp1250", "sr", "Cp1251", "th", "MS874", "tr", "Cp1254", "uk", "Cp1251", "zh", "GBK", "zh_TW", "MS950", >This may differ per >Windows version; perhaps it can differ drive letter? Or per >application or per thread? On Windows NT, filenames are supposed to >be Unicode. (I suppose also on Windowns 2000?) JDK only uses GetThreadLocale() for the starting thread. It does not appears to check for windows versions at all. >How do I open a file >with a given Unicode string for its name, in a C program? I suppose >there's a Win32 API call for that which has a Unicode variant. The JDK does not make use the unicode API is it exists on the platform. >On Windows 95/98, the Unicode variants of the Win32 API calls don't >exist. So what is the poor Python runtime to do there? > >Can Japanese people use Japanese characters in filenames on Windows >95/98? Let's assume they can. Since the filesystem isn't Unicode >aware, the filenames must be encoded. Which encoding is used? Let's >assume they use Microsoft's multibyte encoding. If they put such a >file on a floppy and ship it to Link?ping, what will Fredrik see as >the filename? (I.e., is the encoding fixed by the disk volume, or by >the operating system?) > >Once we have a few answers here, we can solve the problem. Note that >sometimes we'll have to refuse a Unicode filename because there's no >mapping for some of the characters it contains in the filename >encoding used. JDK silently replaced the offending character with a '?' which cause an exception when attempting to open the file. The filename, directory name, or volume label syntax is incorrect >Question: how does Fredrik create a file with a Euro >character (u'\u20ac') in its name? import java.io.*; public class x { public static void main(String[] args) throws Exception { String filename = "An eurosign \u20ac"; System.out.println(filename); new FileOutputStream(filename).close(); } } The resulting file contains an euro sign when shown in FileExplorer. The output of the program also contains an euro sign when shown with notepad. But the filename/program output does *not* contain an euro when dir'ed/type'd in my DOS box. regards, finn From gresham at mediavisual.com Thu Apr 27 18:41:04 2000 From: gresham at mediavisual.com (Paul Gresham) Date: Fri, 28 Apr 2000 00:41:04 +0800 Subject: [Python-Dev] Re: [I18n-sig] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <010f01bfb067$64e43260$9a2b440a@miv01> Hi, I'm not sure how much value I can add, as I know little about the charsets etc. and a bit more about Python. As a user of these, and running a consultancy firm in Hong Kong, I can at least pass on some points and perhaps help you with testing later on. My first touch on international PCs was fixing a Japanese 8086 back in 1989, it didn't even have colour ! Hong Kong is quite an experience as there are two formats in common use, plus occasionally another gets thrown in. In HK they use the Traditional Chinese, whereas the mainland uses Simplified, as Guido says, there are a number of different types of these. Occasionally we see the Taiwanese charsets used. It seems to me that having each individual string variable encoded might just be too atomic, perhaps creating a cumbersome overhead in the system. For most applications I can settle for the entire app to be using a single charset, however from experience there are exceptions. We are normally working with prior knowledge of the charset being used, rather than having to deal with any charset which may come along (at an application level), and therefore generally work in a context, just as a European programmer would be working in say English or German. As you know, storage/retrieval is not a problem, but manipulation and comparison is. A nice way to handle this would be like operator overloading such that string operations would be perfomed in the context of the current charset, I could then change context as needed, removing the need for metadata surrounding the actual data. This should speed things up as each overloaded library could be optimised given the different quirks, and new ones could be added easily. My code could be easily re-used on different charsets by simply changing context externally to the code, rather than passing in lots of stuff and expecting Python to deal with it. Also I'd like very much to compile/load in only the International charsets that I need. I wouldn't want to see Java type bloat occurring to Python, and adding internationalisation for everything, is huge. I think what I am suggesting is a different approach which obviously places more onus on the programmer rather than Python. Perhaps this is not acceptable, I don't know as I've never developed a programming language. I hope this is a helpful point of view to get you thinking further, otherwise ... please ignore me and I'll keep quiet : ) Regards Paul ----- Original Message ----- From: "Guido van Rossum" To: ; Cc: "Just van Rossum" Sent: Thursday, April 27, 2000 11:01 PM Subject: [I18n-sig] Unicode debate > I'd like to reset this discussion. I don't think we need to involve > c.l.py yet -- I haven't seen anyone with Asian language experience > chime in there, and that's where this matters most. I am directing > this to the Python i18n-sig mailing list, because that's where the > debate belongs, and there interested parties can join the discussion > without having to be vetted as "fit for python-dev" first. > > I apologize for having been less than responsive in the matter; > unfortunately there's lots of other stuff on my mind right now that > has recently had a tendency to distract me with higher priority > crises. > > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) > > These encodings are much less "pure" than UTF-8, because they only > encode the local characters (and ASCII), and because of various > problems with slicing: if you look "in the middle" of an encoded > string or file, you may not know how to interpret the bytes you see. > There are overlaps (in most of these encodings anyway) between the > codes used for single-byte and double-byte encodings, and you may have > to look back one or more characters to know what to make of the > particular byte you see. To get an idea of the nightmares that > non-UTF-8 multibyte encodings give C/C++ programmers, see the > Multibyte Character Set (MBCS) Survival Guide > (http://msdn.microsoft.com/library/backgrnd/html/msdn_mbcssg.htm). > See also the home page of the i18n-sig for more background information > on encoding (and other i18n) issues > (http://www.python.org/sigs/i18n-sig/). > > UTF-8 attempts to solve some of these problems: the multi-byte > encodings are chosen such that you can tell by the high bits of each > byte whether it is (1) a single-byte (ASCII) character (top bit off), > (2) the start of a multi-byte character (at least two top bits on; how > many indicates the total number of bytes comprising the character), or > (3) a continuation byte in a multi-byte character (top bit on, next > bit off). > > Many of the problems with non-UTF-8 multibyte encodings are the same > as for UTF-8 though: #bytes != #characters, a byte may not be a valid > character, regular expression patterns using "." may give the wrong > results, and so on. > > The truth of the matter is: the encoding of string objects is in the > mind of the programmer. When I read a GIF file into a string object, > the encoding is "binary goop". When I read a line of Japanese text > from a file, the encoding may be JIS, shift-JIS, or ENC -- this has to > be an assumption built-in to my program, or perhaps information > supplied separately (there's no easy way to guess based on the actual > data). When I type a string literal using Latin-1 characters, the > encoding is Latin-1. When I use octal escapes in a string literal, > e.g. '\303\247', the encoding could be UTF-8 (this is a cedilla). > When I type a 7-bit string literal, the encoding is ASCII. > > The moral of all this? 8-bit strings are not going away. They are > not encoded in UTF-8 henceforth. Like before, and like 8-bit text > files, they are encoded in whatever encoding you want. All you get is > an extra mechanism to convert them to Unicode, and the Unicode > conversion defaults to UTF-8 because it is the only conversion that is > reversible. And, as Tim Peters quoted Andy Robinson (paraphrasing > Tim's paraphrase), UTF-8 annoys everyone equally. > > Where does the current approach require work? > > - We need a way to indicate the encoding of Python source code. > (Probably a "magic comment".) > > - We need a way to indicate the encoding of input and output data > files, and we need shortcuts to set the encoding of stdin, stdout and > stderr (and maybe all files opened without an explicit encoding). > Marc-Andre showed some sample code, but I believe it is still > cumbersome. (I have to play with it more to see how it could be > improved.) > > - We need to discuss whether there should be a way to change the > default conversion between Unicode and 8-bit strings (currently > hardcoded to UTF-8), in order to make life easier for people who want > to continue to use their favorite 8-bit encoding (e.g. Latin-1, or > shift-JIS) but who also want to make use of the new Unicode datatype. > > We're still in alpha, so we can still fix things. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > From petrilli at amber.org Thu Apr 27 18:48:16 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Thu, 27 Apr 2000 12:48:16 -0400 Subject: [Python-Dev] Unicode debate In-Reply-To: <200004271501.LAA13535@eric.cnri.reston.va.us>; from guido@python.org on Thu, Apr 27, 2000 at 11:01:48AM -0400 References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <20000427124816.C1723@trump.amber.org> Guido van Rossum [guido at python.org] wrote: > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) Actually a bigger concern that we hear from our customers in Japan is that Unicode has *serious* problems in asian languages. Theey took the "unification" of Chinese and Japanese, rather than both, and therefore can not represent los of phrases quite right. I can have someone write up a better dscription, but I was told by several Japanese people that they wouldn't use Unicode come hell or high water, basically. Basically it's JJIS, Shift-JIS or nothing for most Japanese companies. This was my experience working with Konica a few years ago as well. Chris -- | Christopher Petrilli | petrilli at amber.org From andy at reportlab.python.org Thu Apr 27 18:50:28 2000 From: andy at reportlab.python.org (Andy Robinson) Date: Thu, 27 Apr 2000 16:50:28 GMT Subject: [Python-Dev] Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: <39086e6a.34554266@post.demon.co.uk> >Alisa Pasic Robinson Drat! my wife's been hacking my email headers! Sorry... - Andy Robinson From jeremy at cnri.reston.va.us Fri Apr 28 00:12:15 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 27 Apr 2000 18:12:15 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <39075D58.C549938E@tismer.com> References: <3905EEB4.4153A845@tismer.com> <14598.9873.769055.198345@goon.cnri.reston.va.us> <39074295.FA136113@tismer.com> <14599.17827.23033.266024@goon.cnri.reston.va.us> <3907498B.C596C495@tismer.com> <14599.20985.493264.876095@goon.cnri.reston.va.us> <39075D58.C549938E@tismer.com> Message-ID: <14600.47935.704157.565225@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> Summary: We had two effects here. Effect 1: Wasting time with CT> extra errors in instance creation. Effect 2: Loss of locality CT> due to code size increase. CT> Solution to 1 is Jeremy's patch. Solution to 2 could be a CT> little renaming of the one or the other module, in order to get CT> the default link order to support locality better. CT> Now everything is clear to me. My first attempts with reordering CT> could not reveal the loss with the instance stuff. CT> All together, Python 1.6 is a bit faster than 1.5.2 if we try to CT> get related code ordered better. I reach a different conclusion. The performance difference 1.5.2 and 1.6, measured with pystone and pybench, is so small that effects like the order in which the compiler assembles the code make a difference. I don't think we should make any non-trivial effort to improve performance based on this kind of voodoo. I also question the claim that the two effects here explain the performance difference between 1.5.2 and 1.6. Rather, they explain the performance difference of pystone and pybench running on different versions of the interpreter. Saying that pystone is the same speed is a far cry from saying that python is the same speed! Remember that performance on a benchmark is just that. (It's like the old joke about a person's IQ: It is a very good indicator of how well they did on the IQ test.) I think we could use better benchmarks of two sorts. The pybench microbenchmarks are quite helpful individually, though the overall number isn't particularly meaningful. However, these benchmarks are sometimes a little too big to be useful. For example, the instance creation effect was tracked down by running this code: class Foo: pass for i in range(big_num): Foo() The pybench test "CreateInstance" does all sorts of other stuff. It tests creation with and without an __init__ method. It tests instance deallocation (because all the created objected need to be dealloced, too). It also tests attribute assignment, since many of the __init__ methods make assignments. What would be better (and I'm not sure what priority should be placed on doing it) is a set of nano-benchmarks that try to limit themselves to a single feature or small set of features. Guido suggested having a hierarchy so that there are multiple nano-benchmarks for instance creation, each identifying a particular effect, and a micro-benchmark that is the aggregate of all these nano-benchmarks. We could also use some better large benchmarks. Using pystone is pretty crude, because it doesn't necessarily measure the performance of things we care about. It would be better to have a collection of 5-10 apps that each do something we care about -- munging text files or XML data, creating lots of objects, etc. For example, I used the compiler package (in nondist/src/Compiler) to compile itself. Based on that benchmark, an interpreter built from the current CVS tree is still 9-11% slower than 1.5. Jeremy From tismer at tismer.com Fri Apr 28 02:48:34 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 28 Apr 2000 02:48:34 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <3905EEB4.4153A845@tismer.com> <14598.9873.769055.198345@goon.cnri.reston.va.us> <39074295.FA136113@tismer.com> <14599.17827.23033.266024@goon.cnri.reston.va.us> <3907498B.C596C495@tismer.com> <14599.20985.493264.876095@goon.cnri.reston.va.us> <39075D58.C549938E@tismer.com> <14600.47935.704157.565225@goon.cnri.reston.va.us> Message-ID: <3908DFE1.F43A62EB@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> Summary: We had two effects here. Effect 1: Wasting time with > CT> extra errors in instance creation. Effect 2: Loss of locality > CT> due to code size increase. > > CT> Solution to 1 is Jeremy's patch. Solution to 2 could be a > CT> little renaming of the one or the other module, in order to get > CT> the default link order to support locality better. > > CT> Now everything is clear to me. My first attempts with reordering > CT> could not reveal the loss with the instance stuff. from here... > CT> All together, Python 1.6 is a bit faster than 1.5.2 if we try to > CT> get related code ordered better. ...to here I was not clear. The rest of it is at least 100% correct. > I reach a different conclusion. The performance difference 1.5.2 and > 1.6, measured with pystone and pybench, is so small that effects like > the order in which the compiler assembles the code make a difference. Sorry, it is 10 percent. Please do not shift the topic. I agree that there must be better measurements to be able to do my thoughtless claim ...from here to here..., but the question was raised in the py-dev thread "Python 1.6 speed" by Andrew, who was exactly asking why pystone gets 10 percent slower. I have been hunting that for a week now, and with your help, it is solved. > I don't think we should make any non-trivial effort to improve > performance based on this kind of voodoo. Thanks. I've already built it in - it was trivial, but I'll keep it for my version. > I also question the claim that the two effects here explain the > performance difference between 1.5.2 and 1.6. Rather, they explain > the performance difference of pystone and pybench running on different > versions of the interpreter. Exactly. I didn't want to claim anything else, it was all in the context of the inital thread. ciao - chris Oops, p.s: interesting: ... > For example, I used the compiler package (in nondist/src/Compiler) to > compile itself. Based on that benchmark, an interpreter built from > the current CVS tree is still 9-11% slower than 1.5. Did you adjust the string methods? I don't believe these are still fast. -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From paul at prescod.net Fri Apr 28 04:20:22 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 21:20:22 -0500 Subject: [Python-Dev] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <3908F566.8E5747C@prescod.net> Guido van Rossum wrote: > > ... > > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) I think that maybe an important point is getting lost here. I could be wrong, but it seems that all of this emphasis on encodings is misplaced. The physical and logical makeup of character strings are entirely separate issues. Unicode is a character set. It works in the logical domain. Dozens of different physical encodings can be used for Unicode characters. There are XML users who work with XML (and thus Unicode) every day and never see UTF-8, UTF-16 or any other Unicode-consortium "sponsored" encoding. If you invent an encoding tomorrow, it can still be XML-compatible. There are many encodings older than Unicode that are XML (and Unicode) compatible. I have not heard complaints about the XML way of looking at the world and in fact it was explicitly endorsed by many of the world's leading experts on internationalization. I haven't followed the Java situation as closely but I have also not heard screams about its support for il8n. > The truth of the matter is: the encoding of string objects is in the > mind of the programmer. When I read a GIF file into a string object, > the encoding is "binary goop". IMHO, it's a mistake of history that you would even think it makes sense to read a GIF file into a "string" object and we should be trying to erase that mistake, as quickly as possible (which is admittedly not very quickly) not building more and more infrastructure around it. How can we make the transition to a "binary goops are not strings" world easiest? > The moral of all this? 8-bit strings are not going away. If that is a statement of your long term vision, then I think that it is very unfortunate. Treating string literals as if they were isomorphic with byte arrays was probably the right thing in 1991 but it won't be in 2005. It doesn't meet the definition of string used in the Unicode spec., nor in XML, nor in Java, nor at the W3C nor in most other up and coming specifications. From paul at prescod.net Fri Apr 28 04:21:44 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 21:21:44 -0500 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> Message-ID: <3908F5B8.9F8D8A9A@prescod.net> Andy Robinson wrote: > > - you can work with old fashioned strings, which are understood > by everyone to be arrays of bytes, and there is no magic > conversion going on. The bytes in literal strings in your script file > are the bytes that end up in the program. Who is "everyone"? Are you saying that CP4E hordes are going to understand that the syntax "abcde" is constructing a *byte array*? It seems like you think that Python users are going to be more sophisticated in their understanding of these issues than Java programmers. In most other things, Python is simpler. > ... > > I'm also convinced that the majority of Python scripts won't need > to work in Unicode. Anything working with XML will need to be Unicode. Anything working with the Win32 API (especially COM) will want to do Unicode. Over time the entire Web infrastructure will move to Unicode. Anything written in JPython pretty much MOST use Unicode (doesn't it?). > Even working with exotic languages, there is always a native > 8-bit encoding. Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use 8-bit encodings of Unicode if you want. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From petrilli at amber.org Fri Apr 28 06:12:29 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Fri, 28 Apr 2000 00:12:29 -0400 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: <3908F5B8.9F8D8A9A@prescod.net>; from paul@prescod.net on Thu, Apr 27, 2000 at 09:21:44PM -0500 References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> Message-ID: <20000428001229.A4790@trump.amber.org> Paul Prescod [paul at prescod.net] wrote: > > I'm also convinced that the majority of Python scripts won't need > > to work in Unicode. > > Anything working with XML will need to be Unicode. Anything working with > the Win32 API (especially COM) will want to do Unicode. Over time the > entire Web infrastructure will move to Unicode. Anything written in > JPython pretty much MOST use Unicode (doesn't it?). I disagree with this. Unicode has been a very long time, and it's not been adopted by a lot of people for a LOT of very valid reasons. > > Even working with exotic languages, there is always a native > > 8-bit encoding. > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > 8-bit encodings of Unicode if you want. Um, if you go: JIS -> Unicode -> JIS you don't get the same thing out that you put in (at least this is what I've been told by a lot of Japanese developers), and therefore it's not terribly popular because of the nature of the Japanese (and Chinese) langauge. My experience with Unicode is that a lot of Western people think it's the answer to every problem asked, while most asian language people disagree vehemently. This says the problem isn't solved yet, even if people wish to deny it. Chris -- | Christopher Petrilli | petrilli at amber.org From just at letterror.com Fri Apr 28 10:33:16 2000 From: just at letterror.com (Just van Rossum) Date: Fri, 28 Apr 2000 09:33:16 +0100 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: <200004271501.LAA13535@eric.cnri.reston.va.us> References: Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: At 11:01 AM -0400 27-04-2000, Guido van Rossum wrote: >Where does the current approach require work? > >- We need a way to indicate the encoding of Python source code. >(Probably a "magic comment".) How will other parts of a program know which encoding was used for non-unicode string literals? It seems to me that an encoding attribute for 8-bit strings solves this nicely. The attribute should only be set automatically if the encoding of the source file was specified or when the string has been encoded from a unicode string. The attribute should *only* be used when converting to unicode. (Hm, it could even be used when calling unicode() without the encoding argument.) It should *not* be used when comparing (or adding, etc.) 8-bit strings to each other, since they still may contain binary goop, even in a source file with a specified encoding! >- We need a way to indicate the encoding of input and output data >files, and we need shortcuts to set the encoding of stdin, stdout and >stderr (and maybe all files opened without an explicit encoding). Can you open a file *with* an explicit encoding? Just From mal at lemburg.com Fri Apr 28 11:39:37 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 28 Apr 2000 11:39:37 +0200 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> Message-ID: <39095C59.A5916EEB@lemburg.com> [Note: These discussion should all move to 18n-sig... CCing there] Christopher Petrilli wrote: > > Paul Prescod [paul at prescod.net] wrote: > > > Even working with exotic languages, there is always a native > > > 8-bit encoding. > > > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > > 8-bit encodings of Unicode if you want. > > Um, if you go: > > JIS -> Unicode -> JIS > > you don't get the same thing out that you put in (at least this is > what I've been told by a lot of Japanese developers), and therefore > it's not terribly popular because of the nature of the Japanese (and > Chinese) langauge. > > My experience with Unicode is that a lot of Western people think it's > the answer to every problem asked, while most asian language people > disagree vehemently. This says the problem isn't solved yet, even if > people wish to deny it. Isn't this a problem of the translation rather than Unicode itself (Andy mentioned several times that you can use the private BMP areas to implement 1-1 round-trips) ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tree at cymru.basistech.com Fri Apr 28 12:44:00 2000 From: tree at cymru.basistech.com (Tom Emerson) Date: Fri, 28 Apr 2000 06:44:00 -0400 (EDT) Subject: [Python-Dev] [I18n-sig] Re: Unicode debate In-Reply-To: References: Message-ID: <14601.27504.337569.201251@cymru.basistech.com> Just van Rossum writes: > How will other parts of a program know which encoding was used for > non-unicode string literals? This is the exact reason that Unicode should be used for all string literals: from a language design perspective I don't understand the rationale for providing "traditional" and "unicode" string. > It seems to me that an encoding attribute for 8-bit strings solves this > nicely. The attribute should only be set automatically if the encoding of > the source file was specified or when the string has been encoded from a > unicode string. The attribute should *only* be used when converting to > unicode. (Hm, it could even be used when calling unicode() without the > encoding argument.) It should *not* be used when comparing (or adding, > etc.) 8-bit strings to each other, since they still may contain binary > goop, even in a source file with a specified encoding! In Dylan there is an explicit split between 'characters' (which are always Unicode) and 'bytes'. What are the compelling reasons to not use UTF-8 as the (source) document encoding? In the past the usual response is, "the tools are't there for authoring UTF-8 documents". This argument becomes more specious as more OS's move towards Unicode. I firmly believe this can be done without Java's bloat. One off-the-cuff solution is this: All character strings are Unicode (utf-8 encoding). Language terminals and operators are restricted to US-ASCII, which are identical to UTF8. The contents of comments are not interpreted in any way. > >- We need a way to indicate the encoding of input and output data > >files, and we need shortcuts to set the encoding of stdin, stdout and > >stderr (and maybe all files opened without an explicit encoding). > > Can you open a file *with* an explicit encoding? If you cannot, you lose. You absolutely must be able to specify the encoding of a file when opening it, so that the runtime can transcode into the native encoding as you read it. This should be otherwise transparent the user. -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From tree at cymru.basistech.com Fri Apr 28 12:56:50 2000 From: tree at cymru.basistech.com (Tom Emerson) Date: Fri, 28 Apr 2000 06:56:50 -0400 (EDT) Subject: [I18n-sig] Re: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: <39095C59.A5916EEB@lemburg.com> References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> <39095C59.A5916EEB@lemburg.com> Message-ID: <14601.28274.667733.660938@cymru.basistech.com> M.-A. Lemburg writes: > > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > > > 8-bit encodings of Unicode if you want. This is meaningless: legacy encodings of national character sets such Shift-JIS, Big Five, GB2312, or TIS620 are not "encodings" of Unicode. TIS620 is a single-byte, 8-bit encoding: each character is represented by a single byte. The Japanese and Chinese encodings are multibyte, 8-bit, encodings. ISO-2022 is a multi-byte, 7-bit encoding for multiple character sets. Unicode has several possible encodings: UTF-8, UCS-2, UCS-4, UTF-16... You can view all of these as 8-bit encodings, if you like. Some are multibyte (such as UTF-8, where each character in Unicode is represented in 1 to 3 bytes) while others are fixed length, two or four bytes per character. > > Um, if you go: > > > > JIS -> Unicode -> JIS > > > > you don't get the same thing out that you put in (at least this is > > what I've been told by a lot of Japanese developers), and therefore > > it's not terribly popular because of the nature of the Japanese (and > > Chinese) langauge. This is simply not true any more. The ability to round trip between Unicode and legacy encodings is dependent on the software: being able to use code points in the PUA for this is acceptable and commonly done. The big advantage is in using Unicode as a pivot when transcoding between different CJK encodings. It is very difficult to map between, say, Shift JIS and GB2312, directly. However, Unicode provides a good go-between. It isn't a panacea: transcoding between legacy encodings like GB2312 and Big Five is still difficult: Unicode or not. > > My experience with Unicode is that a lot of Western people think it's > > the answer to every problem asked, while most asian language people > > disagree vehemently. This says the problem isn't solved yet, even if > > people wish to deny it. This is a shame: it is an indication that they don't understand the technology. Unicode is a tool: nothing more. -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From gstein at lyra.org Fri Apr 28 14:41:11 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 28 Apr 2000 05:41:11 -0700 (PDT) Subject: [Python-Dev] c.l.py readership datapoint (was: Python 1.6a2 Unicode bug) In-Reply-To: Message-ID: On Thu, 27 Apr 2000, Just van Rossum wrote: > At 10:27 PM -0400 26-04-2000, Tim Peters wrote: > >Indeed, if someone from an inferior culture wants to chime in, let them find > >Python-Dev with their own beady little eyes . > > All irony aside, I think you've nailed one of the problems spot on: > - most core Python developers seem to be too busy to read *anything* at all > in c.l.py Datapoint: I stopped reading c.l.py almost two years ago. For a while, I would pop up a newsreader every month or so and skim what kinds of things were happening. That stopped at least a year or so ago. I get a couple hundred messages a day. Another 100+ from c.l.py would be way too much. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Apr 28 15:24:29 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 09:24:29 -0400 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: Your message of "Fri, 28 Apr 2000 11:39:37 +0200." <39095C59.A5916EEB@lemburg.com> References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> <39095C59.A5916EEB@lemburg.com> Message-ID: <200004281324.JAA15642@eric.cnri.reston.va.us> > [Note: These discussion should all move to 18n-sig... CCing there] > > Christopher Petrilli wrote: > > you don't get the same thing out that you put in (at least this is > > what I've been told by a lot of Japanese developers), and therefore > > it's not terribly popular because of the nature of the Japanese (and > > Chinese) langauge. > > > > My experience with Unicode is that a lot of Western people think it's > > the answer to every problem asked, while most asian language people > > disagree vehemently. This says the problem isn't solved yet, even if > > people wish to deny it. [Marc-Andre Lenburg] > Isn't this a problem of the translation rather than Unicode > itself (Andy mentioned several times that you can use the private > BMP areas to implement 1-1 round-trips) ? Maybe, but apparently such high-quality translations are rare (note that Andy said "can"). Anyway, a word of caution here. Years ago I attended a number of IETF meetings on internationalization, in a time when Unicode wasn't as accepted as it is now. The one thing I took away from those meetings was that this is a *highly* emotional and controversial issue. As the Python community, I feel we have no need to discuss "why Unicode." Therein lies madness, controversy, and no progress. We know there's a clear demand for Unicode, and we've committed to support it. The question now at hand is "how Unicode." Let's please focus on that, e.g. in the other thread ("Unicode debate") in i18n-sig and python-dev. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 28 16:10:27 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:10:27 -0400 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 09:33:16 BST." References: Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: <200004281410.KAA16104@eric.cnri.reston.va.us> [GvR] > >- We need a way to indicate the encoding of Python source code. > >(Probably a "magic comment".) [JvR] > How will other parts of a program know which encoding was used for > non-unicode string literals? > > It seems to me that an encoding attribute for 8-bit strings solves this > nicely. The attribute should only be set automatically if the encoding of > the source file was specified or when the string has been encoded from a > unicode string. The attribute should *only* be used when converting to > unicode. (Hm, it could even be used when calling unicode() without the > encoding argument.) It should *not* be used when comparing (or adding, > etc.) 8-bit strings to each other, since they still may contain binary > goop, even in a source file with a specified encoding! Marc-Andre took this idea a bit further, but I think it's not practical given the current implementation: there are too many places where the C code would have to be changed in order to propagate the string encoding information, and there are too many sources of strings with unknown encodings to make it very useful. Plus, it would slow down 8-bit string ops. I have a better idea: rather than carrying around 8-bit strings with an encoding, use Unicode literals in your source code. If the source encoding is known, these will be converted using the appropriate codec. If you object to having to write u"..." all the time, we could say that "..." is a Unicode literal if it contains any characters with the top bit on (of course the source file encoding would be used just like for u"..."). But I think this should be enabled by a separate pragma -- people who want to write Unicode-unaware code manipulating 8-bit strings in their favorite encoding (e.g. shift-JIS or Latin-1) should not silently get Unicode strings. (I thought about an option to make *all strings* (not just literals) Unicode, but the current implementation would require too much hacking. This is what JPython does, and maybe it should be what Python 3000 does; I don't see it as a realistic option for the 1.x series.) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Apr 28 16:27:18 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 28 Apr 2000 10:27:18 -0400 (EDT) Subject: [Python-Dev] Brian Hooper's patch to add u & u# to Py_BuildValue Message-ID: <14601.40902.531340.684389@seahag.cnri.reston.va.us> Brian Hooper submitted a patch to add U and U# to the format strings for Py_BuildValue(), and there were comments that indicated u and u# would be better. He's submitted a documentation update for this as well the implementation. If there are no objections, I'll incorporate these changes. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Apr 28 16:32:28 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:32:28 -0400 Subject: [Python-Dev] Re: [I18n-sig] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 06:44:00 EDT." <14601.27504.337569.201251@cymru.basistech.com> References: <14601.27504.337569.201251@cymru.basistech.com> Message-ID: <200004281432.KAA16418@eric.cnri.reston.va.us> > This is the exact reason that Unicode should be used for all string > literals: from a language design perspective I don't understand the > rationale for providing "traditional" and "unicode" string. In Python 3000, you would have a point. In current Python, there simply are too many programs and extensions written in other languages that manipulating 8-bit strings to ignore their existence. We're trying to add Unicode support to Python 1.6 without breaking code that used to run under Python 1.5.x; practicalities just make it impossible to go with Unicode for everything. I think that if Python didn't have so many extension modules (many maintained by 3rd party modules) it would be a lot easier to switch to Unicode for all strings (I think JavaScript has done this). In Python 3000, we'll have to seriously consider having separate character string and byte array objects, along the lines of Java's model. Note that I say "seriously consider." We'll first have to see how well the current solution works *in practice*. There's time before we fix Py3k in stone. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 28 16:33:24 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:33:24 -0400 Subject: [Python-Dev] Brian Hooper's patch to add u & u# to Py_BuildValue In-Reply-To: Your message of "Fri, 28 Apr 2000 10:27:18 EDT." <14601.40902.531340.684389@seahag.cnri.reston.va.us> References: <14601.40902.531340.684389@seahag.cnri.reston.va.us> Message-ID: <200004281433.KAA16446@eric.cnri.reston.va.us> > Brian Hooper submitted a patch to add U and U# to the format strings > for Py_BuildValue(), and there were comments that indicated u and u# > would be better. He's submitted a documentation update for this as > well the implementation. > If there are no objections, I'll incorporate these changes. Please go ahead, changing U/U# to u/u#. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 28 16:50:05 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:50:05 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode debate In-Reply-To: Your message of "Thu, 27 Apr 2000 21:20:22 CDT." <3908F566.8E5747C@prescod.net> References: <200004271501.LAA13535@eric.cnri.reston.va.us> <3908F566.8E5747C@prescod.net> Message-ID: <200004281450.KAA16493@eric.cnri.reston.va.us> [Paul Prescod] > I think that maybe an important point is getting lost here. I could be > wrong, but it seems that all of this emphasis on encodings is misplaced. In practical applications that manipulate text, encodings creep up all the time. I remember a talk or message by Andy Robinson about the messiness of producing printed reports in Japanese for a large investment firm. Most off the issues that took his time had to do with encodings, if I recall correctly. (Andy, do you remember what I'm talking about? Do you have a URL?) > > The truth of the matter is: the encoding of string objects is in the > > mind of the programmer. When I read a GIF file into a string object, > > the encoding is "binary goop". > > IMHO, it's a mistake of history that you would even think it makes sense > to read a GIF file into a "string" object and we should be trying to > erase that mistake, as quickly as possible (which is admittedly not very > quickly) not building more and more infrastructure around it. How can we > make the transition to a "binary goops are not strings" world easiest? I'm afraid that's a bigger issue than we can solve for Python 1.6. We're committed to by and large backwards compatibility while supporting Unicode -- the backwards compatibility with tons of extension module (many 3rd party) requires that we deal with 8-bit strings in basically the same way as we did before. > > The moral of all this? 8-bit strings are not going away. > > If that is a statement of your long term vision, then I think that it is > very unfortunate. Treating string literals as if they were isomorphic > with byte arrays was probably the right thing in 1991 but it won't be in > 2005. I think you're a tad too optimistic about the evolution speed of software (Windows 2000 *still* has to support DOS programs), but I see your point. As I stated in another message, in Python 3000 we'll have to consider a more Java-esque solution: *character* strings are Unicode, and for bytes we have (mutable!) byte arras. Certainly 8-bit bytes as the smallest storage unit aren't going away. > It doesn't meet the definition of string used in the Unicode spec., nor > in XML, nor in Java, nor at the W3C nor in most other up and coming > specifications. OK, so that's a good indication of where you're coming from. Maybe you should spend a little more time in the trenches and a little less in standards bodies. Standards are good, but sometimes disconnected from reality (remember ISO networking? :-). > From the W3C site: > > ""While ISO-2022-JP is not sufficient for every ISO10646 document, it is > the case that ISO10646 is a sufficient document character set for any > entity encoded with ISO-2022-JP."" And this is exactly why encodings will remain important: entities encoded in ISO-2022-JP have no compelling reason to be recoded permanently into ISO10646, and there are lots of forces that make it convenient to keep it encoded in ISO-2022-JP (like existing tools). > http://www.w3.org/MarkUp/html-spec/charset-harmful.html I know that document well. --Guido van Rossum (home page: http://www.python.org/~guido/) From just at letterror.com Fri Apr 28 19:51:03 2000 From: just at letterror.com (Just van Rossum) Date: Fri, 28 Apr 2000 18:51:03 +0100 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: <200004281410.KAA16104@eric.cnri.reston.va.us> References: Your message of "Fri, 28 Apr 2000 09:33:16 BST." Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: [GvR, on string.encoding ] >Marc-Andre took this idea a bit further, but I think it's not >practical given the current implementation: there are too many places >where the C code would have to be changed in order to propagate the >string encoding information, I may miss something, but the encoding attr just travels with the string object, no? Like I said in my reply to MAL, I think it's undesirable to do *anything* with the encoding attr if not in combination with a unicode string. >and there are too many sources of strings >with unknown encodings to make it very useful. That's why the default encoding must be settable as well, as Fredrik suggested. >Plus, it would slow down 8-bit string ops. Not if you ignore it most of the time, and just pass it along when concatenating. >I have a better idea: rather than carrying around 8-bit strings with >an encoding, use Unicode literals in your source code. Explain that to newbies... I guess is that they will want simple 8 bit strings in their native encoding. Dunno. >If the source >encoding is known, these will be converted using the appropriate >codec. > >If you object to having to write u"..." all the time, we could say >that "..." is a Unicode literal if it contains any characters with the >top bit on (of course the source file encoding would be used just like >for u"..."). Only if "\377" would still yield an 8-bit string, for binary goop... Just From guido at python.org Fri Apr 28 20:31:19 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 14:31:19 -0400 Subject: [I18n-sig] Re: [Python-Dev] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 18:51:03 BST." References: Your message of "Fri, 28 Apr 2000 09:33:16 BST." Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: <200004281831.OAA17406@eric.cnri.reston.va.us> > [GvR, on string.encoding ] > >Marc-Andre took this idea a bit further, but I think it's not > >practical given the current implementation: there are too many places > >where the C code would have to be changed in order to propagate the > >string encoding information, [JvR] > I may miss something, but the encoding attr just travels with the string > object, no? Like I said in my reply to MAL, I think it's undesirable to do > *anything* with the encoding attr if not in combination with a unicode > string. But just propagating affects every string op -- s+s, s*n, s[i], s[:], s.strip(), s.split(), s.lower(), ... > >and there are too many sources of strings > >with unknown encodings to make it very useful. > > That's why the default encoding must be settable as well, as Fredrik > suggested. I'm open for debate about this. There's just something about a changeable global default encoding that worries me -- like any global property, it requires conventions and defensive programming to make things work in larger programs. For example, a module that deals with Latin-1 strings can't just set the default encoding to Latin-1: it might be imported by a program that needs it to be UTF-8. This model is currently used by the locale in C, where all locale properties are global, and it doesn't work well. For example, Python needs to go through a lot of hoops so that Python numeric literals use "." for the decimal indicator even if the user's locale specifies "," -- we can't change Python to swap the meaning of "." and "," in all contexts. So I think that a changeable default encoding is of limited value. That's different from being able to set the *source file* encoding -- this only affects Unicode string literals. > >Plus, it would slow down 8-bit string ops. > > Not if you ignore it most of the time, and just pass it along when > concatenating. And slicing, and indexing, and... > >I have a better idea: rather than carrying around 8-bit strings with > >an encoding, use Unicode literals in your source code. > > Explain that to newbies... I guess is that they will want simple 8 bit > strings in their native encoding. Dunno. If they are hap-py with their native 8-bit encoding, there's no need for them to ever use Unicode objects in their program, so they should be fine. 8-bit strings aren't ever interpreted or encoded except when mixed with Unicode objects. > >If the source > >encoding is known, these will be converted using the appropriate > >codec. > > > >If you object to having to write u"..." all the time, we could say > >that "..." is a Unicode literal if it contains any characters with the > >top bit on (of course the source file encoding would be used just like > >for u"..."). > > Only if "\377" would still yield an 8-bit string, for binary goop... Correct. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Apr 28 20:57:18 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 28 Apr 2000 20:57:18 +0200 Subject: [Python-Dev] Changing PYC-Magic Message-ID: <3909DF0E.1D886485@lemburg.com> I have just looked at the Python/import.c file and the hard coded PYC magic number... /* Magic word to reject .pyc files generated by other Python versions */ /* Change for each incompatible change */ /* The value of CR and LF is incorporated so if you ever read or write a .pyc file in text mode the magic number will be wrong; also, the Apple MPW compiler swaps their values, botching string constants */ /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */ /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ #define MAGIC (20121 | ((long)'\r'<<16) | ((long)'\n'<<24)) A bit outdated, I'd say. With the addition of Unicode, the PYC files will contain marshalled Unicode objects which are not readable by older versions. I'd suggest bumping the magic number to 50428 ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From trentm at ActiveState.com Sat Apr 29 00:08:57 2000 From: trentm at ActiveState.com (Trent Mick) Date: Fri, 28 Apr 2000 15:08:57 -0700 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: <200004261851.OAA06794@eric.cnri.reston.va.us> Message-ID: > > Guido van Rossum wrote: > > > > > > The email below is a serious bug report. A quick analysis shows that > > > UserString.count() calls the count() method on a string object, which > > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > > format code truncates integers. It probably should raise an overflow > > > exception instead. But that would still cause the test to fail -- > > > just in a different way (more explicit). Then the string methods > > > should be fixed to use long ints instead -- and then something else > > > would probably break... > > MAL wrote: > > All uses in stringobject.c and unicodeobject.c use INT_MAX > > together with integers, so there's no problem on that side > > of the fence ;-) > > > > Since strings and Unicode objects use integers to describe the > > length of the object (as well as most if not all other > > builtin sequence types), the correct default value should > > thus be something like sys.maxlen which then gets set to > > INT_MAX. > > > > I'd suggest adding sys.maxlen and the modifying UserString.py, > > re.py and sre_parse.py accordingly. > Guido wrote: > Hm, I'm not so sure. It would be much better if passing sys.maxint > would just WORK... Since that's what people have been doing so far. > Possible solutions (I give 4 of them): 1. The 'i' format code could raise an overflow exception and the PyArg_ParseTuple() call in string_count() could catch it and truncate to INT_MAX (reasoning that any overflow of the end position of a string can be bound to INT_MAX because that is the limit for any string in Python). Pros: - This "would just WORK" for usage of sys.maxint. Cons: - This overflow exception catching should then reasonably be propagated to other similar functions (like string.endswith(), etc). - We have to assume that the exception raised in the PyArg_ParseTuple(args, "O|ii:count", &subobj, &i, &last) call is for the second integer (i.e. 'last'). This is subtle and ugly. Pro or Con: - Do we want to start raising overflow exceptions for other conversion formats (i.e. 'b' and 'h' and 'l', the latter *can* overflow on Win64 where sizeof(long) < size(void*))? I think this is a good idea in principle but may break code (even if it *does* identify bugs in that code). 2. Just change the definitions of the UserString methods to pass a variable length argument list instead of default value parameters. For example change UserString.count() from: def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) to: def count(self, *args)): return self.data.count(*args) The result is that the default value for 'end' is now set by string_count() rather than by the UserString implementation: >>> from UserString import UserString >>> s= 'abcabcabc' >>> u = UserString('abcabcabc') >>> s.count('abc') 3 >>> u.count('abc') 3 Pros: - Easy change. - Fixes the immediate bug. - This is a safer way to copy the string behaviour in UserString anyway (is it not?). Cons: - Does not fix the general problem of the (common?) usage of sys.maxint to mean INT_MAX rather than the actual LONG_MAX (this matters on 64-bit Unices). - The UserString code is no longer really self-documenting. 3. As MAL suggested: add something like sys.maxlen (set to INT_MAX) with breaks the logical difference with sys.maxint (set to LONG_MAX): - sys.maxint == "the largest value a Python integer can hold" - sys.maxlen == "the largest value for the length of an object in Python (e.g. length of a string, length of an array)" Pros: - More explicit in that it separates two distinct meanings for sys.maxint (which now makes a difference on 64-bit Unices). - The code changes should be fairly straightforward. Cons: - Places in the code that still use sys.maxint where they should use sys.maxlen will unknowingly be overflowing ints and bringing about this bug. - Something else for coders to know about. 4. Add something like sys.maxlen, but set it to SIZET_MAX (c.f. ANSI size_t type). It is probably not a biggie, but Python currently makes the assumption that string never exceed INT_MAX in length. While this assumption is not likely to be proven false it technically could be on 64-bit systems. As well, when you start compiling on Win64 (where sizeof(int) == sizeof(long) < sizeof(size_t)) then you are going to be annoyed by hundreds of warnings about implicit casts from size_t (64-bits) to int (32-bits) for every strlen, str*, fwrite, and sizeof call that you make. Pros: - IMHO logically more correct. - Might clean up some subtle bugs. - Cleans up annoying and disconcerting warnings. - Will probably mean less pain down the road as 64-bit systems (esp. Win64) become more prevalent. Cons: - Lot of coding changes. - As Guido said: "and then something else would probably break". (Though, on currently 32-bits system, there should be no effective change). Only 64-bit systems should be affected and, I would hope, the effect would be a clean up. I apologize for not being succinct. Note that I am volunteering here. Opinions and guidance please. Trent From moshez at math.huji.ac.il Sat Apr 29 04:08:48 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 29 Apr 2000 05:08:48 +0300 (IDT) Subject: [I18n-sig] Re: [Python-Dev] Unicode debate In-Reply-To: <200004281450.KAA16493@eric.cnri.reston.va.us> Message-ID: I agree with most of what you say, but... On Fri, 28 Apr 2000, Guido van Rossum wrote: > As I stated in another message, in Python 3000 we'll have > to consider a more Java-esque solution: *character* strings are > Unicode, and for bytes we have (mutable!) byte arras. I would prefer a different distinction: mutable immutable chars string string_buffer bytes bytes bytes_buffer Why not allow me the freedom to index a dictionary with goop? (Here's a sample application: UNIX "file" command) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Sat Apr 29 14:50:07 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 29 Apr 2000 14:50:07 +0200 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) References: Message-ID: <390ADA7F.2C01C6C3@lemburg.com> Trent Mick wrote: > > > > Guido van Rossum wrote: > > > > > > > > The email below is a serious bug report. A quick analysis shows that > > > > UserString.count() calls the count() method on a string object, which > > > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > > > format code truncates integers. It probably should raise an overflow > > > > exception instead. But that would still cause the test to fail -- > > > > just in a different way (more explicit). Then the string methods > > > > should be fixed to use long ints instead -- and then something else > > > > would probably break... > > > > MAL wrote: > > > All uses in stringobject.c and unicodeobject.c use INT_MAX > > > together with integers, so there's no problem on that side > > > of the fence ;-) > > > > > > Since strings and Unicode objects use integers to describe the > > > length of the object (as well as most if not all other > > > builtin sequence types), the correct default value should > > > thus be something like sys.maxlen which then gets set to > > > INT_MAX. > > > > > > I'd suggest adding sys.maxlen and the modifying UserString.py, > > > re.py and sre_parse.py accordingly. > > > Guido wrote: > > Hm, I'm not so sure. It would be much better if passing sys.maxint > > would just WORK... Since that's what people have been doing so far. > > > > Possible solutions (I give 4 of them): > [...] Here is another one... I don't really like it because I think that silent truncations are a bad idea, but to make things "just work it would help: * Change PyArg_ParseTuple() to truncate the range(INT_MAX+1, LONG_MAX+1) to INT_MAX and the same for negative numbers when passing a Python integer to a "i" marked variable. This would map range(INT_MAX+1, LONG_MAX+1) to INT_MAX and thus sys.maxint would turn out as INT_MAX in all those cases where "i" is used as parser marker. Dito for negative values. With this truncation passing sys.maxint as default argument for length parameters would "just work" :-). The more radical alternative would be changing the Python object length fields to long -- I don't think this is practical though (and probably not really needed unless you intend to work with 3GB strings ;). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From paul at prescod.net Sat Apr 29 16:18:05 2000 From: paul at prescod.net (Paul Prescod) Date: Sat, 29 Apr 2000 09:18:05 -0500 Subject: [I18n-sig] Re: [Python-Dev] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> <3908F566.8E5747C@prescod.net> <200004281450.KAA16493@eric.cnri.reston.va.us> Message-ID: <390AEF1D.253B93EF@prescod.net> Guido van Rossum wrote: > > [Paul Prescod] > > I think that maybe an important point is getting lost here. I could be > > wrong, but it seems that all of this emphasis on encodings is misplaced. > > In practical applications that manipulate text, encodings creep up all > the time. I'm not saying that encodings are unimportant. I'm saying that that they are *different* than what Fredrik was talking about. He was talking about a coherent logical model for characters and character strings based on the conventions of more modern languages and systems than C and Python. > > How can we > > make the transition to a "binary goops are not strings" world easiest? > > I'm afraid that's a bigger issue than we can solve for Python 1.6. I understand that we can't fix the problem now. I just think that we shouldn't go out of our ways to make it worst. If we make byte-array strings "magically" cast themselves into character-strings, people will expect that behavior forever. > > It doesn't meet the definition of string used in the Unicode spec., nor > > in XML, nor in Java, nor at the W3C nor in most other up and coming > > specifications. > > OK, so that's a good indication of where you're coming from. Maybe > you should spend a little more time in the trenches and a little less > in standards bodies. Standards are good, but sometimes disconnected > from reality (remember ISO networking? :-). As far as I know, XML and Java are used a fair bit in the real world...even somewhat in Asia. In fact, there is a book titled "XML and Java" written by three Japanese men. > And this is exactly why encodings will remain important: entities > encoded in ISO-2022-JP have no compelling reason to be recoded > permanently into ISO10646, and there are lots of forces that make it > convenient to keep it encoded in ISO-2022-JP (like existing tools). You cannot recode an ISO-2022-JP document into ISO10646 because 10646 is a character *set* and not an encoding. ISO-2022-JP says how you should represent characters in terms of bits and bytes. ISO10646 defines a mapping from integers to characters. They are both important, but separate. I think that this automagical re-encoding conflates them. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From moshez at math.huji.ac.il Sat Apr 29 20:09:40 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 29 Apr 2000 21:09:40 +0300 (IDT) Subject: [Python-Dev] At the interactive port Message-ID: Continuing the recent debate about what is appropriate to the interactive prompt printing, and the wide agreement that whatever we decide, users might think otherwise, I've written up a patch to have the user control via a function in __builtin__ the way things are printed at the prompt. This is not patches at python level stuff for two reasons: 1. I'm not sure what to call this function. Currently, I call it __print_expr__, but I'm not sure it's a good name 2. I haven't yet supplied a default in __builtin__, so the user *must* override this. This is unacceptable, of course. I'd just like people to tell me if they think this is worth while, and if there is anything I missed. *** ../python/dist/src/Python/ceval.c Fri Mar 31 04:42:47 2000 --- Python/ceval.c Sat Apr 29 03:55:36 2000 *************** *** 1014,1047 **** case PRINT_EXPR: v = POP(); ! /* Print value except if None */ ! /* After printing, also assign to '_' */ ! /* Before, set '_' to None to avoid recursion */ ! if (v != Py_None && ! (err = PyDict_SetItemString( ! f->f_builtins, "_", Py_None)) == 0) { ! err = Py_FlushLine(); ! if (err == 0) { ! x = PySys_GetObject("stdout"); ! if (x == NULL) { ! PyErr_SetString( ! PyExc_RuntimeError, ! "lost sys.stdout"); ! err = -1; ! } ! } ! if (err == 0) ! err = PyFile_WriteObject(v, x, 0); ! if (err == 0) { ! PyFile_SoftSpace(x, 1); ! err = Py_FlushLine(); ! } ! if (err == 0) { ! err = PyDict_SetItemString( ! f->f_builtins, "_", v); ! } } ! Py_DECREF(v); break; case PRINT_ITEM: --- 1014,1035 ---- case PRINT_EXPR: v = POP(); ! x = PyDict_GetItemString(f->f_builtins, ! "__print_expr__"); ! if (x == NULL) { ! PyErr_SetString(PyExc_SystemError, ! "__print_expr__ not found"); ! Py_DECREF(v); ! break; ! } ! t = PyTuple_New(1); ! if (t != NULL) { ! PyTuple_SET_ITEM(t, 0, v); ! w = PyEval_CallObject(x, t); ! Py_XDECREF(w); } ! /*Py_DECREF(x);*/ ! Py_XDECREF(t); break; case PRINT_ITEM: -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From trentm at activestate.com Sat Apr 29 20:12:07 2000 From: trentm at activestate.com (Trent Mick) Date: Sat, 29 Apr 2000 11:12:07 -0700 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: <390ADA7F.2C01C6C3@lemburg.com> References: <390ADA7F.2C01C6C3@lemburg.com> Message-ID: <20000429111207.A16414@activestate.com> On Sat, Apr 29, 2000 at 02:50:07PM +0200, M.-A. Lemburg wrote: > Here is another one... I don't really like it because I think that > silent truncations are a bad idea, but to make things "just work > it would help: > > * Change PyArg_ParseTuple() to truncate the range(INT_MAX+1, LONG_MAX+1) > to INT_MAX and the same for negative numbers when passing a > Python integer to a "i" marked variable. This would map > range(INT_MAX+1, LONG_MAX+1) to INT_MAX and thus sys.maxint > would turn out as INT_MAX in all those cases where "i" is > used as parser marker. Dito for negative values. > > With this truncation passing sys.maxint as default argument > for length parameters would "just work" :-). > > The more radical alternative would be changing the Python > object length fields to long -- I don't think this is If we *do* make this change however, say "size_t" please, rather than long because on Win64 sizeof(long) < sizeof(size_t) == sizeof(void*). > practical though (and probably not really needed unless > you intend to work with 3GB strings ;). I know that 3GB+ strings are not likely to come along but if the length fields were size_t it would clean up implicit downcasts that you currently get from size_t to int on calls to strlen and the like on 64-bit systems. Trent From bjorn at roguewave.com Sat Apr 1 00:02:07 2000 From: bjorn at roguewave.com (Bjorn Pettersen) Date: Fri, 31 Mar 2000 15:02:07 -0700 Subject: [Python-Dev] Re: Python 1.6 alpha 1 released References: <200003312130.QAA04361@eric.cnri.reston.va.us> Message-ID: <38E5205F.DE811F61@roguewave.com> Guido van Rossum wrote: > > I've just released a source tarball and a Windows installer for Python > 1.6 alpha 1 to the Python website: > > http://www.python.org/1.6/ > > Probably the biggest news (if you hadn't heard the rumors) is Unicode > support. More news on the above webpage. > > Note: this is an alpha release. Some of the code is very rough! > Please give it a try with your favorite Python application, but don't > trust it for production use yet. I plan to release several more alpha > and beta releases over the next two months, culminating in an 1.6 > final release around June first. > > We need your help to make the final 1.6 release as robust as possible > -- please test this alpha release!!! > > --Guido van Rossum (home page: http://www.python.org/~guido/) Just read the announcement page, and found that socket.connect() no longer takes two arguments as was previously documented. If this change is staying I'm assuming the examples in the manual that uses a two argument socket.connect() will be changed? A quick look shows that this breaks all the network scripts I have installed (at least the ones that I found, undoubtedly there are many more). Because of this I will put any upgrade plans on hold. -- bjorn From tim_one at email.msn.com Sat Apr 1 02:55:54 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 31 Mar 2000 19:55:54 -0500 Subject: [Python-Dev] A surprising case of cyclic trash Message-ID: <000d01bf9b75$08bf58e0$1aa2143f@tim> This comes (indirectly) from a user of my doctest.py, who noticed that sometimes tempfiles created by his docstring tests got cleaned up (via __del__), but other times not. Here's a hard-won self-contained program illustrating the true cause: class Critical: count = 0 def __init__(self): Critical.count = Critical.count + 1 self.id = Critical.count print "acquiring Critical", self.id def __del__(self): print "releasing Critical", self.id good = "temp = Critical()\n" bad = "def f(): pass\n" + good basedict = {"Critical": Critical} for test in good, bad, good: print "\nStarting test case:" print test exec compile(test, "", "exec") in basedict.copy() And here's output: D:\Python>python misc\doccyc.py Starting test case: temp = Critical() acquiring Critical 1 releasing Critical 1 Starting test case: def f(): pass temp = Critical() acquiring Critical 2 Starting test case: temp = Critical() acquiring Critical 3 releasing Critical 3 D:\Python> That is, in the "bad" case, which differs from the "good" case merely in defining an unreferenced function, temp.__del__ not only doesn't get executed "when expected", it never gets executed at all. This appears to be due to a cycle between the function object and the anonymous dict passed to exec, causing the entire dict to become immortal, thus making "temp" immortal too. I can fiddle the doctest framework to manually nuke the temp dict it creates for execution context; the same kind of leak likely occurs in any exec'ed string that contains a function defn. For future reference, note that the finalizer in question belongs to an object not itself in a cycle, it's an object reachable only from a dead cycle. the-users-don't-stand-a-chance-ly y'rs - tim From tismer at tismer.com Sat Apr 1 16:55:50 2000 From: tismer at tismer.com (Christian Tismer) Date: Sat, 01 Apr 2000 16:55:50 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Misc ACKS,1.51,1.52 References: Message-ID: <38E60DF6.9C4C9443@tismer.com> Moshe Zadka wrote: > > On Fri, 31 Mar 2000, Guido van Rossum wrote: > > > + Christian Tismer > > + Christian Tismer > > Ummmmm....I smell something fishy here. Are there two Christian Tismers? Yes! From time to time I'm re-doing my cloning experiments. This isn't so hard as it seems. The hard thing is to keep them from killing each other. BTW: I'm the second copy from the last experiment (the surviver). > That would explain how Christian has so much time to work on Stackless. > > Well, between the both of them, Guido will have no chance but to put > Stackless in the standard distribution. Guido is stronger, even between three of me :-) ciao - chris-and-the-undead-heresy -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From guido at python.org Sat Apr 1 19:00:00 2000 From: guido at python.org (Guido van Rossum) Date: Sat, 1 Apr 2000 12:00:00 -0500 EST Subject: [Python-Dev] New Features in Python 1.6 Message-ID: <200004011740.MAA04675@eric.cnri.reston.va.us> New Features in Python 1.6 ========================== With the recent release of Python 1.6 alpha 1, a lot of people have been wondering what's new. This short note aims to explain the major changes in Python 1.6. Core Language ------------- 1. Unicode strings Python strings can now be stored as Unicode strings. To make it easier to type Unicode strings, the single-quote character defaults to creating a Unicode string, while the double-quote character defaults to ASCII strings. If you need to create a Unicode string with double quotes, just preface it with the letter "u"; likewise, an ASCII string can be created by prefacing single quotes with the letter "a". For example: foo = 'hello' # Unicode foo = "hello" # ASCII foo = a'hello' # ASCII foo = u"hello" # Unicode You can still use the "r" character to quote strings in a manner convenient for the regular expression engine, but with subtle changes in semantics: see "New regular expression engine" below for more information. Also, for compatibility with most editors and operating systems, Python source code is still 7-bit ASCII. Thus, for portability it's best to write Unicode strings using one of two new escapes: \u and \N. \u lets you specify a Unicode character as a 16-bit hexadecimal number, and \N lets you specify it by name: message = 'Bienvenue \N{LATIN SMALL LETTER A WITH GRAVE} ' \ + 'Python fran\N{LATIN SMALL LETTER C WITH CEDILLA}ais!' message = 'Bienvenue \u00E0 Python fran\u00E7ais!' 2. string methods Python strings have grown methods, just like lists and dictionaries. For instance, to split a string on spaces, you can now say this: tokens = "foo bar baz".split(" ") Or, equivalently, this: tokens = " ".split("foo bar baz") (Python figures out which string is the delimiter and which is the string to split by examining both strings to see which one occurs more frequently inside the other.) Be careful not to mix Unicode and ASCII strings when doing this, though. Other examples: foo = "The quick red fox jumped over the lazy brown dog." foo.find("dog") foo.strip() foo.lower() Note that use of any string method on a particular string renders it mutable. This is for consistency with lists, which are mutable and have methods like 'append()' and 'sort()' that modify the list. Thus, "foo.strip()" modifies the string 'foo' in-place. "strip(foo)" retains its old behavior of returning a modified copy of 'foo'. 3. extended call syntax The variable argument list and keyword argument syntax introduced in Python 1.3 has been extended. Previously, it only worked in function/method signatures; calling other functions with the same arguments required the use of 'apply()' def spam(arg1,arg2,*more_args,**keyword_args): # ... apply(foo,(arg1,arg2) + more_args,keyword_args) Now it works for calling functions too. For consistency with C and C++, asterisks in the function signature become ampersands in the function body: foo(arg1,arg2,&more_args,&&keyword_args) 4. assignment to None now works In previous version of Python, values assigned to None were lost. For example, this code: (username,None,None,None,realname,homedir,None) = getpwuid(uid) would only preserve the user name, real name, and home directory fields from a password file entry -- everything else of interest was lost. In Python 1.6, you can meaningfully assign to None. In the above example, None would be replaced by a tuple containing the four values of interest. You can also use the variable argument list syntax here, for example: (username,password,uid,uid,*None) = getpwuid(uid) would set None to a tuple containing the last three elements of the tuple returned by getpwuid. Library ------- 1. Distutils In the past, lots of people have complained about the lack of a standard mechanism for distributing and installing Python modules. This has been fixed by the Distutils, or Distribution Utilities. We took the approach of leveraging past efforts in this area rather than reinventing a number of perfectly good wheels. Thus, the Distutils take advantage of a number of "best-of-breed" tools for distributing, configuring, building, and installing software. The core of the system is a set of m4 macros that augment the standard macros supplied by GNU Autoconf. Where the Autoconf macros generate shell code that becomes a configure script, the Distutils macros generate Python code that creates a Makefile. (This is a similar idea to Perl's MakeMaker system, but of course this Makefile builds Python modules and extensions!) Using the Distutils is easy: you write a script called "setup.in" which contains both Autoconf and Distutils m4 macros; the Autoconf macros are used to create a "configure" script which examines the target system to find out how to build your extensions there, and the Distutils macros create a "setup.py" script, which generates a Makefile that knows how to build your particular collection of modules. You process "setup.in" before distributing your modules, and bundle the resulting "configure" and "setup.py" with your modules. Then, the user just has to run "configure", "setup.py", and "make" to build everything. For example, here's a small, simple "setup.in" for a hypothetical module distribution that uses Autoconf to check for a C library "frob" and builds a Python extension called "_frob" and a pure Python module "frob": AC_INIT(frobmodule.c) AC_CHECK_HEADER(frob.h) AC_HAVE_LIBRARY(frob) AC_OUTPUT() DU_INIT(Frob,1.0) DU_EXTENSION(_frob,frobmodule.c,-lfrob) DU_MODULE(frob,frob.py) DU_OUTPUT(setup.py) First, you run this setup.in using the "prepare_dist" script; this creates "configure" and "setup.py": % prepare_dist Next, you configure the package and create a makefile: % ./configure % ./setup.py Finally, to create a source distribution, use the "sdist" target of the generated Makefile: % make sdist This creates Frob-1.0.tar.gz, which you can then share with the world. A user who wishes to install your extension would download Frob-1.0.tar.gz and create local, custom versions of the "configure" and "setup.py" scripts: % gunzip -c Frob-1.0.tar.gz | tar xf - % cd Frob-1.0 % ./configure % ./setup.py Then, she can build and install your modules: % make % make install Hopefully this will foster even more code sharing in the Python community, and prevent unneeded duplication of effort by module developers. Note that the Python installer for Windows now installs GNU m4, the bash shell, and Autoconf, so that Windows users will be able to use the Distutils just like on Unix. 2. Imputils Complementary to the Distutils are the Imputils, or Import Utilities. Python's import mechanism has been reworked to make it easy for Python programmers to put "hooks" into the code that finds and loads modules. The default import mechanism now includes hooks, written in Python, to load modules via HTTP from a known URL. This has allowed us to drop most of the standard library from the distribution. Now, for example, when you import a less-commonly-needed module from the standard library, Python fetches the code for you. For example, if you say import tokenize then Python -- via the Imputils -- will fetch http://modules.python.org/lib/tokenize.py for you and install it on your system for future use. (This is why the Python interpreter is now installed as a setuid binary under Unix -- if you turn off this bit, you will be unable to load modules from the standard library!) If you try to import a module that's not part of the standard library, then the Imputils will find out -- again from modules.python.org -- where it can find this module. It then downloads the entire relevant module distribution, and uses the Distutils to build and install it on your system. It then loads the module you requested. Simplicity itself! 3. New regular expression engine Python 1.6 includes a new regular expression engine, accessed through the "sre" module, to support Unicode strings. Be sure to use the *old* engine for ASCII strings, though: import re, sre # ... re.match(r"(\d+)", "The number is 42.") # ASCII sre.match(r'(\d+)', 'The number is \N{SUPERSCRIPT TWO}') # Unicode If you're not sure whether a string is ASCII or Unicode, you can always determine this at runtime: from types import * # ... if type(s) is StringType: m = re.match(r"...", s) elif type(s) is UnicodeType: m = sre.match(r'...', s) From gvwilson at nevex.com Sat Apr 1 20:01:13 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Sat, 1 Apr 2000 13:01:13 -0500 (EST) Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: > On Sat, 1 Apr 2000, Guido van Rossum wrote: > New Features in Python 1.6 > ========================== > [lots 'n' lots] > tokens = "foo bar baz".split(" ") > tokens = " ".split("foo bar baz") Has anyone started working up a style guide that'll recommend when to use these new methods, when to use the string module's calls, etc.? Ditto for the other changes --- where there are now two or more ways of doing something, how do I (or my students) tell which one is preferred? Greg p.s. "There's More Than One Way To Do It" == "No Matter How Much Of This Language You Learn, Other People's Code Will Always Look Strange" From gvwilson at nevex.com Sat Apr 1 20:45:16 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Sat, 1 Apr 2000 13:45:16 -0500 (EST) Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <000101bf9c09$71011bc0$182d153f@tim> Message-ID: > >> On Sat, 1 Apr 2000, Guido van Rossum wrote: > >> New Features in Python 1.6 > >> ========================== > >> [lots 'n' lots] > >> tokens = "foo bar baz".split(" ") > >> tokens = " ".split("foo bar baz") > >> [and Python guesses which to split on by studying the contents] > > > Has anyone started working up a style guide that'll recommend when > > to use these new methods, when to use the string module's calls, > > etc.? Ditto for the other changes --- where there are now two or > > more ways of doing something, how do I (or my students) tell which > > one is preferred? > > Greg, you should pay real close attention to the date on Guido's msg. > It's quite a comment on the state of programming languages in general > that this all reads sooooooo plausibly! Well, you have to remember, I'm the guy who asked for "<" to be a legal Python token :-). Greg From est at hyperreal.org Sun Apr 2 00:00:54 2000 From: est at hyperreal.org (est at hyperreal.org) Date: Sat, 1 Apr 2000 14:00:54 -0800 (PST) Subject: [Python-Dev] linuxaudiodev minimal test Message-ID: <20000401220054.13820.qmail@hyperreal.org> The appended script works for me. I think the module should be called something like OSS (since it uses the Open Sound System API) with a -I entry in Setup.in to indicate that this will probably need to be specified to find (e.g., -I/usr/include/linux for Linux, -I/usr/include/machine for FreeBSD...). I'm sure I'll have other suggestions for the module, but they'll have to wait until I finish moving to California. :) Best, Eric #!/usr/bin/python import linuxaudiodev import math, struct, fcntl, FCNTL a = linuxaudiodev.open('w') a.setparameters(44100, 16, 1, linuxaudiodev.AFMT_S16_LE) N = 500 data = apply(struct.pack, ['<%dh' % N] + map(lambda n: 32767 * math.sin((2 * math.pi * n) / N), range(N))) fd = a.fileno() fcntl.fcntl(fd, FCNTL.F_SETFL, ~FCNTL.O_NONBLOCK & fcntl.fcntl(fd, FCNTL.F_GETFL)) for i in xrange(200): a.write(data) From Vladimir.Marangozov at inrialpes.fr Sun Apr 2 01:30:46 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sun, 2 Apr 2000 01:30:46 +0200 (CEST) Subject: [Python-Dev] python -t gets confused? Message-ID: <200004012330.BAA10022@python.inrialpes.fr> The tab/space checking code in the tokenizer seems to get confused by the recently checked in test_pyexpat.py With python -t or -tt, there are inconsistency reports at places where there doesn't seem to be one. (tabnanny seems to be confused too, btw :) ./python -tt Lib/test/test_pyexpat.py File "Lib/test/test_pyexpat.py", line 13 print 'Start element:\n\t', name, attrs ^ SyntaxError: inconsistent use of tabs and spaces in indentation Thus, "make test" reports a failure on test_pyexpat due to a syntax error, instead of a missing optional feature (expat not compiled in). I'm not an expert of the tokenizer code, so someone might want to look at it and tell us what's going on. Without -t or -tt, the code runs fine. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond at skippinet.com.au Sun Apr 2 01:53:50 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun, 2 Apr 2000 09:53:50 +1000 Subject: [Python-Dev] string.ato? and Unicode Message-ID: Is this an over-sight, or by design? >>> string.atoi(u"1") ... TypeError: argument 1: expected string, unicode found It appears easy to support Unicode - there is already an explicit StringType check in these functions, and it simply delegates to int(), which already _does_ work for Unicode A patch would leave the following behaviour: >>> string.atio(u"1") 1 >>> string.atio(u"1", 16) ... TypeError: can't convert non-string with explicit base IMO, this is better than what we have now. I'll put together a patch if one is wanted... Mark. From tim_one at email.msn.com Sun Apr 2 06:14:23 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 1 Apr 2000 23:14:23 -0500 Subject: [Python-Dev] python -t gets confused? In-Reply-To: <200004012330.BAA10022@python.inrialpes.fr> Message-ID: <000601bf9c59$ef8a3da0$752d153f@tim> [Vladimir Marangozov] > The tab/space checking code in the tokenizer seems to get confused > by the recently checked in test_pyexpat.py > > With python -t or -tt, there are inconsistency reports at places where > there doesn't seem to be one. (tabnanny seems to be confused too, btw :) They're not confused, they're simply reporting that the indentation is screwed up in this file -- which it is. It mixes tabs and spaces in ambiguous ways. > ... > I'm not an expert of the tokenizer code, so someone might want to look > at it and tell us what's going on. Without -t or -tt, the code runs fine. If you set your editor to believe that tab chars are 4 columns (as my Windows editor does), the problem (well, problems -- many lines are flawed) will be obvious. It runs anyway because tab=8 is hardcoded in the Python parser. Quickest fix is for someone at CNRI to just run this thru one of the Unix detabifier programs. From tim_one at email.msn.com Sun Apr 2 08:18:28 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 2 Apr 2000 01:18:28 -0500 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <200003311547.KAA15538@eric.cnri.reston.va.us> Message-ID: <000c01bf9c6b$428af560$752d153f@tim> > The Windows installer is always hard to get just right. ... > ... > I'd love to hear that it also installs cleanly on Windows 95. Please > test IDLE from the start menu! All worked without incident for me under Win95. Nice! Would still prefer that it install to D:\Python-1.6\ by default, though (instead of burying it under "Program Files" -- if you're not on the Help list, you can't believe how hard it is to explain how to deal with embedded spaces in paths). So far I've seen one system crash in TK83.DLL upon closing an IDLE window, but haven't been able to reproduce. OK, I can, it's easy: Open IDLE. Ctrl+O, then navigate to e.g. Tools\idle\config.txt and open it. Click the "close window" button. Boom -- invalid page fault in TK83.DLL. No time to dig further now. From tim_one at email.msn.com Sun Apr 2 08:18:31 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 2 Apr 2000 01:18:31 -0500 Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: Message-ID: <000d01bf9c6b$447957e0$752d153f@tim> [Peter Funk] > -1 for C reformatting. The 4 space intendation seesm reasonable for > Python sources, but I disaggree for C code. C is not Python. Code is code. The project I work on professionally is a half million lines of C++, and 4-space indents are rigidly enforced -- works great. It makes just as much sense for C as for Python, and for all the same reasons. The one formal study I've seen on this showed that comprehension levels peaked at indent levels of 3 and 4, dropping off on both sides. However, tabs in C is one of Guido's endearing inconsistencies, and we don't want to lose the only two of those he has (his other is trying to avoid curly braces whenever possible in C, perhaps out of the same perverse sense of pride I used to take in avoiding redundant semicolons in Pascal <;{} wink>. From pf at artcom-gmbh.de Sun Apr 2 10:03:29 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Sun, 2 Apr 2000 10:03:29 +0200 (MEST) Subject: Indentation of Python interpreter C source (was Re: [Python-Dev] Re: [Python-chec....) In-Reply-To: <000d01bf9c6b$447957e0$752d153f@tim> from Tim Peters at "Apr 2, 2000 1:18:31 am" Message-ID: Hi! > [Peter Funk] > > -1 for C reformatting. The 4 space intendation seesm reasonable for > > Python sources, but I disaggree for C code. C is not Python. Tim Peters: > Code is code. The project I work on professionally is a half million lines > of C++, and 4-space indents are rigidly enforced -- works great. It makes > just as much sense for C as for Python, and for all the same reasons. The > one formal study I've seen on this showed that comprehension levels peaked > at indent levels of 3 and 4, dropping off on both sides. Sigh... Well, if the Python-Interpreter C sources were indented with 4 spaces from the very beginning, I would have kept my mouth shut! But as we can't get the whole world to aggree on how to indent C-Sources, we should at least try to avoid the loss off energy and time, the debate on this topic will cause. So what's my point? IMO reformatting the C-sources wouldn't do us any favor. There will always be people, who like another indentation style more. The GNU software and the Linux kernel have set some standards within the open source community. These projects represent a reasonable fraction of programmers that may be potential contributors to other open source projects. So the only effect a reformatting from 8 to 4 space indents would be to disturb the "8-spacers" and causing endless discussions like this one. Period. > However, tabs in C is one of Guido's endearing inconsistencies, and we don't > want to lose the only two of those he has (his other is trying to > avoid curly braces whenever possible in C, perhaps out of the same perverse > sense of pride I used to take in avoiding redundant semicolons in Pascal > <;{} wink>. Aggreed. Best reagrds, Peter From effbot at telia.com Sun Apr 2 10:37:11 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sun, 2 Apr 2000 10:37:11 +0200 Subject: [Python-Dev] SRE: regex.set_syntax Message-ID: <004701bf9c7e$a5045480$34aab5d4@hagrid> one of my side projects for SRE is to create a regex-compatible frontend. since both engines have NFA semantics, this mostly involves writing an alternate parser. however, when I started playing with that, I completely forgot about the regex.set_syntax() function. supporting one extra syntax isn't that much work, but a whole bunch of them? so what should we do? 1. completely get rid of regex (bjorn would love that, don't you think?) 2. remove regex.set_syntax(), and tell people who've used it that they're SOL. 3. add all the necessary flags to the new parser... 4. keep regex around as before, and live with the extra code bloat. comments? From pf at artcom-gmbh.de Sun Apr 2 14:49:26 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Sun, 2 Apr 2000 14:49:26 +0200 (MEST) Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 1, 2000 12: 0: 0 pm" Message-ID: Hi! Guido van Rossum on april 1st: [...] > With the recent release of Python 1.6 alpha 1, a lot of people have > been wondering what's new. This short note aims to explain the major > changes in Python 1.6. [...] > Python strings can now be stored as Unicode strings. To make it easier > to type Unicode strings, the single-quote character defaults to creating -------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > a Unicode string, while the double-quote character defaults to ASCII ----^^^^^^^^^^^^^^ > strings. As I read this my first thoughts were: "Huh? Is that really true? To me this sounds like a april fools joke. But to be careful I checked first before I read on: pf at artcom0:ttyp4 ~/archiv/freeware/python/CVS_01_04_00/dist/src 41> ./python Python 1.6a1 (#2, Apr 1 2000, 19:19:18) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 'a' 'a' >>> '?' '\344' >>> u'?' u'\344' Since www.python.org happens to be down at that moment, I was unable to check, whether my CVS tarball I downloaded from Davids starship account was recent enough and whether this single-quote-defaults-to-unicode has been discussed earlier before I got subscribed to python-dev. Better I should have read on first, before starting to wonder... [...] > tokens = "foo bar baz".split(" ") > Or, equivalently, this: > tokens = " ".split("foo bar baz") > > (Python figures out which string is the delimiter and which is the > string to split by examining both strings to see which one occurs more > frequently inside the other.) Now it becomes clearer that this *must* be an april fools joke! ;-) : >>> tokens = "foo bar baz".split(" ") >>> print tokens ['foo', 'bar', 'baz'] >>> tokens = " ".split("foo bar baz") >>> print tokens [' '] [...] > Note that use of any string method on a particular string renders it > mutable. [...] > For consistency with C and C++, > asterisks in the function signature become ampersands in the function > body: [...] > load modules via HTTP from a known URL. [...] > This has allowed us to drop most of the standard library from the > distribution... [...] Pheeew... Oh Well. And pigs can fly. Sigh! ;-) That was a well prepared April fools joke! Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From tismer at tismer.com Sun Apr 2 15:53:12 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 02 Apr 2000 15:53:12 +0200 Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) References: Message-ID: <38E750C8.A559DF19@tismer.com> Peter Funk wrote: > > Hi! > > Guido van Rossum on april 1st: [turns into a Perli for a moment - well done! ] ... > Since www.python.org happens to be down at that moment, I was unable to check, > whether my CVS tarball I downloaded from Davids starship account > was recent enough and whether this single-quote-defaults-to-unicode > has been discussed earlier before I got subscribed to python-dev. Better > I should have read on first, before starting to wonder... You should not give up when python.org is down. As a fallback, I used to use www.cwi.nl which appears to be quite up-to-date. You can find the files and the *true* change list at http://www.cwi.nl/www.python.org/1.6/ Note that today is April 2, so you may believe me at-least-not-less-than-usually - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF we're tired of banana software - shipped green, ripens at home From fdrake at acm.org Sun Apr 2 22:34:39 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Sun, 2 Apr 2000 16:34:39 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <004701bf9c7e$a5045480$34aab5d4@hagrid> References: <004701bf9c7e$a5045480$34aab5d4@hagrid> Message-ID: <14567.44767.357265.167396@seahag.cnri.reston.va.us> Fredrik Lundh writes: > 1. completely get rid of regex (bjorn would love that, > don't you think?) The regex module has been documented as obsolete for a while now. Just leave the module alone and will disappear in time. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Mon Apr 3 00:11:02 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 00:11:02 +0200 Subject: [Python-Dev] string.ato? and Unicode References: Message-ID: <38E7C576.5D3530E4@lemburg.com> Mark Hammond wrote: > > Is this an over-sight, or by design? > > >>> string.atoi(u"1") > ... > TypeError: argument 1: expected string, unicode found Probably an oversight... and it may well not be the only one: there are many explicit string checks in the code which might need to be fixed for Unicode support. As for string.ato? I'm not sure: these functions are obsoleted by int(), float() and long(). > It appears easy to support Unicode - there is already an explicit > StringType check in these functions, and it simply delegates to > int(), which already _does_ work for Unicode Right. I fixed the above three APIs to support Unicode. > A patch would leave the following behaviour: > >>> string.atio(u"1") > 1 > >>> string.atio(u"1", 16) > ... > TypeError: can't convert non-string with explicit base > > IMO, this is better than what we have now. I'll put together a > patch if one is wanted... BTW, the code in string.py for atoi() et al. looks really complicated: """ def atoi(*args): """atoi(s [,base]) -> int Return the integer represented by the string s in the given base, which defaults to 10. The string s must consist of one or more digits, possibly preceded by a sign. If base is 0, it is chosen from the leading characters of s, 0 for octal, 0x or 0X for hexadecimal. If base is 16, a preceding 0x or 0X is accepted. """ try: s = args[0] except IndexError: raise TypeError('function requires at least 1 argument: %d given' % len(args)) # Don't catch type error resulting from too many arguments to int(). The # error message isn't compatible but the error type is, and this function # is complicated enough already. if type(s) == _StringType: return _apply(_int, args) else: raise TypeError('argument 1: expected string, %s found' % type(s).__name__) """ Why not simply... def atoi(s, base=10): return int(s, base) dito for atol() and atof()... ?! This would not only give us better performance, but also Unicode support for free. (I'll fix int() and long() to accept Unicode when using an explicit base too.) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Mon Apr 3 11:44:52 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 02:44:52 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <000c01bf9c6b$428af560$752d153f@tim> Message-ID: On Sun, 2 Apr 2000, Tim Peters wrote: > > The Windows installer is always hard to get just right. ... > > ... > > I'd love to hear that it also installs cleanly on Windows 95. Please > > test IDLE from the start menu! > > All worked without incident for me under Win95. Nice! Would still prefer > that it install to D:\Python-1.6\ by default, though (instead of burying it > under "Program Files" -- if you're not on the Help list, you can't believe > how hard it is to explain how to deal with embedded spaces in paths). Ack! No way... Keep my top-level clean! :-) This is Windows. Apps go into Program Files. That is Just The Way It Is. When was the last time you saw /python on a Unix box? Never? Always in .../bin/? Thought so. Cheers, -g -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Mon Apr 3 11:55:53 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 3 Apr 2000 11:55:53 +0200 Subject: [Python-Dev] Windows installer pre-prelease References: Message-ID: <004f01bf9d52$ce40de20$34aab5d4@hagrid> Greg Stein wrote: > > All worked without incident for me under Win95. Nice! Would still prefer > > that it install to D:\Python-1.6\ by default, though (instead of burying it > > under "Program Files" -- if you're not on the Help list, you can't believe > > how hard it is to explain how to deal with embedded spaces in paths). > > Ack! No way... Keep my top-level clean! :-) > > This is Windows. Apps go into Program Files. That is Just The Way It Is. if you're on a US windows box, sure. but "Program Files" isn't exactly an international standard... we install our python distribution under the \py, and we get lot of positive responses. as far as I remember, nobody has ever reported problems setting up the path... > When was the last time you saw /python on a Unix box? Never? Always in > .../bin/? Thought so. if the Unix designers had come up with the bright idea of translating "bin" to "whatever might seem to make sense in this language", I think you'd see many more non-std in- stallations under Unix... especially if they'd made the root directory writable to everyone :-) From gstein at lyra.org Mon Apr 3 12:08:54 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:08:54 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <004f01bf9d52$ce40de20$34aab5d4@hagrid> Message-ID: On Mon, 3 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > > All worked without incident for me under Win95. Nice! Would still prefer > > > that it install to D:\Python-1.6\ by default, though (instead of burying it > > > under "Program Files" -- if you're not on the Help list, you can't believe > > > how hard it is to explain how to deal with embedded spaces in paths). > > > > Ack! No way... Keep my top-level clean! :-) > > > > This is Windows. Apps go into Program Files. That is Just The Way It Is. > > if you're on a US windows box, sure. but "Program Files" > isn't exactly an international standard... Yes it is... if you use the appropriate Windows APIs (or registry... forget where). Windows specifies a way to get the localized name for Program Files. > we install our python distribution under the \py, > and we get lot of positive responses. as far as I remember, > nobody has ever reported problems setting up the path... *shrug* This doesn't dispute the standard Windows recommendation to install software into Program Files. > > When was the last time you saw /python on a Unix box? Never? Always in > > .../bin/? Thought so. > > if the Unix designers had come up with the bright idea of > translating "bin" to "whatever might seem to make sense > in this language", I think you'd see many more non-std in- > stallations under Unix... especially if they'd made the root > directory writable to everyone :-) heh :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Mon Apr 3 12:18:30 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:18:30 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: Message-ID: I don't recall the termination of the discussion, but I don't know that consensus was ever reached. Personally, I find this of little value over the similar (not exact) code: def supplement(dict, extra): d = extra.copy() d.update(dict) return d If the dictionary needs to be modified in place, then the loop from your UserDict.supplement would be used. Another view: why keep adding methods to service all possible needs? Cheers, -g On Mon, 3 Apr 2000, Peter Funk wrote: > Dear Python patcher! > > Please consider to apply the patch appended below and commit into the CVS tree. > It applies to: Python 1.6a1 as released on april 1st. > --=-- argument: --=--=--=--=--=--=--=--=--=--=-->8--=- > This patch adds a new method to dictionary and UserDict objects: > '.supplement()' is a "sibling" of '.update()', but it add only > those items that are not already there instead of replacing them. > > This idea has been discussed on python-dev last month. > --=-- obligatory disclaimer: -=--=--=--=--=--=-->8--=- > I confirm that, to the best of my knowledge and belief, this > contribution is free of any claims of third parties under > copyright, patent or other rights or interests ("claims"). To > the extent that I have any such claims, I hereby grant to CNRI a > nonexclusive, irrevocable, royalty-free, worldwide license to > reproduce, distribute, perform and/or display publicly, prepare > derivative versions, and otherwise use this contribution as part > of the Python software and its related documentation, or any > derivative versions thereof, at no cost to CNRI or its licensed > users, and to authorize others to do so. > > I acknowledge that CNRI may, at its sole discretion, decide > whether or not to incorporate this contribution in the Python > software and its related documentation. I further grant CNRI > permission to use my name and other identifying information > provided to CNRI by me for use in connection with the Python > software and its related documentation. > --=-- dry signature: =--=--=--=--=--=--=--=--=-->8--=- > Regards, Peter > -- > Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 > office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) > --=-- patch: --=--=--=--=--=--=--=--=--=--=--=-->8--=- > *** ../../cvs_01_04_00_orig/dist/src/Objects/dictobject.c Fri Mar 31 11:45:02 2000 > --- src/Objects/dictobject.c Mon Apr 3 10:30:11 2000 > *************** > *** 734,739 **** > --- 734,781 ---- > } > > static PyObject * > + dict_supplement(mp, args) > + register dictobject *mp; > + PyObject *args; > + { > + register int i; > + dictobject *other; > + dictentry *entry, *oldentry; > + if (!PyArg_Parse(args, "O!", &PyDict_Type, &other)) > + return NULL; > + if (other == mp) > + goto done; /* a.supplement(a); nothing to do */ > + /* Do one big resize at the start, rather than incrementally > + resizing as we insert new items. Expect that there will be > + no (or few) overlapping keys. */ > + if ((mp->ma_fill + other->ma_used)*3 >= mp->ma_size*2) { > + if (dictresize(mp, (mp->ma_used + other->ma_used)*3/2) != 0) > + return NULL; > + } > + for (i = 0; i < other->ma_size; i++) { > + entry = &other->ma_table[i]; > + if (entry->me_value != NULL) { > + oldentry = lookdict(mp, entry->me_key, entry->me_hash); > + if (oldentry->me_value == NULL) { > + /* TODO: optimize: > + 'insertdict' does another call to 'lookdict'. > + But for sake of readability and symmetry with > + 'dict_update' I didn't tried to avoid this. > + At least not now as we go into 1.6 alpha. */ > + Py_INCREF(entry->me_key); > + Py_INCREF(entry->me_value); > + insertdict(mp, entry->me_key, entry->me_hash, > + entry->me_value); > + } > + } > + } > + done: > + Py_INCREF(Py_None); > + return Py_None; > + } > + > + > + static PyObject * > dict_copy(mp, args) > register dictobject *mp; > PyObject *args; > *************** > *** 1045,1050 **** > --- 1087,1093 ---- > {"clear", (PyCFunction)dict_clear}, > {"copy", (PyCFunction)dict_copy}, > {"get", (PyCFunction)dict_get, METH_VARARGS}, > + {"supplement", (PyCFunction)dict_supplement}, > {NULL, NULL} /* sentinel */ > }; > > *** ../../cvs_01_04_00_orig/dist/src/Lib/test/test_types.py Wed Feb 23 23:23:17 2000 > --- src/Lib/test/test_types.py Mon Apr 3 10:41:53 2000 > *************** > *** 242,247 **** > --- 242,250 ---- > d.update({2:20}) > d.update({1:1, 2:2, 3:3}) > if d != {1:1, 2:2, 3:3}: raise TestFailed, 'dict update' > + d.supplement({1:"not", 2:"neither", 4:4}) > + if d != {1:1, 2:2, 3:3, 4:4}: raise TestFailed, 'dict supplement' > + del d[4] > if d.copy() != {1:1, 2:2, 3:3}: raise TestFailed, 'dict copy' > if {}.copy() != {}: raise TestFailed, 'empty dict copy' > # dict.get() > *** ../../cvs_01_04_00_orig/dist/src/Lib/UserDict.py Wed Feb 2 16:10:14 2000 > --- src/Lib/UserDict.py Mon Apr 3 10:45:17 2000 > *************** > *** 32,36 **** > --- 32,45 ---- > else: > for k, v in dict.items(): > self.data[k] = v > + def supplement(self, dict): > + if isinstance(dict, UserDict): > + self.data.supplement(dict.data) > + elif isinstance(dict, type(self.data)): > + self.data.supplement(dict) > + else: > + for k, v in dict.items(): > + if not self.data.has_key(k): > + self.data[k] = v > def get(self, key, failobj=None): > return self.data.get(key, failobj) > *** ../../cvs_01_04_00_orig/dist/src/Lib/test/test_userdict.py Fri Mar 26 16:32:02 1999 > --- src/Lib/test/test_userdict.py Mon Apr 3 10:50:29 2000 > *************** > *** 93,101 **** > --- 93,109 ---- > t.update(u2) > assert t == u2 > > + # Test supplement > + > + t = UserDict(d1) > + t.supplement(u2) > + assert t == u2 > + > # Test get > > for i in u2.keys(): > assert u2.get(i) == u2[i] > assert u1.get(i) == d1.get(i) > assert u0.get(i) == d0.get(i) > + > + # TODO: Add a test using dir({}) to test for unimplemented methods > > _______________________________________________ > Patches mailing list > Patches at python.org > http://www.python.org/mailman/listinfo/patches > -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Mon Apr 3 12:25:05 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 3 Apr 2000 12:25:05 +0200 Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' References: Message-ID: <008b01bf9d57$0555fc20$34aab5d4@hagrid> Greg Stein wrote: > I don't recall the termination of the discussion, but I don't know that > consensus was ever reached. iirc, Ping liked it, but I'm not sure anybody else contributed much to that thread... (and to neutralize Ping, just let me say that I don't like it :-) > Personally, I find this of little value over the similar (not exact) code: > > def supplement(dict, extra): > d = extra.copy() > d.update(dict) > return d has anyone benchmarked this? for some reason, I doubt that the difference between copy/update and supplement is that large... > Another view: why keep adding methods to service all possible needs? exactly. From effbot at telia.com Mon Apr 3 12:31:42 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 3 Apr 2000 12:31:42 +0200 Subject: [Python-Dev] Windows installer pre-prelease References: Message-ID: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Greg Stein wrote: > > we install our python distribution under the \py, > > and we get lot of positive responses. as far as I remember, > > nobody has ever reported problems setting up the path... > > *shrug* This doesn't dispute the standard Windows recommendation to > install software into Program Files. no, but Tim's and my experiences from doing user support show that the standard Windows recommendation doesn't work for command line applications. we don't care about Microsoft, we care about Python's users. to quote a Linus Torvalds, "bad standards _should_ be broken" (after all, Microsoft doesn't put their own command line applications down there -- there's no "\Program Files" [sub]directory in the default PATH, at least not on any of my boxes. maybe they've changed that in Windows 2000?) From gstein at lyra.org Mon Apr 3 12:49:27 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 03:49:27 -0700 (PDT) Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Message-ID: On Mon, 3 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > > we install our python distribution under the \py, > > > and we get lot of positive responses. as far as I remember, > > > nobody has ever reported problems setting up the path... > > > > *shrug* This doesn't dispute the standard Windows recommendation to > > install software into Program Files. > > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. Valid point. But there are other solutions, too. VC distributes a thing named "VCVARS.BAT" to set up paths and other environ vars. Python could certainly do the same thing (to overcome the embedded-space issue). > to quote a Linus Torvalds, "bad standards _should_ be broken" Depends on the audience of that standard. Programmers: yah. Consumers? They just want the damn thing to work like they expect it to. That expectation is usually "I can find my programs in Program Files." > (after all, Microsoft doesn't put their own command line applications > down there -- there's no "\Program Files" [sub]directory in the default > PATH, at least not on any of my boxes. maybe they've changed that > in Windows 2000?) Incorrect. Site Server had command-line tools down there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ajung at sz-sb.de Mon Apr 3 13:17:20 2000 From: ajung at sz-sb.de (Andreas Jung) Date: Mon, 3 Apr 2000 13:17:20 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <200004011740.MAA04675@eric.cnri.reston.va.us>; from guido@python.org on Sat, Apr 01, 2000 at 12:00:00PM -0500 References: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: <20000403131720.A10313@sz-sb.de> On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: > > Python strings can now be stored as Unicode strings. To make it easier > to type Unicode strings, the single-quote character defaults to creating > a Unicode string, while the double-quote character defaults to ASCII > strings. If you need to create a Unicode string with double quotes, > just preface it with the letter "u"; likewise, an ASCII string can be > created by prefacing single quotes with the letter "a". For example: > > foo = 'hello' # Unicode > foo = "hello" # ASCII Is single-quoting for creating unicode clever ? I think there might be a problem with old code when the operations on unicode strings are not 100% compatible to the standard string operations. I don't know if this is a real problem - it's just a point for discussion. Cheers, Andreas From pf at artcom-gmbh.de Mon Apr 3 13:12:25 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 3 Apr 2000 13:12:25 +0200 (MEST) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: <008b01bf9d57$0555fc20$34aab5d4@hagrid> from Fredrik Lundh at "Apr 3, 2000 12:25: 5 pm" Message-ID: Hi! > Greg Stein wrote: > > I don't recall the termination of the discussion, but I don't know that > > consensus was ever reached. > Fredrik Lundh: > iirc, Ping liked it, but I'm not sure anybody else contributed > much to that thread... That was my impression: It is hard to guess what you guys think from mere silence. ;-) > (and to neutralize Ping, just let me say that I don't like it :-) > > > Personally, I find this of little value over the similar (not exact) code: > > > > def supplement(dict, extra): [...] > > Another view: why keep adding methods to service all possible needs? > > exactly. A agree that we should avoid adding new methods all over the place. But IMO this is an exception: I proposed it for the sake of symmetry with 'update'. From my POV 'supplement' relates to 'update' as '+' relates to '-'. YMMV and I will not be angry, if this idea will be finally rejected. But it would have saved me an hour or two of coding and testing time if you had expressed your opinions a little bit earlier. ;-) But I know: you are all busy. To get an impression of possible uses for supplement, I sketch some code here: class MysticMegaWidget(MyMegaWidget): _config = { horizontal_elasticity = 1000, vertical_elasticity = 10, mentalplex_fg_color = "#FF0000", mentalplex_bg_color = "#0000FF", font = "Times", } def __init__(self, *args, **kw): if kw: self._config = kw self._config.supplement(self.__class__._config) .... Of course this can also be implemented using 'copy' and 'update'. It's only slightly more complicated. But you can also emulate any boolean operation using only NAND. Nevertheless any serious programming language contains at least OR, AND, NOT and possibly XOR. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mal at lemburg.com Mon Apr 3 13:48:05 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 13:48:05 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 References: <200004011740.MAA04675@eric.cnri.reston.va.us> <20000403131720.A10313@sz-sb.de> Message-ID: <38E884F5.8F2FB271@lemburg.com> Andreas Jung wrote: > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: The above line has all the answers ;-) ... > > Python strings can now be stored as Unicode strings. To make it easier > > to type Unicode strings, the single-quote character defaults to creating > > a Unicode string, while the double-quote character defaults to ASCII > > strings. If you need to create a Unicode string with double quotes, > > just preface it with the letter "u"; likewise, an ASCII string can be > > created by prefacing single quotes with the letter "a". For example: > > > > foo = 'hello' # Unicode > > foo = "hello" # ASCII > > Is single-quoting for creating unicode clever ? I think there might be a problem > with old code when the operations on unicode strings are not 100% compatible to > the standard string operations. I don't know if this is a real problem - it's > just a point for discussion. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Mon Apr 3 14:22:17 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 3 Apr 2000 22:22:17 +1000 Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <38E884F5.8F2FB271@lemburg.com> Message-ID: > > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van > Rossum wrote: > > The above line has all the answers ;-) ... That was pretty sneaky tho! Had the added twist of being half-true... Mark. From mal at lemburg.com Mon Apr 3 14:59:21 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 14:59:21 +0200 Subject: [Python-Dev] Unicode and numerics Message-ID: <38E895A9.94504851@lemburg.com> I've just posted a new patch set to the patches list which contains better support for Unicode in the int(), long(), float() and complex() builtins. There are some new APIs now which can be used by extension writer to convert from Unicode to integers, floats and longs. These APIs are fully Unicode aware, meaning that you can also pass them any Unicode characters with decimal mappings, not only the standard ASCII '0'-'9' ones. One thing I noticed, which needs some discussion: There are two separate APIs which convert long string literals to long objects: PyNumber_Long() and PyLong_FromString(). The first applies the same error checking as does the PyInt_FromString() API, while the latter does not apply this check... Question is: shouldn't the check for truncated data ("9.5" -> 9L) be moved into PyLong_FromString() ? BTW, should I also post patches to string.py which use the simplified versions for string.ato?() I posted a few days ago ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Apr 3 15:12:58 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 15:12:58 +0200 Subject: [Python-Dev] Re: New Features in Python 1.6 References: Message-ID: <38E898DA.B69D7ED6@lemburg.com> Mark Hammond wrote: > > > > > > On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van > > Rossum wrote: > > > > The above line has all the answers ;-) ... > > That was pretty sneaky tho! Had the added twist of being > half-true... ... and on time like a CRON-job ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Mon Apr 3 16:11:55 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 3 Apr 2000 16:11:55 +0200 (CEST) Subject: [Python-Dev] Suggested PyMem & PyObject_NEW includes (fwd) Message-ID: <200004031411.QAA12486@python.inrialpes.fr> Vladimir Marangozov wrote: From Vladimir.Marangozov at inrialpes.fr Mon Apr 3 16:07:43 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 3 Apr 2000 16:07:43 +0200 (CEST) Subject: Suggested PyMem & PyObject_NEW includes Message-ID: Sorry for the delay -- I simply could't progress on this as I wanted to. Here's the includes I suggest for PyMem and PyObject_New cs. PyMem is okay. Some questions arise with PyObject_NEW. 1) I'm willing to unify the implementation on Windows and Unix (so I'm retaining the Windows variant of _PyObject_New reincarnated by PyObject_FromType -- see the comments in pyobjimpl.h). 2) For the user, there's the principle to use functions if binary compatibility is desired, and macros if she needs to trade compatibility for speed. But there's the issue of allocating the user objects with PyMem, or, allocate the objects with a custom allocator. After scratching my head on how to preserve bin compatibility with old libraries and offer the freedom to the user, I ended with the following (subject to discussion): - Use the functions for bin compat (but have also an exception _PyObject_Del(), with a leading underscore, for the core...) Objects in this case are allocated with PyMem. - Use the macros for allocating the objects with the potentially custom allocator (through malloc, realloc, free -- see below) What do you think? -----------------------------[ mymalloc.h ]--------------------------- ... /* * Core memory allocator * ===================== */ /* To make sure the interpreter is user-malloc friendly, all memory and object APIs are implemented on top of this one. The PyCore_* macros can be changed to make the interpreter use a custom allocator. Note that they are for internal use only. Both the core and extension modules should use the PyMem_* API. */ #define PyCore_MALLOC_FUNC malloc #define PyCore_REALLOC_FUNC realloc #define PyCore_FREE_FUNC free #define PyCore_MALLOC_PROTO Py_PROTO((size_t)) #define PyCore_REALLOC_PROTO Py_PROTO((ANY *, size_t)) #define PyCore_FREE_PROTO Py_PROTO((ANY *)) #define PyCore_MALLOC(n) PyCore_MALLOC_FUNC(n) #define PyCore_REALLOC(p, n) PyCore_REALLOC_FUNC((p), (n)) #define PyCore_FREE(p) PyCore_FREE_FUNC(p) /* The following should never be necessary */ #ifdef NEED_TO_DECLARE_MALLOC_AND_FRIEND extern ANY *PyCore_MALLOC_FUNC PyCore_MALLOC_PROTO; extern ANY *PyCore_REALLOC_FUNC PyCore_REALLOC_PROTO; extern void PyCore_FREE_FUNC PyCore_FREE_PROTO; #endif /* BEWARE: Each interface exports both functions and macros. Extension modules should normally use the functions for ensuring binary compatibility of the user's code across Python versions. Subsequently, if Python switches to its own malloc (different from standard malloc), no recompilation is required for the extensions. The macro versions trade compatibility for speed. They can be used whenever there is a performance problem, but their use implies recompilation of the code for each new Python release. The Python core uses the macros because it *is* compiled on every upgrade. This might not be the case with 3rd party extensions in a custom setup (for example, a customer does not always have access to the source of 3rd party deliverables). You have been warned! */ /* * Raw memory interface * ==================== */ /* Functions */ /* Two sets of function wrappers around malloc and friends; useful if you need to be sure that you are using the same memory allocator as Python. Note that the wrappers make sure that allocating 0 bytes returns a non-NULL pointer, even if the underlying malloc doesn't. */ /* These wrappers around malloc call PyErr_NoMemory() on failure */ extern DL_IMPORT(ANY *) Py_Malloc Py_PROTO((size_t)); extern DL_IMPORT(ANY *) Py_Realloc Py_PROTO((ANY *, size_t)); extern DL_IMPORT(void) Py_Free Py_PROTO((ANY *)); /* These wrappers around malloc *don't* call anything on failure */ extern DL_IMPORT(ANY *) PyMem_Malloc Py_PROTO((size_t)); extern DL_IMPORT(ANY *) PyMem_Realloc Py_PROTO((ANY *, size_t)); extern DL_IMPORT(void) PyMem_Free Py_PROTO((ANY *)); /* Macros */ #define PyMem_MALLOC(n) PyCore_MALLOC(n) #define PyMem_REALLOC(p, n) PyCore_REALLOC((ANY *)(p), (n)) #define PyMem_FREE(p) PyCore_FREE((ANY *)(p)) /* * Type-oriented memory interface * ============================== */ /* Functions */ #define PyMem_New(type, n) \ ( (type *) PyMem_Malloc((n) * sizeof(type)) ) #define PyMem_Resize(p, type, n) \ ( (p) = (type *) PyMem_Realloc((n) * sizeof(type)) ) #define PyMem_Del(p) PyMem_Free(p) /* Macros */ #define PyMem_NEW(type, n) \ ( (type *) PyMem_MALLOC(_PyMem_EXTRA + (n) * sizeof(type)) ) #define PyMem_RESIZE(p, type, n) \ if ((p) == NULL) \ (p) = (type *) PyMem_MALLOC( \ _PyMem_EXTRA + (n) * sizeof(type)); \ else \ (p) = (type *) PyMem_REALLOC((p), \ _PyMem_EXTRA + (n) * sizeof(type)) #define PyMem_DEL(p) PyMem_FREE(p) /* PyMem_XDEL is deprecated. To avoid the call when p is NULL, it's recommended to write the test explicitely in the code. Note that according to ANSI C, free(NULL) has no effect. */ #define PyMem_XDEL(p) if ((p) == NULL) ; else PyMem_DEL(p) ... -----------------------------[ mymalloc.h ]--------------------------- ... /* Functions and macros for modules that implement new object types. You must first include "object.h". PyObject_New(type, typeobj) allocates memory for a new object of the given type; here 'type' must be the C structure type used to represent the object and 'typeobj' the address of the corresponding type object. Reference count and type pointer are filled in; the rest of the bytes of the object are *undefined*! The resulting expression type is 'type *'. The size of the object is actually determined by the tp_basicsize field of the type object. PyObject_NewVar(type, typeobj, n) is similar but allocates a variable-size object with n extra items. The size is computed as tp_basicsize plus n * tp_itemsize. This fills in the ob_size field as well. PyObject_Del(op) releases the memory allocated for an object. Two versions of the object constructors/destructors are provided: 1) PyObject_{New, NewVar, Del} delegate the allocation of the objects to the Python allocator which places them within the bounds of the Python heap. This way, Python keeps control on the user's objects regarding their memory management; for instance, they may be subject to automatic garbage collection, once their reference count drops to zero. Binary compatibility is preserved and there's no need to recompile the extension every time a new Python release comes out. 2) PyObject_{NEW, NEW_VAR, DEL} use the allocator of the extension module which *may* differ from the one used by the Python library. Typically, in a C++ module one may wish to redefine the default allocation strategy by overloading the operators new and del. In this case, however, the extension does not cooperate with the Python memory manager. The latter has no control on the user's objects as they won't be allocated within the Python heap. Therefore, automatic garbage collection may not be performed, binary compatibility is not guaranteed and recompilation is required on every new Python release. Unless a specific memory management is needed, it's recommended to use 1). */ /* In pre-Python-1.6 times, only the PyObject_{NEW, NEW_VAR} macros were defined in terms of internal functions _PyObject_{New, NewVar}, the implementation of which used to differ for Windows and non-Windows platforms (see object.c -- these functions are left for backwards compatibility with old libraries). Starting from 1.6, an unified interface was introduced for both 1) & 2) */ extern DL_IMPORT(PyObject *) PyObject_FromType Py_PROTO((PyTypeObject *, PyObject *)); extern DL_IMPORT(PyVarObject *) PyObject_VarFromType Py_PROTO((PyTypeObject *, int, PyVarObject *)); extern DL_IMPORT(void) PyObject_Del Py_PROTO((PyObject *)); /* Functions */ #define PyObject_New(type, typeobj) \ ((type *) PyObject_FromType(typeobj, NULL)) #define PyObject_NewVar(type, typeobj, n) \ ((type *) PyObject_VarFromType((typeobj), (n), NULL)) #define PyObject_Del(op) PyObject_Del((PyObject *)(op)) /* XXX This trades binary compatibility for speed. */ #include "mymalloc.h" #define _PyObject_Del(op) PyMem_FREE((PyObject *)(op)) /* Macros */ #define PyObject_NEW(type, typeobj) \ ((type *) PyObject_FromType(typeobj, \ (PyObject *) malloc((typeobj)->tp_basicsize))) #define PyObject_NEW_VAR(type, typeobj, n) \ ((type *) PyObject_VarFromType(typeobj, \ (PyVarObject *) malloc((typeobj)->tp_basicsize + \ n * (typeobj)->tp_itemsize))) #define PyObject_DEL(op) free(op) ---------------------------------------------------------------------- So with this, I'm planning to "give the example" by renaming everywhere in the distrib PyObject_NEW with PyObject_New, but use for the core _PyObject_Del instead of PyObject_Del. I'll use PyObject_Del for the objects defined in extension modules. The point is that I don't want to define PyObject_Del in terms of PyMem_FREE (or define PyObject_New in terms of PyMem_MALLOC) as this would break the principle of binary compatibility when the Python allocator is changed to a custom malloc from one build to another. OTOH, I don't like the underscore... Do you have a better suggestion? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Mon Apr 3 16:50:25 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 16:50:25 +0200 Subject: [Python-Dev] Re: Unicode and numerics References: <38E895A9.94504851@lemburg.com> Message-ID: <38E8AFB1.9798186E@lemburg.com> "M.-A. Lemburg" wrote: > > BTW, should I also post patches to string.py which use the > simplified versions for string.ato?() I posted a few days ago ? I've just added these to the patch set... they no longer use the same error string, but the error type still is the same when e.g. string.atoi() is called with a non-string. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Apr 3 18:04:02 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 12:04:02 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: Your message of "Sat, 01 Apr 2000 12:00:00 EST." <200004011740.MAA04675@eric.cnri.reston.va.us> References: <200004011740.MAA04675@eric.cnri.reston.va.us> Message-ID: <200004031604.MAA05283@eric.cnri.reston.va.us> Not only was it an April fool's joke, but it wasn't mine! It was forged by an insider. I know by who, but won't tell, because it was so good. It shows that I can trust to delegate way more to the Python community than I think I can! :-) BTW, the biggest give-away that it wasn't mine was the absence of my standard sign-off line: --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at cnri.reston.va.us Mon Apr 3 18:36:24 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 3 Apr 2000 12:36:24 -0400 (EDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' In-Reply-To: References: Message-ID: <14568.51336.811523.937351@bitdiddle.cnri.reston.va.us> I agree with Greg. Jeremy From bwarsaw at cnri.reston.va.us Mon Apr 3 19:20:19 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Mon, 3 Apr 2000 13:20:19 -0400 (EDT) Subject: [Python-Dev] Re: [Patches] [1.6] dictionary objects: new method 'supplement' References: <14568.51336.811523.937351@bitdiddle.cnri.reston.va.us> Message-ID: <14568.53971.777162.624760@anthem.cnri.reston.va.us> -0 on dict.supplement(), not the least because I'll always missspell it :) -Barry From pf at artcom-gmbh.de Mon Apr 3 20:01:50 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 3 Apr 2000 20:01:50 +0200 (MEST) Subject: [Python-Dev] {}.supplement() -- poll results so far Message-ID: Look's like I should better forget my proposal to add a new method '.supplement()' to dictionaries, which should do the opposite of the already available method '.update()'. I summarize in cronological order: Ka-Ping Yee: +1 Fred Drake: +0 Greg Stein: -1 Fredrik Lundh: -1 Jeremy Hylton: -1 Barry Warsaw: -0 Are there other opinions which may change the picture? <0.1 wink> Regards, Peter From gstein at lyra.org Mon Apr 3 20:31:33 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 11:31:33 -0700 (PDT) Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Message-ID: On Mon, 3 Apr 2000, Peter Funk wrote: > Look's like I should better forget my proposal to add a new method > '.supplement()' to dictionaries, which should do the opposite of > the already available method '.update()'. > I summarize in cronological order: > > Ka-Ping Yee: +1 > Fred Drake: +0 > Greg Stein: -1 > Fredrik Lundh: -1 > Jeremy Hylton: -1 > Barry Warsaw: -0 > > Are there other opinions which may change the picture? <0.1 wink> Guido's :-) -- Greg Stein, http://www.lyra.org/ From effbot at telia.com Mon Apr 3 21:40:00 2000 From: effbot at telia.com (Fredrik Lundh) Date: Mon, 3 Apr 2000 21:40:00 +0200 Subject: [Python-Dev] unicode: strange exception Message-ID: <020701bf9da4$670d8580$34aab5d4@hagrid> >>> "!" in ("a", None) 0 >>> u"!" in ("a", None) Traceback (innermost last): File "", line 1, in ? TypeError: expected a character buffer object From guido at python.org Mon Apr 3 21:48:25 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 15:48:25 -0400 Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Your message of "Mon, 03 Apr 2000 11:31:33 PDT." References: Message-ID: <200004031948.PAA05532@eric.cnri.reston.va.us> > On Mon, 3 Apr 2000, Peter Funk wrote: > > Look's like I should better forget my proposal to add a new method > > '.supplement()' to dictionaries, which should do the opposite of > > the already available method '.update()'. > > I summarize in cronological order: > > > > Ka-Ping Yee: +1 > > Fred Drake: +0 > > Greg Stein: -1 > > Fredrik Lundh: -1 > > Jeremy Hylton: -1 > > Barry Warsaw: -0 > > > > Are there other opinions which may change the picture? <0.1 wink> > > Guido's :-) If I have to, it's a -1. I personally wouldn't be able to remember which one was update() and which one was supplement(). --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 3 21:57:26 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 12:57:26 -0700 (PDT) Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: <200004031948.PAA05532@eric.cnri.reston.va.us> Message-ID: On Mon, 3 Apr 2000, Guido van Rossum wrote: > > On Mon, 3 Apr 2000, Peter Funk wrote: > > > Look's like I should better forget my proposal to add a new method > > > '.supplement()' to dictionaries, which should do the opposite of > > > the already available method '.update()'. > > > I summarize in cronological order: > > > > > > Ka-Ping Yee: +1 > > > Fred Drake: +0 > > > Greg Stein: -1 > > > Fredrik Lundh: -1 > > > Jeremy Hylton: -1 > > > Barry Warsaw: -0 > > > > > > Are there other opinions which may change the picture? <0.1 wink> > > > > Guido's :-) > > If I have to, it's a -1. You don't have to, but yours *is* the only one that counts. Ours are "merely advisory" ;-) hehe... Cheers, -g -- Greg Stein, http://www.lyra.org/ From gward at cnri.reston.va.us Mon Apr 3 22:56:21 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Mon, 3 Apr 2000 16:56:21 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <200004031604.MAA05283@eric.cnri.reston.va.us>; from guido@python.org on Mon, Apr 03, 2000 at 12:04:02PM -0400 References: <200004011740.MAA04675@eric.cnri.reston.va.us> <200004031604.MAA05283@eric.cnri.reston.va.us> Message-ID: <20000403165621.A9955@cnri.reston.va.us> On 03 April 2000, Guido van Rossum said: > Not only was it an April fool's joke, but it wasn't mine! It was > forged by an insider. I know by who, but won't tell, because it was > so good. It shows that I can trust to delegate way more to the Python > community than I think I can! :-) > > BTW, the biggest give-away that it wasn't mine was the absence of my > standard sign-off line: > > --Guido van Rossum (home page: http://www.python.org/~guido/) D'ohhh!!! Hasn't anyone noticed that the largest amount of text in the joke feature list was devoted to the Distutils? I thought *that* would give it away "fer shure". You people are *so* gullible! ;-) And for my next trick... *poof*! Greg From mal at lemburg.com Mon Apr 3 23:45:20 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 03 Apr 2000 23:45:20 +0200 Subject: [Python-Dev] unicode: strange exception References: <020701bf9da4$670d8580$34aab5d4@hagrid> Message-ID: <38E910F0.5EB00566@lemburg.com> Fredrik Lundh wrote: > > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (innermost last): > File "", line 1, in ? > TypeError: expected a character buffer object Good catch. The same happens when you try to compare Unicode and a different non-string type: >>> '1' == None 0 >>> u'1' == None Traceback (most recent call last): File "", line 1, in ? TypeError: expected a character buffer object The reason is the same in both cases: failing auto-coercion. I will send a patch for this tomorrow. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Tue Apr 4 01:11:13 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 4 Apr 2000 09:11:13 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. Message-ID: The 1.6a1 installer on Windows copies Python16.dll into the Python directory, rather than the system32 directory like 1.5.x. We discussed too long ago on this list not why this was probably not going to work. I guess Guido decided to "suck it and see" - which is fine. But guess what - it doesnt work :-( I couldnt get past the installer! The win32all installer executes some Python code at the end of the install (to generate the .pyc files and install the COM objects). This Python code is executed directly to the installation .EXE, by loading and executing a "shim" DLL I wrote for the purpose. Problem is, try as I might, my shim DLL could not load Python16.dll. The shim DLL _was_ in the same directory as Python16.dll. The only way I could have solved it was to insist the WISE installation .EXE be run from the main Python directory - obviously not an option. And the problem is quite obviously going to exist with COM objects. The problem would appear to go away if the universe switched over the LoadLibraryEx() - but we dont have that control in most cases (eg, COM, WISE etc dictate this to us). So, my solution was to copy Python16.dll to the system directory during win32all installation. This results in duplicate copies of this DLL, so to my mind, it is preferable that Python itself go back to using the System32 directory. The problem this will lead to is that Python 1.6.0 and 1.6.1 will not be able to be installed concurrently. Putting entries on the PATH doesnt solve the underlying problem - you will only be able to have one Python 1.6 directory on your path, else you end up with the same coflicts for the DLL. I dont see any better answer than System32 :-( Thoughts? Mark. From gstein at lyra.org Tue Apr 4 02:32:12 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 17:32:12 -0700 (PDT) Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: On Tue, 4 Apr 2000, Mark Hammond wrote: >... > The problem this will lead to is that Python 1.6.0 and 1.6.1 will > not be able to be installed concurrently. Same thing happened with Python 1.5, so we're no worse off. If we do want this behavior, then we need to add another version digit... > Putting entries on the > PATH doesnt solve the underlying problem - you will only be able to > have one Python 1.6 directory on your path, else you end up with the > same coflicts for the DLL. > > I dont see any better answer than System32 :-( Thoughts? I don't have a better answer, as you and I explained on several occasions. Dunno why Guido decided to skip our recommendations, but hey... it happens :-). IMO, put the DLL back into System32. If somebody can *demonstrate* (not hypothesize) a mechanism that works, then it can be switched. The underlying issue is this: Python16.dll in the app directory works for Python as an executable. However, it completely disables any possibility for *embedding* Python. On Windows, embedding is practically required because of the COM stuff (sure... a person could avoid COM but...). Cheers, -g -- Greg Stein, http://www.lyra.org/ From nascheme at enme.ucalgary.ca Tue Apr 4 03:38:41 2000 From: nascheme at enme.ucalgary.ca (Neil Schemenauer) Date: 4 Apr 2000 01:38:41 -0000 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <20000403165621.A9955@cnri.reston.va.us> References: <200004011740.MAA04675@eric.cnri.reston.va.us> <200004031604.MAA05283@eric.cnri.reston.va.us> <20000403165621.A9955@cnri.reston.va.us> Message-ID: <20000404013841.15629.qmail@cranky.arctrix.com> In comp.lang.python, you wrote: >You people are *so* gullible! ;-) Well done. You had me going for a while. You had just enough truth in there. Guido releasing the alpha at that time helped your cause as well. Neil -- Tact is the ability to tell a man he has an open mind when he has a hole in his head. From guido at python.org Tue Apr 4 04:52:52 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 03 Apr 2000 22:52:52 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." References: Message-ID: <200004040252.WAA06637@eric.cnri.reston.va.us> > > The problem this will lead to is that Python 1.6.0 and 1.6.1 will > > not be able to be installed concurrently. > > Same thing happened with Python 1.5, so we're no worse off. If we do want > this behavior, then we need to add another version digit... Actually, I don't plan on releasing a 1.6.1. The next one will be 1.7. Of course, alpha and beta versions for 1.6 won't be able to live along, but I can live with that. > > Putting entries on the > > PATH doesnt solve the underlying problem - you will only be able to > > have one Python 1.6 directory on your path, else you end up with the > > same coflicts for the DLL. > > > > I dont see any better answer than System32 :-( Thoughts? > > I don't have a better answer, as you and I explained on several occasions. > Dunno why Guido decided to skip our recommendations, but hey... it > happens :-). Actually, I just wanted to get the discussion started. It worked. :-) I'm waiting for Tim Peters' response in this thread -- if I recall he was the one who said that python1x.dll should not go into the system directory. Note that I've made it easy to switch: the WISE script defines a separate variable DLLDEST which is currently set to MAINDIR, but which I could easily change to SYS32 to get the semantics you prefer. Hey, we could even give the user a choice here! <0.4 wink> > IMO, put the DLL back into System32. If somebody can *demonstrate* (not > hypothesize) a mechanism that works, then it can be switched. > > The underlying issue is this: Python16.dll in the app directory works for > Python as an executable. However, it completely disables any possibility > for *embedding* Python. On Windows, embedding is practically required > because of the COM stuff (sure... a person could avoid COM but...). Yes, I know this. I'm just not happy with it, and I've definitely heard people complain that it is evil to install directories in the system directory. Seems there are different schools of thought... Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into the system directory. I will now be distributing with the VC++ 6.0 servicepack 1 versions of these files. Won't this be a problem for installations that already have an older version? (Now that I think of it, this is another reason why I decided that at least the alpha release should install everything in MAINDIR -- to limit the damage. Any informed opinions?) David Ascher: if you're listening, could you forward this to someone at ActiveState who might understand the issues here? They should have the same problems with ActivePerl, right? Or don't they have COM support? (Personally, I think that it wouldn't be so bad if we made it so that if you install just Python, the DLLs go into MAINDIR -- if you install the COM support, it can move/copy them to the system directory. But you may find this inelegant...) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Tue Apr 4 05:11:33 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 3 Apr 2000 20:11:33 -0700 (PDT) Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> Message-ID: On Mon, 3 Apr 2000, Guido van Rossum wrote: >... > Actually, I just wanted to get the discussion started. It worked. :-) hehe. True :-) > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. What's his physical address again? I have this nice little package to send him... >... > > IMO, put the DLL back into System32. If somebody can *demonstrate* (not > > hypothesize) a mechanism that works, then it can be switched. > > > > The underlying issue is this: Python16.dll in the app directory works for > > Python as an executable. However, it completely disables any possibility > > for *embedding* Python. On Windows, embedding is practically required > > because of the COM stuff (sure... a person could avoid COM but...). > > Yes, I know this. I'm just not happy with it, and I've definitely > heard people complain that it is evil to install directories in the > system directory. Seems there are different schools of thought... It is evil, but it is also unavoidable. The alternative is to munge the PATH variable, but that is a Higher Evil than just dropping DLLs into the system directory. > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > the system directory. I will now be distributing with the VC++ 6.0 > servicepack 1 versions of these files. Won't this be a problem for > installations that already have an older version? Not at all. In fact, Microsoft explicitly recommends including those in the distribution and installing them over the top of *previous* versions. They should never be downgraded (i.e. always check their version stamp!), but they should *always* be upgraded. Microsoft takes phenomenal pains to ensure that OLD applications are compatible with NEW runtimes. It is certainly possible that you could have a new app was built against a new runtime, and breaks when used against an old runtime. But that is why you always upgrade :-) And note that I do mean phenomenal pains. It is one of their ship requirements that you can always drop in a new RT without breaking old apps. So: regardless of where you decide to put python16.dll, you really should be upgrading the RT DLLs. > David Ascher: if you're listening, could you forward this to someone > at ActiveState who might understand the issues here? They should have > the same problems with ActivePerl, right? Or don't they have COM > support? ActivePerl does COM, but I dunno much more than that. > (Personally, I think that it wouldn't be so bad if we made it so that > if you install just Python, the DLLs go into MAINDIR -- if you install > the COM support, it can move/copy them to the system directory. But > you may find this inelegant...) Eek. Now you're talking about one guy reaching into another installation and munging it around. Especially for a move (boy, would that throw off the uninstall!). If you copied, then it is possible to have *two* copies of the DLL loaded into a process. The primary key is the pathname. I've had two pythoncom DLLs loaded in a process, and boy does that suck! The bugs are quite interesting, to say the least :-) And a total bear to track down until you have seen the double-load several times and can start to recognize the effects. In other words, moving is bad for elegance/uninstall reasons, and copy is bad for (potential) runtime reasons. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tim_one at email.msn.com Tue Apr 4 06:28:54 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:54 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: <000201bf9dee$49638760$162d153f@tim> [Mark Hammond] > The 1.6a1 installer on Windows copies Python16.dll into the Python > directory, rather than the system32 directory like 1.5.x. We > discussed too long ago on this list not why this was probably not > going to work. I guess Guido decided to "suck it and see" - which > is fine. > > But guess what - it doesnt work :-( > ... > I dont see any better answer than System32 :-( Thoughts? Same as yours! Guido went off and innovated here -- always a bad sign . OTOH, I've got no use for "Program Files" -- make the cmdline version easy to use too. From tim_one at email.msn.com Tue Apr 4 06:28:59 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:59 -0400 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: Message-ID: <000401bf9dee$4bf9c2a0$162d153f@tim> [/F] > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. [Greg Stein] > Valid point. But there are other solutions, too. VC distributes a thing > named "VCVARS.BAT" to set up paths and other environ vars. Python could > certainly do the same thing (to overcome the embedded-space issue). And put the .bat file where, exactly? In the Python root, somewhere under "Program Files"? Begs the question. MS doesn't want you to put stuff in System32 either, but it's the only rational place to put the DLL. Likewise the only rational place to put the cmdline EXE is in an easy-to-get-at directory. If C:\Quickenw\ is good enough for the best-selling non-MS Windows app, C:\Python-1.6\ is good enough for Python . Besides, it's a *default*. If you love MS guidelines and are savvy enough to know what the heck they are, you're savvy enough to install it under "Program Files" yourself. The people we're trying to help here have scant idea what they're doing, and dealing with the embedded space drives them nuts at the very start of their experience. Other languages understand this. For example, here are pieces of the PATH on my machine: C:\PERL5\BIN D:\JDK1.1.5\BIN C:\WINICON\BIN E:\OCAML\BIN From tim_one at email.msn.com Tue Apr 4 06:28:56 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:56 -0400 Subject: [Python-Dev] {}.supplement() -- poll results so far In-Reply-To: Message-ID: <000301bf9dee$4acba2e0$162d153f@tim> [Peter Funk] > Look's like I should better forget my proposal to add a new method > '.supplement()' to dictionaries, which should do the opposite of > the already available method '.update()'. > I summarize in cronological order: > > Ka-Ping Yee: +1 > Fred Drake: +0 > Greg Stein: -1 > Fredrik Lundh: -1 > Jeremy Hylton: -1 > Barry Warsaw: -0 > > Are there other opinions which may change the picture? <0.1 wink> -1 on dict.supplement(), -0 on an optional arg to dict.update(), dict.update(otherdict, overwrite=1) From guido at python.org Tue Apr 4 07:25:26 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 01:25:26 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Mon, 03 Apr 2000 20:11:33 PDT." References: Message-ID: <200004040525.BAA11585@eric.cnri.reston.va.us> > What's his physical address again? I have this nice little package to send > him... Now, now, you don't want to sound like Ted Kazinsky, do you? :-) > It is evil, but it is also unavoidable. The alternative is to munge the > PATH variable, but that is a Higher Evil than just dropping DLLs into the > system directory. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > the system directory. I will now be distributing with the VC++ 6.0 > > servicepack 1 versions of these files. Won't this be a problem for > > installations that already have an older version? > > Not at all. In fact, Microsoft explicitly recommends including those in > the distribution and installing them over the top of *previous* versions. > They should never be downgraded (i.e. always check their version stamp!), > but they should *always* be upgraded. > > Microsoft takes phenomenal pains to ensure that OLD applications are > compatible with NEW runtimes. It is certainly possible that you could have > a new app was built against a new runtime, and breaks when used against an > old runtime. But that is why you always upgrade :-) > > And note that I do mean phenomenal pains. It is one of their ship > requirements that you can always drop in a new RT without breaking old > apps. > > So: regardless of where you decide to put python16.dll, you really should > be upgrading the RT DLLs. OK. That means I need two separate variables: where to install the MS DLLs and where to install the Py DLLs. > > David Ascher: if you're listening, could you forward this to someone > > at ActiveState who might understand the issues here? They should have > > the same problems with ActivePerl, right? Or don't they have COM > > support? > > ActivePerl does COM, but I dunno much more than that. I just downloaded and installed it. I've never seen an installer like this -- they definitely put a lot of effort in it. Annoying nit: they tell you to install "MS Windows Installer" first, and of course, being a MS tool, it requires a reboot. :-( Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So there. It also didn't change PATH for me, even though the docs mention that it does -- maybe only on NT? (PATH on Win9x is still a mystery to me. Is it really true that in order to change PATH an installer has to edit autoexec.bat? Or is there a better way? Anything that claims to change PATH for me doesn't seem to do so. Could I have screwed something up?) > > (Personally, I think that it wouldn't be so bad if we made it so that > > if you install just Python, the DLLs go into MAINDIR -- if you install > > the COM support, it can move/copy them to the system directory. But > > you may find this inelegant...) > > Eek. Now you're talking about one guy reaching into another installation > and munging it around. Especially for a move (boy, would that throw off > the uninstall!). If you copied, then it is possible to have *two* copies > of the DLL loaded into a process. The primary key is the pathname. I've > had two pythoncom DLLs loaded in a process, and boy does that suck! The > bugs are quite interesting, to say the least :-) And a total bear to track > down until you have seen the double-load several times and can start to > recognize the effects. > > In other words, moving is bad for elegance/uninstall reasons, and copy is > bad for (potential) runtime reasons. OK, got it. But I'm still hoping that there's something we can do differently. Didn't someone tell me that at least on Windows 2000 installing app-specific files (as opposed to MS-provided files) in the system directory is a no-no? What's the alternative there? Is the same mechanism supported on NT or Win98? --Guido van Rossum (home page: http://www.python.org/~guido/) From tim_one at email.msn.com Tue Apr 4 06:28:48 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 00:28:48 -0400 Subject: [Python-Dev] New Features in Python 1.6 In-Reply-To: <20000403165621.A9955@cnri.reston.va.us> Message-ID: <000001bf9dee$45f318c0$162d153f@tim> [Greg Ward, fesses up] > Hasn't anyone noticed that the largest amount of text in the joke > feature list was devoted to the Distutils? I thought *that* would > give it away "fer shure". You people are *so* gullible! ;-) Me too! My first suspect was me, but for the life of me, me couldn't remember writing that. You were only second on me list (it had to be one of us, as nobody else could have described legitimate Python features as if they had been implemented in Perl <0.9 wink>). > And for my next trick... *poof*! Nice try. You're not only not invisible, I've posted your credit card info to a hacker list. crushing-guido's-enemies-cuz-he's-too-much-of-a-wuss-ly y'rs - tim From tim_one at email.msn.com Tue Apr 4 07:00:55 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 01:00:55 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> Message-ID: <000901bf9df2$c224c7a0$162d153f@tim> [Guido] > ... > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. Not that I don't say a lot of dumb-ass things , but I strongly doubt I would have said this one. In my brief career as a Windows app provider, I learned four things, the first three loudly gotten across by seriously unhappy users: 1. Contra MS guidelines, dump the core DLLs in the system directory. 2. Contra MS guidelines, install the app by default in C:\name_of_app\. 3. Contra MS guidelines, put all the config options you can in a text file C:\name_of_app\name_of_app.ini instead of the registry. 4. This one was due to my boss: Contra MS guidelines, put a copy of every MS system DLL you rely on under C:\name_of_app\, so you don't get screwed when MS introduces an incompatible DLL upgrade. In the end, the last one is the only one I disagreed with (in recent years I believe MS DLL upgrades have gotten much more likely to fix bugs than to introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 apps on my home machine that use their own copy of msvcrt.dll -- /F, if you're reading, how come the Pythonworks beta does this?). > ... > I've definitely heard people complain that it is evil to install > directories in the system directory. Seems there are different > schools of thought... Well, mucking with the system directories is horrid! Nobody likes doing it. AFAIK, though, there's really no realistic alternative. It's the only place you *know* will be on the PATH, and if an app embedding Python can't rely on PATH, it will have to hardcode the Python DLL path itself. > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > the system directory. I will now be distributing with the VC++ 6.0 > servicepack 1 versions of these files. Won't this be a problem for > installations that already have an older version? (Now that I think > of it, this is another reason why I decided that at least the alpha > release should install everything in MAINDIR -- to limit the damage. > Any informed opinions?) You're using a std installer, and MS has rigid rules for these DLLs that the installer will follow by magic. Small comfort if things break, but this one is (IMO) worth playing along with. From tim_one at email.msn.com Tue Apr 4 07:42:55 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 4 Apr 2000 01:42:55 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200003242103.QAA03288@eric.cnri.reston.va.us> Message-ID: <000201bf9df8$a066b8c0$6d2d153f@tim> [Guido, on changing socket.connect() to require a single arg] > ... > Similar to append(), I may revert the change if it is shown to cause > too much pain during beta testing... I think this one already caused too much pain: it appears virtually everyone uses the two-argument form routinely, and the reason for getting rid of that seems pretty weak. As Tres Seaver just wrote on c.l.py, Constructing a spurious "address" object (which has no behavior, and exists only to be torn apart inside the implementation) seems a foolish consistency, beyond doubt. So offer to back off on this one, in return for making 1/2 yield 0.5 . From guido at python.org Tue Apr 4 09:03:58 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 03:03:58 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 04 Apr 2000 01:42:55 EDT." <000201bf9df8$a066b8c0$6d2d153f@tim> References: <000201bf9df8$a066b8c0$6d2d153f@tim> Message-ID: <200004040703.DAA11944@eric.cnri.reston.va.us> > I think this one already caused too much pain: it appears virtually > everyone uses the two-argument form routinely, and the reason for getting > rid of that seems pretty weak. As Tres Seaver just wrote on c.l.py, > > Constructing a spurious "address" object (which has no behavior, and > exists only to be torn apart inside the implementation) seems a > foolish consistency, beyond doubt. No more foolish than passing a point as an (x, y) tuple instead of separate x and y arguments. There are good reasons for passing it as a tuple, such as being able to store and recall it as a single entity. > So offer to back off on this one, in return for making 1/2 yield 0.5 . Unfortunately, I think I will have to. And it will have to be documented. The problem is that I can't document it as connect(host, port) -- there are Unix domain sockets that only take a single string argument (a filename). Also, sendto() takes a (host, port) tuple only. It has other arguments so that's the only form. Maybe I'll have to document it as connect(address) with a backwards compatible syntax connect(a, b) being equivalent to connect((a, b)). At least that sets the record straight without breaking old code. Still torn, --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Tue Apr 4 10:59:02 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 4 Apr 2000 18:59:02 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <000901bf9df2$c224c7a0$162d153f@tim> Message-ID: > 2. Contra MS guidelines, install the app by default in > C:\name_of_app\. Ive got to agree here. While I also see Greg's point, the savvy user can place it where they want, while the "average user" is better of with a more reasonable default. However, I would tend to go for "\name_of_app" rooted from the Windows drive. It is likely that this will be the default drive when a command prompt is open, so a simple "cd \python1.6" will work. This is also generally the same drive the default "Program Files" is on too. > You're using a std installer, and MS has rigid rules for > these DLLs that the > installer will follow by magic. Small comfort if things > break, but this one > is (IMO) worth playing along with. I checked the installer, and these MSVC dlls are indeed set to install only if the existing version is the "same or older". Annoyingly, it doesnt have an option for only "if older"! They are also set to correctly reference count in the registry. I believe that by installing a single custom DLL into the system directory, plus correctly installing some MS system DLLs into the system directory we are being perfect citizens. [Interestingly, Windows 2000 has a system process that continually monitors the system directory. If it detects that a "protected file" has been changed, it promptly copies the original back over the top! I believe the MSVC*.dlls are in the protected list, so can only be changed with a service pack release anyway. Everything _looks_ like it updates - Windows just copies it back!] Mark. From mal at lemburg.com Tue Apr 4 11:26:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 11:26:53 +0200 Subject: [Python-Dev] Unicode and comparisons Message-ID: <38E9B55D.F2B6409C@lemburg.com> Fredrik bug report made me dive a little deeper into compares and contains tests. Here is a snapshot of what my current version does: >>> '1' == None 0 >>> u'1' == None 0 >>> '1' == 'a???' 0 >>> u'1' == 'a???' Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data >>> '1' in ('a', None, 1) 0 >>> u'1' in ('a', None, 1) 0 >>> '1' in (u'a???', None, 1) 0 >>> u'1' in ('a???', None, 1) Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data The decoding errors occur because 'a???' is not a valid UTF-8 string (Unicode comparisons coerce both arguments to Unicode by interpreting normal strings as UTF-8 encodings of Unicode). Question: is this behaviour acceptable or should I go even further and mask decoding errors during compares and contains tests too ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From joachim at medien.tecmath.de Tue Apr 4 11:28:37 2000 From: joachim at medien.tecmath.de (Joachim Koenig-Baltes) Date: Tue, 4 Apr 2000 11:28:37 +0200 (MEST) Subject: [Python-Dev] Re: New Features in Python 1.6 In-Reply-To: <20000403131720.A10313@sz-sb.de> References: <200004011740.MAA04675@eric.cnri.reston.va.us> <20000403131720.A10313@sz-sb.de> Message-ID: <20000404092837.944E889@tmpc200.medien.tecmath.de> In comp.lang.python, you wrote: >On Sat, Apr 01, 2000 at 12:00:00PM -0500, Guido van Rossum wrote: >> >> Python strings can now be stored as Unicode strings. To make it easier >> to type Unicode strings, the single-quote character defaults to creating >> a Unicode string, while the double-quote character defaults to ASCII >> strings. If you need to create a Unicode string with double quotes, >> just preface it with the letter "u"; likewise, an ASCII string can be >> created by prefacing single quotes with the letter "a". For example: >> >> foo = 'hello' # Unicode >> foo = "hello" # ASCII > >Is single-quoting for creating unicode clever ? I think there might be a problem >with old code when the operations on unicode strings are not 100% compatible to >the standard string operations. I don't know if this is a real problem - it's >just a point for discussion. > >Cheers, >Andreas > Hallo Andreas, hast Du mal auf das Datum des Beitrages von Guido geschaut? Echt guter April- Scherz, da er die Scherze sehr gut mit der Realit?t mischt. Liebe Gr??e, auch an die anderen, Joachim From guido at python.org Tue Apr 4 13:51:42 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 07:51:42 -0400 Subject: [Python-Dev] Unicode and comparisons In-Reply-To: Your message of "Tue, 04 Apr 2000 11:26:53 +0200." <38E9B55D.F2B6409C@lemburg.com> References: <38E9B55D.F2B6409C@lemburg.com> Message-ID: <200004041151.HAA12035@eric.cnri.reston.va.us> > Fredrik bug report made me dive a little deeper into compares > and contains tests. > > Here is a snapshot of what my current version does: > > >>> '1' == None > 0 > >>> u'1' == None > 0 > >>> '1' == 'a???' > 0 > >>> u'1' == 'a???' > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > >>> '1' in ('a', None, 1) > 0 > >>> u'1' in ('a', None, 1) > 0 > >>> '1' in (u'a???', None, 1) > 0 > >>> u'1' in ('a???', None, 1) > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > The decoding errors occur because 'a???' is not a valid > UTF-8 string (Unicode comparisons coerce both arguments > to Unicode by interpreting normal strings as UTF-8 > encodings of Unicode). > > Question: is this behaviour acceptable or should I go > even further and mask decoding errors during compares > and contains tests too ? I think this is right -- I expect it will catch more errors than it will cause. This made me go out and see what happens if you compare a numeric class instance (one that defines __int__) to another int -- it doesn't even call the __int__ method! This should be fixed in 1.7 when we do the smart comparisons and rich coercions (or was it the other way around? :-). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 4 15:24:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 09:24:12 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 01:00:55 EDT." <000901bf9df2$c224c7a0$162d153f@tim> References: <000901bf9df2$c224c7a0$162d153f@tim> Message-ID: <200004041324.JAA12173@eric.cnri.reston.va.us> > [Guido] > > ... > > I'm waiting for Tim Peters' response in this thread -- if I recall he > > was the one who said that python1x.dll should not go into the system > > directory. [Tim] > Not that I don't say a lot of dumb-ass things , but I strongly doubt I > would have said this one. OK, it must be my overworked tired brain that is playing games with me. It might have been Jim Ahlstrom then, our resident Windows 3.1 supporter. :-) > In my brief career as a Windows app provider, I > learned four things, the first three loudly gotten across by seriously > unhappy users: > > 1. Contra MS guidelines, dump the core DLLs in the system directory. > 2. Contra MS guidelines, install the app by default in C:\name_of_app\. It's already been said that the drive letter could be chosen more carefully. I wonder if the pathname should also be an 8+3 (max) name, so that it can be relyably typed into a DOS window. > 3. Contra MS guidelines, put all the config options you can in a text file > C:\name_of_app\name_of_app.ini > instead of the registry. > 4. This one was due to my boss: Contra MS guidelines, put a copy of > every MS system DLL you rely on under C:\name_of_app\, so you don't > get screwed when MS introduces an incompatible DLL upgrade. > > In the end, the last one is the only one I disagreed with (in recent years I > believe MS DLL upgrades have gotten much more likely to fix bugs than to > introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 apps on > my home machine that use their own copy of msvcrt.dll -- /F, if you're > reading, how come the Pythonworks beta does this?). Probably because Pythonworks doesn't care about COM or embedding. Anyway, I now agree with you on 1-2 and on not following 4. As for 3, I think that for Mark's COM support to work, the app won't necessarily be able to guess what \name_of_app\ is, so that's where the registry comes in handy. PATH info is really about all that Python puts in the registry, so I think we're okay here. (Also if you read PC\getpathp.c in 1.6, you'll see that it now ignores most of the registry when it finds the installation through a search based on argv[0].) > > ... > > I've definitely heard people complain that it is evil to install > > directories in the system directory. Seems there are different > > schools of thought... > > Well, mucking with the system directories is horrid! Nobody likes doing it. > AFAIK, though, there's really no realistic alternative. It's the only place > you *know* will be on the PATH, and if an app embedding Python can't rely on > PATH, it will have to hardcode the Python DLL path itself. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > the system directory. I will now be distributing with the VC++ 6.0 > > servicepack 1 versions of these files. Won't this be a problem for > > installations that already have an older version? (Now that I think > > of it, this is another reason why I decided that at least the alpha > > release should install everything in MAINDIR -- to limit the damage. > > Any informed opinions?) > > You're using a std installer, and MS has rigid rules for these DLLs that the > installer will follow by magic. Small comfort if things break, but this one > is (IMO) worth playing along with. One more thing that I just realized. There are a few Python extension modules (_tkinter and the new pyexpat) that rely on external DLLs: _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs xmlparse.dll and xmltok.dll. If I understand correctly how the path rules work, these have to be on PATH too (although the pyd files don't have to be). This worries me -- these aren't official MS DLLs and neither are the our own, so we could easily stomp on some other app's version of the same... (The tcl folks don't change their filename when the 3rd version digit changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) Is there a better solution? --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Apr 4 16:20:19 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 10:20:19 -0400 (EDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <200004040703.DAA11944@eric.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> Message-ID: <14569.64035.285070.760022@seahag.cnri.reston.va.us> Guido van Rossum writes: > Maybe I'll have to document it as connect(address) with a backwards > compatible syntax connect(a, b) being equivalent to connect((a, b)). > At least that sets the record straight without breaking old code. If you *must* support the two-arg flavor (which I've never actually seen outside this discussion), I'd suggest not documenting it as a backward compatibility, only that it will disappear in 1.7. This can be done fairly easily and cleanly in the library reference. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Tue Apr 4 16:45:36 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 16:45:36 +0200 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> Message-ID: <005101bf9e44$71bade60$34aab5d4@hagrid> Tim Peters wrote: > 4. This one was due to my boss: Contra MS guidelines, put a copy of > every MS system DLL you rely on under C:\name_of_app\, so you don't > get screwed when MS introduces an incompatible DLL upgrade. > > In the end, the last one is the only one I disagreed with (in recent years I > believe MS DLL upgrades have gotten much more likely to fix bugs than to > introduce incompatibilities; OTOH, from Tcl to Macsyma Pro I see 6 apps on > my home machine that use their own copy of msvcrt.dll -- /F, if you're > reading, how come the Pythonworks beta does this?). we've been lazy... in the pre-IE days, some machines came without any msvcrt.dll at all. so since we have to ship it, I guess it was easier to ship it along with all the other components, rather than implementing the "install in system directory only if newer" stuff... (I think it's on the 2.0 todo list ;-) From guido at python.org Tue Apr 4 16:52:30 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 10:52:30 -0400 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: Your message of "Tue, 04 Apr 2000 10:20:19 EDT." <14569.64035.285070.760022@seahag.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> Message-ID: <200004041452.KAA12455@eric.cnri.reston.va.us> > If you *must* support the two-arg flavor (which I've never actually > seen outside this discussion), I'd suggest not documenting it as a > backward compatibility, only that it will disappear in 1.7. This can > be done fairly easily and cleanly in the library reference. Yes, I must. Can you fix up the docs? --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 4 16:52:08 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 16:52:08 +0200 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <006301bf9e45$5b168dc0$34aab5d4@hagrid> Guido van Rossum wrote: > I wonder if the pathname should also be an 8+3 (max) name, so that it > can be relyably typed into a DOS window. "\py" is reserved ;-) From guido at python.org Tue Apr 4 16:56:17 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 10:56:17 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:52:08 +0200." <006301bf9e45$5b168dc0$34aab5d4@hagrid> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <006301bf9e45$5b168dc0$34aab5d4@hagrid> Message-ID: <200004041456.KAA12509@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > I wonder if the pathname should also be an 8+3 (max) name, so that it > > can be relyably typed into a DOS window. > > "\py" is reserved ;-) OK, it'll be \python16 then. --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 4 17:04:40 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 17:04:40 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.99,1.100 References: <200004041410.KAA12405@eric.cnri.reston.va.us> Message-ID: <009701bf9e47$1cf13660$34aab5d4@hagrid> > Socket methods: > + (NB: an argument list of the form (sockaddr...) means that multiple > + arguments are treated the same as a single tuple argument, for backwards > + compatibility.) how about threatening to remove this in 1.7? IOW: > + (NB: an argument list of the form (sockaddr...) means that multiple > + arguments are treated the same as a single tuple argument, for backwards > + compatibility. This is deprecated, and will be removed in future versions.) From skip at mojam.com Tue Apr 4 16:23:44 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 4 Apr 2000 09:23:44 -0500 (CDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14569.64035.285070.760022@seahag.cnri.reston.va.us> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> Message-ID: <14569.64240.80221.587062@beluga.mojam.com> Fred> If you *must* support the two-arg flavor (which I've never Fred> actually seen outside this discussion), I'd suggest not Fred> documenting it as a backward compatibility, only that it will Fred> disappear in 1.7. Having surprisingly little opportunity to call socket.connect directly in my work (considering the bulk of my programming is for the web), I'll note for the record that the direct calls I've made to socket.connect all have two arguments: host and port. It never occurred to me that there would even be a one-argument version. After all, why look at the docs for help if what you're doing already works? Skip From gvwilson at nevex.com Tue Apr 4 17:34:38 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 11:34:38 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <38E9B55D.F2B6409C@lemburg.com> Message-ID: Random thought (hopefully more sensible than my last one): Would it make sense in P3K to keep using '/' for CS-style division (int/int -> rounded-down-int), and to introduce '?' for math-style division (int?int -> float-when-necessary)? Greg From gmcm at hypernet.com Tue Apr 4 17:39:52 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 11:39:52 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040252.WAA06637@eric.cnri.reston.va.us> References: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." Message-ID: <1257259699-4963377@hypernet.com> [Guido] > I'm waiting for Tim Peters' response in this thread -- if I recall he > was the one who said that python1x.dll should not go into the system > directory. Some time ago Tim and I said that the place for a DLL that is intimately tied to an EXE is in the EXE's directory. The search path: 1) the EXE's directory 2) the current directory (useless) 3) the system directory 4) the Windows directory 5) the PATH For a general purpose DLL, that makes the system directory the only sane choice (if modifying PATH was sane, then PATH would be saner, but a SpecTCL will just screw you up). Things that go in the system directory should maintain backwards compatibility. For a DLL, that means all the old entry points are still there, in the same order with new ones at the end. For Python, there's no crying need to conform for now, but if (when?) embedding Python becomes ubiquitous, this (or some other scheme) may need to be considered. - Gordon From guido at python.org Tue Apr 4 17:45:39 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 11:45:39 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 11:39:52 EDT." <1257259699-4963377@hypernet.com> References: Your message of "Mon, 03 Apr 2000 17:32:12 PDT." <1257259699-4963377@hypernet.com> Message-ID: <200004041545.LAA12635@eric.cnri.reston.va.us> > Some time ago Tim and I said that the place for a DLL that is > intimately tied to an EXE is in the EXE's directory. But the conclusion seems to be that python1x.dll is not closely tied to python.exe -- it may be invoked via COM. > The search path: > 1) the EXE's directory > 2) the current directory (useless) > 3) the system directory > 4) the Windows directory > 5) the PATH > > For a general purpose DLL, that makes the system directory > the only sane choice (if modifying PATH was sane, then > PATH would be saner, but a SpecTCL will just screw you up). > > Things that go in the system directory should maintain > backwards compatibility. For a DLL, that means all the old > entry points are still there, in the same order with new ones at > the end. For Python, there's no crying need to conform for > now, but if (when?) embedding Python becomes ubiquitous, > this (or some other scheme) may need to be considered. Where should I put tk83.dll etc.? In the Python\DLLs directory, where _tkinter.pyd also lives? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 4 17:43:49 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 11:43:49 -0400 Subject: [Python-Dev] re: division In-Reply-To: Your message of "Tue, 04 Apr 2000 11:34:38 EDT." References: Message-ID: <200004041543.LAA12616@eric.cnri.reston.va.us> > Random thought (hopefully more sensible than my last one): > > Would it make sense in P3K to keep using '/' for CS-style division > (int/int -> rounded-down-int), and to introduce '?' for math-style > division (int?int -> float-when-necessary)? Careful with your character sets there... The symbol you typed looks like a lowercase o with dieresis to me. :-( Assuming you're proposing something like this: . --- . I'm not so sure that choosing a non-ASCII symbol is going to work. For starters, it's on very few keyboards, and that won't change soon! In the past we've talked about using // for integer division and / for regular (int/int->float) division. This would mean that we have to introduce // now as an alias for /, and encourage people to use it for int division (only); then in 1.7 using / between ints will issue a compatibility warning, and in Py3K int/int will yield a float. It's still going to be painful, though. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Tue Apr 4 17:52:52 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 17:52:52 +0200 Subject: [Python-Dev] Unicode and comparisons References: <38E9B55D.F2B6409C@lemburg.com> <200004041151.HAA12035@eric.cnri.reston.va.us> Message-ID: <38EA0FD4.DB0D96BF@lemburg.com> Guido van Rossum wrote: > > > Fredrik bug report made me dive a little deeper into compares > > and contains tests. > > > > Here is a snapshot of what my current version does: > > > > >>> '1' == None > > 0 > > >>> u'1' == None > > 0 > > >>> '1' == 'a???' > > 0 > > >>> u'1' == 'a???' > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: UTF-8 decoding error: invalid data > > > > >>> '1' in ('a', None, 1) > > 0 > > >>> u'1' in ('a', None, 1) > > 0 > > >>> '1' in (u'a???', None, 1) > > 0 > > >>> u'1' in ('a???', None, 1) > > Traceback (most recent call last): > > File "", line 1, in ? > > UnicodeError: UTF-8 decoding error: invalid data > > > > The decoding errors occur because 'a???' is not a valid > > UTF-8 string (Unicode comparisons coerce both arguments > > to Unicode by interpreting normal strings as UTF-8 > > encodings of Unicode). > > > > Question: is this behaviour acceptable or should I go > > even further and mask decoding errors during compares > > and contains tests too ? > > I think this is right -- I expect it will catch more errors than it > will cause. Ok, I'll only mask the TypeErrors then. (UnicodeErrors are subclasses of ValueErrors and thus do not get masked.) > This made me go out and see what happens if you compare a numeric > class instance (one that defines __int__) to another int -- it doesn't > even call the __int__ method! This should be fixed in 1.7 when we do > the smart comparisons and rich coercions (or was it the other way > around? :-). Not sure ;-) I think both go hand in hand. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Tue Apr 4 17:53:20 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 17:53:20 +0200 Subject: [Python-Dev] re: division References: Message-ID: <010901bf9e4d$eb097840$34aab5d4@hagrid> gvwilson at nevex.com wrote: > Random thought (hopefully more sensible than my last one): > > Would it make sense in P3K to keep using '/' for CS-style division > (int/int -> rounded-down-int), and to introduce '?' for math-style > division (int?int -> float-when-necessary)? where's the ? key? (oh, look, my PC keyboard has one. but if I press it, I get a /. hmm...) From martin at loewis.home.cs.tu-berlin.de Tue Apr 4 17:44:17 2000 From: martin at loewis.home.cs.tu-berlin.de (Martin v. Loewis) Date: Tue, 4 Apr 2000 17:44:17 +0200 Subject: [Python-Dev] Re: Unicode and comparisons Message-ID: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> > Question: is this behaviour acceptable or should I go even further > and mask decoding errors during compares and contains tests too ? I always thought it is a core property of cmp that it works between all objects. Because of that, >>> x=[u'1','a???'] >>> x.sort() Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: invalid data fails. As always in cmp, I'd expect to get a consistent outcome here (ie. cmp should give a total order on objects). OTOH, I'm not so sure why cmp between plain and unicode strings needs to perform UTF-8 conversion? IOW, why is it desirable that >>> 'a' == u'a' 1 Anyway, I'm not objecting to that outcome - I only think that, to get cmp consistent, it may be necessary to drop this result. If it is not necessary, the better. Regards, Martin From jim at interet.com Tue Apr 4 18:06:27 2000 From: jim at interet.com (James C. Ahlstrom) Date: Tue, 04 Apr 2000 12:06:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <38EA1303.B393D7F8@interet.com> Guido van Rossum wrote: > OK, it must be my overworked tired brain that is playing games with > me. It might have been Jim Ahlstrom then, our resident Windows 3.1 > supporter. :-) I think I've been insulted. What's wrong with Windows 3.1?? :-) > > 1. Contra MS guidelines, dump the core DLLs in the system directory. The Python DLL must really go in the Windows system directory. I don't see any other choice. This is in accordance with Microsoft guidelines AFAIK, or anyway, that's the only way it Just Works. The Python16.dll is a system file if you are using COM, and it supports an embedded scripting language, so it goes into the system dir. QED. > > 3. Contra MS guidelines, put all the config options you can in a text file > > C:\name_of_app\name_of_app.ini > > instead of the registry. This is an excellent practice, and there should be a standard module to deal with .ini files. But, as you say, the registry is sometimes needed. > > 4. This one was due to my boss: Contra MS guidelines, put a copy of > > every MS system DLL you rely on under C:\name_of_app\, so you don't > > get screwed when MS introduces an incompatible DLL upgrade. Yuk. More trouble than it's worth. > > > I've definitely heard people complain that it is evil to install > > > directories in the system directory. Seems there are different > > > schools of thought... It is very illegal to install directories as opposed to DLL's. Do you really mean directories? If so, don't do that. > > > Another issue: MSVCRT.DLL and its friend MSVCIRT.DLL will also go into > > > the system directory. I will now be distributing with the VC++ 6.0 If you distribute these, you must check version numbers and only replace old versions. Wise and other installers do this easily. Doing otherwise is evil and unacceptable. Checking file dates is not good enough either. > > > servicepack 1 versions of these files. Won't this be a problem for > > > installations that already have an older version? Probably not, thanks to Microsoft's valiant testing efforts. > > > (Now that I think > > > of it, this is another reason why I decided that at least the alpha > > > release should install everything in MAINDIR -- to limit the damage. > > > Any informed opinions?) Distribute these files with a valid Wise install script which checks VERSIONS. > One more thing that I just realized. There are a few Python extension > modules (_tkinter and the new pyexpat) that rely on external DLLs: > _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs > xmlparse.dll and xmltok.dll. Welcome to the club. > If I understand correctly how the path rules work, these have to be on > PATH too (although the pyd files don't have to be). This worries me > -- these aren't official MS DLLs and neither are the our own, so we > could easily stomp on some other app's version of the same... > (The tcl folks don't change their filename when the 3rd version digit > changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) > > Is there a better solution? This is a daily annoyance and risk in the Windows world. If you require Tk, then you need to completely understand how to produce a valid Tk distribution. Same with PIL (which requires Tk). Often you won't know that some pyd requires some other obscure DLL. To really do this you need something high level. Like rpm's on linux. On Windows, people either write complex install programs with Wise et al, or run third party installers provided with (for example) Tk from simpler install scripts. It is then up to the Tk people to know how to install it, and how to deal with version upgrades. JimA From gmcm at hypernet.com Tue Apr 4 18:10:38 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 12:10:38 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041545.LAA12635@eric.cnri.reston.va.us> References: Your message of "Tue, 04 Apr 2000 11:39:52 EDT." <1257259699-4963377@hypernet.com> Message-ID: <1257257855-5074057@hypernet.com> [Gordon] > > Some time ago Tim and I said that the place for a DLL that is > > intimately tied to an EXE is in the EXE's directory. [Guido] > But the conclusion seems to be that python1x.dll is not closely tied > to python.exe -- it may be invoked via COM. Right. > Where should I put tk83.dll etc.? In the Python\DLLs directory, where > _tkinter.pyd also lives? Won't work (unless there are some tricks in MSVC 6 I don't know about). Assuming no one is crazy enough to use Tk in a COM server, (or rather, that their insanity need not be catered to), then I'd vote for the directory where python.exe and pythonw.exe live. - Gordon From gvwilson at nevex.com Tue Apr 4 18:20:22 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 12:20:22 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <200004041543.LAA12616@eric.cnri.reston.va.us> Message-ID: > Assuming you're proposing something like this: > > . > --- > . > > I'm not so sure that choosing a non-ASCII symbol is going to work. For > starters, it's on very few keyboards, and that won't change soon! I realize that, but neither are many of the accented characters used in non-English names (said the Canadian). If we assume 18-24 months until P3K, will it be safe to assume support for non-7-bit characters, or will we continue to be constrained by what was available on PDP-11's in 1975? (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', but harder to track down, since you'll have to scrutinize values very carefully to spot the difference. Haven't done any field tests, though...) Greg From bwarsaw at cnri.reston.va.us Tue Apr 4 19:56:23 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 4 Apr 2000 13:56:23 -0400 (EDT) Subject: [Python-Dev] re: division References: <200004041543.LAA12616@eric.cnri.reston.va.us> Message-ID: <14570.11463.83210.17189@anthem.cnri.reston.va.us> >>>>> "gvwilson" == writes: gvwilson> If we assume 18-24 months until P3K, will it be safe to gvwilson> assume support for non-7-bit characters, or will we gvwilson> continue to be constrained by what was available on gvwilson> PDP-11's in 1975? Undoubtedly. From gvwilson at nevex.com Tue Apr 4 20:08:36 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 14:08:36 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case Message-ID: Here's a longer, and hopefully more coherent, argument for using the divided-by sign in P3K: 1. If P3K source is allowed to be Unicode, then all Python programming systems (custom-made or pre-existing) are going to have to be able to handle more than just 1970s-vintage 7-bit ASCII. If that support has to be there, it seems a shame not to make use of it in the language itself where that would be helpful. [1,2] 2. As I understand it, support for (int,int)->float division is being added to help people who think that arithmetic on computers ought to behave like arithmetic did in grade 4. I have no data to support this, but I expect that such people will understand the divided-by sign more readily than a forward slash. [3] 3. I also expect, again without data, that '//' vs. '/' will lead to as high a proportion of errors as '==' vs. '='. These errors may even prove harder to track down, since the result is a slightly wrong answer instead of a state change leading (often) to early loop termination or something equally noticeable. Greg [1] I'm aware that there are encoding issues (the replies to my first post mentioned at least two different ways for "my" divided-by sign to display), but this is an issue that will have to be tackled in general in order to support Unicode anyway. [2] I'd be grateful if everyone posting objections along the lines of, "But what about emacs/vi/some other favored bit of legacy technology?" could also indicate whether they use lynx(1) as their web browser, and/or are sure that 100% of the web pages they have built are accessible to people who don't have bit-mapped graphics. I am *not* trying to be inflammatory, I just think that if a technology is taken for granted as part of one tool, then it is legitimate to ask that it be taken for granted in another. [3] Please note that I am not asking for a multiplication sign, a square root sign, or any of APL's mystic runes. From fdrake at acm.org Tue Apr 4 20:27:08 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 14:27:08 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <14570.13308.147675.434718@seahag.cnri.reston.va.us> gvwilson at nevex.com writes: > 1. If P3K source is allowed to be Unicode, then all Python programming > systems (custom-made or pre-existing) are going to have to be able > to handle more than just 1970s-vintage 7-bit ASCII. If that support > has to be there, it seems a shame not to make use of it in the language > itself where that would be helpful. [1,2] I don't recall any requirement that the host be able to deal with Unicode specially (meaning "other than as binary data"). Perhaps I missed that? > 2. As I understand it, support for (int,int)->float division is being > added to help people who think that arithmetic on computers ought to > behave like arithmetic did in grade 4. I have no data to support this, > but I expect that such people will understand the divided-by sign more > readily than a forward slash. [3] I don't think the division sign itself is a problem. Re-training experianced programmers might be; I don't think there's any intention of alienating that audience. > 3. I also expect, again without data, that '//' vs. '/' will lead to as > high a proportion of errors as '==' vs. '='. These errors may even > prove harder to track down, since the result is a slightly wrong answer > instead of a state change leading (often) to early loop termination or > something equally noticeable. A agree. > [3] Please note that I am not asking for a multiplication sign, a square > root sign, or any of APL's mystic runes. As I indicated above, I don't think the specific runes are the problem (outside of programmer alienation). The *biggest* problem (IMO) is that the runes are not on our keyboards. This has nothing to do with the appropriateness of the runes to the semantic meanings bound to them in the language definition, this has to do convenience for typing without any regard to cultured habits in the current programmer population. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson at nevex.com Tue Apr 4 20:38:29 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 14:38:29 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14570.13308.147675.434718@seahag.cnri.reston.va.us> Message-ID: Hi, Fred; thanks for your mail. > gvwilson at nevex.com writes: > > 1. If P3K source is allowed to be Unicode > I don't recall any requirement that the host be able to deal with > Unicode specially (meaning "other than as binary data"). Perhaps I > missed that? I'm sorry, I didn't mean to imply that this decision had been taken --- hence the "if". However, allowing Unicode in source doesn't seem to have slowed down adoption of Java... :-) > I don't think the division sign itself is a problem. Re-training > experianced programmers might be; I don't think there's any intention > of alienating that audience. I think this comes down to spin. If this is presented as, "We're adding a symbol that isn't on your keyboard in order to help newbies," it'll be flamed. If it's presented as, "Python is the first scripting language to fully embrace internationalization, so get with the twenty-first century!" (or something like that), I could see it getting a much more positive response. I also think that, despite their grumbling, experienced programmers are pretty adaptable. After all, I switch from Emacs Lisp to Python to C++ half-a-dozen times a day... :-) > The *biggest* problem (IMO) is that the runes are not on our > keyboards. Agreed. Perhaps non-native English speakers could pitch in and describe how easy/difficult it is for them to (for example) put properly-accented Spanish comments in code? Thanks, Greg From klm at digicool.com Tue Apr 4 20:48:52 2000 From: klm at digicool.com (Ken Manheimer) Date: Tue, 4 Apr 2000 14:48:52 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14570.13308.147675.434718@seahag.cnri.reston.va.us> Message-ID: On Tue, 4 Apr 2000, Fred L. Drake, Jr. wrote: > gvwilson at nevex.com writes: > > 1. If P3K source is allowed to be Unicode, then all Python programming > > systems (custom-made or pre-existing) are going to have to be able > > to handle more than just 1970s-vintage 7-bit ASCII. If that support > > has to be there, it seems a shame not to make use of it in the language > > itself where that would be helpful. [1,2] > [...] > As I indicated above, I don't think the specific runes are the > problem (outside of programmer alienation). The *biggest* problem > (IMO) is that the runes are not on our keyboards. This has nothing to > do with the appropriateness of the runes to the semantic meanings > bound to them in the language definition, this has to do convenience > for typing without any regard to cultured habits in the current > programmer population. In general, it seems that there are some places where a programming language implementation should not be on the leading edge, and this is one. I think we'd have to be very confident that this new division sign (or whatever) is going to be in ubiquitous use, on everyone's keyboard, etc, before we could even consider making it a necessary part of the standard language. Do you have that confidence? Ken Manheimer klm at digicool.com From gvwilson at nevex.com Tue Apr 4 20:53:52 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Tue, 4 Apr 2000 14:53:52 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: HI, Ken; thanks for your mail. > In general, it seems that there are some places where a programming > language implementation should not be on the leading edge, and this is > one. I think we'd have to be very confident that this new division > sign (or whatever) is going to be in ubiquitous use, on everyone's > keyboard, etc, before we could even consider making it a necessary > part of the standard language. Do you have that confidence? I wouldn't expect the division sign to be on keyboards. On the other hand, I would expect that having to type a two-stroke sequence every once in a while would help native English speakers appreciate what people in other countries sometimes have to go through in order to spell their names correctly... :-) Greg From pf at artcom-gmbh.de Tue Apr 4 20:48:11 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 4 Apr 2000 20:48:11 +0200 (MEST) Subject: .ini files - was Re: [Python-Dev] DLL in the system dir In-Reply-To: <38EA1303.B393D7F8@interet.com> from "James C. Ahlstrom" at "Apr 4, 2000 12: 6:27 pm" Message-ID: Hi! [...] > > > 3. Contra MS guidelines, put all the config options you can in a text file > > > C:\name_of_app\name_of_app.ini > > > instead of the registry. James C. Ahlstrom: > This is an excellent practice, and there should be a standard module to > deal > with .ini files. [...] One half of it is already there in the standard library: 'ConfigParser'. From my limited knowledge about windows (shrug) this can at least read .ini files. Writing this info again out to a file shouldn't be too hard. Regards, Peter From effbot at telia.com Tue Apr 4 20:57:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 20:57:17 +0200 Subject: [Python-Dev] a slightly more coherent case References: Message-ID: <024401bf9e67$9a1a53e0$34aab5d4@hagrid> gvwilson at nevex.com wrote: > > The *biggest* problem (IMO) is that the runes are not on our > > keyboards. > > Agreed. Perhaps non-native English speakers could pitch in and describe > how easy/difficult it is for them to (for example) put properly-accented > Spanish comments in code? you know, people who do use foreign languages a lot tend to use keyboards designed for their language. I have keys for all swedish characters on my keyboard -- att skriva korrekt svenska p? mitt tangentbord ?r hur enkelt som helst... to type less common latin 1 characters, I ?s??ll? o?l? have to use tw? keys -- one "d??d ke?" for the ?ccent, f?llow?d by th? c?rre- sp?nding ch?r?ct?r. (visst, ? och ? anv?nds ibland i svensk text, och fanns f?rr ofta som separata tangenter -- i alla fall innan pc'n kom och f?rst?rde allting). besides, the use of indentation causes enough problems when doing trivial things like mailing, posting, and typesetting Python code. adding odd characters to the mix won't exactly help... From gmcm at hypernet.com Tue Apr 4 21:46:34 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 15:46:34 -0400 Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <1257244899-5853377@hypernet.com> Greg Wilson wrote: > I wouldn't expect the division sign to be on keyboards. On the other hand, > I would expect that having to type a two-stroke sequence every once in a > while would help native English speakers appreciate what people in other > countries sometimes have to go through in order to spell their names > correctly... Certain stuffy (and now deceased) members of my family, despite emigrating to the Americas during the Industrial Revolution, insisted that the proper spelling of McMillan involved elevating the "c". Wonder if there's a unicode character for that, so I can get righteously indignant whenever people fail to use it. Personally, I'm delighted when people don't add extra letters to my name, and even that's pretty silly, since all the variations on M*M*ll*n come down to how some government clerk chose to spell it. - Gordon From guido at python.org Tue Apr 4 21:49:32 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 15:49:32 -0400 Subject: [Python-Dev] Re: Unicode and comparisons In-Reply-To: Your message of "Tue, 04 Apr 2000 17:44:17 +0200." <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> References: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> Message-ID: <200004041949.PAA13102@eric.cnri.reston.va.us> > I always thought it is a core property of cmp that it works between > all objects. Not any more. Comparisons can raise exceptions -- this has been so since release 1.5. This is rarely used between standard objects, but not unheard of; and class instances can certainly do anything they want in their __cmp__. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Apr 4 21:51:14 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 4 Apr 2000 15:51:14 -0400 (EDT) Subject: [Python-Dev] Heads up: socket.connect() breakage ahead In-Reply-To: <14569.64240.80221.587062@beluga.mojam.com> References: <000201bf9df8$a066b8c0$6d2d153f@tim> <200004040703.DAA11944@eric.cnri.reston.va.us> <14569.64035.285070.760022@seahag.cnri.reston.va.us> <14569.64240.80221.587062@beluga.mojam.com> Message-ID: <14570.18354.151349.452329@seahag.cnri.reston.va.us> Skip Montanaro writes: > arguments: host and port. It never occurred to me that there would even be > a one-argument version. After all, why look at the docs for help if what > you're doing already works? And it never occurred to me that there would be two args; I vaguely recall the C API having one argument (a structure). Ah, well. I've patched up the documents to warn those who expect intuitive APIs. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Tue Apr 4 21:57:47 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 15:57:47 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Sun, 02 Apr 2000 10:37:11 +0200." <004701bf9c7e$a5045480$34aab5d4@hagrid> References: <004701bf9c7e$a5045480$34aab5d4@hagrid> Message-ID: <200004041957.PAA13168@eric.cnri.reston.va.us> > one of my side projects for SRE is to create a regex-compatible > frontend. since both engines have NFA semantics, this mostly > involves writing an alternate parser. > > however, when I started playing with that, I completely forgot > about the regex.set_syntax() function. supporting one extra > syntax isn't that much work, but a whole bunch of them? > > so what should we do? > > 1. completely get rid of regex (bjorn would love that, > don't you think?) (Who's bjorn?) > 2. remove regex.set_syntax(), and tell people who've > used it that they're SOL. > > 3. add all the necessary flags to the new parser... > > 4. keep regex around as before, and live with the > extra code bloat. > > comments? I'm for 4, then deprecating it, and eventually switching to 1. This saves you effort debugging compatibility with an obsolete module. If it ain't broken, don't "fix" it. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 4 22:10:07 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:10:07 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> Message-ID: <200004042010.QAA13180@eric.cnri.reston.va.us> [me] > > One more thing that I just realized. There are a few Python extension > > modules (_tkinter and the new pyexpat) that rely on external DLLs: > > _tkinter.pyd needs tcl83.dll and tk83.dll, and pyexpat.pyd needs > > xmlparse.dll and xmltok.dll. [Jim A] > Welcome to the club. I'm not sure what you mean by this? > > If I understand correctly how the path rules work, these have to be on > > PATH too (although the pyd files don't have to be). This worries me > > -- these aren't official MS DLLs and neither are the our own, so we > > could easily stomp on some other app's version of the same... > > (The tcl folks don't change their filename when the 3rd version digit > > changes, e.g. 8.3.0 -> 8.3.1, and expat has no versions at all.) > > > > Is there a better solution? > > This is a daily annoyance and risk in the Windows world. If you require > Tk, then you need to completely understand how to produce a valid Tk > distribution. Same with PIL (which requires Tk). Often you won't > know that some pyd requires some other obscure DLL. To really do this > you need something high level. Like rpm's on linux. On Windows, people > either write complex install programs with Wise et al, or run third > party installers provided with (for example) Tk from simpler install > scripts. It is then up to the Tk people to know how to install it, and > how to deal with version upgrades. Calculating the set of required DLLs isn't the problem. I have a tool (Dependency Viewer) that shows me exactly the dependencies (it recurses down any DLLs it finds and shows their dependencies too, using the nice MFC tree widget). The problem is where should I install these extra DLLs. In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer from Scriptics). The feedback over the past year showed that this was a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk installations stomped over it, people chose to install Tcl/Tk on a different volume than Python, etc. In 1.6, I am copying the necessary files from the Tcl/Tk installation into the Python directory. This actually installs fewer files than the full Tcl/Tk installation (but you don't get the Tcl/Tk docs). It gives me complete control over which Tcl/Tk version I use without affecting other Tcl/Tk installations that might exist. This is how professional software installations deal with inclusions. However the COM DLL issue might cause problems: if the Python directory is not in the search path because we're invoked via COM, there are only two places where the Tcl/Tk DLLs can be put so they will be found: in the system directory or somewhere along PATH. Assuming it is still evil to modify PATH, we would end up with Tcl/Tk in the system directory, where it could once again interfere with (or be interfered by) other Tcl/Tk installations! Someone suggested that COM should not use Tcl/Tk, and then the Tcl/Tk DLLs can live in the Python tree. I'm not so sure -- I can at least *imagine* that someone would use Tcl/Tk to give their COM object a bit of a GUI. Moreover, this argument doesn't work for pyexpat -- COM apps are definitely going to expect to be able to use pyexpat! It's annoying. I have noticed, however, that you can use os.putenv() (or assignment to os.environ[...]) to change the PATH environment variable. The FixTk.py script in Python 1.5.2 used this -- it looked in a few places for signs of a Tcl/Tk installation, and then adjusted PATH to include the proper directory before trying to import _tkinter. Maybe there's a solution here? The Python DLL could be the only thing in the system directory, and from the registry it could know where the Python directory was. It could then prepend this directory to PATH. This is not so evil as mucking with PATH at install time, I think, since it is only done when Python16.dll is actually loaded. Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run out of environment space? Wouldn't it break other COM apps? Is the PATH truly separate per process? --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 4 22:11:02 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 22:11:02 +0200 Subject: [Python-Dev] Heads up: socket.connect() breakage ahead References: <000201bf9df8$a066b8c0$6d2d153f@tim><200004040703.DAA11944@eric.cnri.reston.va.us><14569.64035.285070.760022@seahag.cnri.reston.va.us><14569.64240.80221.587062@beluga.mojam.com> <14570.18354.151349.452329@seahag.cnri.reston.va.us> Message-ID: <02d501bf9e71$e85f0e60$34aab5d4@hagrid> Fred L. Drake wrote: > Skip Montanaro writes: > > arguments: host and port. It never occurred to me that there would even be > > a one-argument version. After all, why look at the docs for help if what > > you're doing already works? > > And it never occurred to me that there would be two args; I vaguely > recall the C API having one argument (a structure). Ah, well. I've > patched up the documents to warn those who expect intuitive APIs. ;) while you're at it, and when you find the time, could you perhaps grep for "pair" and change places which use "pair" to mean a tuple with two elements to actually say "tuple" or "2-tuple"... after all, numerous people have claimed that stuff like "a pair (host, port)" isn't enough to make them understand that "pair" actually means "tuple". unless pair refers to a return value, of course. and only if the function doesn't use the optional argument syntax, of course. etc. (I suspect they're making it up as they go, but that's another story...) From skip at mojam.com Tue Apr 4 21:15:26 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 4 Apr 2000 14:15:26 -0500 (CDT) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: References: Message-ID: <14570.16206.210676.756348@beluga.mojam.com> Greg> On the other hand, I would expect that having to type a two-stroke Greg> sequence every once in a while would help native English speakers Greg> appreciate what people in other countries sometimes have to go Greg> through in order to spell their names correctly... I'm sure this is a practical problem, but aren't there country-specific keyboards available to Finnish, Spanish, Russian and non-English-speaking users to avoid precisely these problems? I grumble every time I have to enter some accented characters, but that's just because I do it rarely and use a US ASCII keyboard. I suspect Fran?ois Pinard has a keyboard with a "?" key. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From effbot at telia.com Tue Apr 4 22:23:06 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 4 Apr 2000 22:23:06 +0200 Subject: [Python-Dev] a slightly more coherent case References: <14570.16206.210676.756348@beluga.mojam.com> Message-ID: <02eb01bf9e73$9d4c1560$34aab5d4@hagrid> Skip wrote: > I'm sure this is a practical problem, but aren't there country-specific > keyboards available to Finnish, Spanish, Russian and non-English-speaking > users to avoid precisely these problems? fwiw, my windows box supports about 80 different language-related keyboard layouts. that's western european and american keyboard layouts only, of course (mostly latin-1). haven't installed all the others... From gmcm at hypernet.com Tue Apr 4 22:35:23 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Tue, 4 Apr 2000 16:35:23 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004042010.QAA13180@eric.cnri.reston.va.us> References: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> Message-ID: <1257241967-6030557@hypernet.com> [Guido] > Someone suggested that COM should not use Tcl/Tk, and then the Tcl/Tk > DLLs can live in the Python tree. I'm not so sure -- I can at least > *imagine* that someone would use Tcl/Tk to give their COM object a bit > of a GUI. Moreover, this argument doesn't work for pyexpat -- COM > apps are definitely going to expect to be able to use pyexpat! Me. Would you have any sympathy for someone who wanted to make a GUI an integral part of a web server? Or would you tell them to get a brain and write a GUI that talks to the web server? Same issue. (Though not, I guess, for pyexpat). > It's annoying. > > I have noticed, however, that you can use os.putenv() (or assignment > to os.environ[...]) to change the PATH environment variable. The > FixTk.py script in Python 1.5.2 used this -- it looked in a few places > for signs of a Tcl/Tk installation, and then adjusted PATH to include > the proper directory before trying to import _tkinter. Maybe there's > a solution here? The Python DLL could be the only thing in the system > directory, and from the registry it could know where the Python > directory was. It could then prepend this directory to PATH. This is > not so evil as mucking with PATH at install time, I think, since it is > only done when Python16.dll is actually loaded. The drawback of relying on PATH is that then some other jerk (eg you, last year ) will stick something of the same name in the system directory and break your installation. > Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run > out of environment space? Wouldn't it break other COM apps? Is the > PATH truly separate per process? Are there any exceptions to this: - dynamically load a .pyd - .pyd implicitly loads the .dll ? If that's always the case, then you can temporarily cd to the right directory before the dynamic load, and the implicit load should work. As for the others: probably not; can't see how; yes. - Gordon From guido at python.org Tue Apr 4 22:45:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:45:12 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:35:23 EDT." <1257241967-6030557@hypernet.com> References: Your message of "Tue, 04 Apr 2000 12:06:27 EDT." <38EA1303.B393D7F8@interet.com> <1257241967-6030557@hypernet.com> Message-ID: <200004042045.QAA13343@eric.cnri.reston.va.us> > Me. Would you have any sympathy for someone who wanted > to make a GUI an integral part of a web server? Or would you > tell them to get a brain and write a GUI that talks to the web > server? Same issue. (Though not, I guess, for pyexpat). Not all COM objects are used in web servers. Some are used in GUI contexts (aren't Word and Excel and even IE really mostly COM objects these days?). > > It's annoying. > > > > I have noticed, however, that you can use os.putenv() (or assignment > > to os.environ[...]) to change the PATH environment variable. The > > FixTk.py script in Python 1.5.2 used this -- it looked in a few places > > for signs of a Tcl/Tk installation, and then adjusted PATH to include > > the proper directory before trying to import _tkinter. Maybe there's > > a solution here? The Python DLL could be the only thing in the system > > directory, and from the registry it could know where the Python > > directory was. It could then prepend this directory to PATH. This is > > not so evil as mucking with PATH at install time, I think, since it is > > only done when Python16.dll is actually loaded. > > The drawback of relying on PATH is that then some other jerk > (eg you, last year ) will stick something of the same > name in the system directory and break your installation. Yes, that's a problem, especially since it appears that PATH is searched *last*. (I wonder if this could explain the hard-to-reproduce crashes that people report when quitting IDLE?) > > Would this always work? (Windows 95, 98, NT, 2000?) Wouldn't it run > > out of environment space? Wouldn't it break other COM apps? Is the > > PATH truly separate per process? > > Are there any exceptions to this: > - dynamically load a .pyd > - .pyd implicitly loads the .dll > ? I think this is always the pattern (except that some DLLs will implicitly load other DLLs, and so on). > If that's always the case, then you can temporarily cd to the > right directory before the dynamic load, and the implicit load > should work. Hm, I would think that the danger of temporarily changing the current directory is at least as big as that of changing PATH. (What about other threads? What if you run into an error and don't get a chance to cd back?) > As for the others: probably not; can't see how; yes. --Guido van Rossum (home page: http://www.python.org/~guido/) From jim at interet.com Tue Apr 4 22:53:20 2000 From: jim at interet.com (James C. Ahlstrom) Date: Tue, 04 Apr 2000 16:53:20 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> Message-ID: <38EA5640.D3FC112F@interet.com> Guido van Rossum wrote: > [Jim A] > > Welcome to the club. > > I'm not sure what you mean by this? It sounded like you were joining the Microsoft afflicted... > In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer > from Scriptics). The feedback over the past year showed that this was > a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk > installations stomped over it, people chose to install Tcl/Tk on a > different volume than Python, etc. My first thought was that this was the preferred solution. It is up to Scriptics to provide an installer for Tk and for Tk customers to use it. Any problems with installing Tk are Scriptics' problem. I don't know the reasons it stomped over other installs etc. But either Tk customers are widely using non-standard installs, or the Scriptics installer is broken, or there is no such thing as a standard Tk install. This is fundamentally a Scriptics problem, but I understand it is a Python problem too. There may still be the problem that a standard Tk install might not be accessible to Python. This needs to be worked out with Scriptics. An environment variable could be set, the registry used etc. Assuming there is a standard Tk install and a way for external apps to use Tk, then we can still use the (fixed) Scriptics installer. > Assuming it is still evil to modify PATH, we would end up with Tcl/Tk > in the system directory, where it could once again interfere with (or > be interfered by) other Tcl/Tk installations! I seems to me that the correct Tk install script would put Tk DLL's in the system dir, and use the registry to find the libraries and other needed files. The exe's could go in a program directory somewhere. This is what I have to come to expect from professional software for DLL's which are expected to be used from multiple apps, as opposed to DLL's which are peculiar to one app. If the Tk installer did this, Tk would Just Work, and it would Just Work with third party apps (Tk clients) like Python too. Sorry, I have to run to a class. To be continued tomorrow.... JimA From guido at python.org Tue Apr 4 22:58:08 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 16:58:08 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Tue, 04 Apr 2000 16:53:20 EDT." <38EA5640.D3FC112F@interet.com> References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> <38EA5640.D3FC112F@interet.com> Message-ID: <200004042058.QAA13437@eric.cnri.reston.va.us> > > [Jim A] > > > Welcome to the club. [me] > > I'm not sure what you mean by this? > > It sounded like you were joining the Microsoft afflicted... Indeed :-( > > In 1.5.2 I included a full Tcl/Tk installer (the unadorned installer > > from Scriptics). The feedback over the past year showed that this was > > a bad idea: it stomped over existing Tcl/Tk installations, new Tcl/Tk > > installations stomped over it, people chose to install Tcl/Tk on a > > different volume than Python, etc. > > My first thought was that this was the preferred solution. It is up > to Scriptics to provide an installer for Tk and for Tk customers > to use it. Any problems with installing Tk are Scriptics' problem. > I don't know the reasons it stomped over other installs etc. But > either Tk customers are widely using non-standard installs, or > the Scriptics installer is broken, or there is no such thing > as a standard Tk install. This is fundamentally a Scriptics > problem, but I understand it is a Python problem too. > > There may still be the problem that a standard Tk install might not > be accessible to Python. This needs to be worked out with Scriptics. > An environment variable could be set, the registry used etc. Assuming > there is a standard Tk install and a way for external apps to use Tk, > then we can still use the (fixed) Scriptics installer. The Tk installer has had these problems for a long time. I don't want to have to argue with them, I think it would be a waste of time. > > Assuming it is still evil to modify PATH, we would end up with Tcl/Tk > > in the system directory, where it could once again interfere with (or > > be interfered by) other Tcl/Tk installations! > > I seems to me that the correct Tk install script would put Tk > DLL's in the system dir, and use the registry to find the libraries > and other needed files. The exe's could go in a program directory > somewhere. This is what I have to come to expect from professional > software for DLL's which are expected to be used from multiple > apps, as opposed to DLL's which are peculiar to one app. If > the Tk installer did this, Tk would Just Work, and it would > Just Work with third party apps (Tk clients) like Python too. OK, you go argue with the Tcl folks. They create a vaguely unix-like structure under c:\Program Files\Tcl: subdirectories lib, bin, include, and then they dump their .exe and their .dll files in the bin directory. They also try to munge PATH to include their bin directory, but that often doesn't work (not on Windows 95/98 anyway). --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Tue Apr 4 23:14:59 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 4 Apr 2000 23:14:59 +0200 (MEST) Subject: [Python-Dev] Re: Unicode and comparisons In-Reply-To: <200004041949.PAA13102@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 4, 2000 3:49:32 pm" Message-ID: Hi! Guido van Rossum: > > I always thought it is a core property of cmp that it works between > > all objects. > > Not any more. Comparisons can raise exceptions -- this has been so > since release 1.5. This is rarely used between standard objects, but > not unheard of; and class instances can certainly do anything they > want in their __cmp__. Python 1.6a1 (#6, Apr 2 2000, 02:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> a = '1' >>> b = 2 >>> a < b 0 >>> a > b # Newbies are normally baffled here 1 >>> a = '?' >>> b = u'?' >>> a < b Traceback (most recent call last): File "", line 1, in ? UnicodeError: UTF-8 decoding error: unexpected end of data IMO we will have a *very* hard to time to explain *this* behaviour to newbiews! Unicode objects are similar to normal string objects from the users POV. It is unintuitive that objects that are far less similar (like for example numbers and strings) compare the way they do now, while the attempt to compare an unicode string with a standard string object containing the same character raises an exception. Mit freundlichen Gr??en (Regards), Peter (BTW: using an 12year old US keyboard and a custom xmodmap all the time to write umlauts lots of other interisting chars: ?? ? ?? ?? ? ? ?? ?? ?! ;-) From mal at lemburg.com Tue Apr 4 18:47:51 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 18:47:51 +0200 Subject: [Python-Dev] Re: Unicode and comparisons References: <200004041544.RAA01023@loewis.home.cs.tu-berlin.de> Message-ID: <38EA1CB7.BBECA305@lemburg.com> "Martin v. Loewis" wrote: > > > Question: is this behaviour acceptable or should I go even further > > and mask decoding errors during compares and contains tests too ? > > I always thought it is a core property of cmp that it works between > all objects. It does, but not necessarily without exceptions. I could easily mask the decoding errors too and then have cmp() work exactly as for strings, but the outcome may be different to what the user had expected due to the failing conversion. Sorting order may then look quite unsorted... > Because of that, > > >>> x=[u'1','a???'] > >>> x.sort() > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: invalid data > > fails. As always in cmp, I'd expect to get a consistent outcome here > (ie. cmp should give a total order on objects). > > OTOH, I'm not so sure why cmp between plain and unicode strings needs > to perform UTF-8 conversion? IOW, why is it desirable that > > >>> 'a' == u'a' > 1 This is needed to enhance inter-operability between Unicode and normal strings. Note that they also have the same hash value (provided both use the ASCII code range), making them interchangeable in dictionaries: >>> d={u'a':1} >>> d['a'] = 2 >>> d[u'a'] 2 >>> d['a'] 2 This is per design. > Anyway, I'm not objecting to that outcome - I only think that, to get > cmp consistent, it may be necessary to drop this result. If it is not > necessary, the better. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Apr 4 23:47:16 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 04 Apr 2000 23:47:16 +0200 Subject: [Python-Dev] Re: Unicode and comparisons References: Message-ID: <38EA62E4.7E2B0E43@lemburg.com> Peter Funk wrote: > > Hi! > > Guido van Rossum: > > > I always thought it is a core property of cmp that it works between > > > all objects. > > > > Not any more. Comparisons can raise exceptions -- this has been so > > since release 1.5. This is rarely used between standard objects, but > > not unheard of; and class instances can certainly do anything they > > want in their __cmp__. > > Python 1.6a1 (#6, Apr 2 2000, 02:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> a = '1' > >>> b = 2 > >>> a < b > 0 > >>> a > b # Newbies are normally baffled here > 1 > >>> a = '?' > >>> b = u'?' > >>> a < b > Traceback (most recent call last): > File "", line 1, in ? > UnicodeError: UTF-8 decoding error: unexpected end of data > > IMO we will have a *very* hard to time to explain *this* behaviour > to newbiews! > > Unicode objects are similar to normal string objects from the users POV. > It is unintuitive that objects that are far less similar (like for > example numbers and strings) compare the way they do now, while the > attempt to compare an unicode string with a standard string object > containing the same character raises an exception. I don't think newbies will really want to get into the UTF-8 business right from the start... when they do, they probably know about the above problems already. Changing this behaviour to silently swallow the decoding error would cause more problems than do good, IMHO. Newbies sure would find (u'a' not in 'a???') == 1 just as sursprising... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Wed Apr 5 00:51:01 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 08:51:01 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: > I wonder if the pathname should also be an 8+3 (max) > name, so that it > can be relyably typed into a DOS window. To be honest, I can not see a good reason for this any more. The installation package only works on Win95/98/NT/2000 - all of these support long file names on all their supported file systems. So, any where that this installer will run, the "command prompt" on this system will correctly allow "cd \Python-1.6-and-any-thing-else-I-like-ly" :-) [OTOH, I tend to prefer "Python1.6" purely from an "easier to type" POV] Mark. From mhammond at skippinet.com.au Wed Apr 5 00:59:24 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 08:59:24 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <1257259699-4963377@hypernet.com> Message-ID: [Gordon writes] > Things that go in the system directory should maintain > backwards compatibility. For a DLL, that means all the old > entry points are still there, in the same order with new ones at > the end. Actually, the order is not important unless people link to you by ordinal (in which case you are likely to specify the ordinal in the .def file anyway). The Win32 loader is smart enough to be able to detect that all ordinals are the same as when it was linked, and use a fast-path. If ordinal name-to-number mappings have changed, the runtime loader takes a slower path that fixes up these differences. So what you suggest is ideal, but not really necessary. > For Python, there's no crying need to conform for > now, but if (when?) embedding Python becomes ubiquitous, > this (or some other scheme) may need to be considered. I believe Python will already do this, almost by accident, due to the conservative changes with each minor Python release. Eg, up until Python 1.6 was branded as 1.6, I was still linking my win32all extensions against the CVS version. When I remembered I would switch back to the 1.5.2 release ones, but when I forgot I never had a problem. People running a release version 1.5.2 could happily use my extensions linked with the latest 1.5.2+ binaries. We-could-even-blame-the-time-machine-at-a-strecth-ly, Mark. From mhammond at skippinet.com.au Wed Apr 5 01:08:58 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 5 Apr 2000 09:08:58 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <1257257855-5074057@hypernet.com> Message-ID: > > Where should I put tk83.dll etc.? In the Python\DLLs > directory, where > > _tkinter.pyd also lives? > > Won't work (unless there are some tricks in MSVC 6 I don't > know about). Assuming no one is crazy enough to use Tk in a > COM server, (or rather, that their insanity need not be catered > to), then I'd vote for the directory where python.exe and > pythonw.exe live. What we can do is have Python itself use LoadLibraryEx() to load the .pyd files. This _will_ allow any dependant DLLs to be found in the same directory as the .pyd. [And as I mentioned, if the whole world would use LoadLibraryEx(), our problem would go away] LoadLibraryEx() is documented as working on all Win9x and NT from 3.1. From guido at python.org Wed Apr 5 01:14:22 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 19:14:22 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Wed, 05 Apr 2000 09:08:58 +1000." References: Message-ID: <200004042314.TAA15407@eric.cnri.reston.va.us> > What we can do is have Python itself use LoadLibraryEx() to load the > .pyd files. This _will_ allow any dependant DLLs to be found in the > same directory as the .pyd. [And as I mentioned, if the whole world > would use LoadLibraryEx(), our problem would go away] Doh! [Sound of forehead being slapped violently] We already use LoadLibraryEx()! So we can drop all the dependent dlls in the DLLs directory which has the PYD files as well. Case closed. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Wed Apr 5 03:20:44 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 4 Apr 2000 18:20:44 -0700 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: <008c01bf9d57$d1753be0$34aab5d4@hagrid> Message-ID: > Greg Stein wrote: > > > we install our python distribution under the \py, > > > and we get lot of positive responses. as far as I remember, > > > nobody has ever reported problems setting up the path... > > > > *shrug* This doesn't dispute the standard Windows recommendation to > > install software into Program Files. > > no, but Tim's and my experiences from doing user support show that > the standard Windows recommendation doesn't work for command line > applications. we don't care about Microsoft, we care about Python's > users. > > to quote a Linus Torvalds, "bad standards _should_ be broken" > > (after all, Microsoft doesn't put their own command line applications > down there -- there's no "\Program Files" [sub]directory in the default > PATH, at least not on any of my boxes. maybe they've changed that > in Windows 2000?) Sorry I'm late -- I've been out of town. Just two FYIs: 1) ActivePerl goes into /Perl5.6, and my guess is that it's based on user feedback. 2) I've switched to changing the default installation to C:/Python in all my installs, and am much happier since I made that switchover. --david From DavidA at ActiveState.com Wed Apr 5 03:24:57 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 4 Apr 2000 18:24:57 -0700 Subject: FW: [Python-Dev] Windows installer pre-prelease Message-ID: Forgot to cc: python-dev on my reply to Greg -----Original Message----- From: David Ascher [mailto:DavidA at ActiveState.com] Sent: Tuesday, April 04, 2000 6:23 PM To: Greg Stein Subject: RE: [Python-Dev] Windows installer pre-prelease > Valid point. But there are other solutions, too. VC distributes a thing > named "VCVARS.BAT" to set up paths and other environ vars. Python could > certainly do the same thing (to overcome the embedded-space issue). I hate VCVARS -- it doesn't work from my Cygnus shell, it has to be invoked by the user as opposed to automatically started by the installer, etc. > Depends on the audience of that standard. Programmers: yah. Consumers? > They just want the damn thing to work like they expect it to. That > expectation is usually "I can find my programs in Program Files." In my experience, the /Program Files location works fine for tools which have strictly GUI interfaces and which are launched by the Start menu or other GUI mechanisms. Anything which you might need to invoke at the command line lives best in a non-space-containing path, IMO of course. --david From guido at python.org Wed Apr 5 03:26:12 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 04 Apr 2000 21:26:12 -0400 Subject: [Python-Dev] Windows installer pre-prelease In-Reply-To: Your message of "Tue, 04 Apr 2000 18:20:44 PDT." References: Message-ID: <200004050126.VAA15836@eric.cnri.reston.va.us> I've pretty much made my mind up about this one. Mark's mention of LoadLibraryEx() solved the last puzzle. I'm making the changes to the installer and hope to release alpha 2 with these changes later this week. - Default install root is \Python1.6 on the same drive as the default Program Files - MSVC*RT.DLL and PYTHON16.DLL go into the system directory; the MSV*RT.DLL files are only replaced if we bring a newer or same version - I'm using Tcl/Tk 8.2.3 instead of 8.3.0; the latter often crashes when closing a window - The Tcl/Tk and expat DLLs go in the DLLs subdirectory of the install root Thanks a lot for your collective memory!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Wed Apr 5 04:19:18 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 19:19:18 -0700 Subject: [Python-Dev] re: division In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > but harder to track down, since you'll have to scrutinize values very > carefully to spot the difference. Haven't done any field tests, > though...) My favourite symbol for integer division is _/ (read it as "floor-divide"). It makes visually apparent what is going on. -- ?!ng "There's no point in being grown up if you can't be childish sometimes." -- Dr. Who --KAC01325.954869821/skuld.lfw.org-- --KAD01325.954869821/skuld.lfw.org-- From ping at lfw.org Wed Apr 5 04:19:09 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 19:19:09 -0700 Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: Message-ID: On Sun, 2 Apr 2000, Peter Funk wrote: > > As I read this my first thoughts were: > "Huh? Is that really true? To me this sounds like a april fools joke. > But to be careful I checked first before I read on: My favourite part was the distutils section. The great thing about this announcement is that it would have been almost believable if we were talking about any language other than Python! -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell --JAA00536.954694740/skuld.lfw.org-- --JAB00536.954694740/skuld.lfw.org-- From ping at lfw.org Sun Apr 2 18:58:57 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 2 Apr 2000 09:58:57 -0700 (PDT) Subject: Hard to believe (was Re: [Python-Dev] New Features in Python 1.6) In-Reply-To: Message-ID: On Sun, 2 Apr 2000, Peter Funk wrote: > > As I read this my first thoughts were: > "Huh? Is that really true? To me this sounds like a april fools joke. > But to be careful I checked first before I read on: My favourite part was the distutils section. The great thing about this announcement is that it would have been almost believable if we were talking about any language other than Python! -- ?!ng "To be human is to continually change. Your desire to remain as you are is what ultimately limits you." -- The Puppet Master, Ghost in the Shell --JAA00536.954694740/skuld.lfw.org-- --JAB00536.954694740/skuld.lfw.org-- From ping at lfw.org Tue Apr 4 19:25:07 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 4 Apr 2000 10:25:07 -0700 (PDT) Subject: [Python-Dev] re: division In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > but harder to track down, since you'll have to scrutinize values very > carefully to spot the difference. Haven't done any field tests, > though...) My favourite symbol for integer division is _/ (read it as "floor-divide"). It makes visually apparent what is going on. -- ?!ng "There's no point in being grown up if you can't be childish sometimes." -- Dr. Who --KAC01325.954869821/skuld.lfw.org-- --KAD01325.954869821/skuld.lfw.org-- From tim_one at email.msn.com Wed Apr 5 06:57:27 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040525.BAA11585@eric.cnri.reston.va.us> Message-ID: <000601bf9ebb$723f7ea0$3e2d153f@tim> [Guido] > ... > (PATH on Win9x is still a mystery to me. You're not alone. > Is it really true that in order to change PATH an installer has > to edit autoexec.bat? AFAIK, yes. A specific PATH setting can be associated with a specific exe via the registry, though. > ... > Anything that claims to change PATH for me doesn't seem to do so. Almost always the same here; suspect documentation rot. > Could I have screwed something up? Yes, but I doubt it. > ... > Didn't someone tell me that at least on Windows 2000 installing > app-specific files (as opposed to MS-provided files) in the system > directory is a no-no? MS was threatening to do this in (the then-named) NT5, but I believe they backed down. Don't have (the now-named) W2000 here to check on for sure, though. From tim_one at email.msn.com Wed Apr 5 06:57:33 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:33 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Message-ID: <000701bf9ebb$740c4f60$3e2d153f@tim> [Mark Hammond] > ... > However, I would tend to go for "\name_of_app" rooted from the > Windows drive. It is likely that this will be the default drive > when a command prompt is open, so a simple "cd \python1.6" will > work. This is also generally the same drive the default "Program > Files" is on too. Yes, "C:\" doesn't literally mean "C:\" any more than "Program Files" literally means "Program Files" <0.1 wink>. By "C:\" I meant "drive where the thing we conveniently but naively call 'Program Files' lives"; naming the registry key whose value is this thing is more accurate but less helpful; the installer will have some magic predefined name which would be most helpful to give, but without the installer docs here I can't guess what that is. > ... > [Interestingly, Windows 2000 has a system process that continually > monitors the system directory. If it detects that a "protected > file" has been changed, it promptly copies the original back over > the top! I believe the MSVC*.dlls are in the protected list, so can > only be changed with a service pack release anyway. Everything > _looks_ like it updates - Windows just copies it back!] Thanks for making my day . From tim_one at email.msn.com Wed Apr 5 06:57:36 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 00:57:36 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004041324.JAA12173@eric.cnri.reston.va.us> Message-ID: <000801bf9ebb$75c85740$3e2d153f@tim> [Guido] > ... > I wonder if the pathname should also be an 8+3 (max) name, so that it > can be relyably typed into a DOS window. Yes, but for a different reason: Many sites still use older Novell file servers that screw up on non-8.3 names in a variety of unpleasant ways. Just went thru this at Dragon again, where the language modeling group created a new file format with a 4-letter extension; they had to back off to 3 letters because half the company couldn't get at the new files. BTW, two years ago it was much worse, and one group started using Python instead of Java partly because .java files didn't work over the network at all! From nascheme at enme.ucalgary.ca Wed Apr 5 07:19:45 2000 From: nascheme at enme.ucalgary.ca (Neil Schemenauer) Date: Tue, 4 Apr 2000 23:19:45 -0600 Subject: [Python-Dev] Re: A surprising case of cyclic trash Message-ID: <20000404231945.A16978@acs.ucalgary.ca> An even simpler example: >>> import sys >>> d = {} >>> print sys.getrefcount(d) 2 >>> exec("def f(): pass\n") in d >>> print sys.getrefcount(d) 3 >>> d.clear() >>> print sys.getrefcount(d) 2 exec adds the function to the dictionary. The function references the dictionary through globals. Neil -- "If elected mayor, my first act will be to kill the whole lot of you, and burn your town to cinders!" -- Groundskeeper Willie From moshez at math.huji.ac.il Wed Apr 5 08:44:10 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 08:44:10 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > Agreed. Perhaps non-native English speakers could pitch in and describe > how easy/difficult it is for them to (for example) put properly-accented > Spanish comments in code? As the only (I think?) person here who is a native right-to-left language native speaker let me put my 2cents in. I program in two places: at work, and at home. At work, we have WinNT machines, with *no Hebrew support*. We right everything in English, including internal Word documents. We figured that if 1000 programmers working on NT couldn't produce a stable system, the hacks of half-a-dozen programmers thrown in could only make it worse. At home, I have a Linux machine with no Hebrew support either -- it just didn't seem to be worth the hassle, considering that most of what I write is sent out to the world, so it needs to be in English anyway. My previous machine had some Esperanto support, and I intend to put some on my new machine. True, not many people know Esperanto, but at least its easy enough to learn. It was easy enough to write comments in Esperanto in "vim", but since I was thinking in English anyway while programming (if, while, StringIO etc.), it was more natural to write the comments in English too. The only non-English comments I've seen in sources I had to read were in French, and I won't repeat what I've said about French people then . -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From artcom0!pf at artcom-gmbh.de Wed Apr 5 08:39:56 2000 From: artcom0!pf at artcom-gmbh.de (artcom0!pf at artcom-gmbh.de) Date: Wed, 5 Apr 2000 08:39:56 +0200 (MEST) Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: <200004042332.TAA15480@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 4, 2000 7:32:23 pm" Message-ID: Hi! Guido van Rossum: > Modified Files: > FixTk.py > Log Message: > Work the Tcl version number in the path we search for. [...] > ! import sys, os, _tkinter > ! ver = str(_tkinter.TCL_VERSION) > ! v = os.path.join(sys.prefix, "tcl", "tcl"+ver) > if os.path.exists(os.path.join(v, "init.tcl")): > os.environ["TCL_LIBRARY"] = v [...] Just a wild idea: Does it make sense to have several incarnations of the shared object file _tkinter.so (or _tkinter.pyd on WinXX)? Something like _tkint83.so, _tkint82.so and so on, so that Tkinter.py can do something like the following to find a available Tcl/Tk version: for tkversion in range(83,79,-1): try: _tkinter = __import__("_tkint"+str(tkversion)) break except ImportError: pass else: raise Of course this does only make sense on platforms with shared object loading and if preparing Python binary distributions without including a particular Tcl/Tk package into the Python package. This idea might be interesting for Red Hat, SuSE Linux distribution users to allow partial system upgrades with a binary python-1.6.rpm Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From moshez at math.huji.ac.il Wed Apr 5 08:46:23 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 08:46:23 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: Message-ID: On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > I wouldn't expect the division sign to be on keyboards. On the other hand, > I would expect that having to type a two-stroke sequence every once in a > while would help native English speakers appreciate what people in other > countries sometimes have to go through in order to spell their names > correctly... Not to mention what we have to do to get Americans to pronounce our name correctly. (I've learned to settle for not calling me Moshi) i18n-sucks-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Wed Apr 5 08:55:16 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 08:55:16 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <1257244899-5853377@hypernet.com> Message-ID: On Tue, 4 Apr 2000, Gordon McMillan wrote: > despite emigrating to the Americas during the Industrial > Revolution, insisted that the proper spelling of McMillan > involved elevating the "c". Wonder if there's a unicode > character for that, so I can get righteously indignant whenever > people fail to use it. Hmmmm...I think the Python ACKS file should be moved to UTF-8, and write *my* name in Hebrew letters: mem, shin, hey, space, tsadi, aleph, dalet, kuf, hey. now-i-can-get-righteously-indignant-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From billtut at microsoft.com Wed Apr 5 08:18:49 2000 From: billtut at microsoft.com (Bill Tutt) Date: Tue, 4 Apr 2000 23:18:49 -0700 Subject: [Python-Dev] _PyUnicode_New/PyUnicode_Resize Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> should be exported as part of the unicode object API. Otherwise, external C codec developers have to jump through some useless and silly hoops in order to construct a PyUnicode object. Additionally, you mentioned to Andrew that the decoders don't have to return a tuple anymore. Thats currently incorrect with whats currently in CVS: Python\codecs.c:PyCodec_Decode() current requires, but ignores the integer returned in the tuple. Should this be fixed, or must codecs return the integer as Misc\unicode.txt says? Thanks, Bill From mal at lemburg.com Wed Apr 5 11:40:56 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 11:40:56 +0200 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> Message-ID: <38EB0A28.8E8F6397@lemburg.com> Bill Tutt wrote: > > should be exported as part of the unicode object API. > > Otherwise, external C codec developers have to jump through some useless and > silly hoops in order to construct a PyUnicode object. Hmm, resize would be useful, agreed. The reason I haven't made these public is that the internal allocation logic could be changed in some future version to more elaborate and faster techniques. Having the _PyUnicode_* API private makes these changes possible without breaking external C code. E.g. say Unicode gets interned someday, then resize will need to watch out not resizing a Unicode object which is already stored in the interning dict. Perhaps a wrapper with additional checks around _PyUnicode_Resize() would be useful. Note that you don't really need _PyUnicode_New(): call PyUnicode_FromUnicode() with NULL argument and then fill in the buffer using PyUnicode_AS_UNICODE()... works just like PyString_FromStringAndSize() with NULL argument. > Additionally, you mentioned to Andrew that the decoders don't have to return > a tuple anymore. > Thats currently incorrect with whats currently in CVS: > Python\codecs.c:PyCodec_Decode() current requires, but ignores the integer > returned in the tuple. > Should this be fixed, or must codecs return the integer as Misc\unicode.txt > says? That was a misunderstanding on my part: I was thinking of the .read()/.write() methods which are now in synch with the other file objects. .read() previously returned a tuple and .write() an integer. .encode() and .decode() must return a tuple. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Wed Apr 5 12:42:37 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 12:42:37 +0200 (MEST) Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: <38EB0BD5.66048804@lemburg.com> from "M.-A. Lemburg" at "Apr 5, 2000 11:48: 5 am" Message-ID: Hi! [me]: > > From my POV (using ISO Latin-1 all the time) it would be > > "intuitive"(TM) to assume ISO Latin-1 when interpreting u'???' in a > > Python source file so that (u'???' == '???') == 1. This is what I see > > on *my* screen, whether there is a 'u' in Front of the string or not. M.-A. Lemburg: > u"???" is being interpreted as Latin-1. The problem is the > string '???' to the right: during coercion this string is > being interpreted as UTF-8 and this causes the failure. > > You could say: ok, all my strings use Latin-1, but that would > introduce other problems... esp. when you take different > modules with different encoding assumptions and try to > integrate them into an application. Okay. This wouldn't occur here but we have deal with this possibility. > > In dist/src/Misc/unicode.txt you wrote: > > > > > Note that you should provide some hint to the encoding you used to > > > write your programs as pragma line in one the first few comment lines > > > of the source file (e.g. '# source file encoding: latin-1'). [me]: > > The upcoming 1.6 documentation should probably clarify whether > > the interpreter pays attention to "pragma"s or not. > > This is otherwise misleading. > > This "pragma" is nothing more than a hint for the source code > reader to switch his viewing encoding. The interpreter doesn't > treat the file differently. In fact, Python source code is > supposed to tbe 7-bit ASCII ! Sigh. In our company we use 'german' as our master language so we have string literals containing iso-8859-1 umlauts all over the place. Okay as long as we don't mix them with Unicode objects, this doesn't hurt anybody. What I would love to see, would be a well defined way to tell the interpreter to use 'latin-1' as default encoding instead of 'UTF-8' when dealing with string literals from our modules. The tokenizer in Python 1.6 already contains smart logic to get the size of TABs right (pasting from tokenizer.c): /* Skip comment, while looking for tab-setting magic */ if (c == '#') { static char *tabforms[] = { "tab-width:", /* Emacs */ ":tabstop=", /* vim, full form */ ":ts=", /* vim, abbreviated form */ "set tabsize=", /* will vi never die? */ /* more templates can be added here to support other editors */ }; .. It wouldn't be to hard to add something there to recognize other "pragma" comments like for example: #content-transfer-encoding: iso-8859-1 But what to do with it? May be adding a default encoding to every string object? Is this bloat? Just an idea. Regards, Peter From mal at lemburg.com Wed Apr 5 13:28:58 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 13:28:58 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: Message-ID: <38EB237A.5B16575B@lemburg.com> Peter Funk wrote: > > Hi! > > [me]: > > > From my POV (using ISO Latin-1 all the time) it would be > > > "intuitive"(TM) to assume ISO Latin-1 when interpreting u'???' in a > > > Python source file so that (u'???' == '???') == 1. This is what I see > > > on *my* screen, whether there is a 'u' in Front of the string or not. > > M.-A. Lemburg: > > u"???" is being interpreted as Latin-1. The problem is the > > string '???' to the right: during coercion this string is > > being interpreted as UTF-8 and this causes the failure. > > > > You could say: ok, all my strings use Latin-1, but that would > > introduce other problems... esp. when you take different > > modules with different encoding assumptions and try to > > integrate them into an application. > > Okay. This wouldn't occur here but we have deal with this possibility. > > > > In dist/src/Misc/unicode.txt you wrote: > > > > > > > Note that you should provide some hint to the encoding you used to > > > > write your programs as pragma line in one the first few comment lines > > > > of the source file (e.g. '# source file encoding: latin-1'). > > [me]: > > > The upcoming 1.6 documentation should probably clarify whether > > > the interpreter pays attention to "pragma"s or not. > > > This is otherwise misleading. > > > > This "pragma" is nothing more than a hint for the source code > > reader to switch his viewing encoding. The interpreter doesn't > > treat the file differently. In fact, Python source code is > > supposed to tbe 7-bit ASCII ! > > Sigh. In our company we use 'german' as our master language so > we have string literals containing iso-8859-1 umlauts all over the place. > Okay as long as we don't mix them with Unicode objects, this doesn't > hurt anybody. > > What I would love to see, would be a well defined way to tell the > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > when dealing with string literals from our modules. > > The tokenizer in Python 1.6 already contains smart logic to get the > size of TABs right (pasting from tokenizer.c): > > /* Skip comment, while looking for tab-setting magic */ > if (c == '#') { > static char *tabforms[] = { > "tab-width:", /* Emacs */ > ":tabstop=", /* vim, full form */ > ":ts=", /* vim, abbreviated form */ > "set tabsize=", /* will vi never die? */ > /* more templates can be added here to support other editors */ > }; > .. > > It wouldn't be to hard to add something there to recognize > other "pragma" comments like for example: > #content-transfer-encoding: iso-8859-1 > But what to do with it? May be adding a default encoding to every string > object? Is this bloat? Just an idea. As I have already indicated above this would only solve the problem of string literals in Python source code. It would not however solve the problem with strings in general, since these can be built dynamically or from user input. The only way I can see for #pragma to work here is by auto- converting all static strings in the source code to Unicode and that would probably break more code than do good. Even worse, writing 'abc' in such a program would essentially mean the same thing as u'abc'. I'd suggest turning your Latin-1 strings into Unicode... this will hurt at first, but in the long rung, you win. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jim at interet.com Wed Apr 5 15:33:29 2000 From: jim at interet.com (James C. Ahlstrom) Date: Wed, 05 Apr 2000 09:33:29 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <000901bf9df2$c224c7a0$162d153f@tim> <200004041324.JAA12173@eric.cnri.reston.va.us> <38EA1303.B393D7F8@interet.com> <200004042010.QAA13180@eric.cnri.reston.va.us> <38EA5640.D3FC112F@interet.com> <200004042058.QAA13437@eric.cnri.reston.va.us> Message-ID: <38EB40A9.32A60EA2@interet.com> Guido van Rossum wrote: > OK, you go argue with the Tcl folks. They create a vaguely unix-like > structure under c:\Program Files\Tcl: subdirectories lib, bin, > include, and then they dump their .exe and their .dll files in the bin > directory. They also try to munge PATH to include their bin > directory, but that often doesn't work (not on Windows 95/98 anyway). That is even worse than I thought. Obviously they are incompetent in Windows. Mark's suggestion is a great one! JimA From bwarsaw at cnri.reston.va.us Wed Apr 5 15:34:39 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 5 Apr 2000 09:34:39 -0400 (EDT) Subject: [Python-Dev] a slightly more coherent case References: <1257244899-5853377@hypernet.com> Message-ID: <14571.16623.493822.231793@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: MZ> Hmmmm...I think the Python ACKS file should be moved to UTF-8, MZ> and write *my* name in Hebrew letters: mem, shin, hey, space, MZ> tsadi, aleph, dalet, kuf, hey. Shouldn't that be hey kuf dalet aleph tsadi space hey shin mem? :) lamed-alef-vav-mem-shin-ly y'rs, -Barry From moshez at math.huji.ac.il Wed Apr 5 15:44:15 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 15:44:15 +0200 (IST) Subject: [Python-Dev] a slightly more coherent case In-Reply-To: <14571.16623.493822.231793@anthem.cnri.reston.va.us> Message-ID: On Wed, 5 Apr 2000, Barry A. Warsaw wrote: > MZ> Hmmmm...I think the Python ACKS file should be moved to UTF-8, > MZ> and write *my* name in Hebrew letters: mem, shin, hey, space, > MZ> tsadi, aleph, dalet, kuf, hey. > > Shouldn't that be > > hey kuf dalet aleph tsadi space hey shin mem? No, just stick the unicode directional shifting characters around it. now-you-see-why-i18n-is-a-pain-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gward at cnri.reston.va.us Wed Apr 5 15:48:24 2000 From: gward at cnri.reston.va.us (Greg Ward) Date: Wed, 5 Apr 2000 09:48:24 -0400 Subject: [Python-Dev] re: division In-Reply-To: ; from ping@lfw.org on Tue, Apr 04, 2000 at 10:25:07AM -0700 References: Message-ID: <20000405094823.A11890@cnri.reston.va.us> On 04 April 2000, Ka-Ping Yee said: > On Tue, 4 Apr 2000 gvwilson at nevex.com wrote: > > (BTW, I think '/' vs. '//' is going to be as error-prone as '=' vs. '==', > > but harder to track down, since you'll have to scrutinize values very > > carefully to spot the difference. Haven't done any field tests, > > though...) > > My favourite symbol for integer division is _/ > (read it as "floor-divide"). It makes visually > apparent what is going on. Gaackk! Why is this even an issue? As I recall, Pascal got it right 30 years ago: / is what you learned in grade school (1/2 = 0.5), div is what you learn in first-year undergrad CS (1/2 = 0). Either add a "div" operator or a "div()" builtin to Python and you take care of the spelling issue. (The fixing-old-code issue is another problem entirely.) I think that means I favour keeping operator.div and the __div__() method as-is, and adding operator.fdiv (?) and __fdiv__ for "floating-point" division. In other words: 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 (where I have used artistic license in applying __div__ to actual numbers -- you know what I mean). -1 on adding any non-7-bit-ASCII characters to the character set required to express Python; +0 on allowing any (alphanumeric) Unicode character in identifiers (all for Py3k). Not sure what "alphanumeric" means in Unicode, but I'm sure someone has worried about this. Greg From guido at python.org Wed Apr 5 16:04:53 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:04:53 -0400 Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: Your message of "Wed, 05 Apr 2000 08:39:56 +0200." References: Message-ID: <200004051404.KAA16039@eric.cnri.reston.va.us> > Guido van Rossum: > > Modified Files: > > FixTk.py > > Log Message: > > Work the Tcl version number in the path we search for. > [...] > > ! import sys, os, _tkinter > > ! ver = str(_tkinter.TCL_VERSION) > > ! v = os.path.join(sys.prefix, "tcl", "tcl"+ver) > > if os.path.exists(os.path.join(v, "init.tcl")): > > os.environ["TCL_LIBRARY"] = v > [...] Note that this is only used on Windows, where Python is distributed with a particular version of Tk. I decided I needed to back down from 8.3 to 8.2 (8.3 sometimes crashes on close) so I decided to make the FixTk module independent of the version. > Just a wild idea: > > Does it make sense to have several incarnations of the shared object file > _tkinter.so (or _tkinter.pyd on WinXX)? > > Something like _tkint83.so, _tkint82.so and so on, so that > Tkinter.py can do something like the following to find a > available Tcl/Tk version: > > for tkversion in range(83,79,-1): > try: > _tkinter = __import__("_tkint"+str(tkversion)) > break > except ImportError: > pass > else: > raise > > Of course this does only make sense on platforms with shared object loading > and if preparing Python binary distributions without including a > particular Tcl/Tk package into the Python package. This idea might be > interesting for Red Hat, SuSE Linux distribution users to allow partial > system upgrades with a binary python-1.6.rpm Can you tell me what problem you are trying to solve here? It makes no sense to me, but maybe I'm missing something. Typically Python is built to match the Tcl/Tk version you have installed, right? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 5 16:11:02 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:11:02 -0400 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize In-Reply-To: Your message of "Wed, 05 Apr 2000 11:40:56 +0200." <38EB0A28.8E8F6397@lemburg.com> References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> <38EB0A28.8E8F6397@lemburg.com> Message-ID: <200004051411.KAA16095@eric.cnri.reston.va.us> > E.g. say Unicode gets interned someday, then resize will > need to watch out not resizing a Unicode object which is > already stored in the interning dict. Note that string objects deal with this by requiring that the reference count is 1 when a string is resized. This effectively enforces that resizes are only used when the original creator is still working on the string. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 5 16:16:15 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:16:15 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 12:42:37 +0200." References: Message-ID: <200004051416.KAA16112@eric.cnri.reston.va.us> > Sigh. In our company we use 'german' as our master language so > we have string literals containing iso-8859-1 umlauts all over the place. > Okay as long as we don't mix them with Unicode objects, this doesn't > hurt anybody. > > What I would love to see, would be a well defined way to tell the > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > when dealing with string literals from our modules. It would be better if this was supported for u"..." literals, so that it was taken care of at the source code level completely. The running program shouldn't have to worry about what encoding its source code was! For 8-bit literals, this would mean that if you had source code using Latin-1, the literals would be translated from Latin-1 to UTF-8 by the code generator. This would mean that len('?') would return 2. I'm not sure this is a great idea -- but then I'm not sure that using Latin-1 in source code is a great idea either. > The tokenizer in Python 1.6 already contains smart logic to get the > size of TABs right (pasting from tokenizer.c): > > /* Skip comment, while looking for tab-setting magic */ > if (c == '#') { > static char *tabforms[] = { > "tab-width:", /* Emacs */ > ":tabstop=", /* vim, full form */ > ":ts=", /* vim, abbreviated form */ > "set tabsize=", /* will vi never die? */ > /* more templates can be added here to support other editors */ > }; > .. > > It wouldn't be to hard to add something there to recognize > other "pragma" comments like for example: > #content-transfer-encoding: iso-8859-1 > But what to do with it? May be adding a default encoding to every string > object? Is this bloat? Just an idea. Before we go any further we should design pragmas. The current approach is inefficient and only designed to accommodate editor-specific magical commands. I say it's a Python 1.7 issue. --Guido van Rossum (home page: http://www.python.org/~guido/) From moshez at math.huji.ac.il Wed Apr 5 16:08:53 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 5 Apr 2000 16:08:53 +0200 (IST) Subject: [Python-Dev] re: division In-Reply-To: <20000405094823.A11890@cnri.reston.va.us> Message-ID: On Wed, 5 Apr 2000, Greg Ward wrote: > Gaackk! Why is this even an issue? As I recall, Pascal got it right 30 > years ago: / is what you learned in grade school (1/2 = 0.5) Greg, here's an easy way for you to make money: sue your grade school . I learned that 1/2 is 1/2. Rationals are a much more natural entities then decimals (just think 1/3). FWIW, I think Python should support Rationals, and have integer division return a rational. I'm still working on the details of my great Python numeric tower change. > Not sure what "alphanumeric" > means in Unicode, but I'm sure someone has worried about this. I think Unicode has a clear definition of a letter and a number. How do you feel about letting arbitrary Unicode whitespace into Python? (Other then the indentation of non-empty lines ) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer at tismer.com Wed Apr 5 16:29:03 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 05 Apr 2000 16:29:03 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? Message-ID: <38EB4DAE.2F538F9F@tismer.com> Hi, while fixing my design flaws after Just's Stackless Mac port, I was dealing with some overflow conditions and tracebacks. When there is a recursion depth overflow condition, we create a lot of new structure for the tracebacks. This usually happens in a situation where memory is quite exhausted. Even worse if we crash because of a memory error: The system will not have enough memory to build the traceback structure, to report the error. Puh :-) When I look into tracebacks, it turns out to be just a chain like the frame chain, but upward down. It holds references to the frames in a 1-to-1 manner, and it keeps copies of f->f_lasti and f->f_lineno. I don't see why this is needed. I'm thinking to replace the tracebacks by a single pointer in the frames for this purpose. It appears further to be possible to do that without any extra memory, since all the frames have extra temporary fields for exception info, and that isn't used in this context. Traceback objects exist each for one and only one frame, and they could be embedded into their frame. Does this make sense? Do I miss something? I'm considering this for Stackless and would like to know if I should prepare it for orthodox Python as well? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Wed Apr 5 16:32:05 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:32:05 -0400 Subject: [Python-Dev] re: division In-Reply-To: Your message of "Wed, 05 Apr 2000 16:08:53 +0200." References: Message-ID: <200004051432.KAA16210@eric.cnri.reston.va.us> > FWIW, I think Python should support Rationals, and have integer division > return a rational. I'm still working on the details of my great Python > numeric tower change. Forget it. ABC did this, and the problem is that where you *think* you are doing something simple like calculating interest rates, you are actually manipulating rational numbers with 1000s of digits in their numerator and denumerator. If you want to change it, consider emulating what kids currently use in school: a decimal floating point calculator with N digits of precision. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 5 16:33:18 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 10:33:18 -0400 Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: Your message of "Wed, 05 Apr 2000 16:29:03 +0200." <38EB4DAE.2F538F9F@tismer.com> References: <38EB4DAE.2F538F9F@tismer.com> Message-ID: <200004051433.KAA16229@eric.cnri.reston.va.us> > When I look into tracebacks, it turns out to be just a chain > like the frame chain, but upward down. It holds references > to the frames in a 1-to-1 manner, and it keeps copies of > f->f_lasti and f->f_lineno. I don't see why this is needed. > > I'm thinking to replace the tracebacks by a single pointer > in the frames for this purpose. It appears further to be > possible to do that without any extra memory, since all the > frames have extra temporary fields for exception info, and > that isn't used in this context. Traceback objects exist > each for one and only one frame, and they could be embedded > into their frame. > > Does this make sense? Do I miss something? Yes. It is quite possible to have multiple stack traces lingering around that all point to the same stack frames. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Apr 5 17:04:31 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 17:04:31 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> Message-ID: <38EB55FF.C900CF8A@lemburg.com> Guido van Rossum wrote: > > > Sigh. In our company we use 'german' as our master language so > > we have string literals containing iso-8859-1 umlauts all over the place. > > Okay as long as we don't mix them with Unicode objects, this doesn't > > hurt anybody. > > > > What I would love to see, would be a well defined way to tell the > > interpreter to use 'latin-1' as default encoding instead of 'UTF-8' > > when dealing with string literals from our modules. > > It would be better if this was supported for u"..." literals, so that > it was taken care of at the source code level completely. The running > program shouldn't have to worry about what encoding its source code > was! u"..." currently interprets the characters it finds as Latin-1 (this is by design, since the first 256 Unicode ordinals map to the Latin-1 characters). > For 8-bit literals, this would mean that if you had source code using > Latin-1, the literals would be translated from Latin-1 to UTF-8 by the > code generator. This would mean that len('?') would return 2. I'm > not sure this is a great idea -- but then I'm not sure that using > Latin-1 in source code is a great idea either. > > > The tokenizer in Python 1.6 already contains smart logic to get the > > size of TABs right (pasting from tokenizer.c): ... > > Before we go any further we should design pragmas. The current > approach is inefficient and only designed to accommodate > editor-specific magical commands. > > I say it's a Python 1.7 issue. Good idea :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Wed Apr 5 17:01:24 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 05 Apr 2000 17:01:24 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? References: <38EB4DAE.2F538F9F@tismer.com> <200004051433.KAA16229@eric.cnri.reston.va.us> Message-ID: <38EB5544.5D428C01@tismer.com> Guido van Rossum wrote: [me, about embedding tracebacks into frames] > > Does this make sense? Do I miss something? > > Yes. It is quite possible to have multiple stack traces lingering > around that all point to the same stack frames. Oh, I see. This is a Standard Python specific thing, which I was about to forget. In my version, this can happen, too, unless you are in a continuation-protected context already. There (and that was what I looked at while debugging), this situation can never happen, since an exception creates continuation-copies of all the frames while it crawls up. Since the traceback causes refcount increase, all the frames protect themselves. Thank you. I see it is a stackless feature. I can implement it if I put protection into the core, not just the co-extension. Frames can carry the tracebacks under the condition that they are protected (copied) if the traceback fields are occupied. Great, since this is a rare condition. Thanks again for the enlightment - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From pf at artcom-gmbh.de Wed Apr 5 17:08:35 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 17:08:35 +0200 (MEST) Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: <200004051404.KAA16039@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 5, 2000 10: 4:53 am" Message-ID: Hi! [me]: [...] > > particular Tcl/Tk package into the Python package. This idea might be > > interesting for Red Hat, SuSE Linux distribution users to allow partial > > system upgrades with a binary python-1.6.rpm > > Can you tell me what problem you are trying to solve here? It makes > no sense to me, but maybe I'm missing something. Typically Python is > built to match the Tcl/Tk version you have installed, right? If you build from source this is true. But the Linux world is now different: The two major Linux distributions (RedHat, SuSE) both use the RPM format to distribute precompiled binary packages. Tcl/Tk usually lives in a separate package. (BTW.: SuSE in their perverse mood has splitted Python 1.5.2 itself into more than half a dozen separate packages, but that's another story). If someone wants to prebuild a Python 1.6 binary RPM for installation on any RPM based Linux system it is unknown, which version of Tcl/Tk is installed on the destination system. So either you can build a monster RPM, which includes the Tcl/Tk shared libs or use the RPM Spec file to force the user to install a specific version of Tcl/Tk (for example 8.2.3) or implement something like I suggested above. Of course this places a lot of burden on the RPM builder: he has to install at least all the four major versions of Tcl/Tk (8.0 - 8.3) on his machine and has to build _tkinter four times against each particular shared library and header files... but this would be possible. Currently the situation with SuSE Python 1.5.2 RPMs is even more dangerous, since the SPEC files used by SuSE simply contains the following 'Requires'-definitions: %package -n pyth_tk Requires: python tk tix blt This makes RPM believe that *any* version of Tcl/Tk would fit. Luckily SuSE 6.4 (released last week) still ships with the old Tcl/Tk 8.0.5, so this will not break until SuSE decides to upgrade their Tcl/Tk. But I guess that Red Hat comes with a newer version of Tcl/Tk. Hopefully they have got their SPEC file right (they invented RPM in the first place) RPM can be a really powerful tool protecting people from breaking their system with binary updates --- if used the right way... :-( May be I should go ahead and write a RPM Python.SPEC file? Would that have a chance to get included into src/Misc? Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido at python.org Wed Apr 5 17:25:38 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 11:25:38 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 17:04:31 +0200." <38EB55FF.C900CF8A@lemburg.com> References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> Message-ID: <200004051525.LAA16345@eric.cnri.reston.va.us> > u"..." currently interprets the characters it finds as Latin-1 > (this is by design, since the first 256 Unicode ordinals map to > the Latin-1 characters). Nice, except that now we seem to be ambiguous about the source character encoding: it's Latin-1 for Unicode strings and UTF-8 for 8-bit strings...! --Guido van Rossum (home page: http://www.python.org/~guido/) From pf at artcom-gmbh.de Wed Apr 5 17:54:12 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 5 Apr 2000 17:54:12 +0200 (MEST) Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: <200004051525.LAA16345@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 5, 2000 11:25:38 am" Message-ID: Guido van Rossum: > > u"..." currently interprets the characters it finds as Latin-1 > > (this is by design, since the first 256 Unicode ordinals map to > > the Latin-1 characters). > > Nice, except that now we seem to be ambiguous about the source > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > 8-bit strings...! This is a little bit difficult to understand and will make the task to write the upcoming 1.6 documentation even more challenging. ;-) But I agree: Changing this should go into 1.7 BTW: Our umlaut strings are sooner or later passed through one central function. All modules usually contain something like this: try: import fintl _ = fintl.gettext execpt ImportError: def _(msg): return msg ... MenuEntry(_("?ffnen"), self.open), MenuEntry(_("Schlie?en"), self.close) .... you get the picture. It would be easy to change the implementation of 'fintl.gettext' to coerce the resulting strings into Unicode or do whatever is required. But we currently use GNU gettext to produce the messages files that are translated into english, french and italian. AFAIK GNU gettext handles only 8 bit strings anyway. Our customers in far east currently live with the english version but this has merely financial than technical reasons. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From guido at python.org Wed Apr 5 20:01:29 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 14:01:29 -0400 Subject: [Python-Dev] _tkinter and Tcl/Tk versions In-Reply-To: Your message of "Wed, 05 Apr 2000 17:08:35 +0200." References: Message-ID: <200004051801.OAA16736@eric.cnri.reston.va.us> > RPM can be a really powerful tool protecting people from breaking their > system with binary updates --- if used the right way... :-( > > May be I should go ahead and write a RPM Python.SPEC file? > Would that have a chance to get included into src/Misc? I'd say yes! But check with Oliver Andrich first, who's maintaining Python RPMs already. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Wed Apr 5 20:32:26 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 05 Apr 2000 20:32:26 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> Message-ID: <38EB86BA.5225C381@lemburg.com> Guido van Rossum wrote: > > > u"..." currently interprets the characters it finds as Latin-1 > > (this is by design, since the first 256 Unicode ordinals map to > > the Latin-1 characters). > > Nice, except that now we seem to be ambiguous about the source > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > 8-bit strings...! Noo... there is no definition for non-ASCII 8-bit strings in Python source code using the ordinal range 127-255. If you were to define Latin-1 as source code encoding, then we would have to change auto-coercion to make a Latin-1 assumption instead, but... I see the picture: people are getting pretty confused about what is going on. If you write u"xyz" then the ordinals of those characters are taken and stored directly as Unicode characters. If you live in a Latin-1 world, then you happen to be lucky: the Unicode characters match your input. If not, some totally different characters are likely to show if the string were written to a file and displayed using a Unicode aware editor. The same will happen to your normal 8-bit string literals. Nothing unusual so far... if you use Latin-1 strings and write them to a file, you get Latin-1. If you happen to program on DOS, you'll get the DOS ANSI encoding for the German umlauts. Now the key point where all this started was that u'?' in '???' will raise an error due to '???' being *interpreted* as UTF-8 -- this doesn't mean that '???' will be interpreted as UTF-8 elsewhere in your application. The UTF-8 assumption had to be made in order to get the two worlds to interoperate. We could have just as well chosen Latin-1, but then people currently using say a Russian encoding would get upset for the same reason. One way or another somebody is not going to like whatever we choose, I'm afraid... the simplest solution is to use Unicode for all strings which contain non-ASCII characters and then call .encode() as necessary. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Wed Apr 5 23:39:49 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 5 Apr 2000 23:39:49 +0200 Subject: [Python-Dev] Re: unicode: strange exception Message-ID: <000f01bf9f47$7ea37840$34aab5d4@hagrid> >>> None in "abc" Traceback (most recent call last): File "", line 1, in ? TypeError: coercing to Unicode: need string or charbuffer now that's an interesting error message. I think the old one was better ;-) From effbot at telia.com Wed Apr 5 23:38:10 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 5 Apr 2000 23:38:10 +0200 Subject: [Python-Dev] Re: unicode: strange exception (part 2) Message-ID: <000e01bf9f47$7e47eac0$34aab5d4@hagrid> I wrote: > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (innermost last): > File "", line 1, in ? > TypeError: expected a character buffer object with the latest version, I get: >>> "!" in ("a", None) 0 >>> u"!" in ("a", None) Traceback (most recent call last): File "", line 1, in ? TypeError: coercing to Unicode: need string or charbuffer is this really an improvement? looks like writing code that works with any kind of strings will be harder than I thought... From guido at python.org Wed Apr 5 23:46:47 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 17:46:47 -0400 Subject: [Python-Dev] Re: unicode: strange exception (part 2) In-Reply-To: Your message of "Wed, 05 Apr 2000 23:38:10 +0200." <000e01bf9f47$7e47eac0$34aab5d4@hagrid> References: <000e01bf9f47$7e47eac0$34aab5d4@hagrid> Message-ID: <200004052146.RAA22187@eric.cnri.reston.va.us> > with the latest version, I get: > > >>> "!" in ("a", None) > 0 > >>> u"!" in ("a", None) > Traceback (most recent call last): > File "", line 1, in ? > TypeError: coercing to Unicode: need string or charbuffer > > is this really an improvement? > > looks like writing code that works with any kind of strings > will be harder than I thought... Are you totally up-to-date? I get >>> u"!" in ("a", None) 0 >>> --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Thu Apr 6 00:37:24 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 05 Apr 2000 18:37:24 -0400 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) In-Reply-To: Your message of "Wed, 05 Apr 2000 20:32:26 +0200." <38EB86BA.5225C381@lemburg.com> References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> Message-ID: <200004052237.SAA22215@eric.cnri.reston.va.us> [MAL] > > > u"..." currently interprets the characters it finds as Latin-1 > > > (this is by design, since the first 256 Unicode ordinals map to > > > the Latin-1 characters). [GvR] > > Nice, except that now we seem to be ambiguous about the source > > character encoding: it's Latin-1 for Unicode strings and UTF-8 for > > 8-bit strings...! [MAL] > Noo... there is no definition for non-ASCII 8-bit strings in > Python source code using the ordinal range 127-255. If you were > to define Latin-1 as source code encoding, then we would have > to change auto-coercion to make a Latin-1 assumption instead, but... > I see the picture: people are getting pretty confused about what > is going on. > > If you write u"xyz" then the ordinals of those characters are > taken and stored directly as Unicode characters. If you live > in a Latin-1 world, then you happen to be lucky: the Unicode > characters match your input. If not, some totally different > characters are likely to show if the string were written > to a file and displayed using a Unicode aware editor. > > The same will happen to your normal 8-bit string literals. > Nothing unusual so far... if you use Latin-1 strings and > write them to a file, you get Latin-1. If you happen to > program on DOS, you'll get the DOS ANSI encoding for the > German umlauts. > > Now the key point where all this started was that > u'?' in '???' will raise an error due to '???' being > *interpreted* as UTF-8 -- this doesn't mean that '???' > will be interpreted as UTF-8 elsewhere in your application. > > The UTF-8 assumption had to be made in order to get the two > worlds to interoperate. We could have just as well chosen > Latin-1, but then people currently using say a Russian > encoding would get upset for the same reason. > > One way or another somebody is not going to like whatever > we choose, I'm afraid... the simplest solution is to use > Unicode for all strings which contain non-ASCII characters > and then call .encode() as necessary. I have a different view on this (except that I agree that it's pretty confusing :-). In my definition of a "source character encoding", string literals, whether Unicode or 8-bit strings, are translated from the source encoding to the corresponding run-time values. If I had a C compiler that read its source in EBCDIC but cross-compiled to a machine that used ASCII, I would expect that 'a' in the source would have the integer value 97 (ASCII 'a'), regardless of the EBCDIC value for 'a'. If I type a non-ASCII Latin-1 character in a Unicode literal, it generates the corresponding Unicode character. This means to me that the source character encoding is Latin-1. But when I type the same character in an 8-bit character literal, that literal is interpreted as UTF-8 (e.g. when converting to Unicode using the default conversions). Thus, even though you can do whatever you want with 8-bit literals in your program, the most defensible view is that they are UTF-8 encoded. I would be much happier if all source code was encoded in the same encoding, because otherwise there's no good way to view such code in a general Unicode-aware text viewer! My preference would be to always use UTF-8. This would mean no change for 8-bit literals, but a big change for Unicode literals... And a break with everyone who's currently typing Latin-1 source code and using strings as Latin-1. (Or Latin-7, or whatever.) My next preference would be a pragma to define the source encoding, but that's a 1.7 issue. Maybe the whole thing is... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Thu Apr 6 00:51:51 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 06 Apr 2000 00:51:51 +0200 Subject: [Python-Dev] Re: unicode: strange exception References: <000f01bf9f47$7ea37840$34aab5d4@hagrid> Message-ID: <38EBC387.FAB08D61@lemburg.com> Fredrik Lundh wrote: > > >>> None in "abc" > Traceback (most recent call last): > File "", line 1, in ? > TypeError: coercing to Unicode: need string or charbuffer > > now that's an interesting error message. I think the old one > was better ;-) How come you're always faster on this than I am with my patches ;-) The above is already fixed in my local version (together with some other minor stuff I found in the codec error handling) with the next patch set. It will then again produce this output: >>> None in "abc" Traceback (most recent call last): File "", line 1, in ? TypeError: string member test needs char left operand BTW, my little "don't use tabs use spaces" in C code extravaganza was a complete nightmare... diff just doesn't like it and the Python source code is full of places where tabs and spaces are mixed in many different ways... I'm back to tabs-indent-mode again :-/ -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mhammond at skippinet.com.au Thu Apr 6 02:19:30 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 6 Apr 2000 10:19:30 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004040525.BAA11585@eric.cnri.reston.va.us> Message-ID: > I just downloaded and installed it. I've never seen an > installer like > this -- they definitely put a lot of effort in it. hehe - guess who "encouraged" that :-) > Annoying nit: they > tell you to install "MS Windows Installer" first that should be a good clue :-) > and of > course, being > a MS tool, it requires a reboot. :-( Actually, MSI is very cool. Now MSI is installed, most future MSI installs should proceed without reboot. In Win2k it is finally close to perfect. I dont think an installer has ever wanted to reboot my PC since Win2k. > Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So > there. It also didn't change PATH for me, even though the docs > mention that it does -- maybe only on NT? In another mail you asked David to look into how Active State handle their DLLs. Well, Trent Mick started the ball rolling... The answer is that Perl extensions never import data from the core DLL. They always import functions. In many cases, they can hide this fact with the pre-processor. In the Python world, this qould be equivilent to never accessing Py_None directly - always via a "PyGetNone()" type function. As mentioned, this could possibly be hidden so that code still uses "Py_None". One advantage they mentioned a number of times is avoiding dependencies on differing Perl versions. By avoiding the import of data, they have far more possibilities, including the use of LoadLibrary(), and a new VC6 linker feature called "delay loading". To my mind, it would be quite difficult to make this work for Python. There are a a large number of data items we import, and adding a function call indirection to each one sounds a pain. [As a semi-related issue: This "delay loading" feature is very cool - basically, the EXE loader will not resolve external DLL references until actually used. This is the same trick mentioned on comp.lang.python, where they saw _huge_ startup increases (although the tool used there was a third-party tool). The thread in question on c.l.py resolved that, for some reason, the initialization of the Windows winsock library was taking many seconds on that particular PC. Guido - are you on VC6 yet? If so, I could look into this linker option, and see how it improves startup performance on Windows. Note - this feature only works if no data is imported - hence, we could use it in Python16.dll, as most of its imports are indeed functions. Python extension modules can not use it against Python16 itself as they import data.] Mark. From tim_one at email.msn.com Thu Apr 6 05:10:39 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 23:10:39 -0400 Subject: [Python-Dev] re: division In-Reply-To: <20000405094823.A11890@cnri.reston.va.us> Message-ID: <000401bf9f75$afb18520$ab2d153f@tim> [Greg Ward] > ... > In other words: > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 > > (where I have used artistic license in applying __div__ to actual > numbers -- you know what I mean). +1 from me provided you can sneak the new keyword past Guido <1/3 wink>. From tim_one at email.msn.com Thu Apr 6 05:10:35 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 5 Apr 2000 23:10:35 -0400 Subject: [Python-Dev] re: division In-Reply-To: <200004051432.KAA16210@eric.cnri.reston.va.us> Message-ID: <000301bf9f75$ada190e0$ab2d153f@tim> [Moshe] > FWIW, I think Python should support Rationals, and have integer division > return a rational. I'm still working on the details of my great Python > numeric tower change. [Guido] > Forget it. ABC did this, and the problem is that where you *think* > you are doing something simple like calculating interest rates, you > are actually manipulating rational numbers with 1000s of digits in > their numerator and denumerator. Let's not be too hasty about this, cuz I doubt we'll get to change it twice . You (Guido) & I agreed that ABC's rationals didn't work out well way back when, but a) That has not been my experience in other languages -- ABC was unique. b) Presumably ABC did usability studies that concluded rationals were least surprising. c) TeachScheme! seems delighted with their use of rationals (indeed, one of TeachScheme!'s primary authors beat up on me in email for Python not doing this). d) I'd much rather saddle newbies with time & space surprises than correctness surprises. Last week I took some time to stare at the ABC manual again, & suspect I hit on the cause: ABC was *aggressively* rational. That is, ABC had no notation for floating point (ABC "approximate") literals; even 6.02e23 was taken to mean "exact rational". In my experience ABC was unique this way, and uniquely surprising for it: it's hard to be surprised by 2/3 returning a rational, but hard not to be surprised by 6.02e23/1.0001e-18 doing so. Give it some thought. > If you want to change it, consider emulating what kids currently use > in school: a decimal floating point calculator with N digits of > precision. This is what REXX does, and is very powerful even for experts (assuming the user can, as in REXX, specify N; but that means writing a whole slew of arbitrary-precision math libraries too -- btw, that is doable! e.g., I worked w/ Dave Gillespie on some of the algorithms for his amazing Emacs calc). It will run at best 10x slower than native fp of comparable precision, though, so experts will hate it in the cases they don't love it <0.5 wink>. one-case-where-one-size-doesn't-fit-anyone-ly y'rs - tim From petrilli at amber.org Thu Apr 6 05:16:28 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Wed, 5 Apr 2000 23:16:28 -0400 Subject: [Python-Dev] re: division In-Reply-To: <000401bf9f75$afb18520$ab2d153f@tim>; from tim_one@email.msn.com on Wed, Apr 05, 2000 at 11:10:39PM -0400 References: <20000405094823.A11890@cnri.reston.va.us> <000401bf9f75$afb18520$ab2d153f@tim> Message-ID: <20000405231628.A24968@trump.amber.org> Tim Peters [tim_one at email.msn.com] wrote: > [Greg Ward] > > ... > > In other words: > > > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 > > > > (where I have used artistic license in applying __div__ to actual > > numbers -- you know what I mean). > > +1 from me provided you can sneak the new keyword past Guido <1/3 wink>. +1 from me as well. I spent a little time going through all my code, and looking through Zope as well, and I couldn't find any place I used 'div' as a variable, much less any place I depended on this behaviour, so I don't think my code would break in any odd ways. The only thing I can imagine is some printed text formatting issues. Chris -- | Christopher Petrilli | petrilli at amber.org From moshez at math.huji.ac.il Thu Apr 6 08:30:44 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 6 Apr 2000 08:30:44 +0200 (IST) Subject: [Python-Dev] re: division In-Reply-To: <000301bf9f75$ada190e0$ab2d153f@tim> Message-ID: On Wed, 5 Apr 2000, Tim Peters wrote: > Last week I took some time to stare at the ABC manual again, & suspect I hit > on the cause: ABC was *aggressively* rational. That is, ABC had no > notation for floating point (ABC "approximate") literals; even 6.02e23 was > taken to mean "exact rational". In my experience ABC was unique this way, > and uniquely surprising for it: it's hard to be surprised by 2/3 returning > a rational, but hard not to be surprised by 6.02e23/1.0001e-18 doing so. Ouch. There is definitely place for floats in the numeric tower. It's just that those shouldn't be reached accidentally <0.3 wink> > one-case-where-one-size-doesn't-fit-anyone-ly y'rs - tim but-in-this-case-two-sizes-do-seem-enough-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Thu Apr 6 10:50:47 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 06 Apr 2000 10:50:47 +0200 Subject: [Python-Dev] Re: _PyUnicode_New/PyUnicode_Resize References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCEEA@RED-MSG-50> <38EB0A28.8E8F6397@lemburg.com> <200004051411.KAA16095@eric.cnri.reston.va.us> Message-ID: <38EC4FE7.94F862D7@lemburg.com> Guido van Rossum wrote: > > > E.g. say Unicode gets interned someday, then resize will > > need to watch out not resizing a Unicode object which is > > already stored in the interning dict. > > Note that string objects deal with this by requiring that the > reference count is 1 when a string is resized. This effectively > enforces that resizes are only used when the original creator is still > working on the string. Nice trick ;-) The new PyUnicode_Resize() will have the same interface as _PyString_Resize() since this seems to be the most flexible way to implement it without giving away possibilities for future optimizations... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gvwilson at nevex.com Thu Apr 6 13:31:26 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 6 Apr 2000 07:31:26 -0400 (EDT) Subject: [Python-Dev] re: division In-Reply-To: <20000405231628.A24968@trump.amber.org> Message-ID: > > [Greg Ward] > > > In other words: > > > > > > 5 div 3 = 5.__div__(3) = operator.div(5,3) = 1 > > > 5 / 3 = 5.__fdiv__(3) = operator.fdiv(5,3) = 1.6666667 +1. Should 'mod' be made a synonym for '%' for symmetry's sake? Greg From guido at python.org Thu Apr 6 15:33:51 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:33:51 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Thu, 06 Apr 2000 10:19:30 +1000." References: Message-ID: <200004061333.JAA23880@eric.cnri.reston.va.us> > > Anyway, ActivePerl installs its DLLs (all 5) in c:\Perl\bin\. So > > there. It also didn't change PATH for me, even though the docs > > mention that it does -- maybe only on NT? > > In another mail you asked David to look into how Active State handle > their DLLs. Well, Trent Mick started the ball rolling... > > The answer is that Perl extensions never import data from the core > DLL. They always import functions. In many cases, they can hide > this fact with the pre-processor. This doesn't answer my question. My question is how they support COM without having a DLL in the system directory. Or at least I don't understand how not importing data makes a difference. > By avoiding the import of data, they have far more possibilities, > including the use of LoadLibrary(), For what do they use LoadLibrary()? What is it? We use LoadLibraryEx() -- isn't that just as good? > and a new VC6 linker feature called "delay loading". > To my mind, it would be quite difficult to make this work for > Python. There are a a large number of data items we import, and > adding a function call indirection to each one sounds a pain. Agreed. > [As a semi-related issue: This "delay loading" feature is very > cool - basically, the EXE loader will not resolve external DLL > references until actually used. This is the same trick mentioned on > comp.lang.python, where they saw _huge_ startup increases (although > the tool used there was a third-party tool). The thread in question > on c.l.py resolved that, for some reason, the initialization of the > Windows winsock library was taking many seconds on that particular > PC. > > Guido - are you on VC6 yet? Yes -- I promised myself I'd start using VC6 for the 1.6 release cycle, and I did. > If so, I could look into this linker > option, and see how it improves startup performance on Windows. > Note - this feature only works if no data is imported - hence, we > could use it in Python16.dll, as most of its imports are indeed > functions. Python extension modules can not use it against Python16 > itself as they import data.] But what DLLs does python16 use that could conceivably be delay-loaded? Note that I have a feeling that there are a few standard extensions that should become separate PYDs -- e.g. socket (for the above reason) and unicodedata. This would greatly reduce the size of python16.dll. Since this way we manage our own DLL loading anyway, what's the point of delay-loading? --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Thu Apr 6 15:43:00 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 6 Apr 2000 15:43:00 +0200 (CEST) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <200004041957.PAA13168@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 04, 2000 03:57:47 PM Message-ID: <200004061343.PAA20218@python.inrialpes.fr> [Guido] > > If it ain't broken, don't "fix" it. > This also explains why socket.connect() generated so much resistance... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond at skippinet.com.au Thu Apr 6 15:53:10 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 6 Apr 2000 23:53:10 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004061333.JAA23880@eric.cnri.reston.va.us> Message-ID: > > The answer is that Perl extensions never import data > from the core > > DLL. They always import functions. In many cases, > they can hide > > this fact with the pre-processor. > > This doesn't answer my question. My question is how they > support COM > without having a DLL in the system directory. Or at least I don't > understand how not importing data makes a difference. By not using data, they can use either "delay load", or fully dynamic loading. Fully dynamic loading really just involves getting every API function via GetProcAddress() rather than having implicit linking via external references. GetProcAddress() can retrieve data items, but only their address, leaving us still in a position where "Py_None" doesnt work without magic. Delay Loading involves not loading the DLL until the first reference is used. This also lets you define code that locates the DLL to be used. This code is special in a "DllMain" kinda way, but does allow runtime binding to a statically linked DLL. However, it still has the "no data" limitation. > But what DLLs does python16 use that could conceivably be > delay-loaded? > > Note that I have a feeling that there are a few standard > extensions > that should become separate PYDs -- e.g. socket (for the > above reason) > and unicodedata. This would greatly reduce the size of > python16.dll. Agreed - these were my motivation. If these are moving to external modules then I am happy. I may have a quick look for other preloaded DLLs we can avoid - worth a look for the sake of a linker option :-) Mark. From guido at python.org Thu Apr 6 15:52:47 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:52:47 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Thu, 06 Apr 2000 15:43:00 +0200." <200004061343.PAA20218@python.inrialpes.fr> References: <200004061343.PAA20218@python.inrialpes.fr> Message-ID: <200004061352.JAA24034@eric.cnri.reston.va.us> [GvR] > > If it ain't broken, don't "fix" it. [VM] > This also explains why socket.connect() generated so much resistance... Yes -- people are naturally conservative. I am too, myself, so I should have known... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Thu Apr 6 15:51:41 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 6 Apr 2000 15:51:41 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004051433.KAA16229@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 05, 2000 10:33:18 AM Message-ID: <200004061351.PAA20261@python.inrialpes.fr> [Christian] > > When I look into tracebacks, it turns out to be just a chain > > like the frame chain, but upward down. It holds references > > to the frames in a 1-to-1 manner, and it keeps copies of > > f->f_lasti and f->f_lineno. I don't see why this is needed. > > ... > > Does this make sense? Do I miss something? > [Guido] > Yes. It is quite possible to have multiple stack traces lingering > around that all point to the same stack frames. This reminds me that some time ago I made an experimental patch for removing SET_LINENO. There was the problem of generating callbacks for pdb (which I think I solved somehow but I don't remember the details). I do remember that I had to look at pdb again for some reason. Is there any interest in reviving this idea? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From guido at python.org Thu Apr 6 15:57:27 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 09:57:27 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: Your message of "Thu, 06 Apr 2000 23:53:10 +1000." References: Message-ID: <200004061357.JAA24071@eric.cnri.reston.va.us> > > > The answer is that Perl extensions never import data from the core > > > DLL. They always import functions. In many cases, they can hide > > > this fact with the pre-processor. > > > > This doesn't answer my question. My question is how they support COM > > without having a DLL in the system directory. Or at least I don't > > understand how not importing data makes a difference. > > By not using data, they can use either "delay load", or fully > dynamic loading. > > Fully dynamic loading really just involves getting every API > function via GetProcAddress() rather than having implicit linking > via external references. GetProcAddress() can retrieve data items, > but only their address, leaving us still in a position where > "Py_None" doesnt work without magic. Actually, Py_None is just a macro that expands to the address of some data -- isn't that exactly what we need? > Delay Loading involves not loading the DLL until the first reference > is used. This also lets you define code that locates the DLL to be > used. This code is special in a "DllMain" kinda way, but does allow > runtime binding to a statically linked DLL. However, it still has > the "no data" limitation. > > > But what DLLs does python16 use that could conceivably be > > delay-loaded? > > > > Note that I have a feeling that there are a few standard > > extensions > > that should become separate PYDs -- e.g. socket (for the > > above reason) > > and unicodedata. This would greatly reduce the size of > > python16.dll. > > Agreed - these were my motivation. If these are moving to external > modules then I am happy. I may have a quick look for other > preloaded DLLs we can avoid - worth a look for the sake of a linker > option :-) OK, I'll look into moving socket and unicodedata out of python16.dll. But, I still don't understand why Perl/COM doesn't need a DLL in the system directory. Or is it just because they change PATH? (I don't know zit about COM, so that may be it. I understand that a COM object is registered (in the registry) as an entry point of a DLL. Couldn't that DLL be specified by absolute pathname??? Then no search path would be necessary.) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Thu Apr 6 16:07:38 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 00:07:38 +1000 Subject: [Python-Dev] DLL in the system directory on Windows. In-Reply-To: <200004061357.JAA24071@eric.cnri.reston.va.us> Message-ID: > But, I still don't understand why Perl/COM doesn't need a > DLL in the > system directory. Or is it just because they change PATH? > > (I don't know zit about COM, so that may be it. I > understand that a > COM object is registered (in the registry) as an entry point of a > DLL. Couldn't that DLL be specified by absolute > pathname??? Then no > search path would be necessary.) Yes - but it all gets back to the exact same problem that got us here in the first place: * COM object points to \Python1.6\PythonCOM16.dll * PythonCOM16.dll has link-time reference to Python16.dll * As COM just uses LoadLibrary(), the path of PythonCOM16.dll is not used to resolve its references - only the path of the host .EXE, the system path, etc. End result is Python16.dll is not found, even though it is in the same directory. So, if you have the opportunity to intercept the link-time reference to a DLL (or, obviously, use LoadLibrary()/GetProcAddress() to reference the DLL), you can avoid override the search path. Thus, if PythonCOM16.dll could intercept its references to Python16.dll, it could locate the correct Python16.dll with runtime code. However, as we import data from Python16.dll rather then purely addresses, we can't use any of these interception solutions. If we could hide all data references behind macros, then we could possibly arrange it. Perl _does_ use such techniques, so can arrange for the runtime type resolution. (Its not clear if Perl uses "dynamic loading" via GetProcAddress(), or delayed loading via the new VC6 feature - I believe the former, but the relevant point is that they definately hide data references behind magic...) Mark. From skip at mojam.com Thu Apr 6 15:08:14 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 6 Apr 2000 08:08:14 -0500 (CDT) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004061351.PAA20261@python.inrialpes.fr> References: <200004051433.KAA16229@eric.cnri.reston.va.us> <200004061351.PAA20261@python.inrialpes.fr> Message-ID: <14572.35902.781258.448592@beluga.mojam.com> Vladimir> This reminds me that some time ago I made an experimental Vladimir> patch for removing SET_LINENO. There was the problem of Vladimir> generating callbacks for pdb (which I think I solved somehow Vladimir> but I don't remember the details). I do remember that I had to Vladimir> look at pdb again for some reason. Is there any interest in Vladimir> reviving this idea? I believe you can get line number information from a code object's co_lnotab attribute, though I don't know the format. I think this should be sufficient to allow SET_LINENO to be eliminated altogether. It's just that there are places in various modules that predate the appearance of co_lnotab. Whoops, wait a minute. I just tried >>> def foo(): pass ... >>> foo.func_code.co_lnotab with both "python" and "python -O". co_lnotab is empty for python -O. I thought it was supposed to always be generated? -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From tismer at tismer.com Thu Apr 6 17:09:51 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 17:09:51 +0200 Subject: [Python-Dev] Re: unicode: strange exception References: <000f01bf9f47$7ea37840$34aab5d4@hagrid> <38EBC387.FAB08D61@lemburg.com> Message-ID: <38ECA8BF.5C47F700@tismer.com> "M.-A. Lemburg" wrote: > BTW, my little "don't use tabs use spaces" in C code extravaganza > was a complete nightmare... diff just doesn't like it and the > Python source code is full of places where tabs and spaces > are mixed in many different ways... I'm back to tabs-indent-mode > again :-/ Isn't this ignorable with the diff -b switch? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake at acm.org Thu Apr 6 17:12:11 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 6 Apr 2000 11:12:11 -0400 (EDT) Subject: [Python-Dev] Unicode documentation Message-ID: <14572.43339.472062.364098@seahag.cnri.reston.va.us> I've added Marc-Andre's documentation updates for Unicode to the Python CVS repository; I don't think I've done any damage. Marc-Andre, please review and let me know if I've missed anything! Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From tismer at tismer.com Thu Apr 6 17:16:16 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 17:16:16 +0200 Subject: [Python-Dev] Why do we need Traceback Objects? References: <200004061351.PAA20261@python.inrialpes.fr> Message-ID: <38ECAA40.456F9919@tismer.com> Vladimir Marangozov wrote: > > [Christian] > > > When I look into tracebacks, it turns out to be just a chain > > > like the frame chain, but upward down. It holds references > > > to the frames in a 1-to-1 manner, and it keeps copies of > > > f->f_lasti and f->f_lineno. I don't see why this is needed. > > > ... > > > Does this make sense? Do I miss something? > > > > [Guido] > > Yes. It is quite possible to have multiple stack traces lingering > > around that all point to the same stack frames. > > This reminds me that some time ago I made an experimental patch for > removing SET_LINENO. There was the problem of generating callbacks > for pdb (which I think I solved somehow but I don't remember the > details). I do remember that I had to look at pdb again for some > reason. Is there any interest in reviving this idea? This is a very cheap opcode (at least in my version). What does it buy? Can you drop the f_lineno field from frames, and calculate it for the frame's f_lineno attribute? -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From thomas.heller at ion-tof.com Thu Apr 6 17:40:38 2000 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Thu, 6 Apr 2000 17:40:38 +0200 Subject: [Python-Dev] DLL in the system directory on Windows Message-ID: <01ce01bf9fde$7601f8a0$4500a8c0@thomasnotebook> > However, as we import data from Python16.dll rather then purely > addresses, we can't use any of these interception solutions. What's wrong with: #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) I have only looked at PythonCOM15.dll, and it seems that there are only references to a handfull of exported data items: some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, _PyZero_Struct. Thomas Heller From jim at interet.com Thu Apr 6 17:48:50 2000 From: jim at interet.com (James C. Ahlstrom) Date: Thu, 06 Apr 2000 11:48:50 -0400 Subject: [Python-Dev] DLL in the system directory on Windows. References: <200004061357.JAA24071@eric.cnri.reston.va.us> Message-ID: <38ECB1E2.AD1BAF5C@interet.com> Guido van Rossum wrote: > But, I still don't understand why Perl/COM doesn't need a DLL in the > system directory. Or is it just because they change PATH? Here is some generic info which may help, or perhaps you already know it. If you have a DLL head.dll or EXE head.exe which needs another DLL needed.dll, you can link needed.dll with head, and the system will find all data and module names automatically (well, almost). When head is loaded, needed.dll must be available, or head will fail to load. This can be confusing. For example, I once tried to port PIL to my new Python mini-GUI model, and my DLL failed. Only after some confusion did I realize that PIL is linked with Tk libs, and would fail to load if they were not present, even though I was not using them. I think what Mark is saying is that Microsoft now has an option to do delayed DLL loading. The load of needed.dll is delayed until a function in needed.dll is called. This would have meant that PIL would have worked provided I never called a Tk function. I think he is also saying that this feature can only trap function calls, not pointer access to data, so it won't work in the context of data access (maybe it would if a function call came first). Of course, if you access all data through a function call GetMyData(), it all works. As an alternative, head.[exe|dll] would not be linked with needed.dll, and so needed.dll need not be present. To access functions by name in needed.dll, you call LoadLibrary or LoadLibraryEx to open needed.dll, and then call GetProcAddress() to get a pointer to named functions. In the case of data items, the pointer is dereferenced twice, that is, data = **pt. Python uses this strategy to load PYD's, and accesses the sole function initmodule(). Then the rest of the data is available through Python mechanisms which effectively substitute for normal DLL access. The alternative search path available in LoadLibraryEx only affects head.dll, and causes the system to look in the directory of needed.dll instead of the directory of the ultimate executable for finding other needed DLL's. So on Windows, Python needs PYTHONPATH to find PYD's, and if the PYD's need further DLL's those DLL's can be in the directory of the PYD, or on the usual DLL search path provided the "alternate search path" is used. Probably you alread know this, but maybe it will help the Windozly-challenged follow along. JimA From tismer at tismer.com Thu Apr 6 23:22:30 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 23:22:30 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? Message-ID: <38ED0016.E1C4A26C@tismer.com> Hi, asa side effect, I happened to observe the following rounding bug. It happens in Stackless Python, which is built against the pre-unicode CVS branch. Is this changed for 1.6, or might it be my bug? D:\python\spc>python Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1415999999999999 >>> ^Z D:\python>python Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1416 >>> ^Z ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at tismer.com Thu Apr 6 23:31:03 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 06 Apr 2000 23:31:03 +0200 Subject: [Python-Dev] Long Multiplication is not commutative. Message-ID: <38ED0217.7C44A24F@tismer.com> Yikes! No, it is computatively commutative, just not in terms of computation time. :-)) The following factorial loops differ by a remarkable factor of 1.8, and we can gain this speed by changing long_mult to always put the lower multiplicand into the left. This was reported to me by Lenny Kneler, who thought he had found a Stackless bug, but he was actually testing long math. :-) This buddy... >>> def ifact3(n) : ... p = 1L ... for i in range(1,n+1) : ... p = i*p ... return p performs better by a factor of 1.8 than this one: >>> def ifact1(n) : ... p = 1L ... for i in range(1,n+1) : ... p = p*i ... return p The analysis of this behavior is quite simple if you look at the implementation of long_mult. If the left operand is big and the right is small, there are much more carry operations performed, together with more loop overhead. Swapping the multiplicands would be a 5 line patch. Should I submit it? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From jeremy at cnri.reston.va.us Thu Apr 6 23:29:13 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 6 Apr 2000 17:29:13 -0400 (EDT) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <38ECAA40.456F9919@tismer.com> References: <200004061351.PAA20261@python.inrialpes.fr> <38ECAA40.456F9919@tismer.com> Message-ID: <14573.425.369099.605774@bitdiddle.cnri.reston.va.us> >> Vladimir Marangozov wrote: >> This reminds me that some time ago I made an experimental patch >> for removing SET_LINENO. There was the problem of generating >> callbacks for pdb (which I think I solved somehow but I don't >> remember the details). I do remember that I had to look at pdb >> again for some reason. Is there any interest in reviving this >> idea? I think the details are important. The only thing the SET_LINENO opcode does is to call a trace function if one is installed. It's necessary to have some way to invoke the trace function when the line number changes (or it will be relatively difficult to execute code line-by-line in the debugger ). Off the top of my head, the only other way I see to invoke the trace function would be to add code at the head of the mainloop that computed the line number for each instruction (from lnotab) and called the trace function if the current line number is different than the previous time through the loop. That doesn't sound faster or simpler. Jeremy From guido at python.org Thu Apr 6 23:30:21 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 17:30:21 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Thu, 06 Apr 2000 23:22:30 +0200." <38ED0016.E1C4A26C@tismer.com> References: <38ED0016.E1C4A26C@tismer.com> Message-ID: <200004062130.RAA26273@eric.cnri.reston.va.us> > asa side effect, I happened to observe the following rounding bug. > It happens in Stackless Python, which is built against the > pre-unicode CVS branch. > > Is this changed for 1.6, or might it be my bug? > > D:\python\spc>python > Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1415999999999999 > >>> ^Z > > D:\python>python > Python 1.5.2 (#0, Apr 13 1999, 10:51:12) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1416 > >>> ^Z This is because repr() now uses full precision for floating point numbers. round() does what it can, but 3.1416 just can't be represented exactly, and "%.17g" gives 3.1415999999999999. This is definitely the right thing to do for repr() -- ask Tim. However, it may be time to switch so that "immediate expression" values are printed as str() instead of as repr()... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Thu Apr 6 22:31:02 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 6 Apr 2000 15:31:02 -0500 (CDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38ED0016.E1C4A26C@tismer.com> References: <38ED0016.E1C4A26C@tismer.com> Message-ID: <14572.62470.804145.677372@beluga.mojam.com> Chris> I happened to observe the following rounding bug. It happens in Chris> Stackless Python, which is built against the pre-unicode CVS Chris> branch. Chris> Is this changed for 1.6, or might it be my bug? I doubt it's your problem. I see it too with 1.6a2 (no stackless): % ./python Python 1.6a2 (#2, Apr 6 2000, 15:27:22) [GCC pgcc-2.91.66 19990314 (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> round(3.1415926585, 4) 3.1415999999999999 Same behavior whether compiled with -O2 or -g. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From guido at python.org Thu Apr 6 23:32:36 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 17:32:36 -0400 Subject: [Python-Dev] Long Multiplication is not commutative. In-Reply-To: Your message of "Thu, 06 Apr 2000 23:31:03 +0200." <38ED0217.7C44A24F@tismer.com> References: <38ED0217.7C44A24F@tismer.com> Message-ID: <200004062132.RAA26296@eric.cnri.reston.va.us> > This buddy... > > >>> def ifact3(n) : > ... p = 1L > ... for i in range(1,n+1) : > ... p = i*p > ... return p > > performs better by a factor of 1.8 than this one: > > >>> def ifact1(n) : > ... p = 1L > ... for i in range(1,n+1) : > ... p = p*i > ... return p > > The analysis of this behavior is quite simple if you look at the > implementation of long_mult. If the left operand is big and the > right is small, there are much more carry operations performed, > together with more loop overhead. > Swapping the multiplicands would be a 5 line patch. > Should I submit it? Yes, go for it. I would appreciate a bunch of new test cases that exercise the new path through the code, too... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 00:43:16 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 00:43:16 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <14573.425.369099.605774@bitdiddle.cnri.reston.va.us> from "Jeremy Hylton" at Apr 06, 2000 05:29:13 PM Message-ID: <200004062243.AAA21491@python.inrialpes.fr> Jeremy Hylton wrote: > > >> Vladimir Marangozov wrote: > >> This reminds me that some time ago I made an experimental patch > >> for removing SET_LINENO. There was the problem of generating > >> callbacks for pdb (which I think I solved somehow but I don't > >> remember the details). I do remember that I had to look at pdb > >> again for some reason. Is there any interest in reviving this > >> idea? > > I think the details are important. The only thing the SET_LINENO > opcode does is to call a trace function if one is installed. It's > necessary to have some way to invoke the trace function when the line > number changes (or it will be relatively difficult to execute code > line-by-line in the debugger ). Looking back at the discussion and the patch I ended up with at that time, I think the callback issue was solved rather elegantly. I'm not positive that it does not have side effects, though... For an overview of the approach and the corresponding patch, go back to: http://www.python.org/pipermail/python-dev/1999-August/002252.html http://sirac.inrialpes.fr/~marangoz/python/lineno/ What happens is that in tracing mode, a copy of the original code stream is created, a new CALL_TRACE opcode is stored in it at the addresses corresponding to each source line number, then the instruction pointer is redirected to execute the modified code string. Whenever a CALL_TRACE opcode is reached, the callback is triggered. On a successful return, the original opcode at the current address is fetched from the original code string, then directly goto the dispatch code. This code string duplication & conditional break-point setting occurs only when a trace function is set; in the "normal case", the interpreter executes a code string without SET_LINENO. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mhammond at skippinet.com.au Fri Apr 7 02:47:06 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 10:47:06 +1000 Subject: [Python-Dev] RE: DLL in the system directory on Windows In-Reply-To: <01ce01bf9fde$7601f8a0$4500a8c0@thomasnotebook> Message-ID: > > However, as we import data from Python16.dll rather then purely > > addresses, we can't use any of these interception solutions. > > What's wrong with: > > #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) My only objection is that this is a PITA. It becomes a maintenance nightmare for Guido as the code gets significantly larger and uglier. > I have only looked at PythonCOM15.dll, and it seems that > there are only references to a handfull of exported data items: > > some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, > _PyZero_Struct. Yep - these structs, all the error objects and all the type objects. However, to do this properly, we must do _every_ exported data item, not just ones that satisfy COM (otherwise the next poor soul will have the exact same issue, and require patches to the core before they can work...) Im really not convinced it is worth it to save one, well-named DLL in the system directory. Mark. From guido at python.org Fri Apr 7 03:25:35 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 21:25:35 -0400 Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: Your message of "Fri, 07 Apr 2000 00:43:16 +0200." <200004062243.AAA21491@python.inrialpes.fr> References: <200004062243.AAA21491@python.inrialpes.fr> Message-ID: <200004070125.VAA26776@eric.cnri.reston.va.us> > What happens is that in tracing mode, a copy of the original code stream > is created, a new CALL_TRACE opcode is stored in it at the addresses > corresponding to each source line number, then the instruction pointer > is redirected to execute the modified code string. Whenever a CALL_TRACE > opcode is reached, the callback is triggered. On a successful return, > the original opcode at the current address is fetched from the original > code string, then directly goto the dispatch code. > > This code string duplication & conditional break-point setting occurs > only when a trace function is set; in the "normal case", the interpreter > executes a code string without SET_LINENO. Ai! This really sounds like a hack. It may be a standard trick in the repertoire of virtual machine implementers, but it is still a hack, and makes my heart cry. I really wonder if it makes enough of a difference to warrant all that code, and the risk that that code isn't quite correct. (Is it thread-safe?) --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Fri Apr 7 03:36:30 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 7 Apr 2000 11:36:30 +1000 Subject: [Python-Dev] RE: DLL in the system directory on Windows In-Reply-To: Message-ID: [I wrote] > My only objection is that this is a PITA. It becomes a ... > However, to do this properly, we must do _every_ exported ... > Im really not convinced it is worth it to save one, well-named DLL > in the system directory. ie, lots of good reasons _not_ to do this. However, it is worth pointing out that there is one good - possibly compelling - reason to consider this. Not only would we drop the dependency from the system directory, we could also drop the dependency to the Python version. That is, any C extension compiled for 1.6 would be able to automatically and without recompilation work with Python 1.7, so long as we kept all the same public names. It is too late for Python 1.5, but it would be a nice feature if an upgrade to Python 1.7 did not require waiting for every extension author to catch up. OTOH, if Python 1.7 is really the final in the 1.x family, is it worth it for a single version? Just-musing-ly, Mark. From ping at lfw.org Fri Apr 7 03:47:36 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 6 Apr 2000 20:47:36 -0500 (CDT) Subject: [Python-Dev] Pythons (Like Buses) Considered Harmful Message-ID: So, has anyone not seen Doctor Fun today yet? http://metalab.unc.edu/Dave/Dr-Fun/latest.jpg :) :) -- ?!ng From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 04:02:22 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 04:02:22 +0200 (CEST) Subject: [Python-Dev] python -O weirdness Message-ID: <200004070202.EAA22307@python.inrialpes.fr> Strange. Can somebody confirm/refute, explain this behavior? -------------[ bug.py ]------------ def f(): pass def g(): a = 1 b = 2 def h(): pass def show(func): c = func.func_code print "(%d) %s: %d -> %s" % \ (c.co_firstlineno, c.co_name, len(c.co_lnotab), repr(c.co_lnotab)) show(f) show(g) show(h) ----------------------------------- ~> python bug.py (1) f: 2 -> '\003\001' (4) g: 4 -> '\003\001\011\001' (8) h: 2 -> '\003\000' ~> python -O bug.py (1) f: 2 -> '\000\001' (4) g: 4 -> '\000\001\006\001' (1) f: 2 -> '\000\001' <=== ??? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Fri Apr 7 04:19:02 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 6 Apr 2000 22:19:02 -0400 Subject: [Python-Dev] Long Multiplication is not commutative. In-Reply-To: <200004062132.RAA26296@eric.cnri.reston.va.us> Message-ID: <000701bfa037$a4545960$6c2d153f@tim> > Yes, go for it. I would appreciate a bunch of new test cases that > exercise the new path through the code, too... FYI, a suitable test would be to add a line to function test_division_2 in test_long.py, to verify that x*y == y*x. A variety of bitlengths for x and y are already generated by the framework. From tim_one at email.msn.com Fri Apr 7 04:19:00 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 6 Apr 2000 22:19:00 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38ED0016.E1C4A26C@tismer.com> Message-ID: <000601bfa037$a2c18460$6c2d153f@tim> [posted & mailed] [Christian Tismer] > as a side effect, I happened to observe the following rounding bug. > It happens in Stackless Python, which is built against the > pre-unicode CVS branch. > > Is this changed for 1.6, or might it be my bug? It's a 1.6 thing, and is not a bug. > D:\python\spc>python > Python 1.5.42+ (#0, Mar 29 2000, 20:23:26) [MSC 32 bit (Intel)] on win32 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> round(3.1415926585, 4) > 3.1415999999999999 > >>> ^Z The best possible IEEE-754 double approximation to 3.1416 is (exactly) 3.141599999999999948130380289512686431407928466796875 so the output you got is correctly rounded to 17 significant digits. IOW, it's a feature. 1.6 boosted the number of decimal digits repr(float) produces so that eval(repr(x)) == x for every finite float on every platform with an IEEE-754-conforming libc. It was actually rare for that equality to hold pre-1.6. repr() cannot produce fewer digits than this without allowing the equality to fail in some cases. The 1.6 str() still produces the *illusion* that the result is 3.1416 (as repr() also did pre-1.6). IMO it would be better if Python stopped using repr() (at least by default) for formatting expressions at the interactive prompt (for much more on this, see DejaNews). the-two-things-you-can-do-about-it-are-nothing-and-love-it-ly y'rs - tim From guido at python.org Fri Apr 7 04:23:11 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 06 Apr 2000 22:23:11 -0400 Subject: [Python-Dev] python -O weirdness In-Reply-To: Your message of "Fri, 07 Apr 2000 04:02:22 +0200." <200004070202.EAA22307@python.inrialpes.fr> References: <200004070202.EAA22307@python.inrialpes.fr> Message-ID: <200004070223.WAA26916@eric.cnri.reston.va.us> > Strange. Can somebody confirm/refute, explain this behavior? > > -------------[ bug.py ]------------ > def f(): > pass > > def g(): > a = 1 > b = 2 > > def h(): pass > > def show(func): > c = func.func_code > print "(%d) %s: %d -> %s" % \ > (c.co_firstlineno, c.co_name, len(c.co_lnotab), repr(c.co_lnotab)) > > show(f) > show(g) > show(h) > ----------------------------------- > > ~> python bug.py > (1) f: 2 -> '\003\001' > (4) g: 4 -> '\003\001\011\001' > (8) h: 2 -> '\003\000' > > ~> python -O bug.py > (1) f: 2 -> '\000\001' > (4) g: 4 -> '\000\001\006\001' > (1) f: 2 -> '\000\001' <=== ??? > > -- Yes. I can confirm and explain it. The functions f and h are sufficiently similar that their code objects actually compare equal. A little-known optimization is that two constants in a const array that compare equal (and have the same type!) are replaced by a single copy. This happens in the module's code object: f's and h's code are the same, so only one copy is kept. The function name is not taken into account for the comparison. Maybe it should? On the other hand, the name is a pretty inessential part of the function, and it's not going to change the semantics of the program... --Guido van Rossum (home page: http://www.python.org/~guido/) From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 04:47:15 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 04:47:15 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <200004070125.VAA26776@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 06, 2000 09:25:35 PM Message-ID: <200004070247.EAA22442@python.inrialpes.fr> Guido van Rossum wrote: > > > What happens is that in tracing mode, a copy of the original code stream > > is created, a new CALL_TRACE opcode is stored in it at the addresses > > corresponding to each source line number, then the instruction pointer > > is redirected to execute the modified code string. Whenever a CALL_TRACE > > opcode is reached, the callback is triggered. On a successful return, > > the original opcode at the current address is fetched from the original > > code string, then directly goto the dispatch code. > > > > This code string duplication & conditional break-point setting occurs > > only when a trace function is set; in the "normal case", the interpreter > > executes a code string without SET_LINENO. > > Ai! This really sounds like a hack. It may be a standard trick in > the repertoire of virtual machine implementers, but it is still a > hack, and makes my heart cry. The implementation sounds tricky, yes. But there's nothing hackish in the principle of setting breakpoints. The modified code string is in fact the stripped code stream (without LINENO), reverted back to a standard code stream with LINENO. However, to simplify things, the LINENO (aka CALL_TRACE) are not inserted between the instructions for every source line. They overwrite the original opcodes in the copy whenever a trace function is set (i.e. we set all conditional breakpoints (LINENO) at once). And since we overwrite for simplicity, at runtime, we read the ovewritten opcodes from the original stream, after the callback returns. All this magic occurs before the main loop, with finalization on exit of eval_code2. A tricky implementation of the principle of having a set of conditional breakpoints for every source line (these cond. bp are currently the SET_LINENO opcodes, in a more redundant version). > I really wonder if it makes enough of a difference to warrant all > that code, and the risk that that code isn't quite correct. Well, all this business is internal to ceval.c and doesn't seem to affect the rest of the world. I can see only two benefits (if this idea doesn't hide other mysteries -- so anyone interested may want check it out): 1) Some tiny speedup -- we'll reach -O in a standard setup 2) The .pyc files become smaller. (Lib/*.pyc is reduced by ~80K for 1.5.2) No other benefits (hmmm, maybe the pdb code will be simplified wrt linenos) I originally developped this idea because of the redundant, consecutive SET_LINENO in a code object. > (Is it thread-safe?) I think so. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From thomas.heller at ion-tof.com Fri Apr 7 09:10:41 2000 From: thomas.heller at ion-tof.com (Thomas Heller) Date: Fri, 7 Apr 2000 09:10:41 +0200 Subject: [Python-Dev] Re: DLL in the system directory on Windows References: Message-ID: <03fe01bfa060$626a2010$4500a8c0@thomasnotebook> > > > However, as we import data from Python16.dll rather then purely > > > addresses, we can't use any of these interception solutions. > > > > What's wrong with: > > > > #define PyClass_Type *(GetProcAddress(hdll, "PyClass_Type")) > > My only objection is that this is a PITA. It becomes a maintenance > nightmare for Guido as the code gets significantly larger and > uglier. Why is it a nightmare for Guido? It can be done by the extension writer: You in the case for PythonCOM.dll. > > > I have only looked at PythonCOM15.dll, and it seems that > > there are only references to a handfull of exported data items: > > > > some Py*_Type, plus _PyNone_Struct, _PyTrue_Struct, > > _PyZero_Struct. > > Yep - these structs, all the error objects and all the type objects. > > However, to do this properly, we must do _every_ exported data item, > not just ones that satisfy COM (otherwise the next poor soul will > have the exact same issue, and require patches to the core before > they can work...) IMHO it is not a problem of exporting, but a question how *you* import these. > > Im really not convinced it is worth it to save one, well-named DLL > in the system directory. As long as no one else installs a modified version there (which *should* have a different name, but...) > > Mark. > Thomas Heller From fredrik at pythonware.com Fri Apr 7 10:47:37 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 10:47:37 +0200 Subject: [Python-Dev] SRE: regex.set_syntax References: <200004061343.PAA20218@python.inrialpes.fr> Message-ID: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Vladimir Marangozov wrote: > [Guido] > > If it ain't broken, don't "fix" it. > > This also explains why socket.connect() generated so much resistance... I'm not sure I see the connection -- the 'regex' module is already declared obsolete... so Guido probably meant "if it's not even in there, don't waste time on it" imo, the main reasons for supporting 'regex' are 1) that lots of people are still using it, often for performance reasons 2) while the import error should be easy to spot, actually changing from 'regex' to 're' requires some quite extensive core restructuring, especially com- pared to what it takes to fix a broken 'append' or 'connect' call, and 3) it's fairly easy to do, since the engines use the same semantics, and 'sre' supports pluggable front-ends. but alright, I think the consensus here is "(1) get rid of it completely". in 1.6a2, perhaps? From fredrik at pythonware.com Fri Apr 7 11:13:16 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 11:13:16 +0200 Subject: [Python-Dev] Pythons (Like Buses) Considered Harmful References: Message-ID: <00cd01bfa071$838c6cb0$0500a8c0@secret.pythonware.com> > So, has anyone not seen Doctor Fun today yet? > > http://metalab.unc.edu/Dave/Dr-Fun/latest.jpg > > :) :) the daily python-url features this link ages ago (in internet time, at least): http://hem.passagen.se/eff/url.htm (everyone should read the daily python URL ;-) From fredrik at pythonware.com Fri Apr 7 11:13:23 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 11:13:23 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> Message-ID: <00ce01bfa071$87fd5b60$0500a8c0@secret.pythonware.com> M.-A. Lemburg wrote: > The UTF-8 assumption had to be made in order to get the two > worlds to interoperate. We could have just as well chosen > Latin-1, but then people currently using say a Russian > encoding would get upset for the same reason. > > One way or another somebody is not going to like whatever > we choose, I'm afraid... the simplest solution is to use > Unicode for all strings which contain non-ASCII characters > and then call .encode() as necessary. just a brief head's up: I've been playing with this a bit, and my current view is that the current unicode design is horridly broken when it comes to mixing 8-bit and 16-bit strings. basically, if you pass a uni- code string to a function slicing and dicing 8-bit strings, it will probably not work. and you will probably not under- stand why. I'm working on a proposal that I think will make things simpler and less magic, and far easier to understand. to appear on sunday. From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 11:53:19 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 11:53:19 +0200 (CEST) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> from "Fredrik Lundh" at Apr 07, 2000 10:47:37 AM Message-ID: <200004070953.LAA25788@python.inrialpes.fr> Fredrik Lundh wrote: > > Vladimir Marangozov wrote: > > [Guido] > > > If it ain't broken, don't "fix" it. > > > > This also explains why socket.connect() generated so much resistance... > > I'm not sure I see the connection -- the 'regex' module is > already declared obsolete... Don't look further -- there's no connection with the re/sre code. It was just a thought about the above citation vs. the connect change. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Fri Apr 7 12:55:30 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 12:55:30 +0200 Subject: [Python-Dev] default encoding for 8-bit string literals (was Unicode and comparisons) References: <200004051416.KAA16112@eric.cnri.reston.va.us> <38EB55FF.C900CF8A@lemburg.com> <200004051525.LAA16345@eric.cnri.reston.va.us> <38EB86BA.5225C381@lemburg.com> <00ce01bfa071$87fd5b60$0500a8c0@secret.pythonware.com> Message-ID: <38EDBEA2.8C843E49@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > The UTF-8 assumption had to be made in order to get the two > > worlds to interoperate. We could have just as well chosen > > Latin-1, but then people currently using say a Russian > > encoding would get upset for the same reason. > > > > One way or another somebody is not going to like whatever > > we choose, I'm afraid... the simplest solution is to use > > Unicode for all strings which contain non-ASCII characters > > and then call .encode() as necessary. > > just a brief head's up: > > I've been playing with this a bit, and my current view is that > the current unicode design is horridly broken when it comes > to mixing 8-bit and 16-bit strings. Why "horribly" ? String and Unicode mix pretty well, IMHO. The magic auto-conversion of Unicode to UTF-8 in C APIs using "s" or "s#" does not always do what the user expects, but it's still better than not having Unicode objects work with these APIs at all. > basically, if you pass a uni- > code string to a function slicing and dicing 8-bit strings, it > will probably not work. and you will probably not under- > stand why. > > I'm working on a proposal that I think will make things simpler > and less magic, and far easier to understand. to appear on > sunday. Looking forward to it, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 13:47:07 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 13:47:07 +0200 (CEST) Subject: [Python-Dev] Why do we need Traceback Objects? In-Reply-To: <14572.35902.781258.448592@beluga.mojam.com> from "Skip Montanaro" at Apr 06, 2000 08:08:14 AM Message-ID: <200004071147.NAA26437@python.inrialpes.fr> Skip Montanaro wrote: > > Whoops, wait a minute. I just tried > > >>> def foo(): pass > ... > >>> foo.func_code.co_lnotab > > with both "python" and "python -O". co_lnotab is empty for python -O. I > thought it was supposed to always be generated? It is always generated, but since co_lnotab contains only lineno increments starting from co_firstlineno (i.e. only deltas) and your function is a 1-liner (no lineno increments starting from the first line of the function), the table is empty. Move 'pass' to the next line and the table will contain 1-entry (of 2 bytes: delta_addr, delta_line). Generally speaking, the problem really boils down to the callbacks from C to Python when a tracefunc is set. My approach is not that bad in this regard. A decent processor nowadays has (an IRQ pin) a flag for generating interrupts on every processor instruction (trace flag). In Python, we have the same problem - we need to interrupt the (virtual) processor, implemented in eval_code2() on regular intervals. Actually, what we need (for pdb) is to interrupt the processor on every source line, but one could easily imagine a per instruction interrupt (with a callback installed with sys.settracei(). This is exactly what the patch does under the grounds. It interrupts the processor on every new source line (but interrupting it on every instruction would be a trivial extension -- all opcodes in the code stream would be set to CALL_TRACE!) And this is exactly what LINENO does (+ some processor state saving in the frame: f_lasti, f_lineno). Clearly, there are 2 differences with the existing code: a) The interrupting opcodes are installed dynamically, on demand, only when a trace function is set, for the current traced frame. Presently, these opcodes are SET_LINENO; I introduced a new one byte CALL_TRACE opcode which does the same thing (thus preserving backwards compatibility with old .pyc that contain SET_LINENO). b) f_lasti and f_lineno aren't updated when the frame is not traced :-( I wonder whether we really care about them, though. The other implementation details aren't so important. Yet, they look scary, but no more than the co_lnotab business. The problem with my patch is point b). I believe the approach is good, though -- if it weren't, I woudn't have taken the care to talk about it detail. :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Fri Apr 7 13:57:41 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 13:57:41 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings Message-ID: <38EDCD35.DDD5EB4B@lemburg.com> There has been a bug report about the treatment of Unicode objects together with 8-bit format strings. The current implementation converts the Unicode object to UTF-8 and then inserts this value in place of the %s.... I'm inclined to change this to have '...%s...' % u'abc' return u'...abc...' since this is just another case of coercing data to the "bigger" type to avoid information loss. Thoughts ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tismer at tismer.com Fri Apr 7 14:41:19 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 14:41:19 +0200 Subject: [Python-Dev] python -O weirdness References: <200004070202.EAA22307@python.inrialpes.fr> <200004070223.WAA26916@eric.cnri.reston.va.us> Message-ID: <38EDD76F.986D3C39@tismer.com> Guido van Rossum wrote: ... > The function name is not taken into account for the comparison. Maybe > it should? Absolutely, please! > On the other hand, the name is a pretty inessential part > of the function, and it's not going to change the semantics of the > program... If the name of the code object has any meaning, then it must be the name of the function that I meant, not just another function which happens to have the same body, IMHO. or the name should vanish completely. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gward at mems-exchange.org Fri Apr 7 14:49:15 2000 From: gward at mems-exchange.org (Greg Ward) Date: Fri, 7 Apr 2000 08:49:15 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004062130.RAA26273@eric.cnri.reston.va.us>; from guido@python.org on Thu, Apr 06, 2000 at 05:30:21PM -0400 References: <38ED0016.E1C4A26C@tismer.com> <200004062130.RAA26273@eric.cnri.reston.va.us> Message-ID: <20000407084914.A13606@mems-exchange.org> On 06 April 2000, Guido van Rossum said: > This is because repr() now uses full precision for floating point > numbers. round() does what it can, but 3.1416 just can't be > represented exactly, and "%.17g" gives 3.1415999999999999. > > This is definitely the right thing to do for repr() -- ask Tim. > > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... +1 on this: it's easier to change "foo" to "`foo`" than to "str(foo)" or "print foo". It just makes more sense to use str(). Oh, joy! oh happiness! someday soon, I may be able to type "blah.__doc__" at the interactive prompt and get a readable result! Greg From mikael at isy.liu.se Fri Apr 7 14:57:38 2000 From: mikael at isy.liu.se (Mikael Olofsson) Date: Fri, 07 Apr 2000 14:57:38 +0200 (MET DST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <20000407084914.A13606@mems-exchange.org> Message-ID: On 07-Apr-00 Greg Ward wrote: > Oh, joy! oh happiness! someday soon, I may be able to type > "blah.__doc__" at the interactive prompt and get a readable result! Just i case... I hope you haven't missed "print blah.__doc__". /Mikael ----------------------------------------------------------------------- E-Mail: Mikael Olofsson WWW: http://www.dtr.isy.liu.se/dtr/staff/mikael Phone: +46 - (0)13 - 28 1343 Telefax: +46 - (0)13 - 28 1339 Date: 07-Apr-00 Time: 14:56:52 This message was sent by XF-Mail. ----------------------------------------------------------------------- From guido at python.org Fri Apr 7 15:01:45 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:01:45 -0400 Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: Your message of "Fri, 07 Apr 2000 13:57:41 +0200." <38EDCD35.DDD5EB4B@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> Message-ID: <200004071301.JAA27100@eric.cnri.reston.va.us> > There has been a bug report about the treatment of Unicode > objects together with 8-bit format strings. The current > implementation converts the Unicode object to UTF-8 and then > inserts this value in place of the %s.... > > I'm inclined to change this to have '...%s...' % u'abc' > return u'...abc...' since this is just another case of > coercing data to the "bigger" type to avoid information loss. > > Thoughts ? Makes sense. But note that it's going to be difficult to catch all cases: you could have '...%d...%s...%s...' % (3, "abc", u"abc") and '...%(foo)s...' % {'foo': u'abc'} and even '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'} (the latter should *not* convert to Unicode). --Guido van Rossum (home page: http://www.python.org/~guido/) From jack at oratrix.nl Fri Apr 7 15:06:51 2000 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 07 Apr 2000 15:06:51 +0200 Subject: [Python-Dev] PYTHON_API_VERSION and threading Message-ID: <20000407130652.4D002370CF2@snelboot.oratrix.nl> Something that just struck me: couldn't we use a couple of bits in the PYTHON_API_VERSION to check various other things that make dynamic modules break? WITH_THREAD is the one I just ran in to, but there's a few others such as the object refcounting statistics and platform-dependent things like the debug/nodebug compilation on Windows. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From guido at python.org Fri Apr 7 15:13:21 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:13:21 -0400 Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: Your message of "Fri, 07 Apr 2000 15:06:51 +0200." <20000407130652.4D002370CF2@snelboot.oratrix.nl> References: <20000407130652.4D002370CF2@snelboot.oratrix.nl> Message-ID: <200004071313.JAA27132@eric.cnri.reston.va.us> > Something that just struck me: couldn't we use a couple of bits in the > PYTHON_API_VERSION to check various other things that make dynamic modules > break? WITH_THREAD is the one I just ran in to, but there's a few others such > as the object refcounting statistics and platform-dependent things like the > debug/nodebug compilation on Windows. I'm curious what combination didn't work? The thread APIs are supposed to be designed so that all combinations work -- the APIs are always present, they just don't do anything in the unthreaded version. If an extension is compiled without threads, well, then it won't release the interpreter lock, of course, but otherwise there should be no bad effects. The debug issue on Windows is taken care of by a DLL naming convention: the debug versions are named spam_d.dll (or .pyd). --Guido van Rossum (home page: http://www.python.org/~guido/) From gward at mems-exchange.org Fri Apr 7 15:15:46 2000 From: gward at mems-exchange.org (Greg Ward) Date: Fri, 7 Apr 2000 09:15:46 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: ; from mikael@isy.liu.se on Fri, Apr 07, 2000 at 02:57:38PM +0200 References: <20000407084914.A13606@mems-exchange.org> Message-ID: <20000407091545.B13606@mems-exchange.org> On 07 April 2000, Mikael Olofsson said: > > On 07-Apr-00 Greg Ward wrote: > > Oh, joy! oh happiness! someday soon, I may be able to type > > "blah.__doc__" at the interactive prompt and get a readable result! > > Just i case... I hope you haven't missed "print blah.__doc__". Yeah, I know: my usual mode of operation is this: >>> blah.__doc__ ...repr of docstring... ...sound of me cursing... >>> print blah.__doc__ The real reason for using str() at the interactive prompt is not to save me keystrokes, but because it just seems like the sensible thing to do. People who understand the str/repr difference, and really want the repr version, can slap backquotes around whatever they're printing. Greg From guido at python.org Fri Apr 7 15:18:39 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 09:18:39 -0400 Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: Your message of "Fri, 07 Apr 2000 10:47:37 +0200." <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <200004071318.JAA27173@eric.cnri.reston.va.us> > but alright, I think the consensus here is "(1) get rid > of it completely". in 1.6a2, perhaps? I don't think so... If people still use regex, why not keep it? It doesn't cost much to maintain... --Guido van Rossum (home page: http://www.python.org/~guido/) From fredrik at pythonware.com Fri Apr 7 15:43:03 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 7 Apr 2000 15:43:03 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: <20000407084914.A13606@mems-exchange.org> <20000407091545.B13606@mems-exchange.org> Message-ID: <002801bfa097$33228770$0500a8c0@secret.pythonware.com> Greg wrote: > Yeah, I know: my usual mode of operation is this: > > >>> blah.__doc__ > ...repr of docstring... > ...sound of me cursing... > >>> print blah.__doc__ on the other hand, I tend to do this now and then: >>> blah = foo() # returns chunk of binary data >>> blah which, if you use str instead of repr, can reprogram your terminal window in many interesting ways... but I think I'm +1 on this anyway. or at least +0.90000000000000002 From skip at mojam.com Fri Apr 7 15:04:39 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 7 Apr 2000 08:04:39 -0500 (CDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <14573.56551.939560.375409@beluga.mojam.com> Fredrik> 1) that lots of people are still using it, often for Fredrik> performance reasons Speaking of which, how do sre, re and regex compare to one another these days? -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From jack at oratrix.nl Fri Apr 7 16:19:36 2000 From: jack at oratrix.nl (Jack Jansen) Date: Fri, 07 Apr 2000 16:19:36 +0200 Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: Message by Guido van Rossum , Fri, 07 Apr 2000 09:13:21 -0400 , <200004071313.JAA27132@eric.cnri.reston.va.us> Message-ID: <20000407141937.3FBDE370CF2@snelboot.oratrix.nl> > > Something that just struck me: couldn't we use a couple of bits in the > > PYTHON_API_VERSION to check various other things that make dynamic modules > > break? WITH_THREAD is the one I just ran in to, but there's a few others such > > as the object refcounting statistics and platform-dependent things like the > > debug/nodebug compilation on Windows. > > I'm curious what combination didn't work? The thread APIs are > supposed to be designed so that all combinations work -- the APIs are > always present, they just don't do anything in the unthreaded > version. Oops, the problem was mine: not only was the extension module compiled without threading, but also with the previous version of the I/O library used on the mac. Silly me. -- Jack Jansen | ++++ stop the execution of Mumia Abu-Jamal ++++ Jack.Jansen at oratrix.com | ++++ if you agree copy these lines to your sig ++++ www.oratrix.nl/~jack | see http://www.xs4all.nl/~tank/spg-l/sigaction.htm From fdrake at acm.org Fri Apr 7 16:21:59 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 7 Apr 2000 10:21:59 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> Message-ID: <14573.61191.486890.43591@seahag.cnri.reston.va.us> Fredrik Lundh writes: > 1) that lots of people are still using it, often for > performance reasons That's why I never converted Grail; the "re" layer around "pcre" was substantially more expensive to use, and the HTML parser was way too slow already. (Displaying the result was still the slowest part, but we were desparate for every little scrap!) > but alright, I think the consensus here is "(1) get rid > of it completely". in 1.6a2, perhaps? I seem to recall a determination to toss it for Py3K (or Python 2, as it was called at the time). Note that Grail breaks completely as soon as the module can't be imported. I'll propose a compromise: keep it in the set of modules that get built by default, but remove the documentation sections from the manual. This will more strongly encourage migration for actively maintained code. I would be surprised if Grail is the only large application which uses "regex" for performance reasons, and we don't really *want* to break everything. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Fri Apr 7 16:48:31 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 16:48:31 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> Message-ID: <38EDF53F.94071785@lemburg.com> Guido van Rossum wrote: > > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Makes sense. But note that it's going to be difficult to catch all > cases: you could have > > '...%d...%s...%s...' % (3, "abc", u"abc") > > and > > '...%(foo)s...' % {'foo': u'abc'} > > and even > > '...%(foo)s...' % {'foo': 'abc', 'bar': u'def'} > > (the latter should *not* convert to Unicode). No problem... :-) Its a simple fix: once %s in an 8-bit string sees a Unicode object it will stop processing the string and restart using the unicode formatting algorithm. This will cost performance, of course. Optimization is easy though: add a small "u" in front of the string ;-) A sample session: >>> '...%(foo)s...' % {'foo':u"abc"} u'...abc...' >>> '...%(foo)s...' % {'foo':"abc"} '...abc...' >>> '...%(foo)s...' % {u'foo':"abc"} '...abc...' >>> '...%(foo)s...' % {u'foo':u"abc"} u'...abc...' >>> '...%(foo)s...' % {u'foo':u"abc",'def':123} u'...abc...' >>> '...%(foo)s...' % {u'foo':u"abc",u'def':123} u'...abc...' -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fdrake at acm.org Fri Apr 7 16:53:43 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 7 Apr 2000 10:53:43 -0400 (EDT) Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: <38EDF53F.94071785@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> Message-ID: <14573.63095.48171.721921@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > No problem... :-) Its a simple fix: once %s in an 8-bit string > sees a Unicode object it will stop processing the string and > restart using the unicode formatting algorithm. > > This will cost performance, of course. Optimization is easy though: > add a small "u" in front of the string ;-) Seems reasonable to me! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 19:14:03 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 19:14:03 +0200 (CEST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000601bfa037$a2c18460$6c2d153f@tim> from "Tim Peters" at Apr 06, 2000 10:19:00 PM Message-ID: <200004071714.TAA27347@python.inrialpes.fr> Tim Peters wrote: > > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > 3.141599999999999948130380289512686431407928466796875 > > so the output you got is correctly rounded to 17 significant digits. IOW, > it's a feature. I'm very respectful when I see a number with so many digits in a row. :-) I'm not sure that this will be of any interest to you, number crunchers, but a research team in computer arithmetics here reported some major results lately: they claim that they "solved" the Table Maker's Dilemma for most common functions in IEEE-754 double precision arithmetic. (and no, don't ask me what this means ;-) For more information, see: http://www.ens-lyon.fr/~jmmuller/Intro-to-TMD.htm -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 20:03:15 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 20:03:15 +0200 (CEST) Subject: [Python-Dev] python -O weirdness In-Reply-To: <38EDD76F.986D3C39@tismer.com> from "Christian Tismer" at Apr 07, 2000 02:41:19 PM Message-ID: <200004071803.UAA27485@python.inrialpes.fr> Christian Tismer wrote: > > Guido van Rossum wrote: > ... > > The function name is not taken into account for the comparison. Maybe > > it should? > > Absolutely, please! Honestly, no. -O is used for speed, so showing the wrong symbols is okay. It's the same in C. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at tismer.com Fri Apr 7 20:37:54 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 20:37:54 +0200 Subject: [Python-Dev] python -O weirdness References: <200004071803.UAA27485@python.inrialpes.fr> Message-ID: <38EE2B02.1E6F3CB8@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Guido van Rossum wrote: > > ... > > > The function name is not taken into account for the comparison. Maybe > > > it should? > > > > Absolutely, please! > > Honestly, no. -O is used for speed, so showing the wrong symbols is > okay. It's the same in C. Not ok, IMHO. If the name is not guaranteed to be valid, why should it be there at all? If I write code that relies on inspecting those things, then I'm hosed. I'm the last one who argues against optimization. But I'd use either no name at all, or a tuple with all folded names. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Vladimir.Marangozov at inrialpes.fr Fri Apr 7 20:40:03 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 7 Apr 2000 20:40:03 +0200 (CEST) Subject: [Python-Dev] the regression test suite Message-ID: <200004071840.UAA27606@python.inrialpes.fr> My kitchen programs show that regrtest.py keeps requesting more and more memory until it finishes all tests. IOW, it doesn't finalize properly each test. It keeps importing modules, without deleting them after each test. I think that before a particular test is run, we need to save the value of sys.modules, then restore it after the test (before running the next one). In a module enabled interpreter, this reduces the memory consumption almost by half... Patch? Think about the number of new tests that will be added in the future. I don't want to tolerate a silently approaching useless disk swapping :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From ping at lfw.org Fri Apr 7 20:47:45 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 7 Apr 2000 13:47:45 -0500 (CDT) Subject: [Python-Dev] Round Bug in Python 1.6? Message-ID: Tim Peters wrote: > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > 3.141599999999999948130380289512686431407928466796875 Let's call this number 'A' for the sake of discussion. > so the output you got is correctly rounded to 17 significant digits. IOW, > it's a feature. Clearly there is something very wrong here: Python 1.5.2+ (#2, Mar 28 2000, 18:27:50) Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> 3.1416 3.1415999999999999 >>> Now you say that 17 significant digits are required to ensure that eval(repr(x)) == x, but we surely know that 17 digits are *not* required when x is A because i *just typed in* 3.1416 and the best choice of double value was A. I haven't gone and figured it out, but i'll take your word for it that 17 digits may be required in *certain* cases to ensure that eval(repr(x)) == x. They're just not required in all cases. It's very jarring to type something in, and have the interpreter give you back something that looks very different. It breaks a fundamental rule of consistency, and that damages the user's trust in the system or their understanding of the system. (What do you do then, start explaining the IEEE double representation to your CP4E beginner?) What should really happen is that floats intelligently print in the shortest and simplest manner possible, i.e. the fewest number of digits such that the decimal representation will convert back to the actual value. Now you may say this is a pain to implement, but i'm talking about sanity for the user here. I haven't investigated how to do this best yet. I'll go off now and see if i can come up with an algorithm that's not quite so stupid as def smartrepr(x): p = 17 while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 return '%%.%df' % p % x -- ?!ng From tismer at tismer.com Fri Apr 7 20:51:09 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 07 Apr 2000 20:51:09 +0200 Subject: [Python-Dev] Long Multiplication is not commutative. References: <000701bfa037$a4545960$6c2d153f@tim> Message-ID: <38EE2E1D.6708B43D@tismer.com> Tim Peters wrote: > > > Yes, go for it. I would appreciate a bunch of new test cases that > > exercise the new path through the code, too... > > FYI, a suitable test would be to add a line to function test_division_2 in > test_long.py, to verify that x*y == y*x. A variety of bitlengths for x and > y are already generated by the framework. Thanks - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From moshez at math.huji.ac.il Fri Apr 7 20:45:41 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 7 Apr 2000 20:45:41 +0200 (IST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004062130.RAA26273@eric.cnri.reston.va.us> Message-ID: On Thu, 6 Apr 2000, Guido van Rossum wrote: > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... Just checking my newly bought "Guido Channeling" kit -- you mean str() but special case the snot out of strings(TM), don't you Trademark probably belong to Tim Peters. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From guido at python.org Fri Apr 7 21:18:40 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:18:40 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Fri, 07 Apr 2000 20:45:41 +0200." References: Message-ID: <200004071918.PAA27474@eric.cnri.reston.va.us> > Just checking my newly bought "Guido Channeling" kit -- you mean str() > but special case the snot out of strings(TM), don't you Except I'm not sure what kind of special-casing should be happening. Put quotes around it without worrying if that makes it a valid string literal is one thought that comes to mind. Another approach might be what Tk's text widget does -- pass through certain control characters (LF, TAB) and all (even non-ASCII) printing characters, but display other control characters as \x.. escapes rather than risk putting the terminal in a weird mode. No quotes though. Hm, I kind of like this: when used as intended, it will just display the text, with newlines and umlauts etc.; but when printing binary gibberish, it will do something friendly. There's also the issue of what to do with lists (or tuples, or dicts) containing strings. If we agree on this: >>> "hello\nworld\n\347" # octal 347 is a cedilla hello world ? >>> Then what should ("hello\nworld", "\347") show? I've got enough serious complaints that I don't want to propose that it use repr(): >>> ("hello\nworld", "\347") ('hello\nworld', '\347') >>> Other possibilities: >>> ("hello\nworld", "\347") ('hello world', '?') >>> or maybe >>> ("hello\nworld", "\347") ('''hello world''', '?') >>> Of course there's also the Unicode issue -- the above all assumes Latin-1 for stdout. Still no closure, I think... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 7 21:35:32 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:35:32 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Your message of "Fri, 07 Apr 2000 13:47:45 CDT." References: Message-ID: <200004071935.PAA27541@eric.cnri.reston.va.us> > Tim Peters wrote: > > The best possible IEEE-754 double approximation to 3.1416 is (exactly) > > > > 3.141599999999999948130380289512686431407928466796875 > > Let's call this number 'A' for the sake of discussion. > > > so the output you got is correctly rounded to 17 significant digits. IOW, > > it's a feature. > > Clearly there is something very wrong here: > > Python 1.5.2+ (#2, Mar 28 2000, 18:27:50) > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> 3.1416 > 3.1415999999999999 > >>> > > Now you say that 17 significant digits are required to ensure > that eval(repr(x)) == x, but we surely know that 17 digits are > *not* required when x is A because i *just typed in* 3.1416 and > the best choice of double value was A. Ping has a point! > I haven't gone and figured it out, but i'll take your word for > it that 17 digits may be required in *certain* cases to ensure > that eval(repr(x)) == x. They're just not required in all cases. > > It's very jarring to type something in, and have the interpreter > give you back something that looks very different. It breaks a > fundamental rule of consistency, and that damages the user's > trust in the system or their understanding of the system. (What > do you do then, start explaining the IEEE double representation > to your CP4E beginner?) > > What should really happen is that floats intelligently print in > the shortest and simplest manner possible, i.e. the fewest > number of digits such that the decimal representation will > convert back to the actual value. Now you may say this is a > pain to implement, but i'm talking about sanity for the user here. > > I haven't investigated how to do this best yet. I'll go off > now and see if i can come up with an algorithm that's not > quite so stupid as > > def smartrepr(x): > p = 17 > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > return '%%.%df' % p % x Have a look at what Java does; it seems to be doing this right: & jpython JPython 1.1 on java1.2 (JIT: sunwjit) Copyright (C) 1997-1999 Corporation for National Research Initiatives >>> import java.lang >>> x = java.lang.Float(3.1416) >>> x.toString() '3.1416' >>> ^D & Could it be as simple as converting x +/- one bit and seeing how many differing digits there were? (Not that +/- one bit is easy to calculate...) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 7 21:37:26 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:37:26 -0400 Subject: [Python-Dev] the regression test suite In-Reply-To: Your message of "Fri, 07 Apr 2000 20:40:03 +0200." <200004071840.UAA27606@python.inrialpes.fr> References: <200004071840.UAA27606@python.inrialpes.fr> Message-ID: <200004071937.PAA27552@eric.cnri.reston.va.us> > My kitchen programs show that regrtest.py keeps requesting more and > more memory until it finishes all tests. IOW, it doesn't finalize > properly each test. It keeps importing modules, without deleting them > after each test. I think that before a particular test is run, we need to > save the value of sys.modules, then restore it after the test (before > running the next one). In a module enabled interpreter, this reduces > the memory consumption almost by half... > > Patch? > > Think about the number of new tests that will be added in the future. > I don't want to tolerate a silently approaching useless disk swapping :-) I'm not particularly concerned, but it does make some sense. (And is faster than starting a fresh interpreter for each test.) So why don't you give it a try! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 7 21:49:52 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 15:49:52 -0400 Subject: [Python-Dev] Unicode as argument for 8-bit format strings In-Reply-To: Your message of "Fri, 07 Apr 2000 16:48:31 +0200." <38EDF53F.94071785@lemburg.com> References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> Message-ID: <200004071949.PAA27635@eric.cnri.reston.va.us> > No problem... :-) Its a simple fix: once %s in an 8-bit string > sees a Unicode object it will stop processing the string and > restart using the unicode formatting algorithm. But the earlier items might already have incurred side effects (e.g. when rendering user code)... Unless you save all the strings you got for reuse, which seems a pain as well. --Guido van Rossum (home page: http://www.python.org/~guido/) From ping at lfw.org Fri Apr 7 22:00:09 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Fri, 7 Apr 2000 15:00:09 -0500 (CDT) Subject: [Python-Dev] str() for interpreter output Message-ID: Guido van Rossum wrote: > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... You do NOT want this. I'm against this change -- quite strongly, in fact. Greg Ward wrote: > Oh, joy! oh happiness! someday soon, I may be able to type > "blah.__doc__" at the interactive prompt and get a readable result! Have repr() use triple-quotes when strings contain newlines if you like, but do *not* hide the fact that the thing being displayed is a string. Imagine the confusion this would cause! (in a hypothetical Python-with-str()...) >>> a = 1 + 1 >>> b = '2' >>> c = [1, 2, 3] >>> d = '[1, 2, 3]' ...much later... >>> a 2 >>> b 2 >>> a + 5 7 >>> b + 5 Traceback (innermost last): File "", line 1, in ? TypeError: illegal argument type for built-in operation Huh?!? >>> c [1, 2, 3] >>> d [1, 2, 3] >>> c.append(4) >>> c [1, 2, 3, 4] >>> d.append(4) Traceback (innermost last): File "", line 1, in ? AttributeError: attribute-less object Huh?!?! >>> c[1] 2 >>> d[1] 1 What?! This is guaranteed to confuse! Things that look the same should be the same. Things that are different should look different. Getting the representation of objects from the interpreter provides a very important visual cue: you can usually tell just by looking at the first character what kind of animal you've got. A digit means it's a number; a quote means a string; "[" means a list; "(" means a tuple; "{" means a dictionary; "<" means an instance or a special kind of object. Switching to str() instead of repr() completely breaks this property so you have no idea what you are getting. Intuitions go out the window. Granted, repr() cannot always produce an exact reconstruction of an object. repr() is not a serialization mechanism! We have 'pickle' for that. But the nice thing about repr() is that, in general, you can *tell* whether the representation is accurate enough to re-type: once you see a "<...>" sort of thing, you know that there is extra magic that you can't type in. "<...>" was an excellent choice because it is very clearly syntactically illegal. As a corollary, here is an important property of repr() that i think ought to be documented and preserved: eval(repr(x)) should produce an object with the same value and state as x, or it should cause a SyntaxError. We should avoid ever having it *succeed* and produce the *wrong* x. * * * As Tim suggested, i did go back and read the comp.lang.python thread on "__str__ vs. __repr__". Honestly i'm really surprised that such a convoluted hack as the suggestion to "special-case the snot out of strings" would come from Tim, and more surprised that it actually got so much airtime. Doing this special-case mumbo-jumbo would be even worse! Look: (in a hypothetical Python-with-snotless-str()...) >>> a = '\\' >>> b = '\'' ...much later... >>> a '\' >>> '\' File "", line 1 '\' ^ SyntaxError: invalid token (at this point i am envisioning the user screaming, "But that's what YOU said!") >>> b ''' >>> ''' ... Wha...?!! Or, alternatively, if even more effort had been expended removing snot: >>> b "'" >>> "'" "'" >>> print b ' Okay... then: >>> c = '"\'" >>> c '"'' >>> '"'' File "", line 1 '"'' ^ SyntaxError: invalid token Oh, it should print as '"\'', you say? Well then what of: >>> c '"\'' >>> d = '"\\\'' '"\\'' >>> '"\\'' File "", line 1 '"\\'' ^ SyntaxError: invalid token Damned if you do, damned if you don't. Tim's snot-removal algorithm forces the user to *infer* the rules of snot removal, remember them, and tentatively apply them to everything they see (since they still can't be sure whether snot has been removed from what they are seeing). How are the user and the interpreter ever to get along if they can't talk to each other in the same language? * * * As for the suggestion to add an interpreter hook to __builtins__ such that you can supply your own display routine, i'm all for it. Great idea there. * * * I think Donn Cave put it best: there are THREE different kinds of convert-to-string, and we'll only confuse the issue if we try to ignore the distinctions. (a) accurate serialization (b) coerce to string (c) friendly display (a) is taken care of by 'pickle'. (b) is str(). Clearly, coercing a string to a string should not change anything -- thus str(x) is just x if x is already a string. (c) is repr(). repr() is for the human, not for the machine. (a) is for the machine. repr() is: "Please show me as much information as you reasonably can about this object in an accurate and unambiguous way, but if you can't readably show me everything, make it obvious that you're not." repr() must be unambiguous, because the interpreter must help people learn by example. -- ?!ng From gmcm at hypernet.com Fri Apr 7 22:12:29 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 7 Apr 2000 16:12:29 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <1256984142-21537727@hypernet.com> Ka-Ping Yee wrote: > repr() must be unambiguous, because the interpreter must help people > learn by example. Speaking of which: >>> class A: ... def m(self): ... pass ... >>> a = A() >>> a.m >>> m = a.m >>> m >>> m is a.m 0 >>> ambiguated-ly y'rs - Gordon From guido at python.org Fri Apr 7 22:14:53 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 07 Apr 2000 16:14:53 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Your message of "Fri, 07 Apr 2000 15:00:09 CDT." References: Message-ID: <200004072014.QAA27700@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > However, it may be time to switch so that "immediate expression" > > values are printed as str() instead of as repr()... [Ping] > You do NOT want this. > > I'm against this change -- quite strongly, in fact. Thanks for reminding me of what my original motivation was for using repr(). I am also still annoyed at some extension writers who violate the rule, and design a repr() that is nice to look at but lies about the type. Note that xrange() commits this sin! (I didn't write xrange() and never liked it. ;-) We still have a dilemma though... People using the interactive interpreter to perform some specific task (e.g. NumPy users), rather than to learn about Python, want str(), and actually I agree with them there. How can we give everybody wht they want? > As for the suggestion to add an interpreter hook to __builtins__ > such that you can supply your own display routine, i'm all for it. > Great idea there. Maybe this is the solution... --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Apr 7 23:03:31 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 07 Apr 2000 23:03:31 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> <200004071949.PAA27635@eric.cnri.reston.va.us> Message-ID: <38EE4D22.CC43C664@lemburg.com> Guido van Rossum wrote: > > > No problem... :-) Its a simple fix: once %s in an 8-bit string > > sees a Unicode object it will stop processing the string and > > restart using the unicode formatting algorithm. > > But the earlier items might already have incurred side effects > (e.g. when rendering user code)... Unless you save all the strings > you got for reuse, which seems a pain as well. Oh well... I don't think it's worth getting this 100% right. We'd need quite a lot of code to store the intermediate results and then have them reused during the Unicode %-formatting -- just to catch the few cases where str(obj) does have side-effects: the code would have to pass the partially rendered string pasted together with the remaining format string to the Unicode coercion mechanism and then fiddle the arguments right. Which side-effects are you thinking about here ? Perhaps it would be better to simply raise an exception in case '%s' meets Unicode. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Sat Apr 8 00:42:01 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 08 Apr 2000 00:42:01 +0200 Subject: [Python-Dev] Unicode as argument for 8-bit format strings References: <38EDCD35.DDD5EB4B@lemburg.com> <200004071301.JAA27100@eric.cnri.reston.va.us> <38EDF53F.94071785@lemburg.com> <200004071949.PAA27635@eric.cnri.reston.va.us> <38EE4D22.CC43C664@lemburg.com> Message-ID: <38EE6439.80847A06@lemburg.com> "M.-A. Lemburg" wrote: > > Guido van Rossum wrote: > > > > > No problem... :-) Its a simple fix: once %s in an 8-bit string > > > sees a Unicode object it will stop processing the string and > > > restart using the unicode formatting algorithm. > > > > But the earlier items might already have incurred side effects > > (e.g. when rendering user code)... Unless you save all the strings > > you got for reuse, which seems a pain as well. > > Oh well... I don't think it's worth getting this 100% right. Never mind -- I have a patch ready now, that doesn't restart, but instead uses what has already been formatted and then continues in Unicode mode. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tim_one at email.msn.com Sat Apr 8 03:41:48 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 7 Apr 2000 21:41:48 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000201bfa0fb$9af44b40$bc2d153f@tim> [Ka-Ping Yee] > ,,, > Now you say that 17 significant digits are required to ensure > that eval(repr(x)) == x, Yes. This was first proved in Jerome Coonen's doctoral dissertation, and is one of the few things IEEE-754 guarantees about fp I/O: that input(output(x)) == x for all finite double x provided that output() produces at least 17 significant decimal digits (and 17 is minimal). In particular, IEEE-754 does *not* guarantee that either I or O are properly rounded, which latter is needed for what *you* want to see here. The std doesn't require proper rounding in this case (despite that it requires it in all other cases) because no efficient method for doing properly rounded I/O was known at the time (and, alas, that's still true). > but we surely know that 17 digits are *not* required when x is A > because i *just typed in* 3.1416 and the best choice of double value > was A. Well, x = 1.0 provides a simpler case . > I haven't gone and figured it out, but i'll take your word for > it that 17 digits may be required in *certain* cases to ensure > that eval(repr(x)) == x. They're just not required in all cases. > > It's very jarring to type something in, and have the interpreter > give you back something that looks very different. It's in the very nature of binary floating-point that the numbers they type in are often not the numbers the system uses. > It breaks a fundamental rule of consistency, and that damages the user's > trust in the system or their understanding of the system. If they're surprised by this, they indeed don't understand the arithmetic at all! This is an argument for using a different form of arithmetic, not for lying about reality. > (What do you do then, start explaining the IEEE double representation > to your CP4E beginner?) As above. repr() shouldn't be used at the interactive prompt anyway (but note that I did not say str() should be). > What should really happen is that floats intelligently print in > the shortest and simplest manner possible, i.e. the fewest > number of digits such that the decimal representation will > convert back to the actual value. Now you may say this is a > pain to implement, but i'm talking about sanity for the user here. This can be done, but only if Python does all fp I/O conversions entirely on its own -- 754-conforming libc routines are inadequate for this purpose (and, indeed, I don't believe any libc other than Sun's does do proper rounding here). For background and code, track down "How To Print Floating-Point Numbers Accurately" by Steele & White, and its companion paper (s/Print/Read/) by Clinger. Steele & White were specifically concerned with printing the "shortest" fp representation possible such that proper input could later reconstruct the value exactly. Steele, White & Clinger give relatively simple code for this that relies on unbounded int arithmetic. Excruciatingly difficult and platform-#ifdef'ed "optimized" code for this was written & refined over several years by the numerical analyst David Gay, and is available from Netlib. > I haven't investigated how to do this best yet. I'll go off > now and see if i can come up with an algorithm that's not > quite so stupid as > > def smartrepr(x): > p = 17 > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > return '%%.%df' % p % x This merely exposes accidents in the libc on the specific platform you run it. That is, after print smartrepr(x) on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not yield the same number platform A started with. Both platforms have to do proper rounding to make this work; there's no way to do proper rounding by using libc; so Python has to do it itself; there's no efficient way to do it regardless; nevertheless, it's a noble goal, and at least a few languages in the Lisp family require it (most notably Scheme, from whence Steele, White & Clinger's interest in the subject). you're-in-over-your-head-before-the-water-touches-your-toes-ly y'rs - tim From billtut at microsoft.com Sat Apr 8 03:45:03 2000 From: billtut at microsoft.com (Bill Tutt) Date: Fri, 7 Apr 2000 18:45:03 -0700 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> > There has been a bug report about the treatment of Unicode > objects together with 8-bit format strings. The current > implementation converts the Unicode object to UTF-8 and then > inserts this value in place of the %s.... > > I'm inclined to change this to have '...%s...' % u'abc' > return u'...abc...' since this is just another case of > coercing data to the "bigger" type to avoid information loss. > > Thoughts ? Suddenly returning a Unicode string from an operation that was an 8-bit string is likely to give some code exterme fits of despondency. Converting to UTF-8 didn't give you any data loss, however it certainly might be unexpected to now find UTF-8 characters in what the user originally thought was a binary string containing whatever they had wanted it to contain. Throwing an exception would at the very least force the user to make a decision one way or the other about what they want to do with the data. They might want to do a codepage translation, or something else. (aka Hey, here's a bug I just found for you!) In what other cases are you suddenly returning a Unicode string object from which previouslly returned a string object? Bill From tim_one at email.msn.com Sat Apr 8 03:49:03 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 7 Apr 2000 21:49:03 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: <200004072014.QAA27700@eric.cnri.reston.va.us> Message-ID: <000301bfa0fc$9e452e80$bc2d153f@tim> [Guido] > Thanks for reminding me of what my original motivation was for using > repr(). I am also still annoyed at some extension writers who violate > the rule, and design a repr() that is nice to look at but lies about > the type. ... Back when this was a hot topic on c.l.py (there are no new topics <0.1 wink>), it was very clear that many did this to class __repr__ on purpose, precisely because they wanted to get back a readable string at the interactive prompt (where a *correct* repr may yield a megabyte of info -- see my extended examples from that thread with Rationals, and lists of Rationals, and dicts w/ Rationals etc). In fact, at least one Python old-timer argued strongly that the right thing to do was to swap the descriptions of str() and repr() in the docs! str()-should-also-"pass-str()-down"-ly y'rs - tim From fredrik at pythonware.com Sat Apr 8 07:47:13 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Sat, 8 Apr 2000 07:47:13 +0200 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> Message-ID: <002c01bfa11d$e4608ec0$0500a8c0@secret.pythonware.com> Bill Tutt wrote: > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Suddenly returning a Unicode string from an operation that was an 8-bit > string is likely to give some code exterme fits of despondency. why is this different from returning floating point values from operations involving integers and floats? > Converting to UTF-8 didn't give you any data loss, however it certainly > might be unexpected to now find UTF-8 characters in what the user originally > thought was a binary string containing whatever they had wanted it to contain. the more I've played with this, the stronger my opinion that the "now it's an ordinary string, now it's a UTF-8 string, now it's an ordinary string again" approach doesn't work. more on this in a later post. (am I the only one here that has actually tried to write code that handles both unicode strings and ordinary strings? if not, can anyone tell me what I'm doing wrong?) > Throwing an exception would at the very least force the user to make a > decision one way or the other about what they want to do with the data. > They might want to do a codepage translation, or something else. (aka Hey, > here's a bug I just found for you!) > In what other cases are you suddenly returning a Unicode string object from > which previouslly returned a string object? if unicode is ever to be a real string type in python, and not just a nifty extension type, it must be okay to return a unicode string from any operation that involves a unicode argument... From billtut at microsoft.com Sat Apr 8 08:24:06 2000 From: billtut at microsoft.com (Bill Tutt) Date: Fri, 7 Apr 2000 23:24:06 -0700 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF04@RED-MSG-50> > From: Fredrik Lundh [mailto:fredrik at pythonware.com] > > Bill Tutt wrote: > > > There has been a bug report about the treatment of Unicode > > > objects together with 8-bit format strings. The current > > > implementation converts the Unicode object to UTF-8 and then > > > inserts this value in place of the %s.... > > > > > > I'm inclined to change this to have '...%s...' % u'abc' > > > return u'...abc...' since this is just another case of > > > coercing data to the "bigger" type to avoid information loss. > > > > > > Thoughts ? > > > > Suddenly returning a Unicode string from an operation that > was an 8-bit > > string is likely to give some code exterme fits of despondency. > > why is this different from returning floating point values from > operations involving integers and floats? > > > Converting to UTF-8 didn't give you any data loss, however > it certainly > > might be unexpected to now find UTF-8 characters in what > the user originally > > thought was a binary string containing whatever they had > wanted it to contain. > > the more I've played with this, the stronger my opinion that > the "now it's an ordinary string, now it's a UTF-8 string, now > it's an ordinary string again" approach doesn't work. more on > this in a later post. > Well, unicode string/UTF-8 string, but I definately agree with you. Pick one or the other and make the user convert betwixt the two. > (am I the only one here that has actually tried to write code > that handles both unicode strings and ordinary strings? if not, > can anyone tell me what I'm doing wrong?) > In C++, yes. :) Autoconverting into or out of unicode is bound to lead to trouble for someone. Look at the various messes that misused C++ operator overloading can get you into. Whether its the code that wasn't expecting UTF-8 in a normal string type, or a formatting operation that used to return a normal string type now returning a Unicode string. > > Throwing an exception would at the very least force the > user to make a > > decision one way or the other about what they want to do > with the data. > > They might want to do a codepage translation, or something > else. (aka Hey, > > here's a bug I just found for you!) > > > In what other cases are you suddenly returning a Unicode > string object from > > which previouslly returned a string object? > > if unicode is ever to be a real string type in python, and not just a > nifty extension type, it must be okay to return a unicode string from > any operation that involves a unicode argument... Err. I'm not sure what you're getting at here. If your saying that it'd be nice if we could ditch the current string type and just use the Unicode string type, then I agree with you. However, that doesn't mean you should change the semantics of an operation that existed before unicode came into the picture, since it would break backward compatability. +1 for '%s' % u'\u1234' throwing a TypeError exception. Bill From tim_one at email.msn.com Sat Apr 8 09:23:16 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 8 Apr 2000 03:23:16 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071935.PAA27541@eric.cnri.reston.va.us> Message-ID: <000001bfa12b$4f5501e0$6b2d153f@tim> [Guido] > Have a look at what Java does; it seems to be doing this right: > > & jpython > JPython 1.1 on java1.2 (JIT: sunwjit) > Copyright (C) 1997-1999 Corporation for National Research Initiatives > >>> import java.lang > >>> x = java.lang.Float(3.1416) > >>> x.toString() > '3.1416' > >>> That Java does this is not an accident: Guy Steele pushed for the same rules he got into Scheme, although a) The Java rules are much tighter than Scheme's. and b) He didn't prevail on this point in Java until version 1.1 (before then Java's double/float->string never produced more precision than ANSI C's default %g format, so was inadequate to preserve equality under I/O). I suspect there was more than a bit of internal politics behind the delay, as the 754 camp has never liked the "minimal width" gimmick(*), and Sun's C and Fortran numerics (incl. their properly-rounding libc I/O routines) were strongly influenced by 754 committee members. > Could it be as simple as converting x +/- one bit and seeing how many > differing digits there were? (Not that +/- one bit is easy to > calculate...) Sorry, it's much harder than that. See the papers (and/or David Gay's code) I referenced before. (*) Why the minimal-width gimmick is disliked: If you print a (32-bit) IEEE float with minimal width, then read it back in as a (64-bit) IEEE double, you may not get the same result as if you had converted the original float to a double directly. This is because "minimal width" here is *relative to* the universe of 32-bit floats, and you don't always get the same minimal width if you compute it relative to the universe of 64-bit doubles instead. In other words, "minimal width" can lose accuracy needlessly -- but this can't happen if you print the float to full precision instead. From mal at lemburg.com Sat Apr 8 11:51:32 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 08 Apr 2000 11:51:32 +0200 Subject: [Python-Dev] re: Unicode as argument for 8-bit strings References: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF03@RED-MSG-50> Message-ID: <38EF0124.F5032CB2@lemburg.com> Bill Tutt wrote: > > > There has been a bug report about the treatment of Unicode > > objects together with 8-bit format strings. The current > > implementation converts the Unicode object to UTF-8 and then > > inserts this value in place of the %s.... > > > > I'm inclined to change this to have '...%s...' % u'abc' > > return u'...abc...' since this is just another case of > > coercing data to the "bigger" type to avoid information loss. > > > > Thoughts ? > > Suddenly returning a Unicode string from an operation that was an 8-bit > string is likely to give some code exterme fits of despondency. > > Converting to UTF-8 didn't give you any data loss, however it certainly > might be unexpected to now find UTF-8 characters in what the user originally > thought was > a binary string containing whatever they had wanted it to contain. Well, the design is to always coerce to Unicode when 8-bit string objects and Unicode objects meet. This is done for all string methods and that's the reason I'm also implementing this for %-formatting (internally this is just another string method). > Throwing an exception would at the very least force the user to make a > decision one way or the other about what they want to do with the data. > They might want to do a codepage translation, or something else. (aka Hey, > here's a bug I just found for you!) True; but Guido's intention was to have strings and Unicode interoperate without too much user intervention. > In what other cases are you suddenly returning a Unicode string object from > which previouslly returned a string object? All string methods automatically coerce to Unicode when they see a Unicode argument, e.g. " ".join(("abc", u"def")) will return u"abc def". -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Sat Apr 8 13:01:00 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 8 Apr 2000 13:01:00 +0200 (CEST) Subject: [Python-Dev] python -O weirdness In-Reply-To: <38EE2B02.1E6F3CB8@tismer.com> from "Christian Tismer" at Apr 07, 2000 08:37:54 PM Message-ID: <200004081101.NAA28756@python.inrialpes.fr> > > > [GvR] > > > ... > > > > The function name is not taken into account for the comparison. Maybe > > > > it should? > > > > > > [CT] > > > Absolutely, please! > > > > [VM] > > Honestly, no. -O is used for speed, so showing the wrong symbols is > > okay. It's the same in C. > > [CT] > Not ok, IMHO. If the name is not guaranteed to be valid, why > should it be there at all? If I write code that relies on > inspecting those things, then I'm hosed. I think that you don't want to rely on inspecting the symbol<->code bindings of an optimized program. In general. Python is different in this regard, though, because of the standard introspection facilities. One expects that f.func_code.co_name == 'f' is always true, although it's not for -O. A perfect example of a name `conflict' due to object sharing. The const array optimization is well known. It folds object constants which have the same value. In this particular case, however, they don't have the same value, because of the hardcoded function name. So in the end, it turns out that Chris is right (although not for the same reason ;-) and it would be nice to fix code_compare. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Sun Apr 9 03:26:23 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 8 Apr 2000 21:26:23 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071714.TAA27347@python.inrialpes.fr> Message-ID: <000001bfa1c2$9e403a80$18a2143f@tim> [Vladimir Marangozov] > I'm not sure that this will be of any interest to you, number crunchers, > but a research team in computer arithmetics here reported some major > results lately: they claim that they "solved" the Table Maker's Dilemma > for most common functions in IEEE-754 double precision arithmetic. > (and no, don't ask me what this means ;-) Back in the old days, some people spent decades making tables of various function values. A common way was to laboriously compute high-precision values over a sparse grid, using e.g. series expansions, then extend that to a fine grid via relatively simple interpolation formulas between the high-precision results. You have to compute the sparse grid to *some* "extra" precision in order to absorb roundoff errors in the interpolated values. The "dilemma" is figuring out how *much* extra precision: too much and it greatly slows the calculations, too little and the interpolated values are inaccurate. The "problem cases" for a function f(x) are those x such that the exact value of f(x) is very close to being exactly halfway between representable numbers. In order to round correctly, you have to figure out which representable number f(x) is closest to. How much extra precision do you need to use to resolve this correctly in all cases? Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3. That's not enough to know whether it *should* round to 41 or 42. So you need to try again with more precision. But how much? You might try 5 digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next? Might be a waste of effort. Try 20 next? Might *still* not be enough -- or could just as well be that 7 would have been enough and you did 10x the work you needed to do. Etc. It turns out that for most functions there's no general way known to answer the "how much?" question in advance: brute force is the best method known. For various IEEE double precision functions, so far it's turned out that you need in the ballpark of 40-60 extra accurate bits (beyond the native 53) in order to round back correctly to 53 in all cases, but there's no *theory* supporting that. It *could* require millions of extra bits. For those wondering "why bother?", the practical answer is this: if a std could require correct rounding, functions would be wholly portable across machines ("correctly rounded" is precisely defined by purely mathematical means). That's where IEEE-754 made its huge break with tradition, by requiring correct rounding for + - * / and sqrt. The places it left fuzzy (like string<->float, and all transcendental functions) are the places your program produces different results when you port it. Irritating one: MS VC++ on Intel platforms generates different code for exp() depending on the optimization level. They often differ in the last bit they compute. This wholly accounts for why Dragon's speech recognition software sometimes produces subtly (but very visibly!) different results depending on how it was compiled. Before I got tossed into this pit, it was assumed for a year to be either a -O bug or somebody fetching uninitialized storage. that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 06:39:09 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 00:39:09 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <000101bfa1dd$8c382f80$172d153f@tim> [Guido van Rossum] > However, it may be time to switch so that "immediate expression" > values are printed as str() instead of as repr()... [Ka-Ping Yee] > You do NOT want this. > > I'm against this change -- quite strongly, in fact. Relax, nobody wants that. The fact is that neither str() nor repr() is reasonable today for use at the interactive prompt. repr() *appears* adequate only so long as you stick to the builtin types, where the difference between repr() and str() is most often non-existent(!). But repr() has driven me (& not only me) mad for years at the interactive prompt in my own (and extension) types, since a *faithful* representation of a "large" object is exactly what you *don't* want to see scrolling by. You later say (echoing Donn Cave) > repr() is for the human, not for the machine but that contradicts the docs and the design. What you mean to say is "the thing that the interactive prompt uses by default *should* be for the human, not for the machine" -- which repr() is not. That's why repr() sucks here, despite that it's certainly "more for the human" than a pickle is. str() isn't suitable either, alas, despite that (by design and by the docs) it was *intended* to be, at least because str() on a container invokes repr() on the containees. Neither str() nor repr() can be used to get a human-friendly string form of nested objects today (unless, as is increasingly the *practice*, people misuse __repr__() to do what __str__() was *intended* to do -- c.f. Guido's complaint about that). > ... > Have repr() use triple-quotes when strings contain newlines > if you like, but do *not* hide the fact that the thing being > displayed is a string. Nobody wants to hide this (or, if someone does, set yourself up for merciless poking before it's too late). > ... > Getting the representation of objects from the interpreter provides > a very important visual cue: you can usually tell just by looking > at the first character what kind of animal you've got. A digit means > it's a number; a quote means a string; "[" means a list; "(" means a > tuple; "{" means a dictionary; "<" means an instance or a special > kind of object. Switching to str() instead of repr() completely > breaks this property so you have no idea what you are getting. > Intuitions go out the window. This is way oversold: str() also supplies "[" for lists, "(" for tuples, "{" for dicts, and "<" for instances of classes that don't override __str__. The only difference between repr() and str() in this listing of faux terror is when they're applied to strings. > Granted, repr() cannot always produce an exact reconstruction of an > object. repr() is not a serialization mechanism! To the contrary, many classes and types implement repr() for that very purpose. It's not universal but doesn't need to be. > We have 'pickle' for that. pickles are unreadable by humans; that's why repr() is often preferred. > ... > As a corollary, here is an important property of repr() that > i think ought to be documented and preserved: > > eval(repr(x)) should produce an object with the same value > and state as x, or it should cause a SyntaxError. > > We should avoid ever having it *succeed* and produce the *wrong* x. Fine by me. > ... > Honestly i'm really surprised that such a convoluted hack as the > suggestion to "special-case the snot out of strings" would come > from Tim, and more surprised that it actually got so much airtime. That thread tapped into real and widespread unhappiness with what's displayed at an interactive prompt today. That's why it got so much airtime -- no mystery there. As above, your objections to str() reduce to its behavior for strings specifically (I have more objections than just that -- str() should "get passed down" too), hence "str() special-casing the snot out of strings" was a direct hack to address that specific complaint. > Doing this special-case mumbo-jumbo would be even worse! Look: > > (in a hypothetical Python-with-snotless-str()...) > > >>> a = '\\' > >>> b = '\'' I'd actually like to use euroquotes for str(string) -- don't throw the Latin-1 away with your outrage . Whatever, examples with backslashes are non-starters, since newbies can't make any sense out of their doubling under repr() today either (if it's not a FAQ, it should be -- I've certainly had to explain it often enough!). > ...much later... > > >>> a > '\' > >>> '\' > File "", line 1 > '\' > ^ > SyntaxError: invalid token > > (at this point i am envisioning the user screaming, "But that's > what YOU said!") Nobody ever promised that eval(str(x)) == x -- if they want that, they should use repr() or backticks. Today they get >>> a '\\' and scream "Huh?! I thought that was only supposed to be ONE backslash!". Or someone in Europe tries to look at a list of strings, or a simple dict keyed by names, and gets back a god-awful mish-mash of octal backslash escapes (and str() can't be used today to stop that either, since str() "isn't passed down"). Compared to that, confusion over explicit backslashes strikes me as trivial. > [various examples of ambiguous output] That's why it's called a hack . Last time I corresponded with Guido about it, he was leaning toward using angle brackets (<>) instead. That would take away the temptation to believe you should be able to type the same thing back in and have it do something reasonable. > Tim's snot-removal algorithm forces the user to *infer* the rules > of snot removal, remember them, and tentatively apply them to > everything they see (since they still can't be sure whether snot > has been removed from what they are seeing). Not at all. "Tim's snot-removal algorithm" didn't remove anything ("removal" is an adjective I don't believe I've seen applied to it before). At the time it simply did str() and stuck a pair of quotes around the result. The (passed down) str() was the important part; how it's decorated to say "and, btw, it's a string" is the teensy tail of a flea that's killing the whole dog <0.9 wink>. If we had Latin-1, we could use euroquotes for this. If we had control over the display, we could use a different color or font. If we stick to 7-bit ASCII, we have to do *something* irritating. So here's a different idea for SSCTSOOS: escape quote chars and backslashes (like repr()) as needed, but leave everything else alone (like str()). Then you can have fun stringing N adjacent backslashes together , and other people can use non-ASCII characters without going mad. What I want *most*, though, is for ssctsoos() to get passed down (from container to containee), and for it to be the default action. > ... > As for the suggestion to add an interpreter hook to __builtins__ > such that you can supply your own display routine, i'm all for it. > Great idea there. Same here! But I reject going on from there to say "and since Python lets you do it yourself, Python isn't obligated to try harder itself". anything-to-keep-octal-escapes-out-of-a-unicode-world-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 06:39:17 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 00:39:17 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: <200004072014.QAA27700@eric.cnri.reston.va.us> Message-ID: <000201bfa1dd$90581800$172d153f@tim> [Guido] > ... > We still have a dilemma though... People using the interactive > interpreter to perform some specific task (e.g. NumPy users), rather > than to learn about Python, want str(), and actually I agree with them > there. And if they're using something fancier than NumPy arrays, they want str() to get passed down from containers to containees too. BTW, boosting the number of digits repr displays is likely to make NumPy users even unhappier so long as repr() is used at the prompt (they'll be very happy to be able to transport doubles exactly across machines via repr(), but won't want to see all the noise digits all the time). > How can we give everybody what they want? More than one display function, user-definable and user-settable, + a change in the default setting. From gstein at lyra.org Sun Apr 9 11:28:18 2000 From: gstein at lyra.org (Greg Stein) Date: Sun, 9 Apr 2000 02:28:18 -0700 (PDT) Subject: [Python-Dev] PYTHON_API_VERSION and threading In-Reply-To: <200004071313.JAA27132@eric.cnri.reston.va.us> Message-ID: On Fri, 7 Apr 2000, Guido van Rossum wrote: > > Something that just struck me: couldn't we use a couple of bits in the > > PYTHON_API_VERSION to check various other things that make dynamic modules > > break? WITH_THREAD is the one I just ran in to, but there's a few others such > > as the object refcounting statistics and platform-dependent things like the > > debug/nodebug compilation on Windows. > > I'm curious what combination didn't work? The thread APIs are > supposed to be designed so that all combinations work -- the APIs are > always present, they just don't do anything in the unthreaded > version. If an extension is compiled without threads, well, then it > won't release the interpreter lock, of course, but otherwise there > should be no bad effects. But if you enable "free threading" or "trace refcounts", then the combinations will not work. This is because these two options modify very basic things like Py_INCREF/DECREF. To help prevent mismatches, they do some monkey work with redefining a Python symbol (the InitModule thingy). Jack's idea of using PYTHON_API_VERSION is a cleaner approach to preventing imcompatibilities. > The debug issue on Windows is taken care of by a DLL naming > convention: the debug versions are named spam_d.dll (or .pyd). It would be nice to have it at the code level, too. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping at lfw.org Sun Apr 9 12:46:41 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 03:46:41 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000201bfa0fb$9af44b40$bc2d153f@tim> Message-ID: In a previous message, i wrote: > > It's very jarring to type something in, and have the interpreter > > give you back something that looks very different. [...] > > It breaks a fundamental rule of consistency, and that damages the user's > > trust in the system or their understanding of the system. Then on Fri, 7 Apr 2000, Tim Peters replied: > If they're surprised by this, they indeed don't understand the arithmetic at > all! This is an argument for using a different form of arithmetic, not for > lying about reality. This is not lying! If you type in "3.1416" and Python says "3.1416", then indeed it is the case that "3.1416" is a correct way to type in the floating-point number being expressed. So "3.1415999999999999" is not any more truthful than "3.1416" -- it's just more annoying. I just tried this in Python 1.5.2+: >>> .1 0.10000000000000001 >>> .2 0.20000000000000001 >>> .3 0.29999999999999999 >>> .4 0.40000000000000002 >>> .5 0.5 >>> .6 0.59999999999999998 >>> .7 0.69999999999999996 >>> .8 0.80000000000000004 >>> .9 0.90000000000000002 Ouch. I wrote: > > (What do you do then, start explaining the IEEE double representation > > to your CP4E beginner?) Tim replied: > As above. repr() shouldn't be used at the interactive prompt anyway (but > note that I did not say str() should be). What, then? Introduce a third conversion routine and further complicate the issue? I don't see why it's necessary. I wrote: > > What should really happen is that floats intelligently print in > > the shortest and simplest manner possible Tim replied: > This can be done, but only if Python does all fp I/O conversions entirely on > its own -- 754-conforming libc routines are inadequate for this purpose Not "all fp I/O conversions", right? Only repr(float) needs to be implemented for this particular purpose. Other conversions like "%f" and "%g" can be left to libc, as they are now. I suppose for convenience's sake it may be nice to add another format spec so that one can ask for this behaviour from the "%" operator as well, but that's a separate issue (perhaps "%r" to insert the repr() of an argument of any type?). > For background and code, track down "How To Print Floating-Point Numbers > Accurately" by Steele & White, and its companion paper (s/Print/Read/) Thanks! I found 'em. Will read... I suggested: > > def smartrepr(x): > > p = 17 > > while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 > > return '%%.%df' % p % x Tim replied: > This merely exposes accidents in the libc on the specific platform you run > it. That is, after > > print smartrepr(x) > > on IEEE-754 platform A, reading that back in on IEEE-754 platform B may not > yield the same number platform A started with. That is not repr()'s job. Once again: repr() is not for the machine. It is not part of repr()'s contract to ensure the kind of platform-independent conversion you're talking about. It prints out the number in a way that upholds the eval(repr(x)) == x contract for the system you are currently interacting with, and that's good enough. If you wanted platform-independent serialization, you would use something else. As long as the language reference says "These represent machine-level double precision floating point numbers. You are at the mercy of the underlying machine architecture and C implementation for the accepted range and handling of overflow." and until Python specifies the exact sizes and behaviours of its floating-point numbers, you can't expect these kinds of cross-platform guarantees anyway. Here are the expectations i've come to have: str()'s contract: - if x is a string, str(x) == x - otherwise, str(x) is a reasonable string coercion from x repr()'s contract: - if repr(x) is syntactically valid, eval(repr(x)) == x - repr(x) displays x in a safe and readable way - for objects composed of basic types, repr(x) reflects what the user would have to say to produce x pickle's contract: - pickle.dumps(x) is a platform-independent serialization of the value and state of object x -- ?!ng From ping at lfw.org Sun Apr 9 12:33:00 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 03:33:00 -0700 (PDT) Subject: [Python-Dev] str() for interpreter output In-Reply-To: <000101bfa1dd$8c382f80$172d153f@tim> Message-ID: On Sun, 9 Apr 2000, Tim Peters wrote: > You later say (echoing Donn Cave) > > > repr() is for the human, not for the machine > > but that contradicts the docs and the design. What you mean to say > is "the thing that the interactive prompt uses by default *should* be for > the human, not for the machine" -- which repr() is not. No, what i said is what i said. Let's try this again: repr() is not for the machine. The documentation for __repr__ says: __repr__(self) Called by the repr() built-in function and by string conversions (reverse quotes) to compute the "official" string representation of an object. This should normally look like a valid Python expression that can be used to recreate an object with the same value. It only suggests that the output "normally look like a valid Python expression". It doesn't require it, and certainly doesn't imply that __repr__ should be the standard way to turn an object into a platform-independent serialization. > This is way oversold: str() also supplies "[" for lists, "(" for tuples, > "{" for dicts, and "<" for instances of classes that don't override __str__. > The only difference between repr() and str() in this listing of faux terror > is when they're applied to strings. Right, and that is exactly the one thing that breaks everything: because strings are the most dangerous things to display raw, they can appear like anything, and break all the rules in one fell swoop. > > Granted, repr() cannot always produce an exact reconstruction of an > > object. repr() is not a serialization mechanism! > > To the contrary, many classes and types implement repr() for that very > purpose. It's not universal but doesn't need to be. If they want to, that's fine. In general, however, repr() is not for the machine. If you are using repr(), it's because you are expecting a human to look at the thing at some point. > > We have 'pickle' for that. > > pickles are unreadable by humans; that's why repr() is often preferred. Precisely. You just said it yourself: repr() is for humans. That is why repr() cannot be mandated as a serialization mechanism. There are two goals at odds here: readability and serialization. You can't have both, so you must prioritize. Pickles are more about serialization than about readability; repr is more about readability than about serialization. repr() is the interpreter's way of communicating with the human. It makes sense that e.g. the repr() of a string that you see printed by the interpreter looks just like what you would type in to produce the same string, because the interpreter and the human should speak and understand the same language as much as possible. > > >>> a = '\\' > > >>> b = '\'' > > I'd actually like to use euroquotes for str(string) -- don't throw the > Latin-1 away with your outrage . And no, even if you argue that we need to have something else, whatever you want to call it, it's not called 'str'. 'str' is "coerce to string". If you coerce an object into the type it's already in, it must not change. So, if x is a string, then str(x) must == x. > Whatever, examples with backslashes > are non-starters, since newbies can't make any sense out of their doubling > under repr() today either (if it's not a FAQ, it should be -- I've certainly > had to explain it often enough!). It may not be easy, but at least it's *consistent*. Eventually, you can't avoid the problem of escaping characters, and you just have to learn how that works, and that's that. Introducing yet a different way of escaping things won't help. Or, to put it another way: to write Python, it is required that you understand how to read and write escaped strings. Either you learn just that, or you learn that plus another, different way to read escaped-strings-as-printed-by-the-interpreter. The second case clearly requires you to learn and remember more. > Nobody ever promised that eval(str(x)) == x -- if they want that, they > should use repr() or backticks. Today they get > > >>> a > '\\' > > and scream "Huh?! I thought that was only supposed to be ONE backslash!". You have to understand this at some point. You can't get around it. Changing the way the interpreter prints things won't save anyone the trouble of learning it. > Or someone in Europe tries to look at a list of strings, or a simple dict > keyed by names, and gets back a god-awful mish-mash of octal backslash > escapes (and str() can't be used today to stop that either, since str() > "isn't passed down"). This is a pretty sensible complaint to me. I don't use characters beyond 0x7f often, but i can empathize with the hassle. As you suggested, this could be solved by having the built-in container types do something nicer with str(), such as repr without escaping characters beyond 0x7f. (However, characters below 0x20 are definitely dangerous to the terminal, and would have to be escaped regardless.) > Not at all. "Tim's snot-removal algorithm" didn't remove anything > ("removal" is an adjective I don't believe I've seen applied to it before). Well, if you "special-case the snot OUT of strings", then you're removing snot, aren't you? :) > What I want *most*, though, is for ssctsoos() to get passed down (from > container to containee), and for it to be the default action. Getting it passed down as str() seems okay to me. Making it the default action, in my (naturally) subjective opinion, is Right Out if it means that eval(what_the_interpreter_prints_for(x)) == x no longer holds for objects composed of the basic built-in types. -- ?!ng From tismer at tismer.com Sun Apr 9 15:07:53 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 09 Apr 2000 15:07:53 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: Message-ID: <38F080A9.16DE05B8@tismer.com> Ok, just a word (carefully:) Ka-Ping Yee wrote: ... > I just tried this in Python 1.5.2+: > > >>> .1 > 0.10000000000000001 > >>> .2 > 0.20000000000000001 > >>> .3 > 0.29999999999999999 Agreed that this is not good. ... > repr()'s contract: > - if repr(x) is syntactically valid, eval(repr(x)) == x > - repr(x) displays x in a safe and readable way > - for objects composed of basic types, repr(x) reflects > what the user would have to say to produce x This sounds reasonable. BTW my problem did not come up by typing something in, but I just rounded a number down to 3 digits past the dot. Then, as usual, I just let the result drop from the prompt, without prefixing it with "print". repr() was used, and the result was astonishing. Here is the problem, as I see it: You say if you type 3.1416, you want to get exactly this back. But how should Python know that you typed it in? Same in my case: I just rounded to 3 digits, but how should Python know about this? And what do you expect when you type in 3.14160, do you want the trailing zero preserved or not? Maybe we would need to carry exactness around for numbers. Or even have a different float type for cases where we want exact numbers? Keyboard entry and rounding produce exact numbers. Simple operations between exact numbers would keep exactness, higher level functions would probably not. I think we dlved into a very difficult domain here. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From ping at lfw.org Sun Apr 9 19:24:07 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 9 Apr 2000 10:24:07 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F080A9.16DE05B8@tismer.com> Message-ID: On Sun, 9 Apr 2000, Christian Tismer wrote: > Here is the problem, as I see it: > You say if you type 3.1416, you want to get exactly this back. > But how should Python know that you typed it in? > Same in my case: I just rounded to 3 digits, but how > should Python know about this? > > And what do you expect when you type in 3.14160, do you want > the trailing zero preserved or not? It's okay for the zero to go away, because it doesn't affect the value of the number. (Carrying around a significant-digit count or error range with numbers is another issue entirely, and a very thorny one at that.) I think "fewest digits needed to distinguish the correct value" will give good and least-surprising results here. This method guarantees: - If you just type a number in and the interpreter prints it back, it will never respond with more junk digits than you typed. - If you type in what the interpreter displays for a float, you can be assured of getting the same value. > Maybe we would need to carry exactness around for numbers. > Or even have a different float type for cases where we want > exact numbers? Keyboard entry and rounding produce exact numbers. If you mean a decimal representation, yes, perhaps we need to explore that possibility a little more. -- ?!ng "All models are wrong; some models are useful." -- George Box From tismer at tismer.com Sun Apr 9 20:53:51 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 09 Apr 2000 20:53:51 +0200 Subject: [Python-Dev] Round Bug in Python 1.6? References: Message-ID: <38F0D1BF.E5ECA4E5@tismer.com> Ka-Ping Yee wrote: > > On Sun, 9 Apr 2000, Christian Tismer wrote: > > Here is the problem, as I see it: > > You say if you type 3.1416, you want to get exactly this back. > > But how should Python know that you typed it in? > > Same in my case: I just rounded to 3 digits, but how > > should Python know about this? > > > > And what do you expect when you type in 3.14160, do you want > > the trailing zero preserved or not? > > It's okay for the zero to go away, because it doesn't affect > the value of the number. (Carrying around a significant-digit > count or error range with numbers is another issue entirely, > and a very thorny one at that.) > > I think "fewest digits needed to distinguish the correct value" > will give good and least-surprising results here. This method > guarantees: Hmm, I hope I understood. Oh, wait a minute! What is the method? What is the correct value? If I type >>> 0.1 0.10000000000000001 >>> 0.10000000000000001 0.10000000000000001 >>> There is only one value: The one which is in the machine. Would you think it is ok to get 0.1 back, when you actually *typed* 0.10000000000000001 ? -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim_one at email.msn.com Sun Apr 9 21:42:11 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 15:42:11 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F080A9.16DE05B8@tismer.com> Message-ID: <000101bfa25b$b39567e0$812d153f@tim> [Christian Tismer] > ... > Here is the problem, as I see it: > You say if you type 3.1416, you want to get exactly this back. > > But how should Python know that you typed it in? > Same in my case: I just rounded to 3 digits, but how > should Python know about this? > > And what do you expect when you type in 3.14160, do you want > the trailing zero preserved or not? > > Maybe we would need to carry exactness around for numbers. > Or even have a different float type for cases where we want > exact numbers? Keyboard entry and rounding produce exact numbers. > Simple operations between exact numbers would keep exactness, > higher level functions would probably not. > > I think we dlved into a very difficult domain here. "This kind of thing" is hopeless so long as Python uses binary floating point. Ping latched on to "shortest" conversion because it appeared to solve "the problem" in a specific case. But it doesn't really solve anything -- it just shuffles the surprises around. For example, >>> 3.1416 - 3.141 0.00059999999999993392 >>> Do "shorest conversion" (relative to the universe of IEEE doubles) instead, and it would print 0.0005999999999999339 Neither bears much syntactic resemblance to the 0.0006 the numerically naive "expect". Do anything less than the 16 significant digits shortest conversion happens to produce in this case, and eval'ing the string won't return the number you started with. So "0.0005999999999999339" is the "best possible" string repr can produce (assuming you think "best" == "shortest faithful, relative to the platform's universe of possibilities", which is itself highly debatable). If you don't want to see that at the interactive prompt, one of two things has to change: A) Give up on eval(repr(x)) == x for float x, even on a single machine. or B) Stop using repr by default. There is *no* advantage to #A over the long haul: lying always extracts a price, and unlike most of you , I appeared to be the lucky email recipient of the passionate gripes about repr(float)'s inadequacy in 1.5.2 and before. Giving a newbie an illusion of comfort at the cost of making it useless for experts is simply nuts. The desire for #B pops up from multiple sources: people trying to use native non-ASCII chars in strings; people just trying to display docstrings without embedded "\012" (newline) and "\011" (tab) escapes; and people using "big" types (like NumPy arrays or rationals) where repr() can produce unboundedly more info than the interactive user typically wants to see. It *so happens* that str() already "does the right thing" in all 3 of the last three points, and also happens to produce "0.0006" for the example above. This is why people leap to: C) Use str by default instead of repr. But str doesn't pass down to containees, and *partly* does a wrong thing when applied to strings, so it's not suitable either. It's *more* suitable than repr, though! trade-off-ing-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 21:42:19 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 15:42:19 -0400 Subject: [Python-Dev] str() for interpreter output In-Reply-To: Message-ID: <000201bfa25b$b7e7ab00$812d153f@tim> [Ping] > No, what i said is what i said. > > Let's try this again: > > repr() is not for the machine. Ping, believe me, I heard that the first 42 times . If it wasn't clear before, I'll spell it out: we don't agree on this, and I didn't agree with Donn Cave when he first went down this path. repr() is a noble attempt to be usable by both human and machine. > The documentation for __repr__ says: > > __repr__(self) Called by the repr() built-in function and by > string conversions (reverse quotes) to compute the "official" > string representation of an object. This should normally look > like a valid Python expression that can be used to recreate an > object with the same value. Additional docs are in the Built-in Functions section of the Library Ref (for repr() and str()). > It only suggests that the output "normally look like a valid > Python expression". It doesn't require it, and certainly doesn't > imply that __repr__ should be the standard way to turn an object > into a platform-independent serialization. Alas, the docs for repr and str are vague to the point of painfulness. Guido's *intent* is more evident in later c.l.py posts, and especially in what the implementation *does*: for at least all of ints, longs, floats, complex numbers and strings, and dicts, lists and tuples composed of those recursively, the 1.6 repr produces a faithful and platform-independent eval'able string composed of 7-bit ASCII printable characters. For floats and complex numbers, bit-for-bit reproducibility relies on the assumption that the platforms are IEEE-754, but all current Windows, Mac and Unix platforms (even Psion's EPOC32) *are*. So when you later say > There are two goals at odds here: readability and serialization. > You can't have both, sorry, but the 1.6 repr() implementation already meets both goals for a great many builtin types (as well as for dozens of classes & types I've implemented, and likely hundreds of classes & types others have implemented -- and there would be twice as many if people weren't abusing repr() to do what str() was intended to do so that the interactive prompt hehaves reasonably). > If you are using repr(), it's because you are expecting a human to > look at the thing at some point. Often, yes. More often it's because I expect a human to *edit* it (dump repr to a text file, fiddle it, then read it back in and eval it -- poor man's database), which they can't reasonably be expected to do with a pickle. Often also it's just a way to send a data structure in email, without needing to attach tedious instructions for how to use pickle to decipher it. >> pickles are unreadable by humans; that's why repr() is often preferred. > Precisely. You just said it yourself: repr() is for humans. *Partly*, yes. You assume an either/or here that I reject: repr() works best when it's designed for both == as Python itself does whenever possible. > That is why repr() cannot be mandated as a serialization mechanism. I haven't suggested to mandate it. It's a goal, and one which is often achievable, and appreciated when it is achieved. Nobody expects repr() to capture the state of an open file object -- but then they don't expect pickle to do that either . > There are two goals at odds here: readability and serialization. > You can't have both, so you must prioritize. Pickles are more > about serialization than about readability; repr is more about > readability than about serialization. Pickles are more about *efficient* machine serialization, sacrificing all readability to run as fast as possible. Sometimes that's the best choice; other times not. > repr() is the interpreter's way of communicating with the human. It is *a* way, sure, but for things like NumPy arrays and Rationals (and probably also for IEEE doubles) it's rarely the *best* way. > It makes sense that e.g. the repr() of a string that you see > printed by the interpreter looks just like what you would type > in to produce the same string, Yes, that's repr's job. But it's often *not* what the interactive user *wants*. You don't want it either! You later say > Right Out if it means that > > eval(what_the_interpreter_prints_for(x)) == x > > no longer holds for objects composed of the basic built-in types. and that implies the shortest string the prompt can display for 3.1416 - 3.141 is 0.0005999999999999339 (see reply to Christian for details on that example). Do you really want to get that string at the prompt? If you have a NumPy array with a million elements, do you really want the interpreter to display all of them -- and in ~17 different widths? If you're using one of my Rational classes, do you really want to see a ratio of multi-thousand digit longs instead of a nice 12-digit floating approximation? I use the interactive prompt a *lot* -- the current behavior plain sucks, starting about 10 minutes after you finish the Python Tutorial <0.7 wink>. > And no, even if you argue that we need to have something else, > whatever you want to call it, it's not called 'str'. Yes, I've said repeatedly that both str() and repr() are unsuitable. That's where SSCTSOOS started, as str() is *more* suitable for more people more of the time than is repr() -- but still isn't enough. > ... > Or, to put it another way: to write Python, it is required that > you understand how to read and write escaped strings. Either > you learn just that, or you learn that plus another, different > way to read escaped-strings-as-printed-by-the-interpreter. The > second case clearly requires you to learn and remember more. You need to learn whatever it takes to get the job done. Since the current alternatives do not get the job done, yes, if anything is ever introduced that *does* get the job done, there's more to learn. Complexity isn't necessarily evil; gratuitous complexity is evil. > ... > (However, characters below 0x20 are definitely dangerous to the terminal, > and would have to be escaped regardless.) They're no danger on any platform I use, and at least in MS-DOS they're mapped to useful graphics characters. Python has no way to know what's dangerous, and gets in the way by trying to guess. Even if x does have control characters that are dangerous, the user will get screwed as soon as they do print x unless you want (the implied) str() to start escaping "dangerous" characters too. Safety and usefulness are definitely at odds here, and I favor usefulness. If they want saftey, let 'em use Java . > Getting it passed down as str() seems okay to me. Making it > the default action, in my (naturally) subjective opinion, is > Right Out if it means that > > eval(what_the_interpreter_prints_for(x)) == x > > no longer holds for objects composed of the basic built-in types. Whereas in my daily use, this property is usually a *wrong* thing to shoot for at an interactive prompt (but is a great thing for repr() to shoot for). When I want eval'ability, it's just a pair of backticks away; by default, I'd rather see something *friendly*. If I type "ping" at the prompt, I don't want to see a second-by-second account of your entire life history . the-best-thing-to-do-with-most-info-is-to-suppress-it-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 22:14:17 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:14:17 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000301bfa260$2f161640$812d153f@tim> [Tim] >> If they're surprised by this, they indeed don't understand the >> arithmetic at all! This is an argument for using a different form of >> arithmetic, not for lying about reality. > This is not lying! Yes, I overstated that. It's not lying, but I defy anyone to explain the full truth of it in a way even Guido could understand <0.9 wink>. "Shortest conversion" is a subtle concept, requiring knowledge not only of the mathematical value, but of details of the HW representation. Plain old "correct rounding" is HW-independent, so is much easier to *fully* understand. And in things floating-point, what you don't fully understand will eventually burn you. Note that in a machine with 2-bit floating point, the "shortest conversion" for 0.75 is the string "0.8": this should suggest the sense in which "shortest conversion" can be actively misleading too. > If you type in "3.1416" and Python says "3.1416", then indeed it is the > case that "3.1416" is a correct way to type in the floating-point number > being expressed. So "3.1415999999999999" is not any more truthful than > "3.1416" -- it's just more annoying. Yes, shortest conversion is *defensible*. But Python has no code to implement that now, so it's not an option today. > I just tried this in Python 1.5.2+: > > >>> .1 > 0.10000000000000001 > >>> .2 > 0.20000000000000001 > >>> .3 > 0.29999999999999999 > >>> .4 > 0.40000000000000002 > >>> .5 > 0.5 > >>> .6 > 0.59999999999999998 > >>> .7 > 0.69999999999999996 > >>> .8 > 0.80000000000000004 > >>> .9 > 0.90000000000000002 > > Ouch. As shown in my reply to Christian, shortest conversion is not a cure for this "gosh, it printed so much more than I expected it to"; it only appears to "fix it" in the simplest examples. So long as you want eval(what's_diplayed) == what's_typed, this is unavoidable. The only ways to avoid that are to use a different arithmetic, or stop using repr() at the prompt. >> As above. repr() shouldn't be used at the interactive prompt >> anyway (but note that I did not say str() should be). > What, then? Introduce a third conversion routine and further > complicate the issue? I don't see why it's necessary. Because I almost never want current repr() or str() at the prompt, and even you don't want 3.1416-3.141 to display 0.0005999999999999339 (which is the least you can print and have eval return the true answer). >>> What should really happen is that floats intelligently print in >>> the shortest and simplest manner possible >> This can be done, but only if Python does all fp I/O conversions >> entirely on its own -- 754-conforming libc routines are inadequate >> for this purpose > Not "all fp I/O conversions", right? Only repr(float) needs to > be implemented for this particular purpose. Other conversions > like "%f" and "%g" can be left to libc, as they are now. No, all, else you risk %f and %g producing results that are inconsistent with repr(), which creates yet another set of incomprehensible surprises. This is not an area that rewards half-assed hacks! I'm intimately familiar with just about every half-assed hack that's been tried here over the last 20 years -- they never work in the end. The only approach that ever bore fruit was 754's "there is *a* mathematically correct answer, and *that's* the one you return". Unfortunately, they dropped the ball here on float<->string conversions (and very publicly regret that today). > I suppose for convenience's sake it may be nice to add another > format spec so that one can ask for this behaviour from the "%" > operator as well, but that's a separate issue (perhaps "%r" to > insert the repr() of an argument of any type?). %r is cool! I like that. >>> def smartrepr(x): >>> p = 17 >>> while eval('%%.%df' % (p - 1) % x) == x: p = p - 1 >>> return '%%.%df' % p % x >> This merely exposes accidents in the libc on the specific >> platform you run it. That is, after >> >> print smartrepr(x) >> >> on IEEE-754 platform A, reading that back in on IEEE-754 ?> platform B may not yield the same number platform A started with. > That is not repr()'s job. Once again: > > repr() is not for the machine. And once again, I didn't and don't agree with that, and, to save the next seven msgs, never will . > It is not part of repr()'s contract to ensure the kind of > platform-independent conversion you're talking about. It > prints out the number in a way that upholds the eval(repr(x)) == x > contract for the system you are currently interacting with, and > that's good enough. It's not good enough for Java and Scheme, and *shouldn't* be good enough for Python. The 1.6 repr(float) is already platform-independent across IEEE-754 machines (it's not correctly rounded on most platforms, but *does* print enough that 754 guarantees bit-for-bit reproducibility) -- and virtually all Python platforms are IEEE-754 (I don't know of an exception -- perhaps Python is running on some ancient VAX?). The std has been around for 15+ years, virtually all platforms support it fully now, and it's about time languages caught up. BTW, the 1.5.2 text-mode pickle was *not* sufficient for reproducing floats either, even on a single machine. It is now -- but thanks to the change in repr. > If you wanted platform-independent serialization, you would > use something else. There is nothing else. In 1.5.2 and before, people mucked around with binary dumps hoping they didn't screw up endianness. > As long as the language reference says > > "These represent machine-level double precision floating > point numbers. You are at the mercy of the underlying > machine architecture and C implementation for the accepted > range and handling of overflow." > > and until Python specifies the exact sizes and behaviours of > its floating-point numbers, you can't expect these kinds of > cross-platform guarantees anyway. There's nothing wrong with exceeding expectations . Despite what the reference manual says, virtually all machines use identical fp representations today (this wasn't true when the text above was written). > str()'s contract: > - if x is a string, str(x) == x > - otherwise, str(x) is a reasonable string coercion from x The last is so vague as to say nothing. My counterpart-- at least equally vague --is - otherwise, str(x) is a string that's easy to read and contains a compact summary indicating x's nature and value in general terms > repr()'s contract: > - if repr(x) is syntactically valid, eval(repr(x)) == x > - repr(x) displays x in a safe and readable way I would say instead: - every character c in repr(x) has ord(c) in range(32, 128) - repr(x) should strive to be easily readable by humans > - for objects composed of basic types, repr(x) reflects > what the user would have to say to produce x Given your first point, does this say something other than "for basic types, repr(x) is syntactically valid"? Also unclear what "basic types" means. > pickle's contract: > - pickle.dumps(x) is a platform-independent serialization > of the value and state of object x Since pickle can't handle all objects, this exaggerates the difference between it and repr. Give a fuller description, like - If pickle.dumps(x) is defined, pickle.loads(pickle.dumps(x)) == x and it's the same as the first line of your repr() contract, modulo s/syntactically valid/is defined/ s/eval/pickle.loads/ s/repr/pickle.dumps/ The differences among all these guys remain fuzzy to me. but-not-surprising-when-talking-about-what-people-like-to-look-at-ly y'rs - tim From tim_one at email.msn.com Sun Apr 9 22:14:25 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:14:25 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: Message-ID: <000401bfa260$33e6ff40$812d153f@tim> [Ping] > ... > I think "fewest digits needed to distinguish the correct value" > will give good and least-surprising results here. This method > guarantees: > > - If you just type a number in and the interpreter > prints it back, it will never respond with more > junk digits than you typed. Note the example from another reply of a machine with 2-bit floats. There the user would see: >>> 0.75 # happens to be exactly representable on this machine 0.8 # because that's the shortest string needed on this machine # to get back 0.75 internally >> This kind of surprise is inherent in the approach, not specific to 2-bit machines . BTW, I don't know that it will never print more digits than you type: did you prove that? It's plausible, but many plausible claims about fp turn out to be false. > - If you type in what the interpreter displays for a > float, you can be assured of getting the same value. This isn't of value for most interactive use -- in general you want to see the range of a number, not enough to get 53 bits exactly (that's beyond the limits of human "number sense"). It also has one clearly bad aspect: when printing containers full of floats, the number of digits printed for each will vary wildly from float to float. Makes for an unfriendly display. If the prompt's display function were settable, I'd probably plug in pprint! From tim_one at email.msn.com Sun Apr 9 22:25:19 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 16:25:19 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <38F0D1BF.E5ECA4E5@tismer.com> Message-ID: <000501bfa261$b9b5f3a0$812d153f@tim> [Christian] > Hmm, I hope I understood. > Oh, wait a minute! What is the method? What is the correct value? > > If I type > >>> 0.1 > 0.10000000000000001 > >>> 0.10000000000000001 > 0.10000000000000001 > >>> > > There is only one value: The one which is in the machine. > Would you think it is ok to get 0.1 back, when you > actually *typed* 0.10000000000000001 ? Yes, this is the kind of surprise I sketched with the "2-bit machine" example. It can get more surprising than the above (where, as you suspect, "shortest conversion" yields "0.1" for both -- which, btw, is why reading it back in to a float type with more precision loses accuracy needlessly, which in turn is why 754 True Believers dislike it). repetitively y'rs - tim From akuchlin at mems-exchange.org Mon Apr 10 00:00:24 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Sun, 9 Apr 2000 18:00:24 -0400 (EDT) Subject: [Python-Dev] SRE: regex.set_syntax In-Reply-To: <14573.61191.486890.43591@seahag.cnri.reston.va.us> References: <200004061343.PAA20218@python.inrialpes.fr> <005e01bfa06d$ed80bda0$0500a8c0@secret.pythonware.com> <14573.61191.486890.43591@seahag.cnri.reston.va.us> Message-ID: <14576.64888.59263.386826@newcnri.cnri.reston.va.us> Fred L. Drake, Jr. writes: >maintained code. I would be surprised if Grail is the only large >application which uses "regex" for performance reasons, and we don't Zope is another, and there's even a ts_regex module hiding in Zope which tries to provide thread-safety on top of regex. --amk From tim_one at email.msn.com Mon Apr 10 04:40:03 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sun, 9 Apr 2000 22:40:03 -0400 Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <200004071918.PAA27474@eric.cnri.reston.va.us> Message-ID: <000401bfa296$13876e20$7da0143f@tim> [Moshe Zadka] > Just checking my newly bought "Guido Channeling" kit -- you mean str() > but special case the snot out of strings(TM), don't you [Guido] > Except I'm not sure what kind of special-casing should be happening. Welcome to the club. > Put quotes around it without worrying if that makes it a valid string > literal is one thought that comes to mind. If nothing else , Ping convinced me the temptation to type that back in will prove overwhelming. > Another approach might be what Tk's text widget does -- pass through > certain control characters (LF, TAB) and all (even non-ASCII) printing > characters, but display other control characters as \x.. escapes > rather than risk putting the terminal in a weird mode. This must be platform-dependent? Just tried this loop in Win95 IDLE, using Courier: >>> for i in range(256): print i, chr(i), Across the whole range, it just showed what Windows always shows in the Courier font (which is usually a (empty or filled) rectangle for most "control characters"). No \x escapes at all. BTW, note that Tk unhelpfully translates a request for "Courier New" into a request for "Courier", which aren't the same fonts under Windows! So if anyone tries this with the IDLE Windows defaults, and doesn't see all the special characters Windows assigns to the range 128-159 in Courier New, that's why -- most of them aren't assigned under Courier. > No quotes though. Hm, I kind of like this: when used as intended, it will > just display the text, with newlines and umlauts etc.; but when printing > binary gibberish, it will do something friendly. Can't be worse than what happens now . > There's also the issue of what to do with lists (or tuples, or dicts) > containing strings. If we agree on this: > > >>> "hello\nworld\n\347" # octal 347 is a cedilla > hello > world > ? > >>> I don't think there is agreement on this, because nothing in the output says "btw, this thing was a string". Is that worth preserving? "It depends" is the only answer I've got to that. > Then what should ("hello\nworld", "\347") show? I've got enough serious > complaints that I don't want to propose that it use repr(): > > >>> ("hello\nworld", "\347") > ('hello\nworld', '\347') > >>> > > Other possibilities: > > >>> ("hello\nworld", "\347") > ('hello > world', '?') > >>> > > or maybe > > >>> ("hello\nworld", "\347") > ('''hello > world''', '?') > >>> I like the last best. > Of course there's also the Unicode issue -- the above all assumes > Latin-1 for stdout. > > Still no closure, I think... It's curious how you invoke "closure" when and only when you don't know what *you* want to do . a-guido-divided-against-himself-cannot-stand-ly y'rs - tim From mhammond at skippinet.com.au Mon Apr 10 06:32:53 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 10 Apr 2000 14:32:53 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. Message-ID: [Im re-sending as the attachment caused this to be held up for administrative approval. Ive forwarded the attachement to Chris - anyone else just mail me for it] Ive struck a crash in the new trashcan mechanism (so I guess Chris is gunna pay the most attention here). Although I can only provoke this reliably in debug builds, I believe it also exists in release builds, but is just far more insidious. Unfortunately, I also can not create a simple crash case. But I _can_ provide info on how you can reliably cause the crash. Obviously only tested on Windows... * Go to http://lima.mudlib.org/~rassilon/p2c/, and grab the download, and unzip. * Replace "transformer.py" with the attached version (multi-arg append bites :-) * Ensure you have a Windows "debug" build available, built from CVS. * From the p2c directory, Run "python_d.exe gencode.py gencode.py" You will get a crash, and the debugger will show you are destructing a list, with an invalid object. The crash occurs about 1000 times after this code is first hit, and I can't narrow the crash condition down :-( If you open object.h, and disable the trashcan mechanism (by changing the "xx", as the comments suggest) then it runs fine. Hope this helps someone - Im afraid I havent a clue :-( Mark. From gstein at lyra.org Mon Apr 10 10:14:59 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 01:14:59 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils sysconfig.py In-Reply-To: <200004100117.VAA16514@kaluha.cnri.reston.va.us> Message-ID: Why aren't we getting diffs on these things? Is it because of the "distutils" root instead of the Python root? Just curious... thx, -g On Sun, 9 Apr 2000, Greg Ward wrote: > Update of /projects/cvsroot/distutils/distutils > In directory kaluha:/tmp/cvs-serv16499 > > Modified Files: > sysconfig.py > Log Message: > Added optional 'prefix' arguments to 'get_python_inc()' and > 'get_python_lib()'. > > > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Mon Apr 10 10:18:20 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 01:18:20 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils cmd.py In-Reply-To: <200004100130.VAA16590@kaluha.cnri.reston.va.us> Message-ID: [ damn... can't see the code... went and checked it out... ] On Sun, 9 Apr 2000, Greg Ward wrote: > Update of /projects/cvsroot/distutils/distutils > In directory kaluha:/tmp/cvs-serv16575 > > Modified Files: > cmd.py > Log Message: > Added a check for the 'force' attribute in '__getattr__()' -- better than > crashing when self.force not defined. This seems a bit silly. Why don't you simply define .force in the __init__ method? Better yet: make the other guys crash -- the logic is bad if they are using something that isn't supposed to be defined on that particular Command object. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov at inrialpes.fr Mon Apr 10 11:25:03 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 10 Apr 2000 11:25:03 +0200 (CEST) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000001bfa1c2$9e403a80$18a2143f@tim> from "Tim Peters" at Apr 08, 2000 09:26:23 PM Message-ID: <200004100925.LAA03689@python.inrialpes.fr> Tim Peters wrote: > > Suppose you're computing f(x) to 2 significant decimal digits, using 4-digit > arithmetic, and for some specific x0 f(x0) turns out to be 41.49 +- 3. > That's not enough to know whether it *should* round to 41 or 42. So you > need to try again with more precision. But how much? You might try 5 > digits next, and might get 41.501 +- 3, and you're still stuck. Try 6 next? > Might be a waste of effort. Try 20 next? Might *still* not be enough -- or > could just as well be that 7 would have been enough and you did 10x the work > you needed to do. Right. From what I understand, the dilemma is this: In order to round correctly, how much extra precision do we need, so that the range of uncertainity (+-3 in your example) does not contain the middle of two consecutive representable numbers (say 41.49 and 41.501). "Solving" the dilemma is predicting this extra precision so that the ranges of uncertainity does not contain the middle of two consecutive floats. Which in turn equals to calculating the min distance between the image of a number and the middle of two consecutive machine numbers. And that's what these guys have calculated for common functions in IEEE-754 double precision, with brute force, using an apparently original algorithm they have proposed. > > that's-what-you-get-when-you-refuse-to-define-results-ly y'rs - tim > I haven't asked for anything. It was just passive echoing with a good level of uncertainity :-). -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein at lyra.org Mon Apr 10 11:53:48 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 02:53:48 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] Unicode Patch Set 2000-04-10 In-Reply-To: <38F1A430.D70DF89@lemburg.com> Message-ID: On Mon, 10 Apr 2000, M.-A. Lemburg wrote: > The attached patch includes the following fixes and additions: >... > * '...%s...' % u"abc" now coerces to Unicode just like > string methods. Care is taken not to reevaluate already formatted > arguments -- only the first Unicode object appearing in the > argument mapping is looked up twice. Added test cases for > this to test_unicode.py. >... I missed a chance to bring this up on the first round of discussion, but is this really the right thing to do? We never coerce the string on the left based on operands. For example: if the operands are class instances, we call __str__ -- we don't call __coerce__. It seems a bit weird to magically revise the left operand. In many cases, a Unicode used as a string is used as a UTF-8 value. Why is that different in this case? Seems like a wierd special case. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Mon Apr 10 12:55:50 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 12:55:50 +0200 Subject: [Python-Dev] Re: [Patches] Unicode Patch Set 2000-04-10 References: Message-ID: <38F1B336.12B6707@lemburg.com> Greg Stein wrote: > > On Mon, 10 Apr 2000, M.-A. Lemburg wrote: > > The attached patch includes the following fixes and additions: > >... > > * '...%s...' % u"abc" now coerces to Unicode just like > > string methods. Care is taken not to reevaluate already formatted > > arguments -- only the first Unicode object appearing in the > > argument mapping is looked up twice. Added test cases for > > this to test_unicode.py. > >... > > I missed a chance to bring this up on the first round of discussion, but > is this really the right thing to do? We never coerce the string on the > left based on operands. For example: if the operands are class instances, > we call __str__ -- we don't call __coerce__. > > It seems a bit weird to magically revise the left operand. > > In many cases, a Unicode used as a string is used as a UTF-8 value. Why is > that different in this case? Seems like a wierd special case. It's not a special case: % works just like a method call and all string methods auto-coerce to Unicode in case a Unicode object is found among the arguments. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Mon Apr 10 13:19:51 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Mon, 10 Apr 2000 13:19:51 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: Message-ID: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> Greg Stein wrote: > In many cases, a Unicode used as a string is used as a UTF-8 value. Why is > that different in this case? Seems like a wierd special case. the whole "sometimes it's UTF-8, sometimes it's not" concept is one big mess (try using some existing string crunching code with unicode strings if you don't believe me -- using non-US input strings, of course). among other things, it's very hard to get things to work properly when string slicing and indexing no longer works as expected... I see two possible ways to solve this; rough proposals follow: ----------------------------------------------------------------------- 1. a java-like approach ----------------------------------------------------------------------- a) define *character* in python to be a unicode character b) provide two character containers: 8-bit strings and unicode strings. the former can only hold unicode characters in the range 0-255, the latter can hold characters from the full unicode character set (not entirely true for the current implementation, but that's not relevant here) given a string "s" of any string type, s[i] is *always* the i'th character. len(s) is always the number of characters in the string. len(s[i]) is 1. etc. c) string operations involving mixed types use the larger type for the return value. d) they raise TypeError if (c) doesn't make any sense. e) as before, 8-bit strings can also be used to store binary data, hold- ing *bytes* instead of characters. given an 8-bit string "b" used as a buffer, b[i] is always the i'th byte. len(b) is always the number of bytes in the buffer. binary buffers can be used to hold any external unicode encodings (utf-8, utf-16, etc), as well as non-unicode 8-bit encodings (iso-8859-x, cyrillic, far east, etc). there are no implicit conversions from buffers to strings; it's up to the programmer to spell that out when necessary. f) it's up to the programmer to keep track of what a given 8-bit string actually contains (strings, encoded characters, or some other kind of binary data). g) (optionally) change the language definition to say that source code is written in unicode, and provide an "encoding pragma" to tell the com- piler how to interpret any given source file. (maybe in 1.7?) (there are more issues here, but let's start with these) ----------------------------------------------------------------------- 2. a tcl-like approach ----------------------------------------------------------------------- a) change slicing, 8-bit regular expressions (etc) to handle UTF-8 byte sequences as characters. this opens one big can of worms... b) kill the worms. ----------------------------------------------------------------------- comments? (for obvious reasons, I'm especially interested in comments from people using non-ASCII characters on a daily basis...) Return-Path: Delivered-To: python-dev at python.org Received: from mr14.vic-remote.bigpond.net.au (mr14.vic-remote.bigpond.net.au [24.192.1.29]) by dinsdale.python.org (Postfix) with ESMTP id B8DCF1CD40 for ; Sun, 9 Apr 2000 20:53:33 -0400 (EDT) Received: from bobcat (CPE-144-132-23-166.vic.bigpond.net.au [144.132.23.166]) by mr14.vic-remote.bigpond.net.au (Pro-8.9.3/8.9.3) with SMTP id KAA21301 for ; Mon, 10 Apr 2000 10:55:59 +1000 (EST) From: "Mark Hammond" To: Date: Mon, 10 Apr 2000 10:55:39 +1000 Message-ID: X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2919.6700 Subject: [Python-Dev] Crash in new "trashcan" mechanism. Sender: python-dev-admin at python.org Errors-To: python-dev-admin at python.org X-BeenThere: python-dev at python.org X-Mailman-Version: 2.0beta2 Precedence: bulk List-Id: Python core developers Ive struck a crash in the new trashcan mechanism (so I guess Chris is gunna pay the most attention here). Although I can only provoke this reliably in debug builds, I believe it also exists in release builds, but is just far more insidious. Unfortunately, I also can not create a simple crash case. But I _can_ provide info on how you can reliably cause the crash. Obviously only tested on Windows... * Go to http://lima.mudlib.org/~rassilon/p2c/, and grab the download, and unzip. * Replace "transformer.py" with the attached version (multi-arg append bites :-) * Ensure you have a Windows "debug" build available, built from CVS. * From the p2c directory, Run "python_d.exe gencode.py gencode.py" You will get a crash, and the debugger will show you are destructing a list, with an invalid object. The crash occurs about 1000 times after this code is first hit, and I can't narrow the crash condition down :-( If you open object.h, and disable the trashcan mechanism (by changing the "xx", as the comments suggest) then it runs fine. Hope this helps someone - Im afraid I havent a clue :-( Mark. begin 666 transformer.py M(R!#;W!Y2 M+2!T6-L960N;F5T*0T*(R!&96)R=6%R M>2 Q.3DW+ at T*(PT*(PT*(R!4:&4@;W5T<'5T('1R964@:&%S('1H92!F;VQL M;W=I;F<@;F]D97,Z#0HC#0HC(%-O=7)C92!0>71H;VX@;&EN92 C)W, at 87!P M96%R(&%T('1H92!E;F0@;V8 at 96%C:"!O9B!A;&P@;V8@=&AE'!R,4YO M9&4L(&5X<'(R3F]D92P at 97AP'!R M,TYO9&4-"B,@=')Y9FEN86QL>3H)=')Y4W5I=&5.;V1E+"!F:6Y3nT94YO M9&4-"B,@=')Y97AC97!T. at ET'!R3F]D92P at 871T3$L('9A;#$I+" N+BXL("AK M97E.+"!V86Q.*2!=#0HC(&YO=#H)"65X<').;V1E#0HC(&-O;7!A'!R, at T*(PT*(R!#;VUP:6QE9"!A&]R. at E;(&YO9&4Q+" N+BXL(&YO9&5.(%T-"B, at 8FET86YD.@E;(&YO9&4Q M+" N+BXL(&YO9&5.(%T-"B,-"B, at 3W!E'!R3F]D92P@2LZ"6YO9&4-"B,@=6YA'!R*'1E>'0I#0H)=')E92 ]('!AR!]#0H@(" @9F]R('9A;'5E+"!N M86UE(&EN('-Y;6)O;"YS>6U?;F%M92YI=&5M7!E*'1R964I("$]('1Y<&4H6UTI. at T*(" @(" @ M=')E92 ]('!A'0I. at T*(" @("(B(E)E='5R;B!A(&UO9&EF:65D('!A7!E*&9I;&4I(#T]('1Y<&4H)R6UB;VPN9FEL95]I;G!U=#H-"@D@(')E='5R;B!S96QF+F9I M;&5?:6YPpH;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN979A;%]I;G!U M=#H-"@D@(')E='5R;B!S96QF+F5V86Q?:6YPpH;F]D95LQ.ETI#0H):68@ M;B ]/2!S>6UB;VPN;&%M8F1E9CH-"@D@(')E='5R;B!S96QF+FQA;6)D968H M;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN9G5N8V1E9CH-"@D@(')E='5R M;B!S96QF+F9U;F-D968H;F]D95LQ.ETI#0H):68@;B ]/2!S>6UB;VPN8VQA M7!E)RP@;BD-"@T* M("!D968@71H:6YG(&%B;W5T(&)E:6YG(")I;G1E'!R3F]D92D-"B @("!N;V1E'!R,BP at 97AP'!R M75T-"B @("!E>'!R,2 ]('-E;&8N8V]M7VYO9&4H;F]D96QI'!R,R ]('-E;&8N8V]M7VYO9&4H;F]D96QI M'!R,R ]($YO;F4-"B @ M("!E;'-E. at T*(" @(" @97AP'!R,R ]($YO;F4-"B @(" @(" @ M#0H@(" @;B ]($YO9&4H)V5X96,G+"!E>'!R,2P at 97AP'!R,2P at 97AP4YO9&4L(&5L5]S=&UT*'-E;&8L(&YO9&5L:7-T*3H-"B @(" C("=T2<@)SHG('-U:71E("=F:6YA;&QY)R G M.B<@6UB;VPN97AC M97!T7V-L875S93H-"B @(" @(')E='5R;B!S96QF+F-O;5]T2AN;V1E;&ES="D-"@T*(" @(')E='5R;B!S96QF+F-O;5]T6UB;VPN'!R("@G+"<@97AP'!R;&ES=#H at 97AP6UB;VPN;&%M8F1E9CH-"B @(" @(')E='5R;B!S96QF+FQA M;6)D968H;F]D96QI2 at G;W(G+"!N;V1E;&ES="D-"@T*("!D968 at 86YD7W1E7!E(#T@)VYO=&EN)PT*"0D@(&5L7!E(#T@)VES;F]T)PT*"2 @96QS93H-"@D)='EP92 ](%]C;7!?='EP97-; M;ELP75T-"@T*"2 @;&EN96YO(#T@;FQ;,5U;,ET-"@D@(')E7!E+"!S96QF+F-O;5]N;V1E*&YO9&5L:7-T6VE=*2D@*0T*#0H) M(R!W92!N965D(&$@'!R*2H-"B @("!R971U'!R*2H-"B @("!R971U&]R7V5X<'(@*"2 at G8FET86YD)RP@;F]D96QI'!R*'-E;&8L(&YO9&5L:7-T*3H-"B @("!N;V1E(#T@2TG+"!N;V1E*0T*(" @(" @;F]D92YL:6YE;F\@/2!N;V1E;&ES=%LP75LR M70T*(" @(&EF('0@/3T@=&]K96XN5$E,1$4Z#0H@(" @("!N;V1E(#T at 3F]D M92 at G:6YV97)T)RP@;F]D92D-"B @(" @(&YO9&4N;&EN96YO(#T@;F]D96QI M5]T'AX>"DI($YO9&5S*0T*(" @(",@#0H@(" @:68@;F]D95LP72 ]/2!T M;VME;BY.15=,24Y%. at T*(" @(" @2AS96QF+"!N;V1E;&ES="DZ#0H@(" @(R!T2(@(CHB('-U:71E#0H@ M(" @;B ]($YO9&4H)W1R>69I;F%L;'DG+"!S96QF+F-O;5]N;V1E*&YO9&5L M:7-T6S)=*2P@5]E M>&-E<'0Z("=T&-E<'0Z"2!;5')Y3F]D M92P at 6V5X8V5P=%]C;&%U&-E<'1?8VQA=7-E. at T*(" @ M(" @(" C(&5X8V5P=%]C;&%U&-E<'0G(%ME>'!R(%LG+"<@97AP M'!R,B ]($YO;F4-"B @(" @(" @8VQA=7-E'!R,BP@65X8V5P="'!R M;&ES="!O$5R6UB;VPN871O;3H-"B @(" @(" @("!R M86ES92!3>6YT87A%2P@;F]D95LM M,5TL(&%S6YT87A%2P@;F]D92P@ M87-S:6=N:6YG*3H-"B @("!T(#T@;F]D95LQ75LP70T*(" @(&EF('0@/3T@ M=&]K96XN3%!!4CH-"B @(" @(')A:7-E(%-Y;G1A>$5R2P@;F]D95LR72P at 87-S:6=N:6YG*0T*(" @(&EF('0@/3T@=&]K96XN3%-1 M0CH-"B @(" @(')E='5R;B!S96QF+F-O;5]S=6)S8W)I<'1L:7-T*'!R:6UA M6YT87A%7!E.B E2AS96QF+"!T>7!E M+"!N;V1E;&ES="DZ#0H@(" @(D-O;7!I;&4@)TY/1$4@*$]0($Y/1$4I*B<@ M:6YT;R H='EP92P at 6R!N;V1E,2P at +BXN+"!N;V1E3B!=*2XB#0H@(" @:68@ M;&5N*&YO9&5L:7-T*2 ]/2 Q. at T*(" @(" @7!E+"!I=&5M&-E<'0Z#0H@(" @("!P4YO9&4L(&YO9&5L:7-T*3H-"B @("!T(#T@;F]D96QI4YO9&4L(&YO9&5L:7-T6S)=+"!/ M4%]!4%!,62D-"@T*(" @(')A:7-E(%-Y;G1A>$5R4YO9&4L(&YO9&5L:7-T*3H-"B @("!I9B!N;V1E;&ES M=%LP72 A/2!T;VME;BY.04U%. at T*(" @(" @$5R'!R97-S:6]N("@E'1E;F1E9%]S;&EC:6YG#0H@ M(" @(R!S:6UP;&5?2 B6R(@'1E;F1E9%]S;&EC:6YG.B!P6UB;VPN'!R97-S:6]N('P@<')O M<&5R7W-L:6-E('P at 96QL:7!S:7,-"B @("!C:" ](&YO9&5;,5T-"B @("!I M9B!C:%LP72 ]/2!T;VME;BY$3U0 at 86YD(&YO9&5;,EU;,%T@/3T@=&]K96XN M1$]4. at T*(" @(" @'!R97-S:6]N M#0H@(" @(R!U<'!E2!B92!F=7)T:&5R('-L:6-I;F2!L;V]K:6YG#0H@(" @(R!F;W(@6UB;VPN6UB;VPN97AP6UB;VPN=&5S=&QI M6UB;VPN86YD7W1E'!R+ T*("!S>6UB;VPN>&]R7V5X<'(L#0H@('-Y;6)O;"YA;F1?97AP M'!R M+ T*("!S>6UB;VPN=&5R;2P-"B @7!E M6UB;VPN9G5N M8V1E9BP-"B @6UB;VPN'!R7W-T;70L#0H@('-Y;6)O;"YP6UB;VPN9&5L7W-T;70L#0H@('-Y;6)O;"YP87-S7W-T;70L#0H@('-Y;6)O M;"YB6UB;VPN8V]N=&ENe?&5C7W-T;70L#0H@('-Y;6)O;"YA6UB;VPN M9F]R7W-T;70L#0H@('-Y;6)O;"YT6UB;VPN=&5S=&QI6UB M;VPN86YD7W1E'!R;&ES="P-"B @'!R+ T*("!S M>6UB;VPN6UB;VPN9F%C=&]R+ T*("!S>6UB;VPN<&]W97(L M#0H@('-Y;6)O;"YA=&]M+ T*("!=#0H-"E]A6UB;VPN86YD7W1E'!R M+ T*("!S>6UB;VPN>&]R7V5X<'(L#0H@('-Y;6)O;"YA;F1?97AP'!R+ T*("!S D>6UB;VPN=&5R;2P-"B @; from gstein@lyra.org on Mon, Apr 10, 2000 at 01:18:20AM -0700 References: <200004100130.VAA16590@kaluha.cnri.reston.va.us> Message-ID: <20000410091101.B406@mems-exchange.org> On 10 April 2000, Greg Stein said: > On Sun, 9 Apr 2000, Greg Ward wrote: > > Modified Files: > > cmd.py > > Log Message: > > Added a check for the 'force' attribute in '__getattr__()' -- better than > > crashing when self.force not defined. > > This seems a bit silly. Why don't you simply define .force in the __init__ > method? Duhh, 'cause I'm stupid? No, that's not it. 'Cause I was doing this on a lazy Sunday evening and not really thinking about it? Yeah, I think that's it. There, I now define self.force in the Command class constructor. A wee bit cheesy (not all Distutils command classes need or use self.force, and it wouldn't always mean the same thing), but it means minimal code upheaval for now. > [ damn... can't see the code... went and checked it out... ] Oops, that was a CVS config thing. Fixed now -- I'll go checkin that change and we'll all see if it worked. Just as well it was off though -- I checked in a couple of big documentation updates this weekend, and who wants to see 30k of LaTeX patches in their inbox on Monday morning? ;-) Greg -- Greg Ward - software developer gward at mems-exchange.org MEMS Exchange / CNRI voice: +1-703-262-5376 Reston, Virginia, USA fax: +1-703-262-5367 From guido at python.org Mon Apr 10 16:01:58 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 10:01:58 -0400 Subject: [Python-Dev] "takeuchi": a unicode string on IDLE shell Message-ID: <200004101401.KAA00238@eric.cnri.reston.va.us> Can anyone answer this? I can reproduce the output side of this, and I believe he's right about the input side. Where should Python migrate with respect to Unicode input? I think that what Takeuchi is getting is actually better than in Pythonwin or command line (where he gets Shift-JIS)... --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Mon, 10 Apr 2000 22:49:45 +0900 From: "takeuchi" To: Subject: a unicode string on IDLE shell Dear Guido, I plaied your latest CPython(Python1.6a1) on Win98 Japanese version, and found a strange IDLE shell behavior. I'm not sure this is a bug or feacher, so I report my story anyway. When typing a Japanese string on IDLE shell with IME , Tk8.3 seems to convert it to a UTF-8 representation. Unfortunatly Python does not know this, it is dealt with an ordinary string. >>> s = raw_input(">>>") Type Japanese characters with IME for example $B$"(B (This is the first character of Japanese alphabet, Hiragana) >>> s '\343\201\202' # UTF-8 encoded >>> print s $B$"(B # A proper griph is appear on the screen Print statement on IDLE shell works fine with a UTF-8 encoded string,however,slice operation or len() does not work. # I know this is a right result So I have to convert this string with unicode(). >>> u = unicode(s) >>> u u'\u3042' >>> print u $B$"(B # A proper griph is appear on the screen Do you think this convertion is unconfortable ? I think this behavior is inconsistant with command line Python and PythonWin. If I want the same result on command line Python shell or PythonWin shell, I have to code as follows; >>> s = raw_input(">>>") Type Japanese characters with IME for example $B$"(B >>>s '\202\240' # Shift-JIS encoded >>> print s $B$"(B # A proper griph is appear on the screen >>> u = unicode(s,"mbcs") # if I use unicode(s) then UnicodeError is raised ! >>>print u.encode("mbcs") # if I use print u then wrong griph is appear $B$"(B # A proper griph is appear on the screen This difference is confusing !! I do not have the best solution for this annoyance, I hope at least IDLE shell and PythonWin shell would have the same behavior . Thank you for reading. Best Regards, takeuchi ------- End of Forwarded Message From tismer at tismer.com Mon Apr 10 16:24:24 2000 From: tismer at tismer.com (Christian Tismer) Date: Mon, 10 Apr 2000 16:24:24 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F1E418.FF191AEE@tismer.com> About extensions and Trashcan. Mark Hammond wrote: ... > Ive struck a crash in the new trashcan mechanism (so I guess Chris > is gunna pay the most attention here). Although I can only provoke > this reliably in debug builds, I believe it also exists in release > builds, but is just far more insidious. > > Unfortunately, I also can not create a simple crash case. But I > _can_ provide info on how you can reliably cause the crash. > Obviously only tested on Windows... ... > You will get a crash, and the debugger will show you are destructing > a list, with an invalid object. The crash occurs about 1000 times > after this code is first hit, and I can't narrow the crash condition > down :-( The trashcan is built in a quite simple manner. It uses a List to delay deletions if the nesting level is deep. The list operations are not thread safe. A special case is handled: It *could* happen on destruction of the session, that trashcan cannot handle errors, since the thread state is already undefined. But the general case of no interpreter lock is undefined and forbidden. In a discussion with Guido, we first thought that we would need some thread safe object for the delay. Later on it turned out that it must be generally *forbidden* to destroy an object when the interpreter lock is not held. Reason: An instance destruction might call __del__, and that would run an interpreter without lock. Forbidden. For that reason, I kept the list in place. I think it is fine that it crashed. There are obviously extension modules left where the interpreter lock rule is violated. The builtin Python code has been checked, there are most probably no holes, including tkinter. Or, I made a mistake in this little code: void _PyTrash_deposit_object(op) PyObject *op; { PyObject *error_type, *error_value, *error_traceback; if (PyThreadState_GET() != NULL) PyErr_Fetch(&error_type, &error_value, &error_traceback); if (!_PyTrash_delete_later) _PyTrash_delete_later = PyList_New(0); if (_PyTrash_delete_later) PyList_Append(_PyTrash_delete_later, (PyObject *)op); if (PyThreadState_GET() != NULL) PyErr_Restore(error_type, error_value, error_traceback); } void _PyTrash_destroy_list() { while (_PyTrash_delete_later) { PyObject *shredder = _PyTrash_delete_later; _PyTrash_delete_later = NULL; ++_PyTrash_delete_nesting; Py_DECREF(shredder); --_PyTrash_delete_nesting; } } ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Mon Apr 10 16:40:19 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 10:40:19 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 10:20:34 EDT." <200004101420.KAA00291@eric.cnri.reston.va.us> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> Message-ID: <200004101440.KAA00324@eric.cnri.reston.va.us> Thinking about entering Japanese into raw_input() in IDLE more, I thought I figured a way to give Takeuchi a Unicode string when he enters Japanese characters. I added an experimental patch to the readline method of the PyShell class: if the line just read, when converted to Unicode, has fewer characters but still compares equal (and no exceptions happen during this test) then return the Unicode version. This doesn't currently work because the built-in raw_input() function requires that the readline() call it makes internally returns an 8-bit string. Should I relax that requirement in general? (I could also just replace __builtin__.[raw_]input with more liberal versions supplied by IDLE.) I also discovered that the built-in unicode() function is not idempotent: unicode(unicode('a')) returns u'\000a'. I think it should special-case this and return u'a' ! Finally, I believe we need a way to discover the encoding used by stdin or stdout. I have to admit I know very little about the file wrappers that Marc wrote -- is it easy to get the encoding out of them? IDLE should probably emulate this, as it's encoding is clearly UTF-8 (at least when using Tcl 8.1 or newer). --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Apr 10 17:16:58 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:16:58 -0400 Subject: [Python-Dev] int division proposal in idle-dev Message-ID: <200004101516.LAA00442@eric.cnri.reston.va.us> David Scherer posted an interesting proposal to the idle-dev list for dealing with the incompatibility issues around int division. Bruce Sherwood also posted an interesting discussion there on how to deal with incompatibilities in general (culminating in a recommendation of David's solution). In brief, David abuses the "global" statement at the module level to implement a pragma. Not ideal, but kind of cute and backwards compatible -- this can be added to Python 1.5 or even 1.4 code without breaking! He proposes that you put "global olddivision" at the top of any file that relies on int/int yielding an int; a newer Python can then default to new division semantics. (He does this by generating a different opcode, which is also smart.) It's time to start thinking about a transition path -- Bruce's discussion and David's proposal are a fine starting point, I think. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Mon Apr 10 17:32:17 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 17:32:17 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> Message-ID: <38F1F401.45535C23@lemburg.com> Guido van Rossum wrote: > > Thinking about entering Japanese into raw_input() in IDLE more, I > thought I figured a way to give Takeuchi a Unicode string when he > enters Japanese characters. > > I added an experimental patch to the readline method of the PyShell > class: if the line just read, when converted to Unicode, has fewer > characters but still compares equal (and no exceptions happen during > this test) then return the Unicode version. > > This doesn't currently work because the built-in raw_input() function > requires that the readline() call it makes internally returns an 8-bit > string. Should I relax that requirement in general? (I could also > just replace __builtin__.[raw_]input with more liberal versions > supplied by IDLE.) > > I also discovered that the built-in unicode() function is not > idempotent: unicode(unicode('a')) returns u'\000a'. I think it should > special-case this and return u'a' ! Good idea. I'll fix this in the next round. > Finally, I believe we need a way to discover the encoding used by > stdin or stdout. I have to admit I know very little about the file > wrappers that Marc wrote -- is it easy to get the encoding out of > them? I'm not sure what you mean: the name of the input encoding ? Currently, only the names of the encoding and decoding functions are available to be queried. > IDLE should probably emulate this, as it's encoding is clearly > UTF-8 (at least when using Tcl 8.1 or newer). It should be possible to redirect sys.stdin/stdout using the codecs.EncodedFile wrapper. Some tests show that raw_input() doesn't seem to use the redirected sys.stdin though... >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') >>> s = raw_input() ??? >>> s '\344\366\374' >>> s = sys.stdin.read() ??? >>> s '\303\244\303\266\303\274\012' -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Apr 10 17:38:58 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:38:58 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 17:32:17 +0200." <38F1F401.45535C23@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> Message-ID: <200004101538.LAA00486@eric.cnri.reston.va.us> > > Finally, I believe we need a way to discover the encoding used by > > stdin or stdout. I have to admit I know very little about the file > > wrappers that Marc wrote -- is it easy to get the encoding out of > > them? > > I'm not sure what you mean: the name of the input encoding ? > Currently, only the names of the encoding and decoding functions > are available to be queried. Whatever is helpful for a module or program that wants to know what kind of encoding is used. > > IDLE should probably emulate this, as it's encoding is clearly > > UTF-8 (at least when using Tcl 8.1 or newer). > > It should be possible to redirect sys.stdin/stdout using > the codecs.EncodedFile wrapper. Some tests show that raw_input() > doesn't seem to use the redirected sys.stdin though... > > >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') > >>> s = raw_input() > ??? > >>> s > '\344\366\374' > >>> s = sys.stdin.read() > ??? > >>> s > '\303\244\303\266\303\274\012' This deserves more looking into. The code for raw_input() in bltinmodule.c certainly *tries* to use sys.stdin. (I think that because your EncodedFile object is not a real stdio file object, it will take the second branch, near the end of the function; this calls PyFile_GetLine() which attempts to call readline().) Aha! It actually seems that your read() and readline() are inconsistent! I don't know your API well enough to know which string is "correct" (\344\366\374 or \303\244\303\266\303\274) but when I call sys.stdin.readline() I get the same as raw_input() returns: >>> from codecs import * >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') >>> s = raw_input() ??? >>> s '\344\366\374' >>> s = sys.stdin.read() ??? >>> >>> s '\303\244\303\266\303\274\012' >>> unicode(s) u'\344\366\374\012' >>> s = sys.stdin.readline() ??? >>> s '\344\366\374\012' >>> Didn't you say that your wrapper only wraps read()? Maybe you need to revise that decision! (Note that PyShell doesn't even define read() -- it only defines readline().) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Mon Apr 10 17:45:29 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Mon, 10 Apr 2000 11:45:29 -0400 (EDT) Subject: [Python-Dev] test_fork1 on Linux Message-ID: <14577.63257.956728.228174@seahag.cnri.reston.va.us> I've just checked in changes to test_fork1.py make the test a little more sensible on Linux (where the assumption that the thread pids are the same as the controlling process doesn't hold). However, I'm still observing some serious weirdness with this test. As far as I've been able to tell, the os.fork() call always succeeds, but sometimes the parent process segfaults, and sometimes it locks up. It does seem to get to the os.waitpid() call, which isi appearantly where the failure actually occurs. (And sometimes everything works as expected!) If anyone here is particularly familiar with threading on Linux, I'd appreciate a little help, or even a pointer to someone who understands enough of the low-level aspects of threading on Linux that I can communicate with them to figure this out. Thanks! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw at python.org Mon Apr 10 17:52:43 2000 From: bwarsaw at python.org (Barry Warsaw) Date: Mon, 10 Apr 2000 11:52:43 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods Message-ID: <14577.63691.561040.281577@anthem.cnri.reston.va.us> A number of people have played FAST and loose with function and method docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are handy because they are the one attribute on funcs and methods that are easily writable. But as more people overload the semantics for docstrings, we'll get collisions. I've had a number of discussions with folks about adding attribute dictionaries to functions and methods so that you can essentially add any attribute. Namespaces are one honking great idea -- let's do more of those! Below is a very raw set of patches to add an attribute dictionary to funcs and methods. It's only been minimally tested, but if y'all like the idea, I'll clean it up, sanity check the memory management, and post the changes to patches at python.org. Here's some things you can do: -------------------- snip snip -------------------- Python 1.6a2 (#10, Apr 10 2000, 11:27:59) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def a(): pass ... >>> a.publish = 1 >>> a.publish 1 >>> a.__doc__ >>> a.__doc__ = 'a doc string' >>> a.__doc__ 'a doc string' >>> a.magic_string = a.__doc__ >>> a.magic_string 'a doc string' >>> dir(a) ['__doc__', '__name__', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name', 'magic_string', 'publish'] >>> class F: ... def a(self): pass ... >>> f = F() >>> f.a.publish Traceback (most recent call last): File "", line 1, in ? AttributeError: publish >>> f.a.publish = 1 >>> f.a.publish 1 >>> f.a.__doc__ >>> f.a.__doc__ = 'another doc string' >>> f.a.__doc__ 'another doc string' >>> f.a.magic_string = f.a.__doc__ >>> f.a.magic_string 'another doc string' >>> dir(f.a) ['__dict__', '__doc__', '__name__', 'im_class', 'im_func', 'im_self', 'magic_string', 'publish'] >>> -------------------- snip snip -------------------- -Barry [1] Aycock, "Compiling Little Languages in Python", http://www.foretec.com/python/workshops/1998-11/proceedings/papers/aycock-little/aycock-little.html [2] http://classic.zope.org:8080/Documentation/Reference/ORB P.S. I promised to add a little note about setattr and getattr vs. setattro and getattro. There's very little documentation about the differences, and searching on python.org doesn't seem to turn up anything. The differences are simple. setattr/getattr take a char* argument naming the attribute to change, while setattro/getattro take a PyObject* (hence the trailing `o' -- for Object). This stuff should get documented in the C API, but at least now, it'll turn up in a SIG search. :) -------------------- snip snip -------------------- Index: funcobject.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/funcobject.h,v retrieving revision 2.16 diff -c -r2.16 funcobject.h *** funcobject.h 1998/12/04 18:48:02 2.16 --- funcobject.h 2000/04/07 21:30:40 *************** *** 44,49 **** --- 44,50 ---- PyObject *func_defaults; PyObject *func_doc; PyObject *func_name; + PyObject *func_dict; } PyFunctionObject; extern DL_IMPORT(PyTypeObject) PyFunction_Type; Index: classobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/classobject.c,v retrieving revision 2.84 diff -c -r2.84 classobject.c *** classobject.c 2000/04/10 13:03:19 2.84 --- classobject.c 2000/04/10 15:27:15 *************** *** 1550,1577 **** /* Dummies that are not handled by getattr() except for __members__ */ {"__doc__", T_INT, 0}, {"__name__", T_INT, 0}, {NULL} /* Sentinel */ }; static PyObject * instancemethod_getattr(im, name) register PyMethodObject *im; ! PyObject *name; { ! char *sname = PyString_AsString(name); ! if (sname[0] == '_') { /* Inherit __name__ and __doc__ from the callable object implementing the method */ ! if (strcmp(sname, "__name__") == 0 || ! strcmp(sname, "__doc__") == 0) ! return PyObject_GetAttr(im->im_func, name); } if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "instance-method attributes not accessible in restricted mode"); return NULL; } ! return PyMember_Get((char *)im, instancemethod_memberlist, sname); } static void --- 1550,1608 ---- /* Dummies that are not handled by getattr() except for __members__ */ {"__doc__", T_INT, 0}, {"__name__", T_INT, 0}, + {"__dict__", T_INT, 0}, {NULL} /* Sentinel */ }; + static int + instancemethod_setattr(im, name, v) + register PyMethodObject *im; + char *name; + PyObject *v; + { + int rtn; + + if (PyEval_GetRestricted() || + strcmp(name, "im_func") == 0 || + strcmp(name, "im_self") == 0 || + strcmp(name, "im_class") == 0) + { + PyErr_Format(PyExc_TypeError, "read-only attribute: %s", name); + return -1; + } + return PyObject_SetAttrString(im->im_func, name, v); + } + + static PyObject * instancemethod_getattr(im, name) register PyMethodObject *im; ! char *name; { ! PyObject *rtn; ! ! if (strcmp(name, "__name__") == 0 || ! strcmp(name, "__doc__") == 0) { /* Inherit __name__ and __doc__ from the callable object implementing the method */ ! return PyObject_GetAttrString(im->im_func, name); } if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "instance-method attributes not accessible in restricted mode"); return NULL; + } + if (strcmp(name, "__dict__") == 0) + return PyObject_GetAttrString(im->im_func, name); + + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); + if (rtn == NULL) { + PyErr_Clear(); + rtn = PyObject_GetAttrString(im->im_func, name); + if (rtn == NULL) + PyErr_SetString(PyExc_AttributeError, name); } ! return rtn; } static void *************** *** 1662,1669 **** 0, (destructor)instancemethod_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ ! 0, /*tp_getattr*/ ! 0, /*tp_setattr*/ (cmpfunc)instancemethod_compare, /*tp_compare*/ (reprfunc)instancemethod_repr, /*tp_repr*/ 0, /*tp_as_number*/ --- 1693,1700 ---- 0, (destructor)instancemethod_dealloc, /*tp_dealloc*/ 0, /*tp_print*/ ! (getattrfunc)instancemethod_getattr, /*tp_getattr*/ ! (setattrfunc)instancemethod_setattr, /*tp_setattr*/ (cmpfunc)instancemethod_compare, /*tp_compare*/ (reprfunc)instancemethod_repr, /*tp_repr*/ 0, /*tp_as_number*/ *************** *** 1672,1678 **** (hashfunc)instancemethod_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ ! (getattrofunc)instancemethod_getattr, /*tp_getattro*/ 0, /*tp_setattro*/ }; --- 1703,1709 ---- (hashfunc)instancemethod_hash, /*tp_hash*/ 0, /*tp_call*/ 0, /*tp_str*/ ! 0, /*tp_getattro*/ 0, /*tp_setattro*/ }; Index: funcobject.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/funcobject.c,v retrieving revision 2.18 diff -c -r2.18 funcobject.c *** funcobject.c 1998/05/22 00:55:34 2.18 --- funcobject.c 2000/04/07 22:15:33 *************** *** 62,67 **** --- 62,68 ---- doc = Py_None; Py_INCREF(doc); op->func_doc = doc; + op->func_dict = PyDict_New(); } return (PyObject *)op; } *************** *** 133,138 **** --- 134,140 ---- {"__name__", T_OBJECT, OFF(func_name), READONLY}, {"func_defaults",T_OBJECT, OFF(func_defaults)}, {"func_doc", T_OBJECT, OFF(func_doc)}, + {"func_dict", T_OBJECT, OFF(func_dict)}, {"__doc__", T_OBJECT, OFF(func_doc)}, {NULL} /* Sentinel */ }; *************** *** 142,153 **** PyFunctionObject *op; char *name; { if (name[0] != '_' && PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not accessible in restricted mode"); return NULL; } ! return PyMember_Get((char *)op, func_memberlist, name); } static int --- 144,167 ---- PyFunctionObject *op; char *name; { + PyObject* rtn; + if (name[0] != '_' && PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not accessible in restricted mode"); return NULL; + } + if (strcmp(name, "__dict__") == 0) + return op->func_dict; + + rtn = PyMember_Get((char *)op, func_memberlist, name); + if (rtn == NULL) { + PyErr_Clear(); + rtn = PyDict_GetItemString(op->func_dict, name); + if (rtn == NULL) + PyErr_SetString(PyExc_AttributeError, name); } ! return rtn; } static int *************** *** 156,161 **** --- 170,177 ---- char *name; PyObject *value; { + int rtn; + if (PyEval_GetRestricted()) { PyErr_SetString(PyExc_RuntimeError, "function attributes not settable in restricted mode"); *************** *** 178,185 **** } if (value == Py_None) value = NULL; } ! return PyMember_Set((char *)op, func_memberlist, name, value); } static void --- 194,214 ---- } if (value == Py_None) value = NULL; + } + else if (strcmp(name, "func_dict") == 0) { + if (value == NULL || !PyDict_Check(value)) { + PyErr_SetString( + PyExc_TypeError, + "func_dict must be set to a dict object"); + return -1; + } + } + rtn = PyMember_Set((char *)op, func_memberlist, name, value); + if (rtn < 0) { + PyErr_Clear(); + rtn = PyDict_SetItemString(op->func_dict, name, value); } ! return rtn; } static void *************** *** 191,196 **** --- 220,226 ---- Py_DECREF(op->func_name); Py_XDECREF(op->func_defaults); Py_XDECREF(op->func_doc); + Py_XDECREF(op->func_dict); PyMem_DEL(op); } From mal at lemburg.com Mon Apr 10 18:01:52 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 18:01:52 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> Message-ID: <38F1FAF0.4821AE6C@lemburg.com> Guido van Rossum wrote: > > > > Finally, I believe we need a way to discover the encoding used by > > > stdin or stdout. I have to admit I know very little about the file > > > wrappers that Marc wrote -- is it easy to get the encoding out of > > > them? > > > > I'm not sure what you mean: the name of the input encoding ? > > Currently, only the names of the encoding and decoding functions > > are available to be queried. > > Whatever is helpful for a module or program that wants to know what > kind of encoding is used. > > > > IDLE should probably emulate this, as it's encoding is clearly > > > UTF-8 (at least when using Tcl 8.1 or newer). > > > > It should be possible to redirect sys.stdin/stdout using > > the codecs.EncodedFile wrapper. Some tests show that raw_input() > > doesn't seem to use the redirected sys.stdin though... > > > > >>> sys.stdin = EncodedFile(sys.stdin, 'utf-8', 'latin-1') > > >>> s = raw_input() > > ??? > > >>> s > > '\344\366\374' > > >>> s = sys.stdin.read() > > ??? > > >>> s > > '\303\244\303\266\303\274\012' The latter is the "correct" output, BTW. > This deserves more looking into. The code for raw_input() in > bltinmodule.c certainly *tries* to use sys.stdin. (I think that > because your EncodedFile object is not a real stdio file object, it > will take the second branch, near the end of the function; this calls > PyFile_GetLine() which attempts to call readline().) > > Aha! It actually seems that your read() and readline() are > inconsistent! They are because I haven't yet found a way to implement readline() without buffering read-ahead data. The only way I can think of to implement it without buffering would be to read one char at a time which is much too slow. Buffering is hard to implement right when assuming that streams are stacked... every level would have its own buffering scheme and mixing .read() and .readline() wouldn't work too well. Anyway, I'll give it try... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Mon Apr 10 17:56:26 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 11:56:26 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 18:01:52 +0200." <38F1FAF0.4821AE6C@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> Message-ID: <200004101556.LAA00578@eric.cnri.reston.va.us> > > Aha! It actually seems that your read() and readline() are > > inconsistent! > > They are because I haven't yet found a way to implement > readline() without buffering read-ahead data. The only way > I can think of to implement it without buffering would be > to read one char at a time which is much too slow. > > Buffering is hard to implement right when assuming that > streams are stacked... every level would have its own > buffering scheme and mixing .read() and .readline() > wouldn't work too well. Anyway, I'll give it try... Since you're calling methods on the underlying file object anyway, can't you avoid buffering by calling the *corresponding* underlying method and doing the conversion on that? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Apr 10 18:02:36 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 12:02:36 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: Your message of "Mon, 10 Apr 2000 11:52:43 EDT." <14577.63691.561040.281577@anthem.cnri.reston.va.us> References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: <200004101602.MAA00590@eric.cnri.reston.va.us> > A number of people have played FAST and loose with function and method > docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are > handy because they are the one attribute on funcs and methods that are > easily writable. But as more people overload the semantics for > docstrings, we'll get collisions. I've had a number of discussions > with folks about adding attribute dictionaries to functions and > methods so that you can essentially add any attribute. Namespaces are > one honking great idea -- let's do more of those! > > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, I'll clean it up, sanity check the memory management, and > post the changes to patches at python.org. Here's some things you can > do: > > -------------------- snip snip -------------------- > Python 1.6a2 (#10, Apr 10 2000, 11:27:59) [GCC 2.8.1] on sunos5 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> def a(): pass > ... > >>> a.publish = 1 > >>> a.publish > 1 > >>> a.__doc__ > >>> a.__doc__ = 'a doc string' > >>> a.__doc__ > 'a doc string' > >>> a.magic_string = a.__doc__ > >>> a.magic_string > 'a doc string' > >>> dir(a) > ['__doc__', '__name__', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name', 'magic_string', 'publish'] > >>> class F: > ... def a(self): pass > ... > >>> f = F() > >>> f.a.publish > Traceback (most recent call last): > File "", line 1, in ? > AttributeError: publish > >>> f.a.publish = 1 > >>> f.a.publish > 1 Here I have a question. Should this really change F.a, or should it change the method bound to f only? You implement the former, but I'm not sure if those semantics are right -- if I have two instances, f1 and f2, and you change f2.a.spam, I'd be surprised if f1.a.spam got changed as well (since f1.a and f2.a are *not* the same thing -- they are not shared. f1.a.im_func and f2.a.im_func are the same thing, but f1.a and f2.a are distinct! I would suggest that you only allow setting attributes via the class or via a function. (This means that you must still implement the pass-through on method objects, but reject it if the method is bound to an instance.) --Guido van Rossum (home page: http://www.python.org/~guido/) From jeremy at cnri.reston.va.us Mon Apr 10 18:05:14 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 10 Apr 2000 12:05:14 -0400 (EDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F1E418.FF191AEE@tismer.com> References: <38F1E418.FF191AEE@tismer.com> Message-ID: <14577.64442.47034.907133@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> I think it is fine that it crashed. There are obviously CT> extension modules left where the interpreter lock rule is CT> violated. The builtin Python code has been checked, there are CT> most probably no holes, including tkinter. Or, I made a mistake CT> in this little code: I think have misunderstood at least one of Mark's bug report and your response. Does the problem Mark reported rely on extension code? I thought the bug was triggered by running pure Python code. If that is the case, then it can never be fine that it crashed. If the problem relies on extension code, then there ought to be a way to write the extension so that it doesn't cause a crash. Jeremy PS Mark: Is the transformer.py you attached different from the one in the nondist/src/Compiler tree? It looks like the only differences are with the whitespace. From pf at artcom-gmbh.de Mon Apr 10 18:54:09 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Mon, 10 Apr 2000 18:54:09 +0200 (MEST) Subject: [Python-Dev] Re: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: from David Scherer at "Apr 10, 2000 9:54:35 am" Message-ID: Hi! David Scherer on idle-dev at python.org: [...] > in the interpreter* is fast. In principle, one could put THREE operators in > the language: one with the new "float division" semantics, one that divided > only integers, and a "backward compatibility" operator with EXACTLY the old > semantics: [...] > An outline of what I did: [...] Yes, this really clever. I like the ideas. [me]: > > 2. What should the new Interpreter do, if he sees a source file without a > > pragma defining the language level? There are two possibilities: [...] > > 2. Assume, it is a new source file and apply language level 2 to it. > > This has the disadvantage, that it will break any existing code. > I think the answer is 2. A high-quality script for adding the pragma to > existing files, with CLI and GUI interfaces, should be packaged with Python. > Running it on your existing modules would be part of the installation > process. Okay. But what is with the Python packages available on the Internet? May be the upcoming dist-utils should handle this? Or should the Python core distribution contain a clever installer program, which handles this? > Long-lived modules should always have a language level, since it makes them > more robust against changes and also serves as documentation. A version > statement could be encouraged at the top of any nontrivial script, e.g: > > python 1.6 [...] global python_1_5 #implies global old_division or global python_1_6 #implies global old_division or global python_1_7 #may be implies global new_division may be we can solve another issue just discussed on python_dev with global source_iso8859_1 or global source_utf_8 Cute idea... but we should keep the list of such pragmas short. > Personally, I think that it makes more sense to talk about ways to > gracefully migrate individual changes into the language than to put off > every backward-incompatible change to a giant future "flag day" that will > break all existing scripts. Versioning of some sort should be encouraged > starting *now*, and incorporated into 1.6 before it goes final. Yes. > Indeed, but Guido has spoken: > > > Great ideas there, Bruce! I hope you will post these to an > > appropriate mailing list (perhaps idle-dev, as there's no official SIG > > to discuss the Python 3000 transition yet, and python-dev is closed). May be someone can invite you into 'python-dev'? However the archives are open to anyone and writing to the list is also open to anybody. Only subscription is closed. I don't know why. Regards, Peter P.S.: Redirected Reply-To: to David and python-dev at python.org ! -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mal at lemburg.com Mon Apr 10 18:39:45 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 18:39:45 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> Message-ID: <38F203D1.4A0038F@lemburg.com> Guido van Rossum wrote: > > > > Aha! It actually seems that your read() and readline() are > > > inconsistent! > > > > They are because I haven't yet found a way to implement > > readline() without buffering read-ahead data. The only way > > I can think of to implement it without buffering would be > > to read one char at a time which is much too slow. > > > > Buffering is hard to implement right when assuming that > > streams are stacked... every level would have its own > > buffering scheme and mixing .read() and .readline() > > wouldn't work too well. Anyway, I'll give it try... > > Since you're calling methods on the underlying file object anyway, > can't you avoid buffering by calling the *corresponding* underlying > method and doing the conversion on that? The problem here is that Unicode has far more line break characters than plain ASCII. The underlying API would break on ASCII lines (or even worse on those CRLF sequences defined by the C lib), not the ones I need for Unicode. BTW, I think that we may need a new Codec class layer here: .readline() et al. are all text based methods, while the Codec base classes clearly work on all kinds of binary and text data. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Mon Apr 10 20:04:31 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:04:31 -0700 (PDT) Subject: [Python-Dev] CVS: distutils/distutils cmd.py In-Reply-To: <20000410091101.B406@mems-exchange.org> Message-ID: On Mon, 10 Apr 2000, Greg Ward wrote: > On 10 April 2000, Greg Stein said: >... > > [ damn... can't see the code... went and checked it out... ] > > Oops, that was a CVS config thing. Fixed now -- I'll go checkin that > change and we'll all see if it worked. Just as well it was off though > -- I checked in a couple of big documentation updates this weekend, and > who wants to see 30k of LaTeX patches in their inbox on Monday morning? > ;-) Cool. The CVS diffs appear to work quite fine now! Note: you might not get a 30k patch since the system elides giant diffs. Of course, if you patch 10 files, each with 3k diffs... :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Mon Apr 10 20:13:08 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:13:08 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Barry Warsaw wrote: >... > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, +1 on concept, -1 on the patch :-) >... > P.S. I promised to add a little note about setattr and getattr > vs. setattro and getattro. There's very little documentation about > the differences, and searching on python.org doesn't seem to turn up > anything. The differences are simple. setattr/getattr take a char* > argument naming the attribute to change, while setattro/getattro take > a PyObject* (hence the trailing `o' -- for Object). This stuff should > get documented in the C API, but at least now, it'll turn up in a SIG > search. :) And note that the getattro/setattro is preferred. It is easy to extract the char* from them; the other direction requires construction of an object. >... > + static int > + instancemethod_setattr(im, name, v) > + register PyMethodObject *im; > + char *name; > + PyObject *v; IMO, this should be instancemethod_setattro() and take a PyObject *name. In the function, you can extract the string for comparison. >... > + { > + int rtn; This variable isn't used. >... > static PyObject * > instancemethod_getattr(im, name) > register PyMethodObject *im; > ! char *name; IMO, this should remain a getattro function. (and fix the name) In your update, note how many GetAttrString calls there are. The plain GetAttr is typically faster. >... > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyObject_GetAttrString(im->im_func, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); Why do you mask this second error with the AttributeError? Seems that you should just leave whatever is there (typically an AttributeError, but maybe not!). >... > --- 144,167 ---- > PyFunctionObject *op; > char *name; > { > + PyObject* rtn; > + > if (name[0] != '_' && PyEval_GetRestricted()) { > PyErr_SetString(PyExc_RuntimeError, > "function attributes not accessible in restricted mode"); > return NULL; > + } > + if (strcmp(name, "__dict__") == 0) > + return op->func_dict; This is superfluous. The PyMember_Get will do this. > + rtn = PyMember_Get((char *)op, func_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyDict_GetItemString(op->func_dict, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); Again, with the masking... >... > + else if (strcmp(name, "func_dict") == 0) { > + if (value == NULL || !PyDict_Check(value)) { > + PyErr_SetString( > + PyExc_TypeError, > + "func_dict must be set to a dict object"); This raises an interesting thought. Why not just require the mapping protocol? Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 20:11:29 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:11:29 -0400 Subject: [Python-Dev] Unicode input issues In-Reply-To: Your message of "Mon, 10 Apr 2000 18:39:45 +0200." <38F203D1.4A0038F@lemburg.com> References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> <38F203D1.4A0038F@lemburg.com> Message-ID: <200004101811.OAA02323@eric.cnri.reston.va.us> > > Since you're calling methods on the underlying file object anyway, > > can't you avoid buffering by calling the *corresponding* underlying > > method and doing the conversion on that? > > The problem here is that Unicode has far more line > break characters than plain ASCII. The underlying API would > break on ASCII lines (or even worse on those CRLF sequences > defined by the C lib), not the ones I need for Unicode. Hm, can't we just use \n for now? > BTW, I think that we may need a new Codec class layer > here: .readline() et al. are all text based methods, > while the Codec base classes clearly work on all kinds of > binary and text data. Not sure what you mean here. Can you explain through an example? --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 10 20:27:03 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:27:03 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib urlparse.py,1.22,1.23 In-Reply-To: <200004101702.NAA01141@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > Update of /projects/cvsroot/python/dist/src/Lib > In directory eric:/projects/python/develop/guido/src/Lib > > Modified Files: > urlparse.py > Log Message: > Some cleanup -- don't use splitfields/joinfields, standardize > indentation (tabs only), rationalize some code in urljoin... Why not use string methods? (the patch still imports from string) Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 20:22:26 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:22:26 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Lib urlparse.py,1.22,1.23 In-Reply-To: Your message of "Mon, 10 Apr 2000 11:27:03 PDT." References: Message-ID: <200004101822.OAA02423@eric.cnri.reston.va.us> > Why not use string methods? (the patch still imports from string) I had the patch sitting in my directory for who knows how long -- I just wanted to flush it to the CVS repository. I didn't really want to thing about all the great changes I *could* make to the code... --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Apr 10 20:44:01 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 14:44:01 -0400 Subject: [Python-Dev] Getting ready for 1.6 alpha 2 Message-ID: <200004101844.OAA02610@eric.cnri.reston.va.us> I'm getting ready for the release of alpha 2. Tomorrow afternoon (around 5:30pm east coast time) I'm going on vacation for the rest of the week, followed by a business trip most of the week after. Obviously, I'm anxious to release a solid alpha tomorrow. Please, send only simple or essential patches between now and the release date! --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 10 20:57:01 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 11:57:01 -0700 (PDT) Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: <200004101844.OAA02610@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > I'm getting ready for the release of alpha 2. Tomorrow afternoon > (around 5:30pm east coast time) I'm going on vacation for the rest of > the week, followed by a business trip most of the week after. > > Obviously, I'm anxious to release a solid alpha tomorrow. > > Please, send only simple or essential patches between now and the > release date! Jeremy reminded me that my new httplib.py is still pending integration. There are two possibilities: 1) My httplib.py uses a new name, or goes into a "net" package. We check it in today, and I follow up with patches to fold in post-1.5.2 compatibility items (such as the SSL stuff). 2) httplib.py will remain in the same place, so the compat changes must happen first. In both cases, I will also need to follow up with test and doc. IMO, we go with "net.httplib" and check it in today. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 21:00:08 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 15:00:08 -0400 Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: Your message of "Mon, 10 Apr 2000 11:57:01 PDT." References: Message-ID: <200004101900.PAA02692@eric.cnri.reston.va.us> > > Please, send only simple or essential patches between now and the > > release date! > > Jeremy reminded me that my new httplib.py is still pending integration. There will be another alpha release after I'm back -- I think this isn't that urgent. (Plus, just because you're you, you'd have to mail me a wet signature. :-) I am opposed to a net.* package until the reorganization discussion has resulted in a solid design. --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 10 21:19:57 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 12:19:57 -0700 (PDT) Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: <200004101900.PAA02692@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > > > Please, send only simple or essential patches between now and the > > > release date! > > > > Jeremy reminded me that my new httplib.py is still pending integration. > > There will be another alpha release after I'm back -- I think this > isn't that urgent. True, but depending on location, it also has zero impact on the release. In other words: added functionality for testing, with no potential for breakage. > (Plus, just because you're you, you'd have to mail > me a wet signature. :-) You've got one on file already :-) [ I sent it back in December; was it misplaced, and I need to resend? ] > I am opposed to a net.* package until the reorganization discussion > has resulted in a solid design. Not a problem. Mine easily replaces httplib.py in its current location. It is entirely backwards compat. A new class is used to get the new functionality, and a compat "HTTP" class is provided (leveraging the new HTTPConnection class). Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 21:20:31 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 15:20:31 -0400 Subject: [Python-Dev] httplib again (was: Getting ready for 1.6 alpha 2) In-Reply-To: Your message of "Mon, 10 Apr 2000 12:19:57 PDT." References: Message-ID: <200004101920.PAA02957@eric.cnri.reston.va.us> > > > Jeremy reminded me that my new httplib.py is still pending integration. > > > > There will be another alpha release after I'm back -- I think this > > isn't that urgent. > > True, but depending on location, it also has zero impact on the release. > In other words: added functionality for testing, with no potential for > breakage. You're just asking for exposure. But unless it's installed as httplib.py, it won't get much more exposure than if you put it on your website and post an announcement to c.l.py, I bet. > > (Plus, just because you're you, you'd have to mail > > me a wet signature. :-) > > You've got one on file already :-) > > [ I sent it back in December; was it misplaced, and I need to resend? ] I was just teasing. Our lawyer believes that you cannot send in a signature for code that you will contribute in the future; but I really don't care enough to force you to send another one... > > I am opposed to a net.* package until the reorganization discussion > > has resulted in a solid design. > > Not a problem. Mine easily replaces httplib.py in its current location. It > is entirely backwards compat. A new class is used to get the new > functionality, and a compat "HTTP" class is provided (leveraging the new > HTTPConnection class). I thought you said there was some additional work on compat changes? I quote: | 2) httplib.py will remain in the same place, so the compat changes must | happen first. Oh well, send it to Jeremy and he'll check it in if it's ready. But not without a test suite and documentation. --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at tismer.com Mon Apr 10 21:47:12 2000 From: tismer at tismer.com (Christian Tismer) Date: Mon, 10 Apr 2000 21:47:12 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <38F1E418.FF191AEE@tismer.com> <14577.64442.47034.907133@goon.cnri.reston.va.us> Message-ID: <38F22FC0.C975290C@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> I think it is fine that it crashed. There are obviously > CT> extension modules left where the interpreter lock rule is > CT> violated. The builtin Python code has been checked, there are > CT> most probably no holes, including tkinter. Or, I made a mistake > CT> in this little code: > > I think have misunderstood at least one of Mark's bug report and your > response. Does the problem Mark reported rely on extension code? I > thought the bug was triggered by running pure Python code. If that is > the case, then it can never be fine that it crashed. If the problem > relies on extension code, then there ought to be a way to write the > extension so that it doesn't cause a crash. Oh! If it is so, then there is in fact a problem left in the Kernel. Mark, did you use an extension? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From andy at reportlab.com Mon Apr 10 21:46:25 2000 From: andy at reportlab.com (Andy Robinson) Date: Mon, 10 Apr 2000 20:46:25 +0100 Subject: [Python-Dev] Re: [I18n-sig] "takeuchi": a unicode string on IDLE shell References: <200004101401.KAA00238@eric.cnri.reston.va.us> Message-ID: <008a01bfa325$79b92f00$01ac2ac0@boulder> ----- Original Message ----- From: Guido van Rossum To: Cc: Sent: 10 April 2000 15:01 Subject: [I18n-sig] "takeuchi": a unicode string on IDLE shell > Can anyone answer this? I can reproduce the output side of this, and > I believe he's right about the input side. Where should Python > migrate with respect to Unicode input? I think that what Takeuchi is > getting is actually better than in Pythonwin or command line (where he > gets Shift-JIS)... > > --Guido van Rossum (home page: http://www.python.org/~guido/) I think what he wants, as you hinted, is to be able to specify a 'system wide' default encoding of Shift-JIS rather than UTF8. UTF-8 has a certain purity in that it equally annoys every nation, and is nobody's default encoding. What a non-ASCII user needs is a site-wide way of setting the default encoding used for standard input and output. I think this could be done with something (config file? registry key) which site.py looks at, and wraps stream encoders around stdin, stdout and stderr. To illustrate why it matters, I often used to parse data files and do queries on a Japanese name and address database; I could print my lists and tuples in interactive mode and check they worked, or initialise functions with correct data, since the OS uses Shift-JIS as its native encoding and I was manipulating Shift-JIS strings. I've lost that ability now due to the Unicode stuff and would need to do >>> for thing in mylist: >>> ....print mylist.encode('shift_jis') to see the contents of a database row, rather than just >>> mylist BTW, Pythonwin stopped working in this regard when Scintilla came along; it prints a byte at a time now, although kanji input is fine, as is kanji pasted into a source file, as long as you specify a Japanese font. However, this is fixable - I just need to find a spare box to run Japanese windows on and find out where the printing goes wrong. Andy Robinson ReportLab From gstein at lyra.org Mon Apr 10 21:53:22 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 12:53:22 -0700 (PDT) Subject: [Python-Dev] httplib again In-Reply-To: <200004101920.PAA02957@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: >... > You're just asking for exposure. But unless it's installed as > httplib.py, it won't get much more exposure than if you put it on your > website and post an announcement to c.l.py, I bet. Hmm. Good point :-) >... > > > (Plus, just because you're you, you'd have to mail > > > me a wet signature. :-) > > > > You've got one on file already :-) > > > > [ I sent it back in December; was it misplaced, and I need to resend? ] > > I was just teasing. :-) >... > > > I am opposed to a net.* package until the reorganization discussion > > > has resulted in a solid design. > > > > Not a problem. Mine easily replaces httplib.py in its current location. It > > is entirely backwards compat. A new class is used to get the new > > functionality, and a compat "HTTP" class is provided (leveraging the new > > HTTPConnection class). > > I thought you said there was some additional work on compat changes? Oops. Yah. It would become option (2) (add compat stuff first) by dropping it over the current one. Mostly, I'm concerned about the SSL stuff that was added, but there may be other things (need to check the CVS logs). For example, there was all that stuff dealing with the errors (which never went in, I believe?). >... > Oh well, send it to Jeremy and he'll check it in if it's ready. But > not without a test suite and documentation. Ah. Well, then it definitely won't go in now :-). It'll take a bit to set up the tests and docco. Well... thanx for the replies. When I get the stuff ready, I'll speak up again. And yes, I do intend to ensure this stuff is ready in time for 1.6. Cheers, -g p.s. and I retract my request for inclusion of davlib. I think there is still some design work to do on that guy. -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 22:01:16 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 16:01:16 -0400 Subject: [Python-Dev] httplib again In-Reply-To: Your message of "Mon, 10 Apr 2000 12:53:22 PDT." References: Message-ID: <200004102001.QAA03201@eric.cnri.reston.va.us> > p.s. and I retract my request for inclusion of davlib. I think there is > still some design work to do on that guy. But it should at least be available outside the distro! The Vaults of Parnassus don't list it -- so it don't exist! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Mon Apr 10 22:50:26 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 13:50:26 -0700 (PDT) Subject: [Python-Dev] httplib again In-Reply-To: <200004102001.QAA03201@eric.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000, Guido van Rossum wrote: > > p.s. and I retract my request for inclusion of davlib. I think there is > > still some design work to do on that guy. > > But it should at least be available outside the distro! The Vaults of > Parnassus don't list it -- so it don't exist! :-) D'oh! I forgot to bring it over from my alternate plane of reality. ... Okay. I've synchronized the universes. Parnassus now contains a number of records for my Python stuff (well, submitted at least). Thanx for the nag :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Mon Apr 10 22:34:12 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 22:34:12 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> Message-ID: <38F23AC4.12CBE187@lemburg.com> Guido van Rossum wrote: > > > > Finally, I believe we need a way to discover the encoding used by > > > stdin or stdout. I have to admit I know very little about the file > > > wrappers that Marc wrote -- is it easy to get the encoding out of > > > them? > > > > I'm not sure what you mean: the name of the input encoding ? > > Currently, only the names of the encoding and decoding functions > > are available to be queried. > > Whatever is helpful for a module or program that wants to know what > kind of encoding is used. Hmm, you mean something like file.encoding ? I'll add some additional attributes holding the encoding names to the wrapper classes (they will then be set by the wrapper constructor functions). BTW, I've just added .readline() et al. to the codecs... all except .readline() are easy to do. For .readline() I simply delegated line breaking to the underlying stream's .readline() method. This is far from optimal, but better than not having the method at all. I also adjusted the interfaces of the .splitlines() methods: they now take a different optional argument: """ S.splitlines([keepends]]) -> list of strings Return a list of the lines in S, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. """ This made implementing the above methods very simple and also allows writing codecs working with other basic storage types (UserString.py anyone ;-). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Mon Apr 10 23:00:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Mon, 10 Apr 2000 23:00:53 +0200 Subject: [Python-Dev] Unicode input issues References: <002601bfa2f3$a1f67720$8f133c81@pflab.ecl.ntt.co.jp> <200004101420.KAA00291@eric.cnri.reston.va.us> <200004101440.KAA00324@eric.cnri.reston.va.us> <38F1F401.45535C23@lemburg.com> <200004101538.LAA00486@eric.cnri.reston.va.us> <38F1FAF0.4821AE6C@lemburg.com> <200004101556.LAA00578@eric.cnri.reston.va.us> <38F203D1.4A0038F@lemburg.com> <200004101811.OAA02323@eric.cnri.reston.va.us> Message-ID: <38F24105.28ADB5EA@lemburg.com> Guido van Rossum wrote: > > > > Since you're calling methods on the underlying file object anyway, > > > can't you avoid buffering by calling the *corresponding* underlying > > > method and doing the conversion on that? > > > > The problem here is that Unicode has far more line > > break characters than plain ASCII. The underlying API would > > break on ASCII lines (or even worse on those CRLF sequences > > defined by the C lib), not the ones I need for Unicode. > > Hm, can't we just use \n for now? > > > BTW, I think that we may need a new Codec class layer > > here: .readline() et al. are all text based methods, > > while the Codec base classes clearly work on all kinds of > > binary and text data. > > Not sure what you mean here. Can you explain through an example? Well, the line concept is really only applicable to text data. Binary data doesn't have lines and e.g. a ZIP codec (probably) couldn't implement this kind of method. As it turns out, only the .writelines() method needs to know what kinds of input/output data objects are used (and then only to be able to specify a joining seperator). I'll just leave things as they are for now: quite shallow w/r to the class hierarchy. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Mon Apr 10 23:34:07 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 14:34:07 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.5,2.6 In-Reply-To: <200004102114.RAA07027@eric.cnri.reston.va.us> Message-ID: Euh... this is the incorrect fix. The 0 is wrong to begin with. Mark Favas submitted a proper patch for this. See his "Revised Patches for bug report 258" posted to patches at python.org on April 4th. Cheers, -g On Mon, 10 Apr 2000, Guido van Rossum wrote: > Update of /projects/cvsroot/python/dist/src/Modules > In directory eric:/projects/python/develop/guido/src/Modules > > Modified Files: > mmapmodule.c > Log Message: > I've had complaints about the comparison "where >= 0" before -- on > IRIX, it doesn't even compile. Added a cast: "where >= (char *)0". > > > Index: mmapmodule.c > =================================================================== > RCS file: /projects/cvsroot/python/dist/src/Modules/mmapmodule.c,v > retrieving revision 2.5 > retrieving revision 2.6 > diff -C2 -r2.5 -r2.6 > *** mmapmodule.c 2000/04/05 14:15:31 2.5 > --- mmapmodule.c 2000/04/10 21:14:05 2.6 > *************** > *** 2,6 **** > / Author: Sam Rushing > / Hacked for Unix by A.M. Kuchling > ! / $Id: mmapmodule.c,v 2.5 2000/04/05 14:15:31 fdrake Exp $ > > / mmapmodule.cpp -- map a view of a file into memory > --- 2,6 ---- > / Author: Sam Rushing > / Hacked for Unix by A.M. Kuchling > ! / $Id: mmapmodule.c,v 2.6 2000/04/10 21:14:05 guido Exp $ > > / mmapmodule.cpp -- map a view of a file into memory > *************** > *** 119,123 **** > char * where = (self->data+self->pos); > CHECK_VALID(NULL); > ! if ((where >= 0) && (where < (self->data+self->size))) { > value = (char) *(where); > self->pos += 1; > --- 119,123 ---- > char * where = (self->data+self->pos); > CHECK_VALID(NULL); > ! if ((where >= (char *)0) && (where < (self->data+self->size))) { > value = (char) *(where); > self->pos += 1; > > > _______________________________________________ > Python-checkins mailing list > Python-checkins at python.org > http://www.python.org/mailman/listinfo/python-checkins > -- Greg Stein, http://www.lyra.org/ From guido at python.org Mon Apr 10 23:43:03 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 17:43:03 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules mmapmodule.c,2.5,2.6 In-Reply-To: Your message of "Mon, 10 Apr 2000 14:34:07 PDT." References: Message-ID: <200004102143.RAA07181@eric.cnri.reston.va.us> > Euh... this is the incorrect fix. The 0 is wrong to begin with. > > Mark Favas submitted a proper patch for this. See his "Revised Patches for > bug report 258" posted to patches at python.org on April 4th. Sigh. You're right. I've seen two patches to mmapmodule.c since he posted that patch, and no comments on his patch, so I thought his patch was already incorporated. I was wrong. Note that this module still gives 6 warnings on VC6.0, all C4018: '>' or '>=' signed/unsigned mismatch. I wish someone gave me a patch for that too. Unrelated: _sre.c also has a bunch of VC6 warnings -- all C4761, integral size mismatch in argument: conversion supplied. This is all about the calls to SRE_IS_DIGIT and SRE_IS_SPACE. The error occurs 8 times on 4 different lines, and is reported in a cyclic fashion: 106, 108, 110, 112, 106, 108, ..., etc., probably due to sre's recursive self-include tricks? --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 11 00:11:26 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 10 Apr 2000 18:11:26 -0400 Subject: [Python-Dev] 1.6a2 prerelease for Windows Message-ID: <200004102211.SAA07363@eric.cnri.reston.va.us> I've made a prerelease of the Windows installer available through the python.org/1.6 webpage (the link is in the paragraph *below* the a1 downloads). This is mostly to give Mark Hammond an opportunity to prepare win32all build 131, to deal with the changed location of the python16.dll file. Hey, it's still alpha software! --Guido van Rossum (home page: http://www.python.org/~guido/) From mhammond at skippinet.com.au Tue Apr 11 01:00:48 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 11 Apr 2000 09:00:48 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F22FC0.C975290C@tismer.com> Message-ID: > If it is so, then there is in fact a problem left > in the Kernel. > Mark, did you use an extension? I tried to explain this in private email: This is pure Python code. The parser module is the only extension being used. The crash _always_ occurs as a frame object is being de-allocated, and _always_ happens as a builtin list object (a local variable) is de-alloced by the frame. Always the same line of Python code, always the same line of C code, always the exact same failure. Mark. From mhammond at skippinet.com.au Tue Apr 11 01:41:16 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 11 Apr 2000 09:41:16 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <14577.64442.47034.907133@goon.cnri.reston.va.us> Message-ID: [Sorry - missed this bit] > PS Mark: Is the transformer.py you attached different > from the one in > the nondist/src/Compiler tree? It looks like the only > differences are > with the whitespace. The attached version is simply the "release" P2C transformer.py with .append args fixed. I imagine it is very close to the CVS version (and indeed I know for a fact that the CVS version also crashes). My initial testing showed the CVS compiler did _not_ trigger this bug (even though code that uses an identical transformer.py does), so I just dropped back to P2C and stopped when I saw it :-) Mark. From bwarsaw at python.org Tue Apr 11 01:48:51 2000 From: bwarsaw at python.org (bwarsaw at python.org) Date: Mon, 10 Apr 2000 19:48:51 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> Message-ID: <14578.26723.857270.63150@anthem.cnri.reston.va.us> > Below is a very raw set of patches to add an attribute dictionary to > funcs and methods. It's only been minimally tested, but if y'all like > the idea, >>>>> "GS" == Greg Stein writes: GS> +1 on concept, -1 on the patch :-) Well, that's good, because I /knew/ the patch was a quick hack (which is why I posted it to python-dev and not patches :). Since there's been generally positive feedback on the idea, I think I'll flesh it out a bit. GS> And note that the getattro/setattro is preferred. It is easy GS> to extract the char* from them; the other direction requires GS> construction of an object. Good point. >... > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyObject_GetAttrString(im->im_func, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); GS> Why do you mask this second error with the AttributeError? GS> Seems that you should just leave whatever is there (typically GS> an AttributeError, but maybe not!). Good point here, but... > + rtn = PyMember_Get((char *)op, func_memberlist, name); > + if (rtn == NULL) { > + PyErr_Clear(); > + rtn = PyDict_GetItemString(op->func_dict, name); > + if (rtn == NULL) > + PyErr_SetString(PyExc_AttributeError, name); GS> Again, with the masking... ...here I don't want the KeyError to leak through the getattr() call. If you do "print func.non_existent_attr" wouldn't you want an AttributeError instead of a KeyError? Maybe it should explicitly test for KeyError rather than masking any error coming back from PyDict_GetItemString()? Or better yet (based on your suggestion below), it should do a PyMapping_HasKey() test, raise an AttributeError if not, then just return PyMapping_GetItemString(). >... > + else if (strcmp(name, "func_dict") == 0) { > + if (value == NULL || !PyDict_Check(value)) { > + PyErr_SetString( > + PyExc_TypeError, > + "func_dict must be set to a dict object"); GS> This raises an interesting thought. Why not just require the GS> mapping protocol? Good point again. -Barry From gstein at lyra.org Tue Apr 11 03:37:45 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 18:37:45 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14578.26723.857270.63150@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000 bwarsaw at python.org wrote: >... > GS> And note that the getattro/setattro is preferred. It is easy > GS> to extract the char* from them; the other direction requires > GS> construction of an object. > > Good point. Oh. Also, I noticed that you removed a handy optimization from the getattr function. Testing a character for '_' *before* calling strcmp() will save a good chunk of time, especially considering how often this function is used. Basically, review whether a quick test can save a strmp() call (and can be easily integrated). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Tue Apr 11 03:12:10 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 18:12:10 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14578.26723.857270.63150@anthem.cnri.reston.va.us> Message-ID: On Mon, 10 Apr 2000 bwarsaw at python.org wrote: >... > >... > > + rtn = PyMember_Get((char *)im, instancemethod_memberlist, name); > > + if (rtn == NULL) { > > + PyErr_Clear(); > > + rtn = PyObject_GetAttrString(im->im_func, name); > > + if (rtn == NULL) > > + PyErr_SetString(PyExc_AttributeError, name); > > GS> Why do you mask this second error with the AttributeError? > GS> Seems that you should just leave whatever is there (typically > GS> an AttributeError, but maybe not!). > > Good point here, but... > > > + rtn = PyMember_Get((char *)op, func_memberlist, name); > > + if (rtn == NULL) { > > + PyErr_Clear(); > > + rtn = PyDict_GetItemString(op->func_dict, name); > > + if (rtn == NULL) > > + PyErr_SetString(PyExc_AttributeError, name); > > GS> Again, with the masking... > > ...here I don't want the KeyError to leak through the getattr() call. Ah! Subtle difference in the code there :-) I agree with you, on remapping the second one. I don't think the first needs to be remapped, however. > If you do "print func.non_existent_attr" wouldn't you want an > AttributeError instead of a KeyError? Maybe it should explicitly test > for KeyError rather than masking any error coming back from > PyDict_GetItemString()? Or better yet (based on your suggestion > below), it should do a PyMapping_HasKey() test, raise an > AttributeError if not, then just return PyMapping_GetItemString(). Seems that you could just do the PyMapping_GetItemString() and remap the error *if* it occurs. Presumably, the exception is the infrequent case and can stand to be a bit slower. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Tue Apr 11 02:58:39 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 10 Apr 2000 17:58:39 -0700 (PDT) Subject: [Python-Dev] transformer.py changes? (was: Crash in new "trashcan" mechanism) In-Reply-To: Message-ID: On Tue, 11 Apr 2000, Mark Hammond wrote: > [Sorry - missed this bit] > > > PS Mark: Is the transformer.py you attached different > > from the one in > > the nondist/src/Compiler tree? It looks like the only > > differences are > > with the whitespace. > > The attached version is simply the "release" P2C transformer.py with > .append args fixed. I imagine it is very close to the CVS version > (and indeed I know for a fact that the CVS version also crashes). > > My initial testing showed the CVS compiler did _not_ trigger this > bug (even though code that uses an identical transformer.py does), > so I just dropped back to P2C and stopped when I saw it :-) Hrm. I fixed those things in the P2C CVS version. Guess I'll have to do a diff to see if there are any other changes... Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw at python.org Tue Apr 11 07:08:49 2000 From: bwarsaw at python.org (bwarsaw at python.org) Date: Tue, 11 Apr 2000 01:08:49 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> <200004101602.MAA00590@eric.cnri.reston.va.us> Message-ID: <14578.45921.289078.190085@anthem.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> Here I have a question. Should this really change F.a, or GvR> should it change the method bound to f only? You implement GvR> the former, but I'm not sure if those semantics are right -- GvR> if I have two instances, f1 and f2, and you change f2.a.spam, GvR> I'd be surprised if f1.a.spam got changed as well (since f1.a GvR> and f2.a are *not* the same thing -- they are not shared. GvR> f1.a.im_func and f2.a.im_func are the same thing, but f1.a GvR> and f2.a are distinct! As are f1.a and f1.a! :) GvR> I would suggest that you only allow setting attributes via GvR> the class or via a function. (This means that you must still GvR> implement the pass-through on method objects, but reject it GvR> if the method is bound to an instance.) Given that, Python should probably raise a TypeError if an attempt is made to set an attribute on a bound method object. However, it should definitely succeed to /get/ an attribute on a bound method object. I'm not 100% sure that setting bound-method-attributes should be illegal, but we can be strict about it now and see if it makes sense to loosen the restriction later. Here's a candidate for Lib/test/test_methattr.py which should print a bunch of `1's. I'll post the revised diffs (taking into account GvR's and GS's suggestions) tomorrow after I've had a night to sleep on it. -Barry -------------------- snip snip -------------------- from test_support import verbose class F: def a(self): pass def b(): pass # setting attributes on functions try: b.blah except AttributeError: pass else: print 'did not get expected AttributeError' b.blah = 1 print b.blah == 1 print 'blah' in dir(b) # setting attributes on unbound methods try: F.a.blah except AttributeError: pass else: print 'did not get expected AttributeError' F.a.blah = 1 print F.a.blah == 1 print 'blah' in dir(F.a) # setting attributes on bound methods is illegal f1 = F() try: f1.a.snerp = 1 except TypeError: pass else: print 'did not get expected TypeError' # but accessing attributes on bound methods is fine print f1.a.blah print 'blah' in dir(f1.a) f2 = F() print f1.a.blah == f2.a.blah F.a.wazoo = F f1.a.wazoo is f2.a.wazoo # try setting __dict__ illegally try: F.a.__dict__ = (1, 2, 3) except TypeError: pass else: print 'did not get expected TypeError' F.a.__dict__ = {'one': 111, 'two': 222, 'three': 333} print f1.a.two == 222 from UserDict import UserDict d = UserDict({'four': 444, 'five': 555}) F.a.__dict__ = d try: f2.a.two except AttributeError: pass else: print 'did not get expected AttributeError' print f2.a.four is f1.a.four is F.a.four From tim_one at email.msn.com Tue Apr 11 08:01:15 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 11 Apr 2000 02:01:15 -0400 Subject: [Python-Dev] Re: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: Message-ID: <001f01bfa37b$58df5740$27a2143f@tim> [Peter Funk] > ... > May be someone can invite you into 'python-dev'? However the archives > are open to anyone and writing to the list is also open to anybody. > Only subscription is closed. I don't know why. The explanation is to be found at the very start of the list -- before it became public . The idea was to have a much smaller group than c.l.py, and composed of people who had contributed non-trivial stuff to Python's implementation. Also a group that felt comfortable arguing with each other (any heat you may perceive on this list is purely illusory ). So the idea was definitely to discourage participation(!), but never to do things in secret. Keeping subscription closed has served its purposes pretty well, despite that the only mechanism enforcing civility here is the lack of an invitation. Elitist social manipulation at its finest . From tim_one at email.msn.com Tue Apr 11 08:01:19 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 11 Apr 2000 02:01:19 -0400 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: Message-ID: <002101bfa37b$5b2acde0$27a2143f@tim> [Peter Funk] > ... > 2. What should the new Interpreter do, if he sees a source file without a > pragma defining the language level? Python has tens of thousands of users now -- if it doesn't default to "Python 1.5.2" (however that's spelled), approximately 79.681% of them will scream. Had the language done this earlier, it would have been much more sellable to default to the current version. However, a default is *just* "a default", and platform-appropriate config mechanism (from Windows registry to cmdline flag) could be introduced to change the default. That is, 1.7 comes out and all my code runs fine without changing a thing. Then I set a global config option to "pretend every module that doesn't say otherwise has a 1.7 pragma in it", and run things again to see what breaks. As part of the process of editing the files that need to be fixed, I can take that natural opportunity to dump in a 1.7 pragma in the modules I've changed, or a 1.6 pragma in the broken modules I can't (for whatever reason) alter just yet. Two pleasant minutes later, I'll have 6,834 .py files all saying "1.7" at the top. Hmm! So when 1.8 comes out, not a one of them will use any incompatible 1.8 features. So I'll also need a global config option that says "pretend every module has a 1.8 pragma in it, *regardless* of whether it has some other pragma in it already". But that will also screw up the one .py file I forgot that had a 1.5.2 pragma in it. Iterate this process a half dozen times, and I'm afraid the end result is intractable. Seems it would be much more tractable over the long haul to default to the current version. Then every incompatible change will require changing every file that relied on the old behavior (to dump in a "no, I can't use the current version's semantics" pragma) -- but that's the situation today too. The difference is that the minimal change required to get unstuck would be trivial. A nice user (like me ) would devote their life to keeping up with incompatible changes, so would never ever have a version pragma in any file. So I vote "default to current version" -- but, *wow*, that's going to be hard to sell. Tech note: Python's front end is not structured today in such a way that it's feasible to have the parser deal with a change in the set of keywords keying off a pragma -- any given identifier today is either always or never a keyword, and that choice is hardwired into the generated parse tables. Not a reason to avoid starting this process with 1.6, just a reason to avoid adding new keywords in 1.6 (it will take some real work to overcome the front end's limitations here). go-for-it!-ly y'rs - tim From pf at artcom-gmbh.de Tue Apr 11 12:15:20 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 11 Apr 2000 12:15:20 +0200 (MEST) Subject: [Python-Dev] The purpose of the 'repr' builtin function Message-ID: Hi! Currently the wrapper classes UserList und UserString contain the following method: def __repr__(self): return repr(self.data) I wonder about the following alternatives: def __repr__(self): return self.__class__.__name__ + "(" + repr(self.data) + ")" or even more radical (here only for lists as an example): def __repr__(self): result = [self.__class__.__name__, "("] for item in self.data: result.append(repr(item)) result.append(", ") result.append(")") return "".join(result) Just a thought which jumped into my mind during the recent discussion about the purpose of the 'repr' function (float representation). Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From mhammond at skippinet.com.au Tue Apr 11 17:15:16 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 01:15:16 +1000 Subject: [Python-Dev] 1.6a2 prerelease for Windows In-Reply-To: <200004102211.SAA07363@eric.cnri.reston.va.us> Message-ID: [Guido wrote] > downloads). This is mostly to give Mark Hammond an opportunity to > prepare win32all build 131, to deal with the changed > location of the > python16.dll file. Thanks! After consideration like that, how could I do anything other than get it out straight away (and if starship wasnt down it would have been a few hours ago :-) 131 is up on starship now. Actually, it looks like starship is down again (or at least under serious stress!) so the pages may not reflect this. It should be at http://starship.python.net/crew/mhammond/downloads/win32all-131.exe Mark. From guido at python.org Tue Apr 11 17:33:15 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 11:33:15 -0400 Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: Your message of "Tue, 11 Apr 2000 12:15:20 +0200." References: Message-ID: <200004111533.LAA08163@eric.cnri.reston.va.us> > Currently the wrapper classes UserList und UserString contain the > following method: > > def __repr__(self): return repr(self.data) > > I wonder about the following alternatives: > > def __repr__(self): > return self.__class__.__name__ + "(" + repr(self.data) + ")" Yes and no. It would make them behave less like their "theoretical" base class, but you're right that it's better to be honest in repr(). Their str() could still look like self.data. > or even more radical (here only for lists as an example): > > def __repr__(self): > result = [self.__class__.__name__, "("] > for item in self.data: > result.append(repr(item)) > result.append(", ") > result.append(")") > return "".join(result) What's the advantage of this? It seems designed to be faster, but I doubt that it really is -- have you timed it? I'd go for simple -- how time-critical can repr() be...? --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 11 17:48:46 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 11 Apr 2000 17:48:46 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Include unicodeobject.h,2.7,2.8 References: <200004111539.LAA08510@eric.cnri.reston.va.us> Message-ID: <01ed01bfa3cd$6d324f20$34aab5d4@hagrid> > Changed PyUnicode_Splitlines() maxsplit argument to keepends. shouldn't that be "PyUnicode_SplitLines" ? (and TailMatch, IsLineBreak, etc.) From effbot at telia.com Tue Apr 11 17:57:58 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 11 Apr 2000 17:57:58 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> Message-ID: <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> > comments? (for obvious reasons, I'm especially interested in comments > from people using non-ASCII characters on a daily basis...) nobody? maybe all problems are gone after the last round of checkins? oh well, I'll rebuild again, and see what happens if I remove all kludges in my test code... From tismer at tismer.com Tue Apr 11 18:12:32 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 18:12:32 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F34EF0.73099769@tismer.com> Mark Hammond wrote: > > > Can you perhaps tell me what the call stack says? > > Is it somewhere, or are we in finalization code of the > > interpreter? > > The crash is in _Py_Dealloc - op is a pointer, but all fields > (ob_type, ob_refcnt, etc) are all 0 - hence the crash. > > Next up is list_dealloc - op is also trashed - ob_item != NULL > (hence we are in the if condition, and calling Py_XDECREF() (which > triggers the _Py_Dealloc) - ob_size ==9, but all other fields are 0. > > Next up is Py_Dealloc() > > Next up is _PyTrash_Destroy() > > Next up is frame_dealloc() > > _Py_Dealloc() > > Next up is eval_code2() - the second last line - Py_DECREF(f) to > cleanup the frame it just finished executing. > > Up the stack are lots more eval_code2() - we are just running the > code - not shutting down or anything. And you do obviously not have any threads, right? And you are in the middle of a simple, heavy computing application. Nothing with GUI events happening? That can only mean there is a bug in the Python core or in the parser module. That happens to be exposed by trashcan, but isn't trashcan's fault. Well. Trashcan might change the order of destruction a little. This *should* not be a problem. But here is a constructed situation where I can think of a problem, if we have buggy code, somewhere: Assume you have something like a tuple that holds other elements. If there is a bug, like someone is dereferencing an argument in an arg tuple, what is always an error. This error can hide for a million of years: a contains (b, c, d) The C function decref's a first, and erroneously then also one of the contained elements. If b is already deallotted by decreffing a, it has refcount zero, but that doesn't hurt, since the dead object is still there, and no mallcos have taken place (unless there is a __del__ trigered of course). This eror would never be detected. With trashcan, it could happen that destruction of a is deferred, but by chance now the delayed erroneous decref of b might happen before a's decref, and there may be mallocs in between, since I have a growing list. If my code is valid (and it appears so), then I guess we have such a situation somewhere in the core code. I-smell-some-long-nightshifts-again - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From akuchlin at mems-exchange.org Tue Apr 11 18:19:21 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 11 Apr 2000 12:19:21 -0400 (EDT) Subject: [Python-Dev] Extensible library packages Message-ID: <200004111619.MAA05881@amarok.cnri.reston.va.us> For 1.6, the XML-SIG wants to submit a few more things, mostly a small SAX implementation. This currently lives in xml.sax.*. There are other subpackages around such as xml.dom, xml.utils, and so forth, but those aren't being proposed for inclusion (too large, too specialized, or whatever reason). The problem is that, if the Python standard library includes a package named 'xml', that package name can't be extended by add-on modules (unless they install themselves into Python's library directory, which is evil). Let's say Sean McGrath or whoever creates a new subpackage; how can he install it so that the code is accessible as xml.pyxie? One option that comes to mind is to have the xml package in the standard library automatically import all the names and modules from some other package ('xml_ext'? 'xml2') in site-packages. This means that all the third-party products install on top of the same location, $(prefix)/site-packages/xml/, which is only slightly less evil. I can't think of a good way to loop through everything in site-packages/* and detect some set of the available packages as XML-related, short of importing every single package, which isn't going to fly. Can anyone suggest a good solution? Fixing this may not require changing the core in any way, but the cleanest solution isn't obvious. -- A.M. Kuchling http://starship.python.net/crew/amk/ The mind of man, though perhaps the most splendid achievement of evolution, is not, surely, that answer to every problem of the universe. Hamlet suffers, but the Gravediggers go right on with their silly quibbles. -- Robertson Davies, "Opera and Humour" From mal at lemburg.com Tue Apr 11 18:35:23 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 18:35:23 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> Message-ID: <38F3544B.57AF8C42@lemburg.com> Fredrik Lundh wrote: > > > comments? (for obvious reasons, I'm especially interested in comments > > from people using non-ASCII characters on a daily basis...) > > nobody? FYI, there currently is a discussion emerging about this on the i18n-sig list. > maybe all problems are gone after the last round of checkins? Probably not :-/ ... the last round only fixed some minor things. > oh well, I'll rebuild again, and see what happens if I remove all > kludges in my test code... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Apr 11 18:41:26 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 18:41:26 +0200 Subject: [Python-Dev] Extensible library packages References: <200004111619.MAA05881@amarok.cnri.reston.va.us> Message-ID: <38F355B6.DD1FD387@lemburg.com> "Andrew M. Kuchling" wrote: > > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > SAX implementation. This currently lives in xml.sax.*. There are > other subpackages around such as xml.dom, xml.utils, and so forth, but > those aren't being proposed for inclusion (too large, too specialized, > or whatever reason). > > The problem is that, if the Python standard library includes a package > named 'xml', that package name can't be extended by add-on modules > (unless they install themselves into Python's library directory, which > is evil). Let's say Sean McGrath or whoever creates a new subpackage; > how can he install it so that the code is accessible as xml.pyxie? You could make use of the __path__ trick in packages and then redirect the imports of subpackages to look in some predefined other areas as well (e.g. a non-package dir .../site-packages/xml-addons/). Here is how I do this in the compatibility packages for my mx series: DateTime/__init__.py: # Redirect all imports to the corresponding mx package def _redirect(mx_subpackage): global __path__ import os,mx __path__ = [os.path.join(mx.__path__[0],mx_subpackage)] _redirect('DateTime') ... Greg won't like this, but __path__ does have its merrits ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Tue Apr 11 18:33:23 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 11 Apr 2000 18:33:23 +0200 Subject: [Python-Dev] Extensible library packages References: <200004111619.MAA05881@amarok.cnri.reston.va.us> Message-ID: <025e01bfa3d3$aa182800$34aab5d4@hagrid> Andrew M. Kuchling wrote: > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > SAX implementation. > Can anyone suggest a good solution? Fixing this may not require > changing the core in any way, but the cleanest solution isn't obvious. saxlib.py ? (yes, I'm serious) From Vladimir.Marangozov at inrialpes.fr Tue Apr 11 18:37:42 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 11 Apr 2000 18:37:42 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F1E418.FF191AEE@tismer.com> from "Christian Tismer" at Apr 10, 2000 04:24:24 PM Message-ID: <200004111637.SAA01941@python.inrialpes.fr> Christian Tismer wrote: > > About extensions and Trashcan. > ... > Or, I made a mistake in this little code: > > void > _PyTrash_deposit_object(op) > PyObject *op; > { > PyObject *error_type, *error_value, *error_traceback; > > if (PyThreadState_GET() != NULL) > PyErr_Fetch(&error_type, &error_value, &error_traceback); > > if (!_PyTrash_delete_later) > _PyTrash_delete_later = PyList_New(0); > if (_PyTrash_delete_later) > PyList_Append(_PyTrash_delete_later, (PyObject *)op); > > if (PyThreadState_GET() != NULL) > PyErr_Restore(error_type, error_value, error_traceback); > } Maybe unrelated, but this code does not handle the case when PyList_Append fails. If it fails, the object needs to be deallocated as usual. Looking at the macros, I don't see how you can do that because Py_TRASHCAN_SAFE_END, which calls the above function, occurs after the finalization code... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From pf at artcom-gmbh.de Tue Apr 11 18:39:45 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Tue, 11 Apr 2000 18:39:45 +0200 (MEST) Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: <200004111533.LAA08163@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 11, 2000 11:33:15 am" Message-ID: Hi! [me:] > > or even more radical (here only for lists as an example): > > > > def __repr__(self): > > result = [self.__class__.__name__, "("] > > for item in self.data: > > result.append(repr(item)) > > result.append(", ") > > result.append(")") > > return "".join(result) Guido van Rossum: > What's the advantage of this? It seems designed to be faster, but I > doubt that it really is -- have you timed it? I'd go for simple -- > how time-critical can repr() be...? I feel sorry: The example above was nonsense. I confused 'str' with 'repr' as I quickly hacked the function above in. I erroneously thought 'repr(some_list)' calls 'str()' on the items. If I only had checked more carefully before, I would have remembered that indeed the opposite is true: Currently lists don't have '__str__' and so fall back to 'repr' on the items when 'str([....])' is used. All this is related to the recent discussion about the new annoying behaviour of Python 1.6 when (mis?)used as a Desktop calculator: Python 1.6a1 (#6, Apr 3 2000, 10:32:06) [GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> print [0.1, 0.2] [0.10000000000000001, 0.20000000000000001] >>> print 0.1 0.1 >>> print (0.1, 0.2) (0.10000000000000001, 0.20000000000000001) >>> print (0.1, 0.2)[0] 0.1 >>> print (0.1, 0.2)[1] 0.2 So if default behaviour of the interactive interpreter would be changed not to use 'repr()' for objects typed at the prompt (I believe Tim Peters suggested that), this wouldn't help to make lists, tuples and dictionaries containing floats more readable. I don't know how to fix this, though. :-( Regards, Peter From tismer at tismer.com Tue Apr 11 18:57:09 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 18:57:09 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004111637.SAA01941@python.inrialpes.fr> Message-ID: <38F35965.CA28C845@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > About extensions and Trashcan. > > ... > > Or, I made a mistake in this little code: > Maybe unrelated, but this code does not handle the case when > PyList_Append fails. If it fails, the object needs to be deallocated > as usual. Looking at the macros, I don't see how you can do that > because Py_TRASHCAN_SAFE_END, which calls the above function, > occurs after the finalization code... Yes, it does not handle this case for the following reasons: Reason 1) If the append does not work, then the system is apparently in a incredibly bad state, most probably broken! Note that these actions only take place when we have a recursion depth of 50 or so. That means, we already freed some memory, and now we have trouble with this probably little list. I won't touch a broken memory management. Reason 2) If the append does not work, then we are not allowed to deallocate the element at all. Trashcan was written in order to avoid crashes for too deeply nested objects. The current nesting level of 20 or 50 is of course very low, but generally I would assume that the limit is choosen for good reasons, and any deeper recursion might cause a machine crash. Under this assumption, the only thing you can do is to forget about the object. Remark ad 1): I had once changed the strategy to use a tuple construct instead. Thinking of memory problems when the shredder list must be grown, this could give an advantage. The optimum would be if the destructor data structure is never bigger than the smallest nested object. This would even allow me to recycle these for the destruction, without any malloc at all. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From Vladimir.Marangozov at inrialpes.fr Tue Apr 11 18:59:07 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 11 Apr 2000 18:59:07 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35965.CA28C845@tismer.com> from "Christian Tismer" at Apr 11, 2000 06:57:09 PM Message-ID: <200004111659.SAA02051@python.inrialpes.fr> Christian Tismer wrote: > > Vladimir Marangozov wrote: > > > > Maybe unrelated, but this code does not handle the case when > > PyList_Append fails. If it fails, the object needs to be deallocated > > as usual. Looking at the macros, I don't see how you can do that > > because Py_TRASHCAN_SAFE_END, which calls the above function, > > occurs after the finalization code... > > Yes, it does not handle this case for the following reasons: > ... Not enough good reasons to segfault. I suggest you move the call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert the condition there. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tismer at tismer.com Tue Apr 11 19:20:36 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 11 Apr 2000 19:20:36 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004111659.SAA02051@python.inrialpes.fr> Message-ID: <38F35EE4.7E741801@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Vladimir Marangozov wrote: > > > > > > Maybe unrelated, but this code does not handle the case when > > > PyList_Append fails. If it fails, the object needs to be deallocated > > > as usual. Looking at the macros, I don't see how you can do that > > > because Py_TRASHCAN_SAFE_END, which calls the above function, > > > occurs after the finalization code... > > > > Yes, it does not handle this case for the following reasons: > > ... > > Not enough good reasons to segfault. I suggest you move the > call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert > the condition there. Sorry, I don't see what you are suggesting, I'm distracted. Maybe you want to submit a patch, and a few more words on what you mean and why you prefer to core dump with stack overflow? I'm busy seeking a bug in the core, not in that ridiculous code. Somewhere is a real bug, probably the one which I was seeking many time before, when I got weird crashes in the small block heap of Windows. It was never solved, and never clear if it was Python or Windows memory management. Maybe we just found another entrance to this. It smells so very familiar: many many small tuples and we crash. busy-ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From effbot at telia.com Tue Apr 11 19:22:25 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 11 Apr 2000 19:22:25 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> <38F3544B.57AF8C42@lemburg.com> Message-ID: <004d01bfa3da$820c37a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > > nobody? > > FYI, there currently is a discussion emerging about this on the > i18n-sig list. okay, I'll catch up with that one later. > > maybe all problems are gone after the last round of checkins? > > Probably not :-/ ... the last round only fixed some minor > things. hey, aren't you supposed to say "don't worry, the design is rock solid"? ;-) From mal at lemburg.com Tue Apr 11 21:25:28 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 21:25:28 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: <004901bfa2de$b12d5200$0500a8c0@secret.pythonware.com> <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> <38F3544B.57AF8C42@lemburg.com> <004d01bfa3da$820c37a0$34aab5d4@hagrid> Message-ID: <38F37C28.4E6D99F@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > > nobody? > > > > FYI, there currently is a discussion emerging about this on the > > i18n-sig list. > > okay, I'll catch up with that one later. > > > > maybe all problems are gone after the last round of checkins? > > > > Probably not :-/ ... the last round only fixed some minor > > things. > > hey, aren't you supposed to say "don't worry, the design > is rock solid"? ;-) Things are hard to get right when you have to deal with backward *and* forward compatibility, interoperability and user-friendliness all at the same time... but we'll keep trying ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From collins at seal.aero.org Tue Apr 11 22:03:47 2000 From: collins at seal.aero.org (Jeff Collins) Date: Tue, 11 Apr 2000 13:03:47 -0700 Subject: [Python-Dev] Python for small platforms Message-ID: <14579.25763.434844.257544@malibu.aero.org> I've just had the chance to examine the unicode implementation and was surprised by the size of the code introduced - not just by the size of the database extension module (which I understand Christian Tismer is optimizing and which I assume can be configured away), but in particular by the size of the additional objects (unicodeobject.c, unicodetype.c). These additional objects alone contribute approximately 100K to the resulting executable. On desktop systems, this is not of much concern and suggestions have been made previously to reduce this if necessary (shared extension modules and possibly a shared VM - libpython.so). However, on small embedded systems (eg, PalmIII), this additional code is tremendous. The current size of the python-1.5.2-pre-unicode VM (after removal of float and complex objects with more reductions to come) on the PalmIII is 240K (already huge by Palm standards). (For reference, the size of python-1.5.1 on the PalmIII is 160K, after removal of the compiler, parser, float/long/complex objects.) With the unicode additions, this value jumps to 340K. The upshot of this is that for small platforms on which I am working, unicode support will have to be removed. My immediated concern is that unicode is getting so embedded in python that it will be difficult to extract. The approach I've taken for removing "features" (like float objects): 1) removes the feature with WITHOUT_XXX #ifdef/#endif decorations, where XXX denotes the removable feature (configurable in config.h) 2) preserves the python API: builtin functions, C API, PyArg_Parse, print format specifiers, etc., raise MissingFeatureError if attempts are made to use them. Of course, the API associated with the removed feature is no longer present. 3) protects the reduced VM: all reads (via marshal, compile, etc.) involving source/compiled python code will fail with a MissingFeatureError if the reduced VM doesn't support it. 4) does not yet support a MissingFeatureError in the tokenizer if, say, 2.2 (for removed floats) is entered on the python command line. This instead results in a SyntaxError indicating a problem with the decimal point. It appears that another error token would have to be added to support this error. Of course, I may have missed something, but if the above appears to be a reasonable approach, I can supply patches (at least for floats and complexes) for further discussion. In the longer term, it would be helpful if developers would follow this (or a similar agreed upon approach) when adding new features. This would reduce the burden of maintaining python for small embedded platforms. Thanks, Jeff From guido at python.org Tue Apr 11 22:29:16 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 16:29:16 -0400 Subject: [Python-Dev] ANNOUNCE: Python 1.6 alpha 2 Message-ID: <200004112029.QAA09762@eric.cnri.reston.va.us> I've just released a source tarball and a Windows installer for Python 1.6 alpha 2 to the Python website: http://www.python.org/1.6/ If you missed the announcement for 1.6a1, probably the biggest news is Unicode support. More news is on the above webpage; Unicode is being discussed in the i18n-sig. Most changes since 1.6a1 affect either details of the Unicode support, or details of what the Windows installer installs where. Note: this is an alpha release. Some of the code is very rough! Please give it a try with your favorite Python application, but don't trust it for production use yet. I plan to release several more alpha and beta releases over the next two months, culminating in an 1.6 final release before June first. We need your help to make the final 1.6 release as robust as possible -- please test this alpha release!!! --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Apr 11 23:18:02 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 11 Apr 2000 17:18:02 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils README.txt,1.9,1.10 In-Reply-To: Your message of "Tue, 11 Apr 2000 17:17:01 EDT." <200004112117.RAA02446@thrak.cnri.reston.va.us> References: <200004112117.RAA02446@thrak.cnri.reston.va.us> Message-ID: <200004112118.RAA09957@eric.cnri.reston.va.us> You realize that that README didn't make it into 1.6a2, right? Shouldn't be a problem. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Tue Apr 11 23:31:46 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 11 Apr 2000 23:31:46 +0200 Subject: [Python-Dev] Python for small platforms References: <14579.25763.434844.257544@malibu.aero.org> Message-ID: <38F399C2.642C1127@lemburg.com> Jeff Collins wrote: > > The approach I've taken for removing "features" (like float objects): > 1) removes the feature with WITHOUT_XXX #ifdef/#endif decorations, > where XXX denotes the removable feature (configurable in config.h) > 2) preserves the python API: builtin functions, C API, PyArg_Parse, > print format specifiers, etc., raise MissingFeatureError if > attempts are made to use them. Of course, the API associated > with the removed feature is no longer present. > 3) protects the reduced VM: all reads (via marshal, compile, etc.) > involving source/compiled python code will fail with > a MissingFeatureError if the reduced VM doesn't support it. > 4) does not yet support a MissingFeatureError in the tokenizer > if, say, 2.2 (for removed floats) is entered on the python > command line. This instead results in a SyntaxError > indicating a problem with the decimal point. It appears that > another error token would have to be added to support > this error. Wouldn't it be simpler to replace the parts in question with dummy replacements ? The dummies could then raise appropriate exceptions as needed. This would work for float, complex and Unicode objects which all have a defined API. The advantage of this approach is that you don't need to maintain separate patches for these parts (which is a pain) and that you can provide drop-in archives which are easy to install: simply unzip over the full source tree and recompile. > Of course, I may have missed something, but if the above appears to be > a reasonable approach, I can supply patches (at least for floats and > complexes) for further discussion. In the longer term, it would be > helpful if developers would follow this (or a similar agreed upon > approach) when adding new features. This would reduce the burden of > maintaining python for small embedded platforms. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Wed Apr 12 01:28:01 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 11 Apr 2000 16:28:01 -0700 (PDT) Subject: [Python-Dev] Re: [Patches] add string precisions to PyErr_Format calls In-Reply-To: <14579.11701.733010.789688@amarok.cnri.reston.va.us> Message-ID: On Tue, 11 Apr 2000, Andrew M. Kuchling wrote: > Greg Stein writes: > >Wouldn't it be best to simply fix PyErr_Format so that we don't have to > >continue to worry about buffer overruns? > > A while ago I suggested using nsprintf() in PyErr_Format, but that > means stealing the implementation from Apache for those platforms > where libc doesn't include nsprintf(). Haven't done it yet... Seems like it would be cake to write one that took *only* the %d and %s (unadorned) modifiers. We wouldn't need anything else, would we? [ ... grep'ing the source ... ] I see the following format codes which would need to change: %.###s -- switch to %s %i -- switch to %d %c -- hrm. probably need to support this (in stringobject.c) %x -- maybe switch to %d? (in stringobject.c) The last two are used once, both in stringobject.c. I could see a case for revising that call use just %s and %d. One pass to count the length, alloc, then one pass to fill in. The second pass could actually be handled by vsprintf() since we know the buffer is large enough. The only tricky part would be determining the max length for %d. For a 32-bit value, it is 10 digits; for 64-bit value, it is 20 digits. I'd say allocate room for 20 digits regardless of platform and be done with it. Maybe support %%, but I didn't see that anywhere. Somebody could add support when the need arises. Last problem: backwards compat for third-party modules using PyErr_Format. IMO, leave PyErr_Format for them (they're already responsible for buffer overruns (or not) since PyErr_Format isn't helping them). The new one would be PyErr_SafeFormat. Recommend the Safe version, deprecate the unsafe one. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw at cnri.reston.va.us Wed Apr 12 01:22:14 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 11 Apr 2000 19:22:14 -0400 (EDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes Message-ID: <14579.45990.603625.434317@anthem.cnri.reston.va.us> Here's the second go at adding arbitrary attribute support to function and method objects. Note that this time it's illegal (TypeError) to set an attribute on a bound method object; getting an attribute on a bound method object returns the value on the underlying function object. First the diffs, then the test case and test output. Enjoy, -Barry -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: methdiff.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_funcattrs.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_funcattrs URL: From mhammond at skippinet.com.au Wed Apr 12 01:54:50 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 09:54:50 +1000 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <020d01bfa3ce$bb5280c0$34aab5d4@hagrid> Message-ID: > > comments? (for obvious reasons, I'm especially > interested in comments > > from people using non-ASCII characters on a daily basis...) > > nobody? Almost certainly not. a) Unicode objects are very new and not everyone has the time to fiddle with them, and b) many of us only speak English. So we need _you_ to tell us what the problems were/are. Dont wait for us to find them - explain them to us. At least we than have a change of sympathizing, even if we can not directly relate the experiences... > maybe all problems are gone after the last round of checkins? > oh well, I'll rebuild again, and see what happens if I remove all > kludges in my test code... OK - but be sure to let us know :-) Mark. From mhammond at skippinet.com.au Wed Apr 12 02:04:22 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 10:04:22 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35EE4.7E741801@tismer.com> Message-ID: To answer Chris' earlier question: No threads, no gui, no events. The "parser" module is the only builtin module (apart from the obvious - ntpath etc) Greg and/or Bill can correct me if I am wrong - it is just P2C, and it is just console based, mainline procedural code. It _is_ highly recursive tho (and I believe this will turn out to be the key factor in the crash) > Somewhere is a real bug, probably the one which I was > seeking many time before, when I got weird crashes in the small > block heap of Windows. It was never solved, and never clear if > it was Python or Windows memory management. I am confident that this problem was my fault, in that I was releasing a different version of the MFC DLLs than I had actually built with. At least everyone with a test case couldnt repro it after the DLL update. This new crash is so predictable and always with the same data that I seriously doubt the problem is in any way related. > Maybe we just found another entrance to this. > It smells so very familiar: many many small tuples and we crash. Lists this time, but I take your point. Ive got a busy next few days, so it is still exists after that I will put some more effort into it. Mark. From mhammond at skippinet.com.au Wed Apr 12 02:07:43 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 12 Apr 2000 10:07:43 +1000 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <38F37C28.4E6D99F@lemburg.com> Message-ID: [Marc] > Things are hard to get right when you have to deal with > backward *and* forward compatibility, interoperability and > user-friendliness all at the same time... but we'll keep > trying ;-) Let me say publically that I think you have done a fine job, and obviously have put lots of thought and effort into it. If parts of the design turn out to be less than ideal (and subsequently changed before 1.6 is real) then this will not detract from your excellent work. Well done! [And also to Fredrik, whose code was the basis for the Unicode object itself - that was a nice piece of code too!] Aww-heck-I-love-all-you-guys--ly, Mark. From gward at mems-exchange.org Wed Apr 12 04:10:18 2000 From: gward at mems-exchange.org (Greg Ward) Date: Tue, 11 Apr 2000 22:10:18 -0400 Subject: [Python-Dev] How *does* Python determine sys.prefix? Message-ID: <20000411221018.A2587@mems-exchange.org> Ooh, here's a yucky problem. Last night, I installed Oliver Andrich's Python 1.5.2 RPM on my Linux box at home, so now I have two Python installations there: * my build, in /usr/local/python and /usr/local/python.i86-linux (I need to test Distutils in the prefix != exec_prefix case) * Oliver's RPM, in /usr I have a symlink /usr/local/bin/python pointing to ../../python.i86-linux/bin/python, and /usr/local/bin is first in my path: $ ls -lF `which python` lrwxrwxrwx 1 root root 30 Aug 28 1999 /usr/local/bin/python -> ../python.i86-linux/bin/python* Since I installed the RPM, /usr/local/bin/python reports an incorrect prefix: $ /usr/local/bin/python Python 1.5.2 (#1, Jun 20 1999, 19:56:42) [GCC 2.7.2.3] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr/local/bin/../python.i86-linux') Essentially the same thing if I run it directly, not through the symlink: $ /usr/local/python.i86-linux/bin/python Python 1.5.2 (#1, Jun 20 1999, 19:56:42) [GCC 2.7.2.3] on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr/local/python.i86-linux') /usr/bin/python gets it right, though: $ /usr/bin/python Python 1.5.2 (#1, Apr 18 1999, 16:03:16) [GCC pgcc-2.91.60 19981201 (egcs-1.1.1 on linux2 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys ; sys.prefix, sys.exec_prefix ('/usr', '/usr') This strikes me as a pretty reasonable and straightforward way to have multiple Python installations; if Python is fooled into getting the wrong sys.prefix, then the Distutils are going to have a much tougher job! Don't tell me I have to write my own prefix-finding code now... (And no, I have not tried this under 1.6 yet.) Damn and blast my last-minute pre-release testing... I should have just released the bloody thing and let the bugs fly. Oh hell, I think I will anyways. Greg -- Greg Ward - software developer gward at mems-exchange.org MEMS Exchange / CNRI voice: +1-703-262-5376 Reston, Virginia, USA fax: +1-703-262-5367 From janssen at parc.xerox.com Wed Apr 12 04:17:38 2000 From: janssen at parc.xerox.com (Bill Janssen) Date: Tue, 11 Apr 2000 19:17:38 PDT Subject: [Python-Dev] Re: ANNOUNCE: Python 1.6 alpha 2 In-Reply-To: Your message of "Tue, 11 Apr 2000 13:29:51 PDT." <200004112029.QAA09762@eric.cnri.reston.va.us> Message-ID: <00Apr11.191729pdt."3438"@watson.parc.xerox.com> ILU seems to work fine with it. Bill From bwarsaw at cnri.reston.va.us Wed Apr 12 04:34:04 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 11 Apr 2000 22:34:04 -0400 (EDT) Subject: [Python-Dev] How *does* Python determine sys.prefix? References: <20000411221018.A2587@mems-exchange.org> Message-ID: <14579.57500.195708.720145@anthem.cnri.reston.va.us> >>>>> "GW" == Greg Ward writes: GW> Ooh, here's a yucky problem. Last night, I installed Oliver GW> Andrich's Python 1.5.2 RPM on my Linux box at home, so now I GW> have two Python installations there: Greg, I don't know why it's finding the wrong landmark. Perhaps the first test for running out of the build directory is tripping up? What happens if you remove /usr/lib/python$VERSION/string.py? If possible you should step through calculate_path() in getpath.c -- this implements the search through the file system for the landmarks. -Barry From gward at python.net Wed Apr 12 04:34:12 2000 From: gward at python.net (Greg Ward) Date: Tue, 11 Apr 2000 22:34:12 -0400 Subject: [Python-Dev] ANNOUNCE: Distutils 0.8 released Message-ID: <20000411223412.A643@beelzebub> Python Distribution Utilities release 0.8 April 11, 2000 The Python Distribution Utilities, or Distutils for short, are a collection of modules that aid in the development, distribution, and installation of Python modules. (It is intended that ultimately the Distutils will grow up into a system for distributing and installing whole Python applications, but for now their scope is limited to module distributions.) The Distutils are a standard part of Python 1.6; if you are running 1.6, you don't need to install the Distutils separately. This release is primarily so that you can add the Distutils to a Python 1.5.2 installation -- you will then be able to install modules that require the Distutils, or use the Distutils to distribute your own modules. More information is available at the Distutils web page: http://www.python.org/sigs/distutils-sig/ and in the README.txt included in the Distutils source distribution. You can download the Distutils from http://www.python.org/sigs/distutils-sig/download.html Trivial patches can be sent to me (Greg Ward) at gward at python.net. Larger patches should be discussed on the Distutils mailing list: distutils-sig at python.org. Here are the changes in release 0.8, if you're curious: * some incompatible naming changes in the command classes -- both the classes themselves and some key class attributes were renamed (this will break some old setup scripts -- see README.txt) * half-hearted, unfinished moves towards backwards compatibility with Python 1.5.1 (the 0.1.4 and 0.1.5 releases were done independently, and I still have to fold those code changes in to the current code) * added ability to search the Windows registry to find MSVC++ (thanks to Robin Becker and Thomas Heller) * renamed the "dist" command to "sdist" and introduced the "manifest template" file (MANIFEST.in), used to generate the actual manifest * added "build_clib" command to build static C libraries needed by Python extensions * fixed the "install" command -- we now have a sane, usable, flexible, intelligent scheme for doing standard, alternate, and custom installations (and it's even documented!) (thanks to Fred Drake and Guido van Rossum for design help) * straightened out the incompatibilities between the UnixCCompiler and MSVCCompiler classes, and cleaned up the whole mechanism for compiling C code in the process * reorganized the build directories: now build to either "build/lib" or "build/lib.", with temporary files (eg. compiler turds) in "build/temp." * merged the "install_py" and "install_ext" commands into "install_lib" -- no longer any sense in keeping them apart, since pure Python modules and extension modules build to the same place * added --debug (-g) flag to "build_*" commands, and make that carry through to compiler switches, names of extensions on Windows, etc. * fixed many portability bugs on Windows (thanks to many people) * beginnings of support for Mac OS (I'm told that it's enough for the Distutils to install itself) (thanks to Corran Webster) * actually pay attention to the "--rpath" option to "build_ext" (thanks to Joe Van Andel for spotting this lapse) * added "clean" command (thanks to Bastien Kleineidam) * beginnings of support for creating built distributions: changes to the various build and install commands to support it, and added the "bdist" and "bdist_dumb" commands * code reorganization: split core.py up into dist.py and cmd.py, util.py into *_util.py * removed global "--force" option -- it's now up to individual commands to define this if it makes sense for them * better error-handling (fewer extravagant tracebacks for errors that really aren't the Distutils' fault -- Greg Ward - just another Python hacker gward at python.net http://starship.python.net/~gward/ All the world's a stage and most of us are desperately unrehearsed. From jon at dgs.monash.edu.au Wed Apr 12 04:40:23 2000 From: jon at dgs.monash.edu.au (Jonathan Giddy) Date: Wed, 12 Apr 2000 12:40:23 +1000 (EST) Subject: [Python-Dev] Re: ANNOUNCE: Python 1.6 alpha 2 In-Reply-To: <"00Apr11.191729pdt.3438"@watson.parc.xerox.com> from "Bill Janssen" at Apr 11, 2000 07:17:38 PM Message-ID: <200004120240.MAA11342@nexus.csse.monash.edu.au> Bill Janssen declared: > >ILU seems to work fine with it. > >Bill Without wishing to jinx this good news, isn't the release of 1.6 the appropriate time to remove the redundant thread.h file? Jon. From Vladimir.Marangozov at inrialpes.fr Wed Apr 12 05:18:26 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:18:26 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F35EE4.7E741801@tismer.com> from "Christian Tismer" at Apr 11, 2000 07:20:36 PM Message-ID: <200004120318.FAA06750@python.inrialpes.fr> Christian Tismer wrote: > > Vladimir Marangozov wrote: > > > > Not enough good reasons to segfault. I suggest you move the > > call to _PyTrash_deposit_object in TRASHCAN_BEGIN and invert > > the condition there. > > Sorry, I don't see what you are suggesting, I'm distracted. I was thinking about the following. Change the macros in object.h from: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ else \ _PyTrash_deposit_object((PyObject*)op);\ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ to: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ where _PyTrash_deposit_object returns 0 on success, -1 on failure. This gives another last chance to the system to finalize the object, hoping that the stack won't overflow. :-) My point is that it is better to control whether _PyTrash_deposit_object succeeds or not (and it may fail because of PyList_Append). If this doesn't sound acceptable (because of the possible stack overflow) it would still be better to abort in _PyTrash_deposit_object with an exception "stack overflow on recursive finalization" when PyList_Append fails. Leaving it unchecked is not nice -- especially in such extreme situations. Currently, if something fails, the object is not finalized (leaking memory). Ok, so be it. What's not nice is that this happens silently which is not the kind of tolerance I would accept from the Python runtime. As to the bug: it's curious that, as Mark reported, without the trashcan logic, things seem to run fine. The trashcan seems to provoke (ok, detect ;) some erroneous situation. I'd expect that if the trashcan macros are implemented as above, the crash will go away (which wouldn't solve the problem and would obviate the trashcan in the first place :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov at inrialpes.fr Wed Apr 12 05:34:48 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:34:48 +0200 (CEST) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <200004120318.FAA06750@python.inrialpes.fr> from "Vladimir Marangozov" at Apr 12, 2000 05:18:26 AM Message-ID: <200004120334.FAA06784@python.inrialpes.fr> Of course, this Vladimir Marangozov wrote: > > to: > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > { \ > ++_PyTrash_delete_nesting; \ > if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > was meant to be this: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From Vladimir.Marangozov at inrialpes.fr Wed Apr 12 05:54:13 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 05:54:13 +0200 (CEST) Subject: [Python-Dev] trashcan and PR#7 Message-ID: <200004120354.FAA06834@python.inrialpes.fr> While I'm at it, maybe the same recursion control logic could be used to remedy (most probably in PyObject_Compare) PR#7: "comparisons of recursive objects" reported by David Asher? -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein at lyra.org Wed Apr 12 06:09:19 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 11 Apr 2000 21:09:19 -0700 (PDT) Subject: [Python-Dev] Extensible library packages In-Reply-To: <025e01bfa3d3$aa182800$34aab5d4@hagrid> Message-ID: On Tue, 11 Apr 2000, Fredrik Lundh wrote: > Andrew M. Kuchling wrote: > > For 1.6, the XML-SIG wants to submit a few more things, mostly a small > > SAX implementation. > > > Can anyone suggest a good solution? Fixing this may not require > > changing the core in any way, but the cleanest solution isn't obvious. > > saxlib.py ? > > (yes, I'm serious) +1 When we solve the problem of installing items into "core" Python packages, then we can move saxlib.py (along with the rest of the modules in the standard library). Cheers, -g -- Greg Stein, http://www.lyra.org/ From pf at artcom-gmbh.de Wed Apr 12 07:43:59 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 07:43:59 +0200 (MEST) Subject: [Python-Dev] Extensible library packages In-Reply-To: <200004111619.MAA05881@amarok.cnri.reston.va.us> from "Andrew M. Kuchling" at "Apr 11, 2000 12:19:21 pm" Message-ID: Hi! Andrew M. Kuchling: [...] > The problem is that, if the Python standard library includes a package > named 'xml', ... [...] > Can anyone suggest a good solution? Fixing this may not require > changing the core in any way, but the cleanest solution isn't obvious. I dislike the idea of having user visible packages in the standard library too. As Fredrik already suggested, putting a file 'saxlib.py' into the lib, which exposes all what a user needs to know about 'sax' seems to be the best solution. Regards, Peter From tim_one at email.msn.com Wed Apr 12 09:52:01 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 12 Apr 2000 03:52:01 -0400 Subject: [Python-Dev] The purpose of the 'repr' builtin function In-Reply-To: Message-ID: <000401bfa453$fca9f1e0$ae2d153f@tim> [Peter Funk] > ... > So if default behaviour of the interactive interpreter would be changed > not to use 'repr()' for objects typed at the prompt (I believe Tim > Peters suggested that), this wouldn't help to make lists, tuples and > dictionaries containing floats more readable. Or lists, tuples and dicts of anything else either: that's what I'm getting at when I keep saying containers should "pass str() down" to containees. That it doesn't has frustrated me for years; newbies aren't bothered by it because before 1.6 str == repr for almost all builtin types, and newbies (by definition ) don't have any classes of their own overriding __str__ or __repr__. But I do, and their repr is rarely what I want to see in the shell. This is a different issue than (but related to) what the interactive prompt should use by default to format expression results. They have one key conundrum in common, though: if str() is simply passed down with no other change, then e.g. print str({"a:": "b, c", "a, b": "c"}) and (same thing in disguise) print {"a:": "b, c", "a, b": "c"} would display {a:: b, c, a, b: c} and that's darned unreadable. As far as I can tell, the only reason str(container) invokes repr on the containees today is simply to get some string quotes in output like this. That's fine so far as it goes, but leads to miserably bloated displays for containees of many types *other* than the builtin ones -- and even for string containees leads to embedded octal escape sequences all over the place. > I don't know how to fix this, though. :-( Sure you do! And we look forward to your patch . From gstein at lyra.org Wed Apr 12 10:09:30 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 01:09:30 -0700 (PDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes In-Reply-To: <14579.45990.603625.434317@anthem.cnri.reston.va.us> Message-ID: On Tue, 11 Apr 2000, Barry A. Warsaw wrote: > Here's the second go at adding arbitrary attribute support to function > and method objects. Note that this time it's illegal (TypeError) to > set an attribute on a bound method object; getting an attribute on a > bound method object returns the value on the underlying function > object. First the diffs, then the test case and test output. In the instancemethod_setattro function, it might be nice to do the speed optimization and test for sname[0] == 'i' before hitting the strcmp() calls. Oh: policy question: I would think that these attributes *should* be available in restricted mode. They aren't "sneaky" like the builtin attributes. Rather than PyMapping_Get/SetItemString()... PyObject_Get/SetItem() should be used. They apply to mappings and will be faster. Note that (internally) the PyMapping_Get/SetItemString use the latter forms (after constructing a string object(!)). ... whoops. I see that the function object doesn't use the ?etattro() variants. hrm. The stuff is looking really good! Cheers, -g -- Greg Stein, http://www.lyra.org/ From andy at reportlab.com Wed Apr 12 10:18:40 2000 From: andy at reportlab.com (Andy Robinson) Date: Wed, 12 Apr 2000 09:18:40 +0100 Subject: [Python-Dev] UTF-8 is no fun... In-Reply-To: <20000412035101.F38D71CE29@dinsdale.python.org> Message-ID: > > Things are hard to get right when you have to deal with > > backward *and* forward compatibility, interoperability and > > user-friendliness all at the same time... but we'll keep > > trying ;-) > > Let me say publically that I think you have done a fine job, and > obviously have put lots of thought and effort into it. If parts of > the design turn out to be less than ideal (and subsequently changed > before 1.6 is real) then this will not detract from your excellent > work. > > Well done! > > [And also to Fredrik, whose code was the basis for the Unicode > object itself - that was a nice piece of code too!] > Mark I've spent a fair bit of time converting strings and files the last few days, and I'd add that what we have now seems both rock solid and very easy to use. The remaining issues are entirely a matter of us end users trying to figure out what we should have asked for in the first place. Whether we achieve that finally before 1.6 is our problem; Marc-Andr\u00C9 and Fredrik have done a great job, and I think we are on track for providing something much more useful and extensible than (say) Java. As proof of this, someone has already contributed Japanese codecs based on the spec. - Andy Robinson From pf at artcom-gmbh.de Wed Apr 12 10:11:23 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 10:11:23 +0200 (MEST) Subject: [Python-Dev] Improving readability of interpreter expression output (was The purpose of 'repr'...) In-Reply-To: <000401bfa453$fca9f1e0$ae2d153f@tim> from Tim Peters at "Apr 12, 2000 3:52: 1 am" Message-ID: Hi! Tim Peters: [...] > This is a different issue than (but related to) what the interactive prompt > should use by default to format expression results. They have one key > conundrum in common, though: if str() is simply passed down with no other > change, then e.g. > > print str({"a:": "b, c", "a, b": "c"}) > and (same thing in disguise) > print {"a:": "b, c", "a, b": "c"} > > would display > > {a:: b, c, a, b: c} > > and that's darned unreadable. Would you please elaborate a bit more, what you have in mind with "other change" in your sentence above? > As far as I can tell, the only reason > str(container) invokes repr on the containees today is simply to get some > string quotes in output like this. That's fine so far as it goes, but leads > to miserably bloated displays for containees of many types *other* than the > builtin ones -- and even for string containees leads to embedded octal > escape sequences all over the place. > > > I don't know how to fix this, though. :-( > > Sure you do! And we look forward to your patch . No. Serious. I don't see how to fix the 'darned unreadable' output. passing 'str' down seems to be simple. But how to fix the problem above isn't obvious to me. Regards, Peter From mal at lemburg.com Wed Apr 12 10:17:02 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 10:17:02 +0200 Subject: [Python-Dev] #pragmas in Python source code Message-ID: <38F430FE.BAF40AB8@lemburg.com> There currently is a discussion about how to write Python source code in different encodings on i18n. The (experimental) solution so far has been to add a command line switch to Python which tells the compiler which encoding to expect for u"...strings..." ("...8-bit strings..." will still be used as is -- it's the user's responsibility to use the right encoding; the Unicode implementation will still assume them to be UTF-8 encoded in automatic conversions). In the end, a #pragma should be usable to tell the compiler which encoding to use for decoding the u"..." strings. What we need now, is a good proposal for handling these #pragmas... does anyone have experience with these ? Any ideas ? Here's a simple strawman for the syntax: # pragma key: value parser = re.compile( '^#\s*pragma\s+' '([a-zA-Z_][a-zA-Z0-9_]*):\s*' '(.+)' ) For the encoding this would be something like: # pragma encoding: unicode-escape The compiler would scan these pragma defs, add them to an internal temporary dictionary and use them for all subsequent code it finds during the compilation process. The dictionary would have to stay around until the original compile() call has completed (spanning recursive calls). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Wed Apr 12 11:24:09 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 02:24:09 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <000401bfa260$33e6ff40$812d153f@tim> Message-ID: Sorry, i'm a little behind on this. I'll try to catch up over the next day or two. On Sun, 9 Apr 2000, Tim Peters wrote: > > Note the example from another reply of a machine with 2-bit floats. There > the user would see: > > >>> 0.75 # happens to be exactly representable on this machine > 0.8 # because that's the shortest string needed on this machine > # to get back 0.75 internally > >> > > This kind of surprise is inherent in the approach, not specific to 2-bit > machines . Okay, okay. But on a 2-bit machine you ought to be no more surprised by the above than by >>> 0.1 + 0.1 0.0 >>> 0.4 + 0.4 1.0 In fact, i suppose one could argue that 0.8 is just as honest as 0.75, as you could get 0.8 from anything in (0.625, 0.825)... or even *more* honest than 0.75, since "0.75" shows more significant digits than the precision of machine would justify. It could be argued either way. I don't see this as a fatal flaw of the 'smartrepr' method, though. After looking at the spec for java.lang.Float.toString() and the Clinger paper you mentioned, it appears to me that both essentially describe 'smartrepr', which seems encouraging. > BTW, I don't know that it will never print more digits than you type: did > you prove that? It's plausible, but many plausible claims about fp turn out > to be false. Indeed, fp *is* tricky, but i think in this case the proof actually is pretty evident -- The 'smartrepr' routine i suggested prints the representation with the fewest number of digits which converts back to the actual value. Since the thing that you originally typed converted to that value the first time around, certainly no *more* digits than what you typed are necessary to produce that value again. QED. > > - If you type in what the interpreter displays for a > > float, you can be assured of getting the same value. > > This isn't of value for most interactive use -- in general you want to see > the range of a number, not enough to get 53 bits exactly (that's beyond the > limits of human "number sense"). What do you mean by "the range of a number"? > It also has one clearly bad aspect: when > printing containers full of floats, the number of digits printed for each > will vary wildly from float to float. Makes for an unfriendly display. Yes, this is something you want to be able to control -- read on. > If the prompt's display function were settable, I'd probably plug in pprint! Since i've managed to convince Guido that such a hook might be nice, i seem to have worked myself into the position of being responsible for putting together a patch to do so... Configurability is good. It won't solve everything, but at least the flexibility provided by a "display" hook will let everybody have the ability to play whatever tricks they want. (Or, equivalently: to anyone who complains about the interpreter display, at least we have plausible grounds on which to tell them to go fix it themselves.) :) Here is what i have in mind: provide two hooks __builtins__.display(object) and __builtins__.displaytb(traceback, exception) that are called when the interpreter needs to display a result or when the top level catches an exception. Protocol is simple: 'display' gets one argument, an object, and can do whatever the heck it wants. 'displaytb' gets a traceback and an exception, and can do whatever the heck it wants. -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From fredrik at pythonware.com Wed Apr 12 11:39:03 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 11:39:03 +0200 Subject: [Python-Dev] UTF-8 is no fun... References: Message-ID: <007801bfa462$f0f399f0$0500a8c0@secret.pythonware.com> Andy Robinson wrote: > I've spent a fair bit of time converting strings and files the > last few days, and I'd add that what we have now seems both rock solid > and very easy to use. I'm not worried about the core string types or the conversion machinery; what disturbs me is mostly the use of automagic conversions to UTF-8, which breaks the fundamental assumption that a string is a sequence of len(string) characters. "The items of a string are characters. There is no separate character type; a character is represented by a string of one item" (from the language reference) I still think the "all strings are sequences of unicode characters" strawman I posted earlier would simplify things for everyone in- volved (programmers, users, and the interpreter itself). more on this later. gotta ship some code first. From Vladimir.Marangozov at inrialpes.fr Wed Apr 12 11:47:56 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Wed, 12 Apr 2000 11:47:56 +0200 (CEST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14577.63691.561040.281577@anthem.cnri.reston.va.us> from "Barry Warsaw" at Apr 10, 2000 11:52:43 AM Message-ID: <200004120947.LAA02067@python.inrialpes.fr> Barry Warsaw wrote: > > A number of people have played FAST and loose with function and method > docstrings, including John Aycock[1], Zope's ORB[2]. Docstrings are > handy because they are the one attribute on funcs and methods that are > easily writable. But as more people overload the semantics for > docstrings, we'll get collisions. I've had a number of discussions > with folks about adding attribute dictionaries to functions and > methods so that you can essentially add any attribute. Namespaces are > one honking great idea -- let's do more of those! Barry, I wonder what for... Just because there's a Python entity implemented as a C structure in which we can easily include a dict + access functions? I don't see the purpose of attaching state (vars) to an algorithm (i.e. a function). What are the benefits compared to class instances? And these special assignment rules because of the real overlap with real instances... Grrr, all this is pretty dark, conceptually. Okay, I inderstood: modules become classes, functions become instances, module variables are class variables, and classes become ... 2-nd order instances of modules. The only missing piece of the puzzle is a legal way to instantiate modules for obtaining functions and classes dynamically, because using eval(), the `new' module or The Hook is perceived as very hackish and definitely not OO. Once the puzzle would be solved, we'll discover that there would be only one additional little step towards inheritance for modules. How weird! Sounds like we're going to metaclass again... -1 until P3K. This is no so cute as it is dangerous. It opens the way to mind abuse. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From fredrik at pythonware.com Wed Apr 12 12:04:32 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 12:04:32 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <009601bfa466$807da2c0$0500a8c0@secret.pythonware.com> Vladimir Marangozov wrote: > I don't see the purpose of attaching state (vars) to an algorithm > (i.e. a function). What are the benefits compared to class instances? > > And these special assignment rules because of the real overlap with > real instances... Grrr, all this is pretty dark, conceptually. > > -1 until P3K. I agree. From tismer at tismer.com Wed Apr 12 14:08:51 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 14:08:51 +0200 Subject: [Python-Dev] trashcan and PR#7 References: <200004120354.FAA06834@python.inrialpes.fr> Message-ID: <38F46753.3759A7B6@tismer.com> Vladimir Marangozov wrote: > > While I'm at it, maybe the same recursion control logic could be > used to remedy (most probably in PyObject_Compare) PR#7: > "comparisons of recursive objects" reported by David Asher? Hey, what a good idea. You know what's happening? We are moving towards tail recursion. If we do this everywhere, Python converges towards Stackless Python. and-most-probably-a-better-one-than-mince - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From paul at prescod.net Wed Apr 12 14:20:18 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 07:20:18 -0500 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <38F46A02.3AB10147@prescod.net> Vladimir Marangozov wrote: > > ... > > I don't see the purpose of attaching state (vars) to an algorithm > (i.e. a function). A function is also an object. > What are the benefits compared to class instances? If I follow you, you are saying that whenever you need to associate information with a function, you should wrap up the function and object into a class. But the end result of this transformation could be a program in which every single function is a class. That would be incredibly annoying, especially with Python's scoping rules. In general, it may not even be possible. Consider the following cases: * I need to associate a Java-style type declaration with a method so that it can be recognized based on its type during Java method dispatch. How would you do that with instances? * I need to associate a "grammar rule" with a Python method so that the method is invoked when the parser recognizes a syntactic construct in the input data. * I need to associate an IDL declaration with a method so that a COM interface definition can be generated from the source file. * I need to associate an XPath "pattern string" with a Python method so that the method can be invoked when a tree walker discovers a particular pattern in an XML DOM. * I need to associate multiple forms of documentation with a method. They are optimized for different IDEs, environments or languages. > And these special assignment rules because of the real overlap with > real instances... Grrr, all this is pretty dark, conceptually. I don't understand what you are saying here. > Once the puzzle would be solved, we'll discover that there would be only > one additional little step towards inheritance for modules. How weird! > Sounds like we're going to metaclass again... I don't see what any of this has to do with Barry's extremely simple idea. Functions *are objects* in Python. It's too late to change that. Objects can have properties. Barry is just allowing arbitrary properties to be associated with functions. I don't see where there is anything mysterious here. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From akuchlin at mems-exchange.org Wed Apr 12 14:22:26 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 12 Apr 2000 08:22:26 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <200004120947.LAA02067@python.inrialpes.fr> References: <14577.63691.561040.281577@anthem.cnri.reston.va.us> <200004120947.LAA02067@python.inrialpes.fr> Message-ID: <14580.27266.683908.216344@newcnri.cnri.reston.va.us> Vladimir Marangozov writes: >Barry, I wonder what for... In the two quoted examples, docstrings are used to store additional info about a function. SPARK uses them to contain grammar rules and the regular expressions for matching tokens. The object publisher in Zope uses the presence of a docstring to indicate whether a function or method is publicly accessible. As a third example, the optional type info being thrashed over in the Types-SIG would be another annotation for a function (though doing def f(): ... f.type = 'void' would be really clunky. >Once the puzzle would be solved, we'll discover that there would be only >one additional little step towards inheritance for modules. How weird! >Sounds like we're going to metaclass again... No, that isn't why Barry is experimenting with this -- instead, it's simply because annotating functions seems useful, but everyone uses the docstring because it's the only option. --amk From tismer at tismer.com Wed Apr 12 14:43:40 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 14:43:40 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004120318.FAA06750@python.inrialpes.fr> Message-ID: <38F46F7C.94D29561@tismer.com> Vladimir Marangozov wrote: > > Christian Tismer wrote: > > > > Vladimir Marangozov wrote: [yup, good looking patch] > where _PyTrash_deposit_object returns 0 on success, -1 on failure. This > gives another last chance to the system to finalize the object, hoping > that the stack won't overflow. :-) > > My point is that it is better to control whether _PyTrash_deposit_object > succeeds or not (and it may fail because of PyList_Append). > If this doesn't sound acceptable (because of the possible stack overflow) > it would still be better to abort in _PyTrash_deposit_object with an > exception "stack overflow on recursive finalization" when PyList_Append > fails. Leaving it unchecked is not nice -- especially in such extreme > situations. You bet that I *would* raise an exception if I could. Unfortunately the destructors have no way to report an error, and they are always called in a context where no error is expected (Py_DECREF macro). I believe this *was* quite ok, until __del__ was introduced. After that, it looks to me like a design flaw. IMHO there should not be a single function in a system that needs heap memory, and cannot report an error. > Currently, if something fails, the object is not finalized (leaking > memory). Ok, so be it. What's not nice is that this happens silently > which is not the kind of tolerance I would accept from the Python runtime. Yes but what can I do? This isn't worse than before. deletion errors die silently, this is the current concept. I don't agree with it, but I'm not the one to change policy. In that sense, trashcan was just compliant to a concept, without saying this is a good concept. :-) > As to the bug: it's curious that, as Mark reported, without the trashcan > logic, things seem to run fine. The trashcan seems to provoke (ok, detect ;) > some erroneous situation. I'd expect that if the trashcan macros are > implemented as above, the crash will go away (which wouldn't solve the > problem and would obviate the trashcan in the first place :-) I think trashcan can be made *way* smarter: Much much more better would be to avoid memory allocation in trashcan at all. I'm wondering if that would be possible. The idea is to catch a couple of objects in an earlier recursion level, and use them as containers for later objects-to-be-deleted. Not using memory at all, that's what I want. And it would avoid all messing with errors in this context. I hate Java dieing silently, since it has not enough memory to tell me that it has not enough memory :-) but-before-implementing-this-*I*-will-need-to-become-*way*-smarter - ly y'rs - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fredrik at pythonware.com Wed Apr 12 14:50:21 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 14:50:21 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> Message-ID: <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Paul Prescod wrote: > * I need to associate a Java-style type declaration with a method so > that it can be recognized based on its type during Java method dispatch. class foo: typemap = {} def myfunc(self): pass typemap[myfunc] = typeinfo > * I need to associate a "grammar rule" with a Python method so that the > method is invoked when the parser recognizes a syntactic construct in > the input data. class foo: rules = [] def myfunc(self): pass rules.append(pattern, myfunc) > * I need to associate an IDL declaration with a method so that a COM > interface definition can be generated from the source file. class foo: idl = {} def myfunc(self): pass idl[myfunc] = "declaration" > * I need to associate an XPath "pattern string" with a Python method so > that the method can be invoked when a tree walker discovers a particular > pattern in an XML DOM. class foo: xpath = [] def myfunc(self): pass xpath.append("pattern", myfunc) From tismer at tismer.com Wed Apr 12 15:00:39 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 15:00:39 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: <200004120334.FAA06784@python.inrialpes.fr> Message-ID: <38F47377.91306DA1@tismer.com> Mark, I know you are very busy. But I have no chance to build a debug version, and probably there are more differences. Can you perhaps try Vlad's patch? and tell me if the outcome changes? This would give me much more insight. The change affects the macros and the function _PyTrash_deposit_object which now must report an error via the return value. The macro code should be: #define Py_TRASHCAN_SAFE_BEGIN(op) \ { \ ++_PyTrash_delete_nesting; \ if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ _PyTrash_deposit_object((PyObject*)op) != 0) { \ #define Py_TRASHCAN_SAFE_END(op) \ ;} \ --_PyTrash_delete_nesting; \ if (_PyTrash_delete_later && _PyTrash_delete_nesting <= 0) \ _PyTrash_destroy_list(); \ } \ And the _PyTrash_deposit_object code should be (untested): int _PyTrash_deposit_object(op) PyObject *op; { PyObject *error_type, *error_value, *error_traceback; if (PyThreadState_GET() != NULL) PyErr_Fetch(&error_type, &error_value, &error_traceback); if (!_PyTrash_delete_later) _PyTrash_delete_later = PyList_New(0); if (_PyTrash_delete_later) return PyList_Append(_PyTrash_delete_later, (PyObject *)op); else return -1; if (PyThreadState_GET() != NULL) PyErr_Restore(error_type, error_value, error_traceback); return 0; } The result of this would be really enlighting :-) ciao - chris Vladimir Marangozov wrote: > > Of course, this > > Vladimir Marangozov wrote: > > > > to: > > > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > > { \ > > ++_PyTrash_delete_nesting; \ > > if (_PyTrash_delete_nesting >= PyTrash_UNWIND_LEVEL && \ > > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > > > > was meant to be this: > > #define Py_TRASHCAN_SAFE_BEGIN(op) \ > { \ > ++_PyTrash_delete_nesting; \ > if (_PyTrash_delete_nesting < PyTrash_UNWIND_LEVEL || \ > _PyTrash_deposit_object((PyObject*)op) != 0) { \ > > -- > Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr > http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at tismer.com Wed Apr 12 16:43:30 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 16:43:30 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Message-ID: <38F48B92.94477DE9@tismer.com> Fredrik Lundh wrote: > > Paul Prescod wrote: > > * I need to associate a Java-style type declaration with a method so > > that it can be recognized based on its type during Java method dispatch. > > class foo: > typemap = {} > def myfunc(self): > pass > typemap[myfunc] = typeinfo Yes, I know that nearly everything is possible to be emulated via classes. But what is so bad about an arbitrary function attribute? ciao - chris p.s.: Paul, did you know that you can use *anything* for __doc__? You could use a class instance instead which still serves as a __doc__ but has your attributes and more. Yes I know this is ugly :-)) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake at acm.org Wed Apr 12 16:47:22 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 10:47:22 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F430FE.BAF40AB8@lemburg.com> References: <38F430FE.BAF40AB8@lemburg.com> Message-ID: <14580.35962.86559.128123@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Here's a simple strawman for the syntax: ... > The compiler would scan these pragma defs, add them to an > internal temporary dictionary and use them for all subsequent > code it finds during the compilation process. The dictionary > would have to stay around until the original compile() call has > completed (spanning recursive calls). Marc-Andre, The problem with this proposal is that the pragmas are embedded in the comments; I'd rather see a new keyword and statement. It could be defined something like: pragma_atom: NAME | NUMBER | STRING+ pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* The biggest problem with embedding it in comments is that it is no longer part of the syntax tree generated by the parser. The pragmas become global to the module on a de-facto basis. While this is probably reasonable for the sorts of pragmas we've thought of so far, this seems an unnecessary restriction; future tools may support scoped pragmas to help out with selection of optimization strategies, for instance, or other applications. If we were to go with a strictly global view of pragmas, we'd need to expose the dictionary created by the parser. The parser module would need to be able to expose the dictionary and accept a dictionary when receiving a parse tree for compilation. The internals just can't be *too* internal! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gvwilson at nevex.com Wed Apr 12 16:55:55 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 12 Apr 2000 10:55:55 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: Is there any way to unify Barry's proposal for enriching doc strings with Marc-Andre's proposal for pragmas? I.e., can pragmas be doc dictionary entries on modules that have particular keys? This would make them part of the parse tree (as per Fred Drake's comments), but not require (extra) syntax changes. Greg From bwarsaw at cnri.reston.va.us Wed Apr 12 17:37:06 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 11:37:06 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> Message-ID: <14580.38946.206846.261405@anthem.cnri.reston.va.us> Functions and methods are first class objects, and they already have attributes, some of which are writable. Why should __doc__ be special? Just because it was the first such attribute to have syntactic support for easily defining? Think about my proposal this way: it actual removes a restriction. What I don't like about /F's approach is that if you were building a framework, you'd now have two conventions you'd have to describe: where to find the mapping, and what keys to use in that mapping. With attributes, you've already got the former: getattr(). Plus, let's say you're handed a method object `x', would you rather do: if x.im_class.typemap[x.im_func] == 'int': ... or if x.__type__ == 'int': ... And what about function objects (as opposed to unbound methods). Where do you stash the typemap? In the module, I supposed. And if you can be passed either type of object, do you now have to do this? if hasattr(x, 'im_class'): if hasattr(x.im_class, 'typemap'): if x.im_class.typemap[x.im_func] == 'int': ... elif hasattr(x, 'func_globals'): if x.func_globals.has_key('typemap'): if x.func_globals['typemap'][x] == 'int': ... instead of the polymorphic elegance of if x.__type__ == 'int': ... Finally, think of this proposal as an evolutionary step toward enabling all kinds of future frameworks. At some point, there may be some kind of optional static type system. There will likely be some syntactic support for easily specifying the contents of the __type__ attribute. With the addition of func/meth attrs now, we can start to play with prototypes of this system, define conventions and standards, and then later when there is compiler support, simplify the definitions, but not have to change code that uses them. -Barry From akuchlin at mems-exchange.org Wed Apr 12 17:39:54 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Wed, 12 Apr 2000 11:39:54 -0400 (EDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: References: <000401bfa260$33e6ff40$812d153f@tim> Message-ID: <14580.39114.631398.101252@amarok.cnri.reston.va.us> Ka-Ping Yee writes: >Here is what i have in mind: provide two hooks > __builtins__.display(object) >and > __builtins__.displaytb(traceback, exception) Shouldn't these be in sys, along with sys.ps1 and sys.ps2? We don't want to add new display() and displaytb() built-ins, do we? --amk From pf at artcom-gmbh.de Wed Apr 12 17:37:05 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 17:37:05 +0200 (MEST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> from "Fred L. Drake, Jr." at "Apr 12, 2000 10:47:22 am" Message-ID: Hi! Fred L. Drake, Jr.: > M.-A. Lemburg writes: > > Here's a simple strawman for the syntax: > ... > > The compiler would scan these pragma defs, add them to an > > internal temporary dictionary and use them for all subsequent > > code it finds during the compilation process. The dictionary > > would have to stay around until the original compile() call has > > completed (spanning recursive calls). > > Marc-Andre, > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* This would defeat an important goal: backward compatibility: You can't add 'pragma division: old' or something like this to a source file, which should be able to run with both Python 1.5.2 and Py3k. This would make this mechanism useless for several important applications of pragmas. Here comes David Scherers idea into play. The relevant emails of this thread are in the archive at: > The biggest problem with embedding it in comments is that it is no > longer part of the syntax tree generated by the parser. The pragmas > become global to the module on a de-facto basis. While this is > probably reasonable for the sorts of pragmas we've thought of so far, > this seems an unnecessary restriction; future tools may support scoped > pragmas to help out with selection of optimization strategies, for > instance, or other applications. [...] IMO this is overkill. For all real applications that have been discussed so far, global pragmas are sufficient: - source file character encoding - language level - generated division operator byte codes - generated comparision operators byte codes (comparing strings and numbers) I really like Davids idea to use 'global' at module level for the purpose of pragmas. And this idea has also the advantage that Guido already wrote the idea is "kind of cute and backwards compatible". Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From bwarsaw at cnri.reston.va.us Wed Apr 12 17:56:18 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 11:56:18 -0400 (EDT) Subject: [Python-Dev] Second round: arbitrary function and method attributes References: <14579.45990.603625.434317@anthem.cnri.reston.va.us> Message-ID: <14580.40098.690512.903519@anthem.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GS> In the instancemethod_setattro function, it might be nice to GS> do the speed optimization and test for sname[0] == 'i' before GS> hitting the strcmp() calls. Yeah, you could do that, but it complicates the code and the win seems negligable. GS> Oh: policy question: I would think that these attributes GS> *should* be available in restricted mode. They aren't "sneaky" GS> like the builtin attributes. Hmm, good point. That does simplify the code too. I wonder if the __dict__ itself should be restricted, but that doesn't seem like it would buy you much. We don't need to restrict them in classobject anyway, because they are already restricted in funcobject (which ends up getting the call anyway). It might be reasonable to relax that for arbitrary func attrs. GS> Rather than GS> PyMapping_Get/SetItemString()... PyObject_Get/SetItem() should GS> be used. They apply to mappings and will be faster. Note that GS> (internally) the PyMapping_Get/SetItemString use the latter GS> forms (after constructing a string object(!)). ... whoops. I GS> see that the function object doesn't use the ?etattro() GS> variants. hrm. Okay cool. Made these changes and `attro'd 'em too. GS> The stuff is looking really good! Thanks! -Barry From mal at lemburg.com Wed Apr 12 17:52:34 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 17:52:34 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F430FE.BAF40AB8@lemburg.com> <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: <38F49BC2.9C192C63@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > Here's a simple strawman for the syntax: > ... > > The compiler would scan these pragma defs, add them to an > > internal temporary dictionary and use them for all subsequent > > code it finds during the compilation process. The dictionary > > would have to stay around until the original compile() call has > > completed (spanning recursive calls). > > Marc-Andre, > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* > > The biggest problem with embedding it in comments is that it is no > longer part of the syntax tree generated by the parser. The pragmas > become global to the module on a de-facto basis. While this is > probably reasonable for the sorts of pragmas we've thought of so far, > this seems an unnecessary restriction; future tools may support scoped > pragmas to help out with selection of optimization strategies, for > instance, or other applications. Fine with me, but this probably isn't going to make it into 1.7 and I don't want to wait until Py3K... perhaps there is another way to implement this without adding a new keyword, e.g. we could first use some kind of hack to implement "# pragma ..." and then later on allow dropping the "#" to make full use of the new mechanism. > If we were to go with a strictly global view of pragmas, we'd need > to expose the dictionary created by the parser. The parser module > would need to be able to expose the dictionary and accept a dictionary > when receiving a parse tree for compilation. The internals just can't > be *too* internal! ;) True :-) BTW, while poking around in the tokenizer/compiler I found a serious bug in the way concatenated strings are implemented: right now the compiler expects to always find string objects, yet it could just as well receive Unicode objects or even mixed string and Unicode objects. Try it: u = (u"abc" u"abc") dumps core ! I'll fix this with the next patch set. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Wed Apr 12 18:12:59 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 18:12:59 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4A08B.A855E69D@lemburg.com> Peter Funk wrote: > > Fred L. Drake, Jr.: > > M.-A. Lemburg writes: > > > Here's a simple strawman for the syntax: > > ... > > > The compiler would scan these pragma defs, add them to an > > > internal temporary dictionary and use them for all subsequent > > > code it finds during the compilation process. The dictionary > > > would have to stay around until the original compile() call has > > > completed (spanning recursive calls). > > > > Marc-Andre, > > The problem with this proposal is that the pragmas are embedded in > > the comments; I'd rather see a new keyword and statement. It could be > > defined something like: > > > > pragma_atom: NAME | NUMBER | STRING+ > > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* > > This would defeat an important goal: backward compatibility: You > can't add 'pragma division: old' or something like this to a source > file, which should be able to run with both Python 1.5.2 and Py3k. > This would make this mechanism useless for several important > applications of pragmas. Hmm, I don't get it: these pragmas would set variabels which make Python behave in a different way -- how do you plan to achieve backward compatibility here ? I mean, u = u"abc" raises a SyntaxError in Python 1.5.2 too... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at cnri.reston.va.us Wed Apr 12 18:37:20 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 12:37:20 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.38946.206846.261405@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <14580.42560.713427.885436@goon.cnri.reston.va.us> >>>>> "BAW" == Barry A Warsaw writes: BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. Why should BAW> __doc__ be special? Just because it was the first such BAW> attribute to have syntactic support for easily defining? I don't have a principled argument about why doc strings should be special, but I think that they should be. I think it's weird that you can change __doc__ at runtime; I would prefer that it be constant. BAW> Think about my proposal this way: it actually removes a BAW> restriction. I think this is really the crux of the matter! The proposal removes a useful restriction. The alternatives /F suggested seem clearer to me that sticking new attributes on functions and methods. Three things I like about the approach: It affords an opportunity to be very clear about how the attributes are intended to be used. I suspect it would be easier to describe with a static type system. It prevents confusion and errors that might result from unprincipled use of function attributes. Jeremy From gmcm at hypernet.com Wed Apr 12 18:56:24 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 12 Apr 2000 12:56:24 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.42560.713427.885436@goon.cnri.reston.va.us> References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <1256563909-46814536@hypernet.com> Jeremy Hylton wrote: > BAW> Think about my proposal this way: it actually removes a > BAW> restriction. > > I think this is really the crux of the matter! The proposal removes > a useful restriction. > > The alternatives /F suggested seem clearer to me that sticking new > attributes on functions and methods. Three things I like about the > approach: It affords an opportunity to be very clear about how the > attributes are intended to be used. I suspect it would be easier to > describe with a static type system. Having to be explicit about the method <-> regex / rule would severely damage SPARK's elegance. It would make Tim's doctest useless. > It prevents confusion and errors > that might result from unprincipled use of function attributes. While I'm sure I will be properly shocked and horrified when you come up with an example, in my naivety, I can't imagine what it will look like ;-). - Gordon From skip at mojam.com Wed Apr 12 19:28:04 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 12 Apr 2000 12:28:04 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.38946.206846.261405@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <14580.45604.756928.858721@beluga.mojam.com> BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. (Trying to read Fredrik's mind...) By extension, we should allow writable attributes to work for other objects. To pollute this discussion with an example from another one: i = 3.1416 i.__precision__ = 4 I haven't actually got anything against adding attributes to functions (or numbers, if it's appropriate). Just wondering out loud and playing a bit of a devil's advocate. Skip From ping at lfw.org Wed Apr 12 19:35:59 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 12:35:59 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > The problem with this proposal is that the pragmas are embedded in > the comments; I'd rather see a new keyword and statement. It could be > defined something like: > > pragma_atom: NAME | NUMBER | STRING+ > pragma_stmt: 'pragma' NAME ':' pragma_atom (',' pragma_atom)* Wa-wa-wa-wa-wait... i thought the whole point of pragmas was that they were supposed to control the operation of the parser itself (you know, set the source character encoding and so on). So by definition they would have to happen at a different level, above the parsing. Or do we need to separate out two categories of pragmas -- pre-parse and post-parse pragmas? -- ?!ng From tismer at tismer.com Wed Apr 12 19:39:34 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 19:39:34 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <38F4B4D6.6F954CDF@tismer.com> Skip Montanaro wrote: > > BAW> Functions and methods are first class objects, and they already > BAW> have attributes, some of which are writable. > > (Trying to read Fredrik's mind...) takes too long since it isn't countable infinite... > By extension, we should allow writable attributes to work for other objects. > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > > I haven't actually got anything against adding attributes to functions (or > numbers, if it's appropriate). Just wondering out loud and playing a bit of > a devil's advocate. please let me join your hexensabbat (de|en)lighted-ly -y'rs - rapunzel -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From fdrake at acm.org Wed Apr 12 19:38:26 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 13:38:26 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <14580.35962.86559.128123@seahag.cnri.reston.va.us> Message-ID: <14580.46226.990025.459426@seahag.cnri.reston.va.us> Ka-Ping Yee writes: > Wa-wa-wa-wa-wait... i thought the whole point of pragmas was > that they were supposed to control the operation of the parser > itself (you know, set the source character encoding and so on). > So by definition they would have to happen at a different level, > above the parsing. Hmm. That's one proposed use, which doesn't seem to fit well with my proposal. But I don't know that I'd think of that as a "pragma" in the general sense. I'll think about this one. I think encoding is a very special case, and I'm not sure I like dealing with it as a pragma. Are there any other (programming) languages that attempt to deal with multiple encodings? Perhaps I missed a message about it. > Or do we need to separate out two categories of pragmas -- > pre-parse and post-parse pragmas? Eeeks! We don't need too many special forms! That's ugly! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From moshez at math.huji.ac.il Wed Apr 12 19:36:14 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 12 Apr 2000 19:36:14 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > And voila! Numbers are no longer immutable. Using any numbers as keys in dicts? Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Wed Apr 12 19:45:15 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 12:45:15 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <14580.46226.990025.459426@seahag.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > Or do we need to separate out two categories of pragmas -- > > pre-parse and post-parse pragmas? > > Eeeks! We don't need too many special forms! That's ugly! Eek indeed. I'm tempted to suggest we drop the multiple-encoding issue (i can hear the screams now). But you're right, i've never heard of another language that can handle configurable encodings right in the source code. Is it really necessary to tackle that here? Gak, what do Japanese programmers do? Has anyone seen any of that kind of source code? -- ?!ng From effbot at telia.com Wed Apr 12 19:42:24 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 19:42:24 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: Message-ID: <002401bfa4a6$778fc360$34aab5d4@hagrid> Moshe Zadka wrote: > > To pollute this discussion with an example from another one: > > > > i = 3.1416 > > i.__precision__ = 4 > > And voila! Numbers are no longer immutable. Using any > numbers as keys in dicts? so? you can use methods as keys today, you know... From skip at mojam.com Wed Apr 12 19:47:01 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 12 Apr 2000 12:47:01 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: References: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <14580.46741.757469.645439@beluga.mojam.com> Moshe> On Wed, 12 Apr 2000, Skip Montanaro wrote: >> To pollute this discussion with an example from another one: >> >> i = 3.1416 >> i.__precision__ = 4 >> Moshe> And voila! Numbers are no longer immutable. Using any numbers as Moshe> keys in dicts? Yes, and I use functions on occasion as dict keys as well. >>> def foo(): pass ... >>> d = {foo: 1} >>> print d[foo] 1 I suspect adding methods to functions won't invalidate their use in that context, nor would adding attributes to numbers. At any rate, it was just an example. Skip From moshez at math.huji.ac.il Wed Apr 12 19:44:50 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 12 Apr 2000 19:44:50 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <002401bfa4a6$778fc360$34aab5d4@hagrid> Message-ID: On Wed, 12 Apr 2000, Fredrik Lundh wrote: > so? you can use methods as keys today, you know... Actually, I didn't know. What hapens if you use a method as a key, and then change it's doc string? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jeremy at cnri.reston.va.us Wed Apr 12 19:51:32 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 13:51:32 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> Message-ID: <14580.47012.646862.615623@goon.cnri.reston.va.us> >>>>> "GMcM" == Gordon McMillan writes: [please imagine that the c is raised] BAW> Think about my proposal this way: it actually removes a BAW> restriction. [Jeremy Hylton wrote:] >> I think this is really the crux of the matter! The proposal >> removes a useful restriction. >> >> The alternatives /F suggested seem clearer to me that sticking >> new attributes on functions and methods. Three things I like >> about the approach: It affords an opportunity to be very clear >> about how the attributes are intended to be used. I suspect it >> would be easier to describe with a static type system. GMcM> Having to be explicit about the method <-> regex / rule would GMcM> severely damage SPARK's elegance. It would make Tim's doctest GMcM> useless. Do either of these examples modify the __doc__ attribute? I am happy to think of both of them as elegant abuses of the doc string. (Not sure what semantics I mean for "elegant abuse" but not pejorative.) I'm not arguing that we should change the language to prevent them from using doc strings. Fred and I were just talking, and he observed that a variant of Python that included a syntactic mechanism to specify more than one attribute (effectively, a multiple doc string syntax) might be less objectionable than setting arbitrary attributes at runtime. Neither of us could imagine just what that syntax would be. >> It prevents confusion and errors that might result from >> unprincipled use of function attributes. GMcM> While I'm sure I will be properly shocked and horrified when GMcM> you come up with an example, in my naivety, I can't imagine GMcM> what it will look like ;-). It would look really, really bad ;-). I couldn't think of a good example, so I guess this is a FUD argument. A rough sketch, though, would be a program that assigned attribute X to all functions that were to be used in a certain way. If the assignment is a runtime operation, rather than a syntactic construct that defines a static attribute, it would be possible to accidentally assign attribute X to a function that was not intended to be used that way. This connection between a group of functions and a particular behavior would depend entirely on some runtime magic with settable attributes. Jeremy From mal at lemburg.com Wed Apr 12 19:55:19 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 19:55:19 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.42560.713427.885436@goon.cnri.reston.va.us> Message-ID: <38F4B887.2C16FF03@lemburg.com> Jeremy Hylton wrote: > BAW> Think about my proposal this way: it actually removes a > BAW> restriction. > > I think this is really the crux of the matter! The proposal removes > a useful restriction. Not sure... I wouldn't mind having the ability to add attributes to all Python objects at my own liking. Ok, maybe a bit far fetched, but the idea would certainly be useful in some cases, e.g. to add new methods to built-in types or to add encoding name information to strings... > The alternatives /F suggested seem clearer to me that sticking new > attributes on functions and methods. Three things I like about the > approach: It affords an opportunity to be very clear about how the > attributes are intended to be used. I suspect it would be easier to > describe with a static type system. It prevents confusion and errors > that might result from unprincipled use of function attributes. The nice side-effect of having these function/method instance dictionaries is that they follow class inheritance. Something which is hard to do right with Fredrik's approach. I suspect that in Py3K we'll only have one type of class system: everything inherits from one global base class -- seen in that light, method attributes are really nothing unusual, since all instances would have instance dictionaries anyway (well maybe only on demand, but that's another story). Anyway, more power to you Barry :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gmcm at hypernet.com Wed Apr 12 19:56:18 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 12 Apr 2000 13:56:18 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <38F4B4D6.6F954CDF@tismer.com> Message-ID: <1256560314-47031192@hypernet.com> Christian Tismer wrote: > > > Skip Montanaro wrote: > > (Trying to read Fredrik's mind...) > > takes too long since it isn't countable infinite... Bounded, however. And therefore, um, dense ... - Gordon From paul at prescod.net Wed Apr 12 19:57:01 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 12:57:01 -0500 Subject: [Python-Dev] #pragmas and method attributes References: <38F430FE.BAF40AB8@lemburg.com> Message-ID: <38F4B8ED.8BC64F69@prescod.net> About a month ago I wrote (but did not publish) a proposal that combined #pragmas and method attributes. The reason I combined them is that in a lot of cases method "attributes" are supposed to be available in the parse tree, before the program begins to run. Here is my rough draft. ---- We've been discussing method attributes for a long time and I think that it might be worth hashing out in more detail, especially for type declaration experimentation. I'm proposing a generalization of the "decl" keyword that hasbeen kicked around in the types-sig. Other applications include Spark grammar strings, XML pattern-trigger strings, multiple language doc-strings, IDE "hints", optimization hints, associated multimedia (down with glass ttys!), IDL definitions, thread locking declarations, method visibility declarations, ... Of course some subset of attributes might migrate into Python's "core language". Decl gives us a place to experiment and get them right before we do that migration. Declarations would always be associated with functions, classes or modules. They would be simple string-keyed values in a dictionary attached to the function, class or module called __decls__. The syntax would be decl { :"value", :"value" } Key would be a Python name. Value would be any Python string. In the case of a type declaration it might be: decl {type:"def(myint: int) returns bar", french_doc:"Bonjour", english_doc: "Hello"} def func( myint ): return bar() No string interpolation or other runtime-ish evaluation is done by the compiler on those strings. Neither the keys nor the values are evaluated as Python expressions. We could have a feature that would allow values to be dictionary-ish strings themselves: decl {type:"def(myint: int) returns bar", doc : "Bonjour", languages:{ french: "Hello"} } That would presumably be rare (if we allow it at all). Again, there would be no evaluation or interpolation. The left hand must be a name. The right must be a Code which depended on the declaration can do whatever it wants...if it has some idea of "execution context" and it wants to (e.g.) do interpolation with things that have percent signs, nobody would stop it. A decl that applies to a function or class immediately precedes the funtion or class. A decl that applies to a module precedes all other statements other than the docstring (which can be before or after). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "I and my companions suffer from a disease of the heart that can only be cured with gold", Hernan Cortes From moshez at math.huji.ac.il Wed Apr 12 19:55:54 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 12 Apr 2000 19:55:54 +0200 (IST) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256560314-47031192@hypernet.com> Message-ID: On Wed, 12 Apr 2000, Gordon McMillan wrote: > Bounded, however. And therefore, um, dense ... I sorta imagined it more like the Cantor set. Nowhere dense, but perfect sorry-but-he-started-with-the-maths-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw at cnri.reston.va.us Wed Apr 12 20:00:52 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:00:52 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> Message-ID: <14580.47572.794837.109290@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Functions and methods are first class objects, and they BAW> already have attributes, some of which are writable. SM> (Trying to read Fredrik's mind...) SM> By extension, we should allow writable attributes to work for SM> other objects. To pollute this discussion with an example SM> from another one: | i = 3.1416 | i.__precision__ = 4 SM> I haven't actually got anything against adding attributes to SM> functions (or numbers, if it's appropriate). Just wondering SM> out loud and playing a bit of a devil's advocate. Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> i = 3.1416 >>> dir(i) [] Floats don't currently have attributes. -Barry From moshez at math.huji.ac.il Wed Apr 12 20:01:13 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Wed, 12 Apr 2000 20:01:13 +0200 (IST) Subject: [Python-Dev] #pragmas and method attributes In-Reply-To: <38F4B8ED.8BC64F69@prescod.net> Message-ID: On Wed, 12 Apr 2000, Paul Prescod wrote: > About a month ago I wrote (but did not publish) a proposal that combined > #pragmas and method attributes. The reason I combined them is that in a > lot of cases method "attributes" are supposed to be available in the > parse tree, before the program begins to run. Here is my rough draft. FWIW, I really really like this. def func(...): decl {zorb: 'visible', spark: 'some grammar rule'} pass Right on! But maybe even def func(...): decl zorb='visible' decl spark='some grammar rule' pass BTW: Why force the value to be a string? Any immutable basic type should do fine, no?? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From jeremy at cnri.reston.va.us Wed Apr 12 20:08:29 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 14:08:29 -0400 (EDT) Subject: [Python-Dev] trashcan and PR#7 In-Reply-To: <38F46753.3759A7B6@tismer.com> References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> Message-ID: <14580.48029.512656.911718@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> Vladimir Marangozov wrote: >> While I'm at it, maybe the same recursion control logic could be >> used to remedy (most probably in PyObject_Compare) PR#7: >> "comparisons of recursive objects" reported by David Asher? CT> Hey, what a good idea. CT> You know what's happening? We are moving towards tail recursion. CT> If we do this everywhere, Python converges towards Stackless CT> Python. It doesn't seem like tail-recursion is the issue, rather we need to define some rules about when to end the recursion. If I understand what is being suggest, it is to create a worklist of subobjects to compare instead of making recursive calls to compare. This change would turn the core dump into an infinite loop; I guess that's an improvement, but not much of one. I have tried to come up with a solution in the same style as the repr solution. repr maintains a list of objects currently being repred. If it encounters a recursive request to repr the same object, it just prints "...". (There are better solutions, but this one is fairly simple.) I always get hung up on a cmp that works this way because at some point you discover a recursive cmp of two objects and you need to decide what to do. You can't just print "..." :-). So the real problem is defining some reasonable semantics for comparison of recursive objects. I checked what Scheme and Common Lisp, thinking that these languages must have dealt with the issue before. The answer, at least in Scheme, is infinite loop. R5RS notes: "'Equal?' may fail to terminate if its arguments are circular data structures. " http://www-swiss.ai.mit.edu/~jaffer/r5rs_8.html#SEC49 For eq? and eqv?, the answer is #f. The issue was also discussed in some detail by the ANSI commitee X3J13. A summary of the discussion is at here: http://www.xanalys.com/software_tools/reference/HyperSpec/Issues/iss143-writeup.html The result was to "Clarify that EQUAL and EQUALP do not descend any structures or data types other than the ones explicitly specified here:" [both descend for cons, bit-vectors, and strings; equalp has some special rules for hashtables and arrays] I believe this means that Common Lisp behaves the same way that Scheme does: comparison of circular data structures does not terminate. I don't think an infinite loop is any better than a core dump. At least with the core dump, you can inspect the core file and figure out what went wrong. In the infinite loop case, you'd wonder for a while why your program doesn't terminate, then kill it and inspect the core file anway :-). I think the comparison ought to return false or raise a ValueError. I'm not sure which is right. It seems odd to me that comparing two builtin lists could ever raise an exception, but it may be more Pythonic to raise an exception in the face of ambiguity. As the X3J13 committee noted: Object equality is not a concept for which there is a uniquely determined correct algorithm. The appropriateness of an equality predicate can be judged only in the context of the needs of some particular program. So, in the end, I propose ValueError. Jeremy From bwarsaw at cnri.reston.va.us Wed Apr 12 20:19:47 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:19:47 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <002401bfa4a6$778fc360$34aab5d4@hagrid> Message-ID: <14580.48707.373146.936232@anthem.cnri.reston.va.us> >>>>> "MZ" == Moshe Zadka writes: >> so? you can use methods as keys today, you know... MZ> Actually, I didn't know. What hapens if you use a method as a MZ> key, and then change it's doc string? Nothing. Python 1.5.2 (#7, Apr 16 1999, 18:24:22) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): ... 'a doc string' ... >>> d = {} >>> d[foo] = foo >>> foo.__doc__ = 'norwegian blue' >>> d[foo].__doc__ 'norwegian blue' The hash of a function object is hash(func_code) ^ id(func_globals): Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): pass ... >>> hash(foo) 557536160 >>> hash(foo.func_code) 557215928 >>> id(foo.func_globals) 860952 >>> hash(foo.func_code) ^ id(foo.func_globals) 557536160 So in the words of Mr. Praline: The plumage don't enter into it. :) But you can still get quite evil: Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> def foo(): pass ... >>> def bar(): print 1 ... >>> d = {} >>> d[foo] = foo >>> d[foo] >>> foo.func_code = bar.func_code >>> d[foo] Traceback (most recent call last): File "", line 1, in ? KeyError: Mwah, ha, ha! Gimme-lists-as-keys-and-who-really-/does/-need-tuples-after-all?-ly y'rs, -Barry From gvwilson at nevex.com Wed Apr 12 20:19:52 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 12 Apr 2000 14:19:52 -0400 (EDT) Subject: [Python-Dev] Processing XML with Perl (interesting article) (fwd) Message-ID: http://www.xml.com/pub/2000/04/05/feature/index.html is Michael Rodriguez' summary of XML processing modules for Perl. It opens with: "Perl is one of the most powerful (and even the most devout Python zealots will agree here) and widely used text processing languages." Greg From bwarsaw at cnri.reston.va.us Wed Apr 12 20:20:40 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:20:40 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> <14580.47012.646862.615623@goon.cnri.reston.va.us> Message-ID: <14580.48760.957536.805522@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> Fred and I were just talking, and he observed that a variant JH> of Python that included a syntactic mechanism to specify more JH> than one attribute (effectively, a multiple doc string syntax) JH> might be less objectionable than setting arbitrary attributes JH> at runtime. Neither of us could imagine just what that syntax JH> would be. So it's the writability of the attributes that bothers you? Maybe we need WORM-attrs? :) -Barry From skip at mojam.com Wed Apr 12 20:27:38 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 12 Apr 2000 13:27:38 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.47572.794837.109290@anthem.cnri.reston.va.us> References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> <14580.47572.794837.109290@anthem.cnri.reston.va.us> Message-ID: <14580.49178.341131.766028@beluga.mojam.com> BAW> Functions and methods are first class objects, and they already BAW> have attributes, some of which are writable. SM> I haven't actually got anything against adding attributes to SM> functions (or numbers, if it's appropriate). Just wondering out SM> loud and playing a bit of a devil's advocate. BAW> Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 BAW> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>>> i = 3.1416 >>>> dir(i) BAW> [] BAW> Floats don't currently have attributes. True enough, but why can't they? I see no reason that your writable function attributes proposal requires that functions already have attributes. Modifying my example, how about: >>> l = [1,2,3] >>> l.__type__ = "int" Like functions, lists do have (readonly) attributes. Why not allow them to have writable attributes as well? Awhile ago, Paul Prescod proposed something I think he called a super tuple, which allowed you to address tuple elements using attribute names: >>> t = ("x": 1, "y": 2, "z": 3) >>> print t.x 1 >>> print t[1] 2 (or something like that). I'm sure Paul or others will chime in if they think it's relevant. Your observation was that functions have a __doc__ attribute that is being abused in multiple, conflicting ways because it's the only function attribute people have to play with. I have absolutely no quibble with that. See: http://www.python.org/pipermail/doc-sig/1999-December/001671.html (Note that it apparently fell on completely deaf ears... ;-) I like your proposal. I was just wondering out loud if it should be more general. Skip From bwarsaw at cnri.reston.va.us Wed Apr 12 20:31:27 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:31:27 -0400 (EDT) Subject: [Python-Dev] #pragmas and method attributes References: <38F430FE.BAF40AB8@lemburg.com> <38F4B8ED.8BC64F69@prescod.net> Message-ID: <14580.49407.807617.750146@anthem.cnri.reston.va.us> >>>>> "PP" == Paul Prescod writes: PP> About a month ago I wrote (but did not publish) a proposal PP> that combined #pragmas and method attributes. The reason I PP> combined them is that in a lot of cases method "attributes" PP> are supposed to be available in the parse tree, before the PP> program begins to run. Here is my rough draft. Very cool. Combine them with Greg Wilson's approach and you've got my +1 on the idea. I still think it's fine that the func attr dictionary is writable. -Barry From mal at lemburg.com Wed Apr 12 20:31:16 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 20:31:16 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4C0F4.E0A8B01@lemburg.com> Ka-Ping Yee wrote: > > On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > > Or do we need to separate out two categories of pragmas -- > > > pre-parse and post-parse pragmas? > > > > Eeeks! We don't need too many special forms! That's ugly! > > Eek indeed. I'm tempted to suggest we drop the multiple-encoding > issue (i can hear the screams now). But you're right, i've never > heard of another language that can handle configurable encodings > right in the source code. Is it really necessary to tackle that here? Yes. > Gak, what do Japanese programmers do? Has anyone seen any of that > kind of source code? It's not intended for use by Asian programmers, it must be seen as a way to equally support all those different languages and scripts for which Python provides codecs. Note that Fred's argument is not far fetched: if you look closely at the way the compiler works it seems that adding a new keyword would indeed be the simplest solution. If done right, we could add some nifty lookup optimizations to the byte code compiler, e.g. a module might declare all globals as being constant or have all could allow the compiler to assume that all global lookups return constants allowing it to cache them or even rewrite the byte code at run-time... But the concepts are still not 100% right -- if we want to add scope to pragmas, we ought to follow the usual Python lookup scheme: locals, globals, built-ins. This would introduce the need to pass locals and globals to all APIs compiling Python source code. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From pf at artcom-gmbh.de Wed Apr 12 20:17:25 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 12 Apr 2000 20:17:25 +0200 (MEST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F4A08B.A855E69D@lemburg.com> from "M.-A. Lemburg" at "Apr 12, 2000 6:12:59 pm" Message-ID: Hi! [me:] > > This would defeat an important goal: backward compatibility: You > > can't add 'pragma division: old' or something like this to a source > > file, which should be able to run with both Python 1.5.2 and Py3k. > > This would make this mechanism useless for several important > > applications of pragmas. M.-A. Lemburg: > Hmm, I don't get it: these pragmas would set variabels which > make Python behave in a different way -- how do you plan to > achieve backward compatibility here ? > > I mean, u = u"abc" raises a SyntaxError in Python 1.5.2 too... Okay. What I mean is for example changing the behaviour of the division operator: if 1/2 becomes 0.5 instead of 0 in some future version of Python, it is a must to be able to put in a pragma with the meaning "use the old style division in this module" into a source file without breaking the usability of this source file on older versions of Python. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany, Fax:+49 4222950260 office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen) From Mike.Da.Silva at uk.fid-intl.com Wed Apr 12 20:37:56 2000 From: Mike.Da.Silva at uk.fid-intl.com (Da Silva, Mike) Date: Wed, 12 Apr 2000 19:37:56 +0100 Subject: [Python-Dev] #pragmas in Python source code Message-ID: Java uses ResourceBundles, which are identified by basename + 2 character locale id (eg "en", "fr" etc). The content of the resource bundle is essentially a dictionary of name value pairs. MS Visual C++ uses pragma code_page(windows_code_page_id) in resource files to indicate what code page was used to generate the subsequent text. In both cases, an application would rely on a fixed (7 bit ASCII) subset to give the well-known key to find the localized text for the current locale. Any "hardcoded" string literals would be mangled when attempting to display them using an alternate locale. So essentially, one could take the view that correct support for localization is a runtime issue affecting the user of an application, not the developer. Hence, myfile.py may contain 8 bit string literals encoded in my current windows encoding (1252) but my user may be using Japanese Windows in code page 932. All I can guarantee is that the first 128 characters (notwithstanding BACKSLASH) will be rendered correctly - other characters will be interpreted as half width Katakana or worse. Any literal strings one embeds in code should be purely for the benefit of the code, not for the end user, who should be seeing properly localized text, pulled back from a localized text resource file _NOT_ python code, and automatically pumped through the appropriate native <--> unicode translations as required by the code. So to sum up, 1 Hardcoded strings are evil in source code unless they use the invariant ASCII (and by extension UTF8) character set. 2 A proper localized resource loading mechanism is required to fetch genuine localized text from a static resource file (ie not myfile.py). 3 All transformations of 8 bit strings to and from unicode should explicitly specify the 8 bit encoding for the source/target of the conversion, as appropriate. 4 Assume that a Japanese / Chinese programmer will find it easier to code using the invariant ASCII subset than a Western European / American will be able to read hanzi in source code. Regards, Mike da Silva -----Original Message----- From: Ka-Ping Yee [mailto:ping at lfw.org] Sent: Wednesday, April 12, 2000 6:45 PM To: Fred L. Drake, Jr. Cc: Python Developers @ python.org Subject: Re: [Python-Dev] #pragmas in Python source code On Wed, 12 Apr 2000, Fred L. Drake, Jr. wrote: > > Or do we need to separate out two categories of pragmas -- > > pre-parse and post-parse pragmas? > > Eeeks! We don't need too many special forms! That's ugly! Eek indeed. I'm tempted to suggest we drop the multiple-encoding issue (i can hear the screams now). But you're right, i've never heard of another language that can handle configurable encodings right in the source code. Is it really necessary to tackle that here? Gak, what do Japanese programmers do? Has anyone seen any of that kind of source code? -- ?!ng _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://www.python.org/mailman/listinfo/python-dev From bwarsaw at cnri.reston.va.us Wed Apr 12 20:43:01 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:43:01 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <14580.45604.756928.858721@beluga.mojam.com> <14580.47572.794837.109290@anthem.cnri.reston.va.us> <14580.49178.341131.766028@beluga.mojam.com> Message-ID: <14580.50101.747669.794035@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Floats don't currently have attributes. SM> True enough, but why can't they? Skip, didn't you realize I was setting you up to ask that question? :) I don't necessarily think other objects shouldn't have such attributes, but I thought it might be easier to shove this one tiny little pill down peoples' throats first. Once they realize it tastes good, /they'll/ want more :) SM> Awhile ago, Paul Prescod proposed something I think he called SM> a super tuple, which allowed you to address tuple elements SM> using attribute names: >> t = ("x": 1, "y": 2, "z": 3) print t.x | 1 | >>> print t[1] | 2 SM> (or something like that). I'm sure Paul or others will chime SM> in if they think it's relevant. Might be. I thought that was a cool idea too at the time. SM> Your observation was that functions have a __doc__ attribute SM> that is being abused in multiple, conflicting ways because SM> it's the only function attribute people have to play with. I SM> have absolutely no quibble with that. See: SM> SM> http://www.python.org/pipermail/doc-sig/1999-December/001671.html SM> (Note that it apparently fell on completely deaf ears... ;-) I SM> like your proposal. I was just wondering out loud if it SM> should be more general. Perhaps so. -Barry From effbot at telia.com Wed Apr 12 20:43:55 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 20:43:55 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Mike wrote: > Any literal strings one embeds in code should be purely for the benefit of > the code, not for the end user, who should be seeing properly localized > text, pulled back from a localized text resource file _NOT_ python code, and > automatically pumped through the appropriate native <--> unicode > translations as required by the code. that's hardly a CP4E compatible solution, is it? Ping wrote: > > But you're right, i've never heard of another language that can handle > > configurable encodings right in the source code. XML? From glyph at twistedmatrix.com Wed Apr 12 21:46:24 2000 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 12 Apr 2000 14:46:24 -0500 (EST) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <002101bfa37b$5b2acde0$27a2143f@tim> Message-ID: Language pragmas are all fine and good, BUT ... Here in the Real World(TM) we have to deal with version in compatibilities out the wazoo. I am currently writing a java application that has to run on JDK 1.1, and 1.2, and microsoft's half-way JDK 1.1+1/2 thingy. Python comes installed on many major linux distributions, and the installed base is likely to be even larger than it is now by the time Python 1.6 is ready for the big time. I'd like to tell people who still have RedHat 6.2 installed in six months that they can just download a 40k script and not a 5M interpreter source tarball (which will be incompatible with their previous python installation, which they need for other stuff) when I deploy an end-user application. (Sorry, I can't think of another way to say that, I'm still recovering from java-isms...) :-) What I'm saying is that it would be great if you could write an app that would still function with existing versions of the interpreter, but would be missing certain features that were easier to implement with the new language symantics or required a new core library feature. Backward compatibility is as important to me as forward compatibility, and I'd prefer not to achieve it by writing exclusively to python 1.5.2 for the rest of my life. The way I would like to see this happen is NOT with language pragmas ('global' strikes me as particularly inappropriate, since that already means something else...) but with file-extensions. For example, if you have a python file which uses 1.6 features, I call it 'foo.1_6.py'. I also have a version that will work with 1.5, albeit slightly slower/less featureful: so I call it 'foo.py'. 'import foo' will work correctly. Or, if I only have 'foo.1_6.py' it will break, which I gather would be the desired behavior. As long as we're talking about versioning issues, could we perhaps introduce a slightly more robust introspective facility than assert(sys.version[:3])=='1.5' ? And finally, I appreciate that some physics students may find it confusing that 1/2 yeilds 0 instead of 0.5, but I think it would be easier to just teach them to do 1./2 rather than changing the symantics of integer constants completely ... I use python to do a lot of GUI work right now (and it's BEAUTIFUL for interfacing with Gtk/Tk/Qt, so I'm looking forward to doing more of it) and when I divide *1* by *2*, that's what I mean. I want integers, because I'm talking about pixels. It would be a real drag to go through all of my code and insert int(1/2) because there's no way to do integer math in python anymore... (Besides, what about 100000000000000000000L/200000000000000000000L, which I believe will shortly be lacking the Ls...?) Maybe language features that are like this could be handled by a pseudo-module? I.E. import syntax syntax.floating_point_division() or somesuch ... I'm not sure how you'd implement this so it would be automatic in certain contexts (merging it into your 'site' module maybe? that has obvious problems though), but considering that such features may be NOT the behavior desired by everyone, it seems strange to move the language in that direction unilaterally. ______ __ __ _____ _ _ | ____ | \_/ |_____] |_____| |_____| |_____ | | | | @ t w i s t e d m a t r i x . c o m http://www.twistedmatrix.com/~glyph/ From effbot at telia.com Wed Apr 12 20:52:09 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 20:52:09 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr><38F46A02.3AB10147@prescod.net><001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> Message-ID: <002f01bfa4b0$39df5440$34aab5d4@hagrid> Barry A. Warsaw wrote: > Finally, think of this proposal as an evolutionary step toward > enabling all kinds of future frameworks. /.../ With the addition > of func/meth attrs now, we can start to play with prototypes > of this system, define conventions and standards /.../ does this mean that this feature will be labelled as "experimental" (and hopefully even "interpreter specific"). if so, +0. From bwarsaw at cnri.reston.va.us Wed Apr 12 20:56:32 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 14:56:32 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <002101bfa37b$5b2acde0$27a2143f@tim> Message-ID: <14580.50912.543239.347566@anthem.cnri.reston.va.us> >>>>> "GL" == Glyph Lefkowitz writes: GL> As long as we're talking about versioning issues, could we GL> perhaps introduce a slightly more robust introspective GL> facility than GL> assert(sys.version[:3])=='1.5' sys.hexversion? Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> import sys >>> sys.hexversion 17170593 >>> hex(sys.hexversion) '0x10600a1' From bwarsaw at cnri.reston.va.us Wed Apr 12 20:57:47 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 14:57:47 -0400 (EDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <200004120947.LAA02067@python.inrialpes.fr> <38F46A02.3AB10147@prescod.net> <001701bfa47d$aaa68410$0500a8c0@secret.pythonware.com> <14580.38946.206846.261405@anthem.cnri.reston.va.us> <002f01bfa4b0$39df5440$34aab5d4@hagrid> Message-ID: <14580.50987.10065.518955@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> does this mean that this feature will be labelled as FL> "experimental" (and hopefully even "interpreter specific"). Do you mean "don't add it to JPython whenever I actually get around to making it compatible with CPython 1.6"? -Barry From tismer at tismer.com Wed Apr 12 21:03:33 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:03:33 +0200 Subject: [Python-Dev] trashcan and PR#7 References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <38F4C885.D75DABF2@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> Vladimir Marangozov wrote: > >> While I'm at it, maybe the same recursion control logic could be > >> used to remedy (most probably in PyObject_Compare) PR#7: > >> "comparisons of recursive objects" reported by David Asher? > > CT> Hey, what a good idea. > > CT> You know what's happening? We are moving towards tail recursion. > CT> If we do this everywhere, Python converges towards Stackless > CT> Python. > > It doesn't seem like tail-recursion is the issue, rather we need to > define some rules about when to end the recursion. If I understand > what is being suggest, it is to create a worklist of subobjects to > compare instead of making recursive calls to compare. This change > would turn the core dump into an infinite loop; I guess that's an > improvement, but not much of one. Well, I actually didn't read PR#7 before replying. Thought it was about comparing deeply nested structures. What about this? For one, we do an improved comparison, which is of course towards tail recursion, since we push part of the work after the "return". Second, we can guess the number of actually existing objects, and limit the number of comparisons by this. If we need more comparisons than we have objects, then we raise an exception. Might still take some time, but a bit less than infinite. ciao - chris (sub-cantor-set-minded) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at tismer.com Wed Apr 12 21:06:00 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:06:00 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: <14580.38946.206846.261405@anthem.cnri.reston.va.us> <1256563909-46814536@hypernet.com> <14580.47012.646862.615623@goon.cnri.reston.va.us> <14580.48760.957536.805522@anthem.cnri.reston.va.us> Message-ID: <38F4C918.A1344D68@tismer.com> bwarsaw at cnri.reston.va.us wrote: > > >>>>> "JH" == Jeremy Hylton writes: > > JH> Fred and I were just talking, and he observed that a variant > JH> of Python that included a syntactic mechanism to specify more > JH> than one attribute (effectively, a multiple doc string syntax) > JH> might be less objectionable than setting arbitrary attributes > JH> at runtime. Neither of us could imagine just what that syntax > JH> would be. > > So it's the writability of the attributes that bothers you? Maybe we > need WORM-attrs? :) Why don't you just use WORM programming style. Write it once (into the CVS) and get many complaints :-) chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mal at lemburg.com Wed Apr 12 21:02:25 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 12 Apr 2000 21:02:25 +0200 Subject: [Python-Dev] #pragmas and method attributes References: Message-ID: <38F4C841.7CE3FB32@lemburg.com> Moshe Zadka wrote: > > On Wed, 12 Apr 2000, Paul Prescod wrote: > > > About a month ago I wrote (but did not publish) a proposal that combined > > #pragmas and method attributes. The reason I combined them is that in a > > lot of cases method "attributes" are supposed to be available in the > > parse tree, before the program begins to run. Here is my rough draft. > > FWIW, I really really like this. > > def func(...): > decl {zorb: 'visible', spark: 'some grammar rule'} > pass > > Right on! > > But maybe even > > def func(...): > decl zorb='visible' > decl spark='some grammar rule' > pass Hmm, this is not so far away from simply letting function/method attribute use the compiled-in names of all locals as basis, e.g. def func(x): a = 3 print func.a func.a would look up 'a' in func.func_code.co_names and return the corresponding value found in func.func_code.co_consts. Note that subsequent other assignments to 'a' are not recognized by this technique, since co_consts and co_names are written sequentially. For the same reason, writing things like 'a = 2 + 3' will break this lookup technique. This would eliminate any need for added keywords and probably provide the best programming comfort and the attributes are immutable per se. We would still have to come up with a way to declare these attributes for builtin methods and modules... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From jeremy at cnri.reston.va.us Wed Apr 12 21:07:41 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 12 Apr 2000 15:07:41 -0400 (EDT) Subject: [Python-Dev] trashcan and PR#7 In-Reply-To: <14580.48029.512656.911718@goon.cnri.reston.va.us> References: <200004120354.FAA06834@python.inrialpes.fr> <38F46753.3759A7B6@tismer.com> <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <14580.51581.31775.233843@goon.cnri.reston.va.us> Just after I sent the previous message, I realized that the "trashcan" approach is needed in addition to some application-specific logic for what to do when recursive traversals of objects occur. This is true for repr and for a compare that fixes PR#7. Current recipe for repr coredump: original = l = [] for i in range(1000000): new = [] l.append(new) l = new l.append(original) repr(l) Jeremy From glyph at twistedmatrix.com Wed Apr 12 22:06:17 2000 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 12 Apr 2000 15:06:17 -0500 (EST) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Barry A. Warsaw wrote: > sys.hexversion? Thank you! I stand corrected (and embarrassed) but perhaps this could be a bit better documented? a search of Google comes up with only one hit for this on the entire web: http://www.python.org/1.5/NEWS-152b2.txt ... From gstein at lyra.org Wed Apr 12 21:20:55 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:20:55 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.45604.756928.858721@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: > BAW> Functions and methods are first class objects, and they already > BAW> have attributes, some of which are writable. > > (Trying to read Fredrik's mind...) > > By extension, we should allow writable attributes to work for other objects. > To pollute this discussion with an example from another one: > > i = 3.1416 > i.__precision__ = 4 > > I haven't actually got anything against adding attributes to functions (or > numbers, if it's appropriate). Just wondering out loud and playing a bit of > a devil's advocate. Numbers have no attributes right now. Functions have mutable attributes (__doc__). Barry is allowing them to be annotated (without crushing the values into __doc__ in some nasty way). Paul gave some great examples. IMO, the Zope "visibility by use of __doc__" is the worst kind of hack :-) "Let me be a good person and doc all my functions. Oh, crap! Somebody hacked my system!" And the COM thing was great. Here is what we do today: class MyCOMServer: _public_methods_ = ['Hello'] def private_method(self, args): ... def Hello(self, args) ... The _public_methods_ thing is hacky. I'd rather see a "Hello.public = 1" in there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gvwilson at nevex.com Wed Apr 12 21:16:40 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 12 Apr 2000 15:16:40 -0400 (EDT) Subject: [Python-Dev] re: #pragmas and method attributes Message-ID: > > On Wed, 12 Apr 2000, Paul Prescod wrote: > > About a month ago I wrote (but did not publish) a proposal that combined > > #pragmas and method attributes. The reason I combined them is that in a > > lot of cases method "attributes" are supposed to be available in the > > parse tree, before the program begins to run. Here is my rough draft. > Moshe Zadka wrote: > BTW: Why force the value to be a string? Any immutable basic type > should do fine, no?? If attributes can be objects other than strings, then programmers can implement hierarchical nesting directly using: def func(...): decl { 'zorb' : 'visible', 'spark' : { 'rule' : 'some grammar rule', 'doc' : 'handle quoted expressions' } 'info' : { 'author' : ('Greg Wilson', 'Allun Smythee'), 'date' : '2000-04-12 14:08:20 EDT' } } pass instead of: def func(...): decl { 'zorb' : 'visible', 'spark-rule' : 'some grammar rule', 'spark-doc' : 'handle quoted expressions' 'info-author' : 'Greg Wilson, Allun Smythee', 'info-date' : '2000-04-12 14:08:20 EDT' } pass In my experience, every system for providing information has eventually wanted/needed to be hierarchical --- code blocks, HTML, the Windows registry, you name it. This can be faked up using some convention like semicolon-separated lists, but processing (and escaping insignificant uses of separator characters) quickly becomes painful. (Note that if Python supported multi-dicts, or if something *ML-ish was being used for decl's, the "author" tag in "info" could be listed twice, instead of requiring programmers to fall back on char-separated lists.) Just another random, Greg From bwarsaw at cnri.reston.va.us Wed Apr 12 21:21:16 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 15:21:16 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: <14580.52396.923837.488505@anthem.cnri.reston.va.us> >>>>> "GL" == Glyph Lefkowitz writes: BAW> sys.hexversion? GL> Thank you! GL> I stand corrected (and embarrassed) but perhaps this could be GL> a bit better documented? a search of Google comes up with GL> only one hit for this on the entire web: GL> http://www.python.org/1.5/NEWS-152b2.txt ... Yup, it looks like it's missing from Fred's 1.6 doc tree too. Do python-devers think we also need to make the other patchlevel.h constants available through sys? If so, and because sys.hexversion is currently undocumented, I'd propose making sys.hexversion a tuple of (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) or leaving sys.hexversion as is and crafting a new sys variable which is the [1:] of the tuple above. Prolly need to expose PY_RELEASE_LEVEL_{ALPHA,BETA,GAMMA,FINAL} as constants too. -Barry From effbot at telia.com Wed Apr 12 21:21:50 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 21:21:50 +0200 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <002101bfa37b$5b2acde0$27a2143f@tim> <14580.50912.543239.347566@anthem.cnri.reston.va.us> Message-ID: <007001bfa4b4$6216e780$34aab5d4@hagrid> > sys.hexversion? > > Python 1.6a2 (#26, Apr 12 2000, 13:53:57) [GCC 2.8.1] on sunos5 > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam > >>> import sys > >>> sys.hexversion > 17170593 > >>> hex(sys.hexversion) > '0x10600a1' bitmasks!? (ouch. python is definitely not what it used to be. wonder if the right answer to this is "wouldn't a tuple be much more python-like?" or "I'm outta here...") From gstein at lyra.org Wed Apr 12 21:29:04 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:29:04 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.49178.341131.766028@beluga.mojam.com> Message-ID: On Wed, 12 Apr 2000, Skip Montanaro wrote: >... > BAW> Floats don't currently have attributes. > > True enough, but why can't they? I see no reason that your writable > function attributes proposal requires that functions already have > attributes. Modifying my example, how about: > > >>> l = [1,2,3] > >>> l.__type__ = "int" > > Like functions, lists do have (readonly) attributes. Why not allow them to > have writable attributes as well? Lists, floats, etc are *data*. There is plenty of opportunity for creating data structures that contain whatever you want, organized in any fashion. Functions are (typically) not data. Applying these attributes is a way to define program semantics, not record data. There are two entirely separate worlds here. Adding attributes makes great sense, as a way to enhance the definition of your program's semantics and operation. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Apr 12 21:33:18 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 12:33:18 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <14580.47012.646862.615623@goon.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Jeremy Hylton wrote: >... > It would look really, really bad ;-). I couldn't think of a good > example, so I guess this is a FUD argument. A rough sketch, though, > would be a program that assigned attribute X to all functions that > were to be used in a certain way. If the assignment is a runtime > operation, rather than a syntactic construct that defines a static > attribute, it would be possible to accidentally assign attribute X to > a function that was not intended to be used that way. This connection > between a group of functions and a particular behavior would depend > entirely on some runtime magic with settable attributes. This is a FUD argument also. I could just as easily mis-label a function when using __doc__ strings, when using mappings in a class object, or using some runtime structures to record the attribute. Your "label" can be recorded in any number of ways. It can be made incorrect in all of them. There is nothing intrinsic to function attributes that makes them more prone to error. Being able to place them into function attributes means that you have a better *model* for how you record these values. Why place them into a separate mapping if your intent is to enhance the semantics of a function? If the semantics apply to a function, then bind it right there. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bwarsaw at cnri.reston.va.us Wed Apr 12 21:29:11 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 15:29:11 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <002101bfa37b$5b2acde0$27a2143f@tim> <14580.50912.543239.347566@anthem.cnri.reston.va.us> <007001bfa4b4$6216e780$34aab5d4@hagrid> Message-ID: <14580.52871.763195.168373@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> (ouch. python is definitely not what it used to be. wonder FL> if the right answer to this is "wouldn't a tuple be much more FL> python-like?" or "I'm outta here...") Yeah, pulling the micro version number out of sys.hexversion is ugly and undocumented, hence my subsequent message. The basically idea is pretty cool though, and I've adopted it to Mailman. It allows me to do this: previous_version = last_hex_version() this_version = mm_cfg.HEX_VERSION if previous_version < this_version: # I'm upgrading -Barry From tismer at tismer.com Wed Apr 12 21:37:27 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 12 Apr 2000 21:37:27 +0200 Subject: [Python-Dev] Arbitrary attributes on funcs and methods References: Message-ID: <38F4D077.AEE37C@tismer.com> Greg Stein wrote: ... > Being able to place them into function attributes means that you have a > better *model* for how you record these values. Why place them into a > separate mapping if your intent is to enhance the semantics of a function? > If the semantics apply to a function, then bind it right there. BTW., is then there also a way for the function *itself* so look into its attributes? If it should be able to take special care about its attributes, it would be not nice if it had to know its own name for that? Some self-like surrogate? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From effbot at telia.com Wed Apr 12 21:34:44 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 21:34:44 +0200 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> > If so, and because sys.hexversion is currently undocumented, I'd > propose making sys.hexversion a tuple of > > (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, > PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) thanks. I feel better now ;-) but wouldn't something like (1, 6, 0, "a1") be easier to understand and use? From fdrake at acm.org Wed Apr 12 21:46:07 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 15:46:07 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> <14580.52396.923837.488505@anthem.cnri.reston.va.us> <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> Message-ID: <14580.53887.525513.603276@seahag.cnri.reston.va.us> Fredrik Lundh writes: > but wouldn't something like (1, 6, 0, "a1") be easier > to understand and use? Yes! (But you knew that....) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From ping at lfw.org Wed Apr 12 22:06:03 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 15:06:03 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: On Wed, 12 Apr 2000, Fredrik Lundh wrote: > Ping wrote: > > > But you're right, i've never heard of another language that can handle > > > configurable encodings right in the source code. > > XML? Don't get me started. XML is not a language. It's a serialization format for trees (isomorphic to s-expressions, but five times more verbose). It has no semantics. Anyone who tries to tell you otherwise is probably a marketing drone or has been brainwashed by the buzzword brigade. -- ?!ng From bwarsaw at cnri.reston.va.us Wed Apr 12 22:04:45 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Wed, 12 Apr 2000 16:04:45 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us> <14580.52396.923837.488505@anthem.cnri.reston.va.us> <008a01bfa4b6$2baca0c0$34aab5d4@hagrid> Message-ID: <14580.55005.924001.146052@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> but wouldn't something like (1, 6, 0, "a1") be easier FL> to understand and use? I wasn't planning on splitting PY_VERSION, just in exposing the other #define ints in patchlevel.h -Barry From fdrake at acm.org Wed Apr 12 22:08:35 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Wed, 12 Apr 2000 16:08:35 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: <14580.55235.6235.662297@seahag.cnri.reston.va.us> Ka-Ping Yee writes: > Don't get me started. XML is not a language. It's a serialization And XML was exactly why I asked about *programming* languages. XML just doesn't qualify in any way I can think of as a language. Unless it's also called "Marketing-speak." ;) XML, as you point out, is a syntactic aspect of tree encoding. Harrumph. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Wed Apr 12 22:10:21 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 22:10:21 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <00ad01bfa4bb$24129360$34aab5d4@hagrid> Ka-Ping Yee wrote: > > XML? > > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). call it whatever you want -- my point was that their way of handling configurable encodings in the source code is good enough for python. (briefly, it's all unicode on the inside, and either ASCII/UTF-8 or something compatible enough to allow the parser to find the "en- coding" attribute without too much trouble... except for the de- fault encoding, the same approach should work for python) From effbot at telia.com Wed Apr 12 22:20:26 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 22:20:26 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> <14580.55235.6235.662297@seahag.cnri.reston.va.us> Message-ID: <00bd01bfa4bc$8ad9ce00$34aab5d4@hagrid> Fred L. Drake, Jr. wrote: > > Don't get me started. XML is not a language. It's a serialization > > And XML was exactly why I asked about *programming* languages. XML > just doesn't qualify in any way I can think of as a language. oh, come on. in what way is "Python source code" more expressive than XML, if you don't have anything that inter- prets it? does the Python parser create "better" trees than an XML parser? > XML, as you point out, is a syntactic aspect of tree encoding. just like a Python source file is a syntactic aspect of a Python (parse) tree encoding, right? ;-) ... but back to the real issue -- the point is that XML provides a mechanism for going from an external representation to an in- ternal (unicode) token stream, and that mechanism is good enough for python source code. why invent yet another python-specific wheel? From effbot at telia.com Wed Apr 12 22:25:56 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 12 Apr 2000 22:25:56 +0200 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility References: <14580.50912.543239.347566@anthem.cnri.reston.va.us><14580.52396.923837.488505@anthem.cnri.reston.va.us><008a01bfa4b6$2baca0c0$34aab5d4@hagrid> <14580.55005.924001.146052@anthem.cnri.reston.va.us> Message-ID: <00c901bfa4bd$4ff82560$34aab5d4@hagrid> Barry wrote: > >>>>> "FL" == Fredrik Lundh writes: > > FL> but wouldn't something like (1, 6, 0, "a1") be easier > FL> to understand and use? > > I wasn't planning on splitting PY_VERSION, just in exposing the other > #define ints in patchlevel.h neither was I. I just want Python to return those values in a form suitable for a Python programmer, not a C preprocessor. in other words: char release[2+1]; sprintf(release, "%c%c", PY_RELEASE_LEVEL - 0x0A + 'a', PY_RELEASE_SERIAL + '0'); sys.longversion = BuildTuple("iiis", PY_MAJOR_VERSION, PY_MINOR_VERSION, PY_MICRO_VERSION, release) (this assumes that the release serial will never exceed 9, but I think that's a reasonable restriction...) From skip at mojam.com Wed Apr 12 22:33:22 2000 From: skip at mojam.com (Skip Montanaro) Date: Wed, 12 Apr 2000 15:33:22 -0500 (CDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: References: <14580.49178.341131.766028@beluga.mojam.com> Message-ID: <14580.56722.404953.614718@beluga.mojam.com> me> >>> l = [1,2,3] me> >>> l.__type__ = "int" Greg> Lists, floats, etc are *data*. There is plenty of opportunity for Greg> creating data structures that contain whatever you want, organized Greg> in any fashion. Yeah, but there's no reason you wouldn't want to reason about them. They are, after all, first-class objects. If you consider these other attributes as meta-data, allowing data attributes to hang off lists, tuples, ints or regex objects makes perfect sense to me. I believe someone else during this thread suggested that one use of function attributes might be to record the function's return type. My example above is not really any different. Simpleminded, yes. Part of the value of l, no. Skip From ping at lfw.org Wed Apr 12 22:54:49 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 12 Apr 2000 15:54:49 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <00bd01bfa4bc$8ad9ce00$34aab5d4@hagrid> Message-ID: Fred L. Drake, Jr. wrote: > And XML was exactly why I asked about *programming* languages. XML > just doesn't qualify in any way I can think of as a language. I'm harumphing right along with you, Fred. :) On Wed, 12 Apr 2000, Fredrik Lundh wrote: > oh, come on. in what way is "Python source code" more > expressive than XML, if you don't have anything that inter- > prets it? does the Python parser create "better" trees than > an XML parser? Python isn't just a parse tree. It has semantics. XML has no semantics. It's content-free content. :) > but back to the real issue -- the point is that XML provides a > mechanism for going from an external representation to an in- > ternal (unicode) token stream, and that mechanism is good > enough for python source code. You have a point. I'll go look at what they do. -- ?!ng From gvwilson at nevex.com Wed Apr 12 23:01:04 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 12 Apr 2000 17:01:04 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Message-ID: > Ka-Ping Yee wrote: > Python isn't just a parse tree. It has semantics. > XML has no semantics. It's content-free content. :) Python doesn't even have a parse tree (never mind semantics) unless you have a Python parser handy. XML gives my application a way to parse your information, even if I can't understand it, which is a big step over (for example) comments or strings embedded in Python/Perl/Java source files, colon (or is semi-colon?) separated lists in .ini and .rc files, etc. (I say this having wrestled with large Fortran programs in which a sizeable fraction of the semantics was hidden in comment-style pragmas. Having seen the demands this style of coding places on compilers, and compiler writers, I'm willing to walk barefoot through the tundra to get something more structured. Hanging one of Barry's doc dict's off a module ensures that key information is part of the parse tree, and that anyone who wants to extend the mechanism can do so in a structured way. I'd still rather have direct embedding of XML, but I think doc dicts are still a big step forward.) Greg p.s. This has come up as a major issue in the Software Carpentry competition. On the one hand, you want (the equivalent of) makefiles to be language neutral, so that (for example) you can write processors in Perl and Java as well as Python. On the other hand, you want to have functions, lists, and all the other goodies associated with a language. From DavidA at ActiveState.com Wed Apr 12 23:10:49 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 12 Apr 2000 14:10:49 -0700 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <14580.52871.763195.168373@anthem.cnri.reston.va.us> Message-ID: > The basically idea is pretty cool though, and I've adopted it to > Mailman. It allows me to do this: > > previous_version = last_hex_version() > this_version = mm_cfg.HEX_VERSION > > if previous_version < this_version: > # I'm upgrading Why can't you do that with tuples? --david From bwarsaw at cnri.reston.va.us Wed Apr 12 23:44:16 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 12 Apr 2000 17:44:16 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: <14580.52871.763195.168373@anthem.cnri.reston.va.us> Message-ID: <14580.60976.200757.562690@anthem.cnri.reston.va.us> >>>>> "DA" == David Ascher writes: >> The basically idea is pretty cool though, and I've adopted it >> to Mailman. It allows me to do this: previous_version = >> last_hex_version() this_version = mm_cfg.HEX_VERSION if >> previous_version < this_version: # I'm upgrading DA> Why can't you do that with tuples? How do you know they aren't tuples? :) (no, Moshe, you do not need to answer :) -Barry From DavidA at ActiveState.com Thu Apr 13 00:51:36 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 12 Apr 2000 15:51:36 -0700 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> Message-ID: Gordon McMillan: > Jeremy Hylton wrote: > > It prevents confusion and errors > > that might result from unprincipled use of function attributes. > > While I'm sure I will be properly shocked and horrified when > you come up with an example, in my naivety, I can't imagine > what it will look like ;-). I'm w/ Gordon & Barry on this one. I've wanted method and function attributes in the past and had to rely on building completely new classes w/ __call__ methods just to 'fake it'. There's a performance cost to having to do that, but most importantly there's a big increase in code complexity, readability, maintanability, yaddability, etc. I'm surprised that Jeremy sees it as such a problem area -- if I wanted to play around with static typing, having a version of Python which let me store method metadata cleanly would make me jump with joy. FWIW, I'm perfectly willing to live in a world where 'unprincipled use of method and function attributes' means that my code can't get optimized, just like I don't expect my code which modifies __class__ to get optimized (as long as someone defines what those principles are!). --david From paul at prescod.net Wed Apr 12 21:33:14 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 12 Apr 2000 14:33:14 -0500 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F4CF7A.8F99562F@prescod.net> Ka-Ping Yee wrote: > >... > > Eek indeed. I'm tempted to suggest we drop the multiple-encoding > issue (i can hear the screams now). The XML rule is one encoding per file. One thing that I think that they did innovate in (I had nothing to do with that part) is that entities encoded in something other than UTF-8 or UTF-16 must start with the declaration: "". This has two benefits: By looking at the first four bytes of the file we can differentiate between several different encoding "families" (Shift-JIS-like, UTF-8-like, UTF-16-like, ...) and then we can tell the *precise* encoding by looking at the encoding attribute. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From mhammond at skippinet.com.au Thu Apr 13 02:15:08 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 10:15:08 +1000 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: > Do python-devers think we also need to make the other patchlevel.h > constants available through sys? Can't see why, but also can't see why not! > If so, and because sys.hexversion is currently undocumented, Since when has that ever stopped anyone :-) > I'd > propose making sys.hexversion a tuple of > > (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, > PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) > > or leaving sys.hexversion as is and crafting a new sys > variable which > is the [1:] of the tuple above. My code already uses sys.hexversion to differentiate between 1.5 and 1.6, so if we do anything I would vote for a new name. Personally however, I think the hexversion gives all the information you need - ie, you either want a printable version - sys.version - or a machine comparable version - sys.hexversion. Can't really think of a reason you would want the other attributes... Mark. From mhammond at skippinet.com.au Thu Apr 13 02:20:12 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 10:20:12 +1000 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility In-Reply-To: <007001bfa4b4$6216e780$34aab5d4@hagrid> Message-ID: > > >>> hex(sys.hexversion) > > '0x10600a1' > > bitmasks!? Nah - a comparable number :-) if sys.hexversion >= 0x01060100: # Require Python 1.6 or later! Seems perfectly reasonable and understandable to me. And much cleaner than a tuple: if tuple_version[0] > 1 or tuple_version[0] == 1 and tuple_version[6] >= 1: etc Unless Im missing the point - but I can't see any case other than version comparisons in which hexversion is useful - so it seems perfect to me. > (ouch. python is definitely not what it used to be. wonder > if the right answer to this is "wouldn't a tuple be much more > python-like?" or "I'm outta here...") Be sure to let us know. Mark. From akuchlin at mems-exchange.org Thu Apr 13 02:46:21 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 12 Apr 2000 20:46:21 -0400 (EDT) Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backwardcompatibility In-Reply-To: References: <14580.52396.923837.488505@anthem.cnri.reston.va.us> Message-ID: <14581.6365.234022.976395@newcnri.cnri.reston.va.us> Mark Hammond quoted Barry Warsaw: >> I'd >> propose making sys.hexversion a tuple of >> (PY_VERSION_HEX, PY_MAJOR_VERSION, PY_MINOR_VERSION, >> PY_MICRO_VERSION, PY_RELEASE_LEVEL, PY_RELEASE_SERIAL) If it's a tuple, the name "hexversion" makes absolutely no sense. Call it version_tuple or something like that. --amk From gstein at lyra.org Thu Apr 13 03:10:54 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 18:10:54 -0700 (PDT) Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <38F4D077.AEE37C@tismer.com> Message-ID: On Wed, 12 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: > ... > > Being able to place them into function attributes means that you have a > > better *model* for how you record these values. Why place them into a > > separate mapping if your intent is to enhance the semantics of a function? > > If the semantics apply to a function, then bind it right there. > > BTW., is then there also a way for the function *itself* > so look into its attributes? If it should be able to take > special care about its attributes, it would be not nice > if it had to know its own name for that? > Some self-like surrogate? Separate problem. Functions can't do that today with their own __doc__ attribute. Feel free to solve this issue, but it is distinct from the attributes-on-functions issue being discussed. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond at skippinet.com.au Thu Apr 13 03:07:45 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 13 Apr 2000 11:07:45 +1000 Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <200004120334.FAA06784@python.inrialpes.fr> Message-ID: The trashcan bug turns out to be trivial to describe, but not so trivial to fix. Put simply, the trashcan mechanism conflicts horribly with PY_TRACE_REFS :-( The problem stems from the fact that the trashcan resurrects objects. An object added to the trashcan has its ref count as zero, but is then added to the trash list, transitioning its ref-count back to 1. Deleting the trashcan then does a second deallocation of the object, again taking the ref count back to zero, and this time actually doing the destruction. By pure fluke, this works without Py_DEBUG defined! With Py_DEBUG defined, this first causes problems due to ob_type being NULL. _Py_Dealloc() sets the ob_type element to NULL before it calls the object de-allocater. Thus, the trash object first hits a zero refcount, and its ob_type is zapped. It is then resurrected, but the ob_type value remains NULL. When the second deallocation for the object happens, this NULL type forces the crash. Changing the Py_DEBUG version of _Py_Dealloc() to not zap the type doesnt solve the problem. The whole _Py_ForgetReference() linked-list management also dies. Second time we attempt to deallocate the object the code that removes the object from the "alive objects" linked list fails - the object was already removed first time around. I see these possible solutions: * The trash mechanism is changed to keep a list of (address, deallocator) pairs. This is a "cleaner" solution, as the list is not considered holding PyObjects as such, just blocks of memory to be freed with a custom allocator. Thus, we never end up in a position where a Python objects are resurrected - we just defer the actual memory deallocation, rather than attempting a delayed object destruction. This may not be as trivial to implement as to describe :-) * Debug builds disable the trash mechanism. Not desired as the basic behaviour of the interpreter will change, making bug tracking with debug builds difficult! If we went this way, I would (try to :-) insist that the Windows debug builds dropped Py_DEBUG, as I really want to avoid the scenario that switching to a debug build changes the behaviour to this extent. * Perform further hacks, so that Py_ForgetReference() gracefully handles NULL linked-list elements etc. Any thoughts? Mark. From gstein at lyra.org Thu Apr 13 03:25:41 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 12 Apr 2000 18:25:41 -0700 (PDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: Message-ID: On Thu, 13 Apr 2000, Mark Hammond wrote: >... > I see these possible solutions: > > * The trash mechanism is changed to keep a list of (address, > deallocator) pairs. This is a "cleaner" solution, as the list is > not considered holding PyObjects as such, just blocks of memory to > be freed with a custom allocator. Thus, we never end up in a > position where a Python objects are resurrected - we just defer the > actual memory deallocation, rather than attempting a delayed object > destruction. This may not be as trivial to implement as to describe > :-) > > * Debug builds disable the trash mechanism. Not desired as the > basic behaviour of the interpreter will change, making bug tracking > with debug builds difficult! If we went this way, I would (try to > :-) insist that the Windows debug builds dropped Py_DEBUG, as I > really want to avoid the scenario that switching to a debug build > changes the behaviour to this extent. > > * Perform further hacks, so that Py_ForgetReference() gracefully > handles NULL linked-list elements etc. > > Any thoughts? Option 4: lose the trashcan mechanism. I don't think the free-threading issue was ever resolved. Cheers, -g -- Greg Stein, http://www.lyra.org/ From esr at thyrsus.com Thu Apr 13 04:56:38 2000 From: esr at thyrsus.com (esr at thyrsus.com) Date: Wed, 12 Apr 2000 22:56:38 -0400 Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> Message-ID: <20000412225638.E9002@thyrsus.com> Ka-Ping Yee : > > XML? > > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). It has no semantics. Anyone who tries to tell you otherwise > is probably a marketing drone or has been brainwashed by the buzzword > brigade. Heh. What he said. Squared. Describing XML as a "language" around an old-time LISPer like me (or a new-time one like Ping) is a damn good way to get your eyebrows singed. -- Eric S. Raymond "...quemadmodum gladius neminem occidit, occidentis telum est." [...a sword never kills anybody; it's a tool in the killer's hand.] -- (Lucius Annaeus) Seneca "the Younger" (ca. 4 BC-65 AD), From tim_one at email.msn.com Thu Apr 13 05:54:15 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 12 Apr 2000 23:54:15 -0400 Subject: [Python-Dev] Arbitrary attributes on funcs and methods In-Reply-To: <1256563909-46814536@hypernet.com> Message-ID: <000001bfa4fb$f07e0340$3e2d153f@tim> Lisp systems for 40+ years traditionally supported user-muckable "property lists" on all symbols, which were basically arbitrary dicts w/ a clumsy syntax. No disaster ensued; to the contrary, it was often handy. So +0 from me on the add-attrs-to-funcs idea. The same idea applies to all objects, of course, but attrs-on-funcs has some bang for the buck (adding a dict to e.g. each int object would be a real new burden with little payback). -1 on any notion of restricting attr values to be immutable. [Gordon] > Having to be explicit about the method <-> regex / rule would > severely damage SPARK's elegance. That's why I'm only +0 instead of +1: SPARK won't switch to use the new method anyway, because the beauty of abusing docstrings is that it's syntactically *obvious*. There already exist any number of other ways to associate arbitrary info with arbitrary objects, and it's no mystery why SPARK and Zope avoided all of them in favor of docstring abuse. > It would make Tim's doctest useless. This one not so: doctest is *not* meant to warp docstrings toward testing purposes; it's intended that docstrings remain wholly for human-friendly documentation. What doctest does is give you a way to guarantee that the elucidating examples good docstrings *should have anyway* work exactly as advertised (btw, doctest examples found dozens of places in my modules that needed to be fixed to recover from 1.6 alpha no longer sticking a trailing "L" on str(long) -- if you're not using doctest every day, you're an idiot ). If I could add an attr to funcs, though, *then* I'd think about changing doctest to also run examples in any e.g. func.doctest attrs it could find, and that *new* mechanism would be warped toward testing purposes. Indeed, I think that would be an excellent use for the new facility. namespaces-are-one-honking-great-etc-ly y'rs - tim From tim_one at email.msn.com Thu Apr 13 07:00:29 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 01:00:29 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14580.48029.512656.911718@goon.cnri.reston.va.us> Message-ID: <000701bfa505$31008380$4d2d153f@tim> [Jeremy Hylton]> > It doesn't seem like tail-recursion is the issue, rather we need to > define some rules about when to end the recursion. If I understand > what is being suggest, it is to create a worklist of subobjects to > compare instead of making recursive calls to compare. This change > would turn the core dump into an infinite loop; I guess that's an > improvement, but not much of one. > > ... > > So the real problem is defining some reasonable semantics for > comparison of recursive objects. I think this is exactly a graph isomorphism problem, since Python always compares "by value" (so isomorphism is the natural generalization). This isn't hard (!= tedious, alas) to define or to implement naively, but a straightforward implementation would be very expensive at runtime compared to the status quo. That's why "real languages" would rather suffer an infinite loop. It's expensive because there's no cheap way to know whether you have a loop in an object. An anal compromise would be to run comparisons full speed without trying to detect loops, but if the recursion got "too deep" break out and start over with an expensive alternative that does check for loops. The later requires machinery similar to copy.deepcopy's. > ... > I think the comparison ought to return false or raise a ValueError. After a = [] b = [] a.append(a) b.append(b) it certainly "ought to be" the case that a == b in Python. "false" makes no sense. ValueError makes no sense either unless we incur the expense of proving first that at least one object does contain a loop (as opposed to that it's just possibly nested thousands of levels deep) -- but then we may as well implement an isomorphism discriminator. > I'm not sure which is right. It seems odd to me that comparing two > builtin lists could ever raise an exception, but it may be more > Pythonic to raise an exception in the face of ambiguity. As the > X3J13 committee noted: Lisps have more notions of "equality" than Python 1.6 has flavors of strings . Python has only one notion of equality (conveniently ignoring that it actually has two ). The thing the Lisp people argue about is which of the three or four notions of equality to apply at varying levels when trying to compute one of their *other* notions of equality -- there *can't* be a universally "good" answer to that mess. Python's life is easier here. in-concept-if-not-in-implementation-ly y'rs - tim From effbot at telia.com Thu Apr 13 08:24:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 08:24:17 +0200 Subject: [Python-Dev] RE: [Idle-dev] Forward progress with full backward compatibility References: Message-ID: <003101bfa511$06c89920$34aab5d4@hagrid> Mark Hammond wrote: > Nah - a comparable number :-) tuples can also be compared. > if sys.hexversion >= 0x01060100: # Require Python 1.6 or later! if sys.versiontuple >= (1, 6, 1): ... From moshez at math.huji.ac.il Thu Apr 13 09:10:30 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 13 Apr 2000 09:10:30 +0200 (IST) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Message-ID: [Ping] > But you're right, i've never heard of another language that can handle > configurable encodings right in the source code. [The eff-bot] > XML? [Ping] > Don't get me started. XML is not a language. It's a serialization > format for trees (isomorphic to s-expressions, but five times more > verbose). It has no semantics. Anyone who tries to tell you otherwise > is probably a marketing drone or has been brainwashed by the buzzword > brigade. Of coursem but "everything is a tree". If you put Python in XML by having the parse-tree serialized, then you can handle any encoding in the source file, by snarfing it from XML. not-in-favour-of-Python-in-XML-but-this-is-sure-to-encourage-Greg-Wilson-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer at tismer.com Thu Apr 13 12:50:05 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 13 Apr 2000 12:50:05 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F5A65D.5C2666B5@tismer.com> Greg Stein wrote: > > On Thu, 13 Apr 2000, Mark Hammond wrote: > >... > > I see these possible solutions: > > > > * The trash mechanism is changed to keep a list of (address, > > deallocator) pairs. This is a "cleaner" solution, as the list is > > not considered holding PyObjects as such, just blocks of memory to > > be freed with a custom allocator. Thus, we never end up in a > > position where a Python objects are resurrected - we just defer the > > actual memory deallocation, rather than attempting a delayed object > > destruction. This may not be as trivial to implement as to describe > > :-) This one sounds quite hard to implement. > > * Debug builds disable the trash mechanism. Not desired as the > > basic behaviour of the interpreter will change, making bug tracking > > with debug builds difficult! If we went this way, I would (try to > > :-) insist that the Windows debug builds dropped Py_DEBUG, as I > > really want to avoid the scenario that switching to a debug build > > changes the behaviour to this extent. I vote for this one at the moment. > > * Perform further hacks, so that Py_ForgetReference() gracefully > > handles NULL linked-list elements etc. > > > > Any thoughts? > > Option 4: lose the trashcan mechanism. I don't think the free-threading > issue was ever resolved. Option 5: Forget about free threading, change trashcan in a way that it doesn't change the order of destruction, doesn't need memory at all, and therefore does not change anything if it is disabled in debug mode. cheers - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From ping at lfw.org Thu Apr 13 13:22:56 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 04:22:56 -0700 (PDT) Subject: [Python-Dev] Round Bug in Python 1.6? In-Reply-To: <14580.39114.631398.101252@amarok.cnri.reston.va.us> Message-ID: On Wed, 12 Apr 2000, Andrew M. Kuchling wrote: > Ka-Ping Yee writes: > >Here is what i have in mind: provide two hooks > > __builtins__.display(object) > >and > > __builtins__.displaytb(traceback, exception) > > Shouldn't these be in sys, along with sys.ps1 and sys.ps2? We don't > want to add new display() and displaytb() built-ins, do we? Yes, you're right, they belong in sys. For a while i was under the delusion that you could customize more than one sub-interpreter by giving each one a different modified __builtins__, but that's an rexec thing and completely the wrong approach. Looks like the right approach to customizing sub-interpreters is to generalize the interface of code.InteractiveInterpreter and add more options to code.InteractiveConsole. sys.display and sys.displaytb would then be specifically for tweaking the main interactive interpreter only (just like sys.ps1 and sys.ps2). Still quite worth it, i believe, so i'll proceed. -- ?!ng "You should either succeed gloriously or fail miserably. Just getting by is the worst thing you can do." -- Larry Smith From effbot at telia.com Thu Apr 13 13:06:57 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 13:06:57 +0200 Subject: [Python-Dev] if key in dict? Message-ID: <014901bfa538$637c37e0$34aab5d4@hagrid> now that we have the sq_contains slot, would it make sense to add support for "key in dict" ? after all, if key in dict: ... is a bit more elegant than: if dict.has_key(key): ... and much faster than: if key in dict.keys(): ... (the drawback is that once we add this, some people might ex- pect dictionaries to behave like sequences in others ways too...) (and yes, this might break code that looks for tp_as_sequence before looking for tp_as_mapping. haven't found any code like that, but I might have missed something). whaddyathink? From gstein at lyra.org Thu Apr 13 13:14:56 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 04:14:56 -0700 (PDT) Subject: [Python-Dev] Crash in new "trashcan" mechanism. In-Reply-To: <38F5A65D.5C2666B5@tismer.com> Message-ID: On Thu, 13 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Option 4: lose the trashcan mechanism. I don't think the free-threading > > issue was ever resolved. > > Option 5: Forget about free threading, change trashcan in a way > that it doesn't change the order of destruction, doesn't need > memory at all, and therefore does not change anything if it is > disabled in debug mode. hehe... :-) Definitely possible. Seems like you could just statically allocate an array of PyObject* and drop the pointers in there (without an INCREF or anything). Place them there, in order. Dunno about the debug stuff, and how that would affect it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Thu Apr 13 13:19:32 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 04:19:32 -0700 (PDT) Subject: [Python-Dev] if key in dict? In-Reply-To: <014901bfa538$637c37e0$34aab5d4@hagrid> Message-ID: On Thu, 13 Apr 2000, Fredrik Lundh wrote: > now that we have the sq_contains slot, would it make > sense to add support for "key in dict" ? > > after all, > > if key in dict: > ... The counter has always been, "but couldn't that be read as 'if value in dict' ??" Or maybe 'if (key, value) in dict' ?? People have different impressions of what "in" should mean for a dict. And some people change their impression from one function to the next :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From mal at lemburg.com Thu Apr 13 11:22:27 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 11:22:27 +0200 Subject: [Python-Dev] #pragmas in Python source code References: Message-ID: <38F591D3.32CD3B2A@lemburg.com> I think we should put the discussion back on track again... We were originally talking about proposals to integrate #pragmas into Python source. These pragmas are (for now) intended to provide information to the Python byte code compiler, so that it can make certain assumptions on a per file basis. So far, there have been numerous proposals for all kinds of declarations and decorations of files, functions, methods, etc. As usual in Python Space, things got generalized to a point where people forgot about the original intent ;-) The current need for #pragmas is really very simple: to tell the compiler which encoding to assume for the characters in u"...strings..." (*not* "...8-bit strings..."). The idea behind this is that programmers should be able to use other encodings here than the default "unicode-escape" one. Perhaps someone has a better idea on how to signify this to the compiler ? Could be that we don't need this pragma discussion at all if there is a different, more elegant solution to this... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From ping at lfw.org Thu Apr 13 13:50:02 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 04:50:02 -0700 (PDT) Subject: [Python-Dev] if key in dict? In-Reply-To: Message-ID: On Thu, 13 Apr 2000, Greg Stein wrote: > On Thu, 13 Apr 2000, Fredrik Lundh wrote: > > now that we have the sq_contains slot, would it make > > sense to add support for "key in dict" ? > > > > after all, > > > > if key in dict: > > ... > > The counter has always been, "but couldn't that be read as 'if value in > dict' ??" I've been quite happy with "if key in dict". I forget if i already made this analogy when it came up in regard to the issue of supporting a "set" type, but if you think of it like a real dictionary -- when someone asks you if a particular word is "in the dictionary", you look it up in the keys of the dictionary, not in the definitions. And it does read much better than has_key, and makes it easier to use dicts like sets. So i think it would be nice, though i've seen this meet opposition before. -- ?!ng "You should either succeed gloriously or fail miserably. Just getting by is the worst thing you can do." -- Larry Smith From effbot at telia.com Thu Apr 13 13:50:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 13:50:17 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> Message-ID: <017b01bfa53e$748cc080$34aab5d4@hagrid> M.-A. Lemburg wrote: > The current need for #pragmas is really very simple: to tell > the compiler which encoding to assume for the characters > in u"...strings..." (*not* "...8-bit strings..."). why not? why keep on pretending that strings and strings are two different things? it's an artificial distinction, and it only causes problems all over the place. > Could be that we don't need this pragma discussion at all > if there is a different, more elegant solution to this... here's one way: 1. standardize on *unicode* as the internal character set. use an encoding marker to specify what *external* encoding you're using for the *entire* source file. output from the tokenizer is a stream of *unicode* strings. 2. if the user tries to store a unicode character larger than 255 in an 8-bit string, raise an OverflowError. 3. the default encoding is "none" (instead of XML's "utf-8"). in this case, treat the script as an ascii superset, and store each string literal as is (character-wise, not byte-wise). additional notes: -- item (3) is for backwards compatibility only. might be okay to change this in Py3K, but not before that. -- leave the implementation of (1) to 1.7. for now, assume that scripts have the default encoding, which means that (2) cannot happen. -- we still need an encoding marker for ascii supersets (how about ;-). however, it's up to the tokenizer to detect that one, not the parser. the parser only sees unicode strings. From tismer at tismer.com Thu Apr 13 13:56:18 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 13 Apr 2000 13:56:18 +0200 Subject: [Python-Dev] Crash in new "trashcan" mechanism. References: Message-ID: <38F5B5E2.40B20B53@tismer.com> Greg Stein wrote: > > On Thu, 13 Apr 2000, Christian Tismer wrote: > > Greg Stein wrote: > >... > > > Option 4: lose the trashcan mechanism. I don't think the free-threading > > > issue was ever resolved. > > > > Option 5: Forget about free threading, change trashcan in a way > > that it doesn't change the order of destruction, doesn't need > > memory at all, and therefore does not change anything if it is > > disabled in debug mode. > > hehe... :-) > > Definitely possible. Seems like you could just statically allocate an > array of PyObject* and drop the pointers in there (without an INCREF or > anything). Place them there, in order. Dunno about the debug stuff, and > how that would affect it. I could even better use the given objects-to-be-destroyed as an explicit stack. Similar to what the debug delloc does, I may abuse the type pointer as a stack pointer. Since the refcount is zero, it can be abused to store a type code (we have only 5 types to distinguish here), and there is enough room for some state like a loop counter as well. Given that, I can build a destructor without recursion, but with an explicit stack and iteration. It would not interfere with anything, since it actually does the same thing, just in a different way, but in the same order, without mallocs. Should I try it? (say no and I'll do it anyway:) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From skip at mojam.com Thu Apr 13 15:34:53 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 08:34:53 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F591D3.32CD3B2A@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> Message-ID: <14581.52477.70286.774494@beluga.mojam.com> Marc> We were originally talking about proposals to integrate #pragmas Marc> ... Minor nit... How about we lose the "#" during these discussions so we aren't all subliminally disposed to embed pragmas in comments or to add the C preprocessor to Python? ;-) -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From skip at mojam.com Thu Apr 13 15:39:47 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 08:39:47 -0500 (CDT) Subject: [Python-Dev] if key in dict? In-Reply-To: References: Message-ID: <14581.52771.512393.600949@beluga.mojam.com> Ping> I've been quite happy with "if key in dict". I forget if i Ping> already made this analogy when it came up in regard to the issue Ping> of supporting a "set" type, but if you think of it like a real Ping> dictionary -- when someone asks you if a particular word is "in Ping> the dictionary", you look it up in the keys of the dictionary, not Ping> in the definitions. Also, for many situations, "if value in dict" will be extraordinarily inefficient. In "in" semantics are added to dicts, a corollary move will be to extend this functionality to other non-dict mappings (e.g., file-based mapping objects like gdbm). Implementing "in" for them would be excruciatingly slow if the LHS was "value". To not break the rule of least astonishment when people push large dicts to disk, the only feasible implementation is "if key in dict". -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From fdrake at acm.org Thu Apr 13 15:46:44 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 09:46:44 -0400 (EDT) Subject: [Python-Dev] if key in dict? In-Reply-To: <14581.52771.512393.600949@beluga.mojam.com> References: <14581.52771.512393.600949@beluga.mojam.com> Message-ID: <14581.53188.587479.280569@seahag.cnri.reston.va.us> Skip Montanaro writes: > Also, for many situations, "if value in dict" will be extraordinarily > inefficient. In "in" semantics are added to dicts, a corollary move will be > to extend this functionality to other non-dict mappings (e.g., file-based > mapping objects like gdbm). Implementing "in" for them would be > excruciatingly slow if the LHS was "value". To not break the rule of least > astonishment when people push large dicts to disk, the only feasible > implementation is "if key in dict". Skip, Performance issues aside, I can see very valid reasons for the x in "x in dict" to be either the key or (key, value) pair. For this reason, I've come to consider "x in dict" a mis-feature, though I once pushed for it as well. It may be easy to explain that x is just the key, but it's not clearly the only reasonably desirable semantic. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Thu Apr 13 16:26:01 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 10:26:01 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <017b01bfa53e$748cc080$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> Message-ID: <14581.55545.30446.471809@seahag.cnri.reston.va.us> Fredrik Lundh writes: > -- item (3) is for backwards compatibility only. might be okay to > change this in Py3K, but not before that. > > -- leave the implementation of (1) to 1.7. for now, assume that > scripts have the default encoding, which means that (2) cannot > happen. We shouldn't need to change it then; Unicode editing capabilities will be pervasive by then, right? Oh, heck, it might even be legacy support by then! ;) Seriously, I'd hesitate to change any interpretation of default encoding until Unicode support is pervasive and fully automatic in tools like Notepad, vi/vim, XEmacs, and BBedit/Alpha (or whatever people use on MacOS these days). If I can't use teco on it, we're being too pro-active! ;) > -- we still need an encoding marker for ascii supersets (how about > ;-). however, it's up to > the tokenizer to detect that one, not the parser. the parser only > sees unicode strings. Agreed here. But shouldn't that be: This is war, I tell you, war! ;) Now, just need to hack the exec(2) call on all the Unices so that is properly recognized and used to run the scripts properly, obviating the need for those nasty shbang lines! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov at inrialpes.fr Thu Apr 13 17:22:49 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 13 Apr 2000 17:22:49 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) Message-ID: <200004131522.RAA05137@python.inrialpes.fr> Obviously, the user-attr proposal is not so "simple" as it looks like. I wish we all realize what's really going on here. In all cited use cases for this proposal, functions are no more perceived as functions per se, but as data structures (objects) which are the target of the computation. IOW, functions are just considered as *instances* of a class (inheriting from the builtin "PyFunction" class) with user-attributes, having a state, and eventually a set of operations bound to them. I guess that everybody realized that with this proposal, one could bind not only doc strings, but also functions to the function. def func(): pass def serialize(): ... func.pack = serialize func.pack() What is this? This is manual instance customization. Since nobody said that this would be done 'exceptionally', but rather on a regular basis for all functions (and generally, for all objects) in a program, the way to customize instances after the fact, makes those instances singletons of user-defined classes. You may say "so what?". Well, this is fine if it were part of the object model from the start. And there's no reason why only functions and methods can have this functionality. Stick the __dict__ slot in the object header and let me bind user-attributes to all objects. I have a favorite number, 7, so I want to label it Vlad's number. seven = 7; seven.fanclub = ['Vlad']. I want to add a boolean func to all numbers, n.is_prime(). I want to have a s.zip() method for a set of strings in my particular application, not only the builtin ones. Why is it not allowed to have this today? Think about it! How would you solve your app needs today? Through classes and instances. That's the prescribed `legal' way to do customized objects; no shortcuts. Saying that mucking with functions' __doc__ strings is the only way to implement some functionality is simply not true. In short, there's no way I can accept this proposal in its current state and make the distingo between functions/methods and other kinds of objects (including 3rd party ones). If we're to go down this road, treat all objects as equal citizens in this regard. None or all. The object model must remain consistent. This proposal opens a breach in it. And not the lightest! And this is only part of the reasons why I'm still firmly -1 until P3K. Glad to see that Barry exposed some of the truth about it, after preserving our throats, i.e. he understood that we understood that he fully understood the power of namespaces, but eventually decided to propose a fraction of a significant change reserved for the next major Python release... wink >>> wink.fraction = 1e+-1 >>> wink.fraction.precision = 1e-+1 >>> wink.compute() 0.0 -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From effbot at telia.com Thu Apr 13 17:36:34 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 17:36:34 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> Message-ID: <007c01bfa55e$0faf0360$34aab5d4@hagrid> > Modified Files: > sysmodule.c > Log Message: > > Define version_info to be a tuple (major, minor, micro, level); level > is a string "a2", "b1", "c1", or '' for a final release. maybe level should be chosen so that version_info for a final release is larger than version_info for the corresponding beta ? From akuchlin at mems-exchange.org Thu Apr 13 17:39:43 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 13 Apr 2000 11:39:43 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <007c01bfa55e$0faf0360$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.59967.326442.73539@amarok.cnri.reston.va.us> Fredrik Lundh writes: >> Define version_info to be a tuple (major, minor, micro, level); level >> is a string "a2", "b1", "c1", or '' for a final release. >maybe level should be chosen so that version_info for a final >release is larger than version_info for the corresponding beta ? 'a2' < 'b1' < 'c1' < 'final' --amk From fdrake at acm.org Thu Apr 13 17:41:32 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 11:41:32 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <007c01bfa55e$0faf0360$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.60076.525602.848031@seahag.cnri.reston.va.us> Fredrik Lundh writes: > maybe level should be chosen so that version_info for a final > release is larger than version_info for the corresponding beta ? I thought about that, but didn't like it; should it perhaps be 'final'? If the purpose is to simply make it increase monotonically like sys.hexversion, why not just use sys.hexversion? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin at mems-exchange.org Thu Apr 13 17:44:19 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Thu, 13 Apr 2000 11:44:19 -0400 (EDT) Subject: [Python-Dev] >2GB Data.fs files on FreeBSD In-Reply-To: References: Message-ID: <14581.60243.557955.192783@amarok.cnri.reston.va.us> [Cc'ed to python-dev from the zope-dev mailing list; trim your follow-ups appropriately] R. David Murray writes: >So it looks like there is a problem using Zope with a large database >no matter what the platform. Has anyone figured out how to fix this? ... >But given the number of people who have said "use FreeBSD if you want >big files", I'm really wondering about this. What if later I >have an application where I really need a >2GB database? Different system calls are used for large files, because you can no longer use 32-bit ints to store file position. There's a HAVE_LARGEFILE_SUPPORT #define that turns on the use of these alternate system calls; see Python's configure.in for the test used to detect when it should be turned on. You could just hack the generated config.h to turn on large file support and recompile your copy of Python, but if the configure.in test is incorrect, that should be fixed. The test is: AC_MSG_CHECKING(whether to enable large file support) if test "$have_long_long" = yes -a \ "$ac_cv_sizeof_off_t" -gt "$ac_cv_sizeof_long" -a \ "$ac_cv_sizeof_long_long" -ge "$ac_cv_sizeof_off_t"; then AC_DEFINE(HAVE_LARGEFILE_SUPPORT) AC_MSG_RESULT(yes) else AC_MSG_RESULT(no) fi I thought you have to use the loff_t type instead of off_t; maybe this test should check for it instead? Anyone know anything about large file support? -- A.M. Kuchling http://starship.python.net/crew/amk/ When I dream, sometimes I remember how to fly. You just lift one leg, then you lift the other leg, and you're not standing on anything, and you can fly. -- Chloe Russell, in SANDMAN #43: "Brief Lives:3" From effbot at telia.com Thu Apr 13 17:44:57 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 17:44:57 +0200 Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us><007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> Message-ID: <008c01bfa55f$3af9f880$34aab5d4@hagrid> > Fredrik Lundh writes: > > maybe level should be chosen so that version_info for a final > > release is larger than version_info for the corresponding beta ? > > I thought about that, but didn't like it; should it perhaps be > 'final'? If the purpose is to simply make it increase monotonically > like sys.hexversion, why not just use sys.hexversion? readability? the sys.hexversion stuff isn't exactly obvious: >>> dir(sys) ... 'hexversion' ... >>> sys.hexversion 17170594 eh? is that version 1.71, or what? "final" is okay, I think. better than "f0", at least ;-) From fdrake at acm.org Thu Apr 13 17:56:38 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 11:56:38 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <008c01bfa55f$3af9f880$34aab5d4@hagrid> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> Message-ID: <14581.60982.631891.629922@seahag.cnri.reston.va.us> Fredrik Lundh writes: > readability? But hexversion retains the advantage that it's been there longer, and that's just too hard to change at this point. (Guido didn't leave the keys to his time machine...) > the sys.hexversion stuff isn't exactly obvious: I didn't say hexversion was pretty or that anyone liked it! Writing the docs, version_info is a *lot* easier to explain. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From mal at lemburg.com Thu Apr 13 17:55:08 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 17:55:08 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> Message-ID: <38F5EDDC.731E6740@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > The current need for #pragmas is really very simple: to tell > > the compiler which encoding to assume for the characters > > in u"...strings..." (*not* "...8-bit strings..."). > > why not? Because plain old 8-bit strings should work just as before, that is, existing scripts only using 8-bit strings should not break. > why keep on pretending that strings and strings are two > different things? it's an artificial distinction, and it only > causes problems all over the place. Sure. The point is that we can't just drop the old 8-bit strings... not until Py3K at least (and as Fred already said, all standard editors will have native Unicode support by then). So for now we're stuck with Unicode *and* 8-bit strings and have to make the two meet somehow -- which isn't all that easy, since 8-bit strings carry no encoding information. > > Could be that we don't need this pragma discussion at all > > if there is a different, more elegant solution to this... > > here's one way: > > 1. standardize on *unicode* as the internal character set. use > an encoding marker to specify what *external* encoding you're > using for the *entire* source file. output from the tokenizer is > a stream of *unicode* strings. Yep, that would work in Py3K... > 2. if the user tries to store a unicode character larger than 255 > in an 8-bit string, raise an OverflowError. There are no 8-bit strings in Py3K -- only 8-bit data buffers which don't have string methods ;-) > 3. the default encoding is "none" (instead of XML's "utf-8"). in > this case, treat the script as an ascii superset, and store each > string literal as is (character-wise, not byte-wise). Uhm. I think UTF-8 will be the standard for text file formats by then... so why not make it UTF-8 ? > additional notes: > > -- item (3) is for backwards compatibility only. might be okay to > change this in Py3K, but not before that. > > -- leave the implementation of (1) to 1.7. for now, assume that > scripts have the default encoding, which means that (2) cannot > happen. I'd say, leave all this to Py3K. > -- we still need an encoding marker for ascii supersets (how about > ;-). however, it's up to > the tokenizer to detect that one, not the parser. the parser only > sees unicode strings. Hmm, the tokenizer doesn't do any string -> object conversion. That's a task done by the parser. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Thu Apr 13 18:06:53 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 13 Apr 2000 18:06:53 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> Message-ID: <38F5F09D.53E323EF@lemburg.com> Skip Montanaro wrote: > > Marc> We were originally talking about proposals to integrate #pragmas > Marc> ... > > Minor nit... How about we lose the "#" during these discussions so we > aren't all subliminally disposed to embed pragmas in comments or to add the > C preprocessor to Python? ;-) Hmm, anything else would introduce a new keyword, I guess. And new keywords cause new scripts to fail in old interpreters even when they don't use Unicode at all and only include per convention. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From skip at mojam.com Thu Apr 13 18:16:55 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 11:16:55 -0500 (CDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F5F09D.53E323EF@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> Message-ID: <14581.62199.122899.126940@beluga.mojam.com> Marc> Skip Montanaro wrote: >> Minor nit... How about we lose the "#" during these discussions so >> we aren't all subliminally disposed to embed pragmas in comments or >> to add the C preprocessor to Python? ;-) Marc> Hmm, anything else would introduce a new keyword, I guess. And new Marc> keywords cause new scripts to fail in old interpreters even when Marc> they don't use Unicode at all and only include is> per convention. My point was only that using "#pragma" (or even "pragma") sort of implies we have our eye on a solution, but I don't think we're far enough down the path of answering what we want to have any concrete ideas about how to implement it. I think this thread started (more-or-less) when Guido posted an idea that originally surfaced on the idle-dev list about using "global ..." to implement functionality like this. It's not clear to me at this point what the best course might be. Skip From fdrake at acm.org Thu Apr 13 18:31:50 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 12:31:50 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F5F09D.53E323EF@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> Message-ID: <14581.63094.538920.187344@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Hmm, anything else would introduce a new keyword, I guess. And > new keywords cause new scripts to fail in old interpreters > even when they don't use Unicode at all and only include > per convention. Only if the new keyword is used in the script or anything it imports. This is exactly like using new syntax (u'...') or new library features (unicode('abc', 'iso-8859-1')). I can't think of anything that gets included "by convention" that breaks anything. I don't recall a proposal that we should casually add pragmas to our scripts if there's no need to do so. Adding pragmas to library modules is *not* part of the issue; they'd only be there if the version of Python they're part of supports the syntax. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Thu Apr 13 18:47:52 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 12:47:52 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F4CF7A.8F99562F@prescod.net> References: <38F4CF7A.8F99562F@prescod.net> Message-ID: <14581.64056.727047.412805@seahag.cnri.reston.va.us> Paul Prescod writes: > The XML rule is one encoding per file. One thing that I think that they > did innovate in (I had nothing to do with that part) is that entities I think an important part of this is that the location of the encoding declaration is completely fixed; it can't start five lines down (after all, it might be hard to know what a line is!). If we say, "The first character of a Python source file must be '#', or assume native encoding.", we go a long way to figuring out what's a line (CR/LF/CRLF can be dealt with in a "universal" fashion), so we can deal with something a little farther down, but I'd hate to be so flexible that it became too tedious to implement. I'd be more accepting of encoding declarations embedded in comments than pragmas. (Not that I *like* abusing comments like that.) So perhaps a Python encoding declaration becomes: #?python encoding="iso-8859-7"?# ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Thu Apr 13 18:51:35 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 18:51:35 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F4CF7A.8F99562F@prescod.net> <14581.64056.727047.412805@seahag.cnri.reston.va.us> Message-ID: <003901bfa568$890a5200$34aab5d4@hagrid> Fred wrote: > #?python encoding="iso-8859-7"?# like in: #!/usr/bin/python #?python encoding="utf-8" tabsize=5 if __name__ == "__main__": print "hello!" I've seen worse... From effbot at telia.com Thu Apr 13 18:52:44 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 18:52:44 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> Message-ID: <003a01bfa568$b190c560$34aab5d4@hagrid> M.-A. Lemburg wrote: > Fredrik Lundh wrote: > > > > M.-A. Lemburg wrote: > > > The current need for #pragmas is really very simple: to tell > > > the compiler which encoding to assume for the characters > > > in u"...strings..." (*not* "...8-bit strings..."). > > > > why not? > > Because plain old 8-bit strings should work just as before, > that is, existing scripts only using 8-bit strings should not break. but they won't -- if you don't use an encoding directive, and don't use 8-bit characters in your string literals, everything works as before. (that's why the default is "none" and not "utf-8") if you use 8-bit characters in your source code and wish to add an encoding directive, you need to add the right encoding directive... > > why keep on pretending that strings and strings are two > > different things? it's an artificial distinction, and it only > > causes problems all over the place. > > Sure. The point is that we can't just drop the old 8-bit > strings... not until Py3K at least (and as Fred already > said, all standard editors will have native Unicode support > by then). I discussed that in my original "all characters are unicode characters" proposal. in my proposal, the standard string type will have to roles: a string either contains unicode characters, or binary bytes. -- if it contains unicode characters, python guarantees that methods like strip, lower (etc), and regular expressions work as expected. -- if it contains binary data, you can still use indexing, slicing, find, split, etc. but they then work on bytes, not on chars. it's still up to the programmer to keep track of what a certain string object is (a real string, a chunk of binary data, an en- coded string, a jpeg image, etc). if the programmer wants to convert between a unicode string and an external encoding to use a certain unicode encoding, she needs to spell it out. the codecs are never called "under the hood". (note that if you encode a unicode string into some other encoding, the result is binary buffer. operations like strip, lower et al does *not* work on encoded strings). > So for now we're stuck with Unicode *and* 8-bit strings > and have to make the two meet somehow -- which isn't all > that easy, since 8-bit strings carry no encoding information. in my proposal, both string types hold unicode strings. they don't need to carry any encoding information, because they're not encoded. > > > Could be that we don't need this pragma discussion at all > > > if there is a different, more elegant solution to this... > > > > here's one way: > > > > 1. standardize on *unicode* as the internal character set. use > > an encoding marker to specify what *external* encoding you're > > using for the *entire* source file. output from the tokenizer is > > a stream of *unicode* strings. > > Yep, that would work in Py3K... or 1.7 -- see below. > > 2. if the user tries to store a unicode character larger than 255 > > in an 8-bit string, raise an OverflowError. > > There are no 8-bit strings in Py3K -- only 8-bit data > buffers which don't have string methods ;-) oh, you've seen the Py3K specification? > > 3. the default encoding is "none" (instead of XML's "utf-8"). in > > this case, treat the script as an ascii superset, and store each > > string literal as is (character-wise, not byte-wise). > > Uhm. I think UTF-8 will be the standard for text file formats > by then... so why not make it UTF-8 ? in time for 1.6? or you mean Py3K? sure! I said that in my first "additional note", didn't I: > > additional notes: > > > > -- item (3) is for backwards compatibility only. might be okay to > > change this in Py3K, but not before that. > > > > -- leave the implementation of (1) to 1.7. for now, assume that > > scripts have the default encoding, which means that (2) cannot > > happen. > > I'd say, leave all this to Py3K. do you mean it's okay to settle for a broken design in 1.6, since we can fix it in Py3K? that's scary. fixing the design is not that hard, and can be done now. implementing all parts of it is harder, and require extensive changes to the compiler/interpreter architecture. but iirc, such changes are already planned for 1.7... > > -- we still need an encoding marker for ascii supersets (how about > > ;-). however, it's up to > > the tokenizer to detect that one, not the parser. the parser only > > sees unicode strings. > > Hmm, the tokenizer doesn't do any string -> object conversion. > That's a task done by the parser. "unicode string" meant Py_UNICODE*, not PyUnicodeObject. if the tokenizer does the actual conversion doesn't really matter; the point is that once the code has passed through the tokenizer, it's unicode. From bwarsaw at cnri.reston.va.us Thu Apr 13 18:59:03 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 12:59:03 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> Message-ID: <14581.64727.928889.239985@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> maybe level should be chosen so that version_info for a final FL> release is larger than version_info for the corresponding beta FL> ? Yes, absolutely. Please don't break the comparability of version_info or the connection with the patchversion.h macros. -Barry From fdrake at acm.org Thu Apr 13 19:05:17 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:05:17 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.64727.928889.239985@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> Message-ID: <14581.65101.110813.343483@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > Yes, absolutely. Please don't break the comparability of version_info > or the connection with the patchversion.h macros. So I'm the only person here today who prefers the release level of a final version to be '' instead of 'final'? Or did I miss all the messages of enthusiastic support for '' from my screaming fans? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bwarsaw at cnri.reston.va.us Thu Apr 13 19:04:40 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:04:40 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> Message-ID: <14581.65064.13261.43476@anthem.cnri.reston.va.us> >>>>> "AMK" == Andrew M Kuchling writes: AMK> Fredrik Lundh writes: >> Define version_info to be a tuple (major, minor, micro, level); >> level is a string "a2", "b1", "c1", or '' for a final release. >> maybe level should be chosen so that version_info for a final >> release is larger than version_info for the corresponding beta >> ? AMK> 'a2' < 'b1' < 'c1' < 'final' Another reason I don't like the strings: 'b9' > 'b10' :( I can imagine a remote possibility of more than 9 pre-releases (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has to fit in 4 bits), so at the very least, make that string 'a02', 'a03', etc. -Barry From bwarsaw at cnri.reston.va.us Thu Apr 13 19:07:54 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:07:54 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> Message-ID: <14581.65258.431992.820885@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> readability? Yup. FL> "final" is okay, I think. better than "f0", at least ;-) And I think (but am not 100% positive) that once a final release comes out, Guido stops incrementing the PY_RELEASE_SERIAL's and instead starts incrementing PY_MICRO_VERSION. If that's not the case, then it complicates things a bit. -Barry From bwarsaw at cnri.reston.va.us Thu Apr 13 19:08:51 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:08:51 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> <14581.60982.631891.629922@seahag.cnri.reston.va.us> Message-ID: <14581.65315.489980.275044@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I didn't say hexversion was pretty or that anyone liked Fred> it! Writing the docs, version_info is a *lot* easier to Fred> explain. So is it easier to explain that the empty string means a final release or that 'final' means a final release? :) -Barry From fdrake at acm.org Thu Apr 13 19:11:19 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:11:19 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.65064.13261.43476@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> Message-ID: <14581.65463.994272.442725@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > I can imagine a remote possibility of more than 9 pre-releases > (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has > to fit in 4 bits), so at the very least, make that string 'a02', > 'a03', etc. Doesn't this further damage the human readability of the value? I thought that was an important reason to break it up from sys.hexversion. (Note also that you're not just saying more than 9 pre-releases, but more than 9 at any one of alpha, beta, or release candidate stages. 1-9 at each stage is already 27 pre-release packages.) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From gmcm at hypernet.com Thu Apr 13 19:11:14 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 13:11:14 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131522.RAA05137@python.inrialpes.fr> Message-ID: <1256476619-52065132@hypernet.com> Vladimir Marangozov wrote: > > > Obviously, the user-attr proposal is not so "simple" as it looks like. This is not obvious to me. Both the concept and implementation appear fairly simple to me. > I wish we all realize what's really going on here. > > In all cited use cases for this proposal, functions are no more > perceived as functions per se, but as data structures (objects) > which are the target of the computation. IOW, functions are just > considered as *instances* of a class (inheriting from the builtin > "PyFunction" class) with user-attributes, having a state, and > eventually a set of operations bound to them. I don't see why they aren't still functions. Putting a rack on my bicycle doesn't make it a pickup truck. I think it's a real stretch to say they would become "instances of a class". There's no inheritance, and the "state" isn't visible inside the function (probably unfortunately ). Just like today, they are objects of type PyFunction, and they get called the same old way. You'll be able to hang extra stuff off them, just like today you can hang extra stuff off a module object without the module's knowledge or cooperation. > I guess that everybody realized that with this proposal, one could > bind not only doc strings, but also functions to the function. > > def func(): pass > def serialize(): ... > func.pack = serialize > func.pack() > > What is this? This is manual instance customization. What is "def"? What is f.__doc__ = ... ? > Since nobody said that this would be done 'exceptionally', but rather > on a regular basis for all functions (and generally, for all objects) > in a program, the way to customize instances after the fact, makes > those instances singletons of user-defined classes. Only according to a very loose definition of "instance" and "user-defined class". More accurately, they are objects as they always have been (oops, Barry screwed up the time- machine again; please adjust the tenses of the above). > You may say "so what?". Well, this is fine if it were part of the > object model from the start. And there's no reason why only functions > and methods can have this functionality. Stick the __dict__ slot in > the object header and let me bind user-attributes to all objects. Perceived need is part of this. > I have a favorite number, 7, so I want to label it Vlad's number. > seven = 7; seven.fanclub = ['Vlad']. I want to add a boolean func > to all numbers, n.is_prime(). I want to have a s.zip() method for a > set of strings in my particular application, not only the builtin ones. > > Why is it not allowed to have this today? Think about it! This is apparently a reducto ad absurdum argument. It's got the absurdum, but not much reducto. I prefer this one: Adding attributes to functions is immoral. Therefore func.__doc__ is immoral and should be removed. For another thing, we'll need a couple generations to argue about what to do with those 100 singleton int objects . > How would you solve your app needs today? Through classes and instances. > That's the prescribed `legal' way to do customized objects; no shortcuts. > Saying that mucking with functions' __doc__ strings is the only way to > implement some functionality is simply not true. No, it's a matter of convenience. Remember, Pythonistas are from Yorkshire ("You had Python??... You had assembler??.. You had front-panel toggle switches??.. You had wire- wrapping tools??.."). > In short, there's no way I can accept this proposal in its current > state and make the distingo between functions/methods and other kinds > of objects (including 3rd party ones). If we're to go down this road, > treat all objects as equal citizens in this regard. None or all. They are all first class objects already. Adding capabilities to one of them doesn't subtract them from any other. > The object model must remain consistent. This proposal opens a breach in it. > And not the lightest! In any sense in which you can apply the word "consistent" to Python's object model, I fail to see how this makes it less so. > And this is only part of the reasons why I'm still firmly -1 until P3K. > Glad to see that Barry exposed some of the truth about it, after preserving > our throats, i.e. he understood that we understood that he fully understood > the power of namespaces, but eventually decided to propose a fraction of > a significant change reserved for the next major Python release... wink > > >>> wink.fraction = 1e+-1 > >>> wink.fraction.precision = 1e-+1 > >>> wink.compute() > 0.0 I don't see anything here but an argument that allowing attributes on function objects makes them vaguely similar to instance objects. To the extent that I can agree with that, I fail to see any harm in it. - Gordon From fdrake at acm.org Thu Apr 13 19:16:15 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:16:15 -0400 (EDT) Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14581.65258.431992.820885@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.60076.525602.848031@seahag.cnri.reston.va.us> <008c01bfa55f$3af9f880$34aab5d4@hagrid> <14581.60982.631891.629922@seahag.cnri.reston.va.us> <14581.65315.489980.275044@anthem.cnri.reston.va.us> <14581.65258.431992.820885@anthem.cnri.reston.va.us> Message-ID: <14582.223.861189.614634@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > And I think (but am not 100% positive) that once a final release comes > out, Guido stops incrementing the PY_RELEASE_SERIAL's and instead > starts incrementing PY_MICRO_VERSION. If that's not the case, then > it complicates things a bit. patchlevel.h includes a comment that indicates serial should be 0 for final releases. > So is it easier to explain that the empty string means a final release > or that 'final' means a final release? :) I think it's the same; either is a special value. The only significant advantage of 'final' is the monotonicity provided by 'final'. I'm not convinced that it's otherwise any better. It also means to create a formatter version number from this that you need to special-case the last item in sys.version_info. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Thu Apr 13 19:15:14 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 13 Apr 2000 19:15:14 +0200 Subject: [Python-Dev] Re: CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us><007c01bfa55e$0faf0360$34aab5d4@hagrid><14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> Message-ID: <007301bfa56b$d8ec1ee0$34aab5d4@hagrid> > I can imagine a remote possibility of more than 9 pre-releases > (counting from 1), but not more than 15 (since PY_RELEASE_SERIAL has > to fit in 4 bits) or rather, "I can imagine a remote possibility of more than 5 pre-releases (counting from 1), but not more than 9 (since PY_RELEASE_SERIAL has to fit in a single decimal digit"? in the very unlikely case that I'm wrong, feel free to break the glass and install the following patch: #define PY_RELEASE_LEVEL_DESPAIR 0xD #define PY_RELEASE_LEVEL_EXTRAMUNDANE 0xE #define PY_RELEASE_LEVEL_FINAL 0xF /* Serial should be 0 here */ From bwarsaw at cnri.reston.va.us Thu Apr 13 19:17:30 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 13:17:30 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> Message-ID: <14582.298.938842.466851@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> So I'm the only person here today who prefers the release Fred> level of a final version to be '' instead of 'final'? Or Fred> did I miss all the messages of enthusiastic support for '' Fred> from my screaming fans? I've blocked those messages at your mta, so you would't be fooled into doing the wrong thing. I'll repost them to you, but only after you change it back to 'final' means final. Then you can be rightfully indignant at all of us losers who wanted it the other way, and caused you all that extra work! :) root-of-all-evil-ly y'rs, -Barry From fdrake at acm.org Thu Apr 13 19:20:24 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:20:24 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.298.938842.466851@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> <14582.298.938842.466851@anthem.cnri.reston.va.us> Message-ID: <14582.472.612445.191833@seahag.cnri.reston.va.us> Barry A. Warsaw writes: > I've blocked those messages at your mta, so you would't be fooled into > doing the wrong thing. I'll repost them to you, but only after you I don't mind that, just don't stop the groupies! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip at mojam.com Thu Apr 13 19:24:35 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 12:24:35 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256476619-52065132@hypernet.com> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> Message-ID: <14582.723.427231.355475@beluga.mojam.com> Gordon> I don't see why they aren't still functions. Putting a rack on Gordon> my bicycle doesn't make it a pickup truck. Though putting a gun in the rack might... ;-) Skip From bwarsaw at cnri.reston.va.us Thu Apr 13 19:25:13 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:25:13 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> Message-ID: <14582.761.365390.946880@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> Doesn't this further damage the human readability of the Fred> value? A little, but it's a fine compromise between the various constraints. Another way you could structure that tuple is to split the PY_RELEASE_LEVEL and the PY_RELEASE_SERIAL. Make the former even more readable if you want, and make the latter a real int. Thus Python 1.6a2 would have a sys.version_info() of (1, 6, 0, 'alpha', 2), e.g. the form is: (major, minor, micro, level, serial) You can't use 'gamma' though because then you break comparability. Maybe use 'candidate' instead? Sigh. Fred> I thought that was an important reason to break it Fred> up from sys.hexversion. (Note also that you're not just Fred> saying more than 9 pre-releases, but more than 9 at any one Fred> of alpha, beta, or release candidate stages. 1-9 at each Fred> stage is already 27 pre-release packages.) Well, Guido hisself must have thought that there was a remote possibility of more than 9 releases at a particular level, otherwise he'd have jammed PY_RELEASE_SERIAL in 3 bits. I mean, there's no other possible explanation for his choices is there?! :) -Barry From fdrake at acm.org Thu Apr 13 19:31:04 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:31:04 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.761.365390.946880@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> Message-ID: <14582.1112.750322.6958@seahag.cnri.reston.va.us> bwarsaw at cnri.reston.va.us writes: > A little, but it's a fine compromise between the various constraints. > Another way you could structure that tuple is to split the > PY_RELEASE_LEVEL and the PY_RELEASE_SERIAL. Make the former even more > readable if you want, and make the latter a real int. Thus Python > 1.6a2 would have a sys.version_info() of (1, 6, 0, 'alpha', 2), > e.g. the form is: > > (major, minor, micro, level, serial) I've thought of this as well, and certainly prefer it to the 'a01' solution. > You can't use 'gamma' though because then you break comparability. > Maybe use 'candidate' instead? Sigh. Yeah. > Well, Guido hisself must have thought that there was a remote > possibility of more than 9 releases at a particular level, otherwise > he'd have jammed PY_RELEASE_SERIAL in 3 bits. I mean, there's no > other possible explanation for his choices is there?! :) Clearly. I'll have to break his heart when I release 1.6a16 this afternoon. ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Thu Apr 13 19:32:41 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 13:32:41 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.723.427231.355475@beluga.mojam.com> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> <14582.723.427231.355475@beluga.mojam.com> Message-ID: <14582.1209.471995.242974@seahag.cnri.reston.va.us> Skip Montanaro writes: > Though putting a gun in the rack might... ;-) And make sure that rack is big enough for the dogs, we don't want them to feel left out! (Gosh, I'm feeling like I'm back in south-west Virginia already! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From skip at mojam.com Thu Apr 13 19:32:37 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 12:32:37 -0500 (CDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 In-Reply-To: <14582.761.365390.946880@anthem.cnri.reston.va.us> References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> Message-ID: <14582.1205.389790.293558@beluga.mojam.com> BAW> Thus Python 1.6a2 would have a sys.version_info() of (1, 6, 0, BAW> 'alpha', 2), e.g. the form is: BAW> (major, minor, micro, level, serial) BAW> You can't use 'gamma' though because then you break comparability. Yeah, you can. Don't use 'final'. Use 'omega'... ;-) Skip From bwarsaw at cnri.reston.va.us Thu Apr 13 19:35:05 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:35:05 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.64727.928889.239985@anthem.cnri.reston.va.us> <14581.65101.110813.343483@seahag.cnri.reston.va.us> <14582.298.938842.466851@anthem.cnri.reston.va.us> <14582.472.612445.191833@seahag.cnri.reston.va.us> Message-ID: <14582.1353.482124.111121@anthem.cnri.reston.va.us> >>>>> "Fred" == Fred L Drake, Jr writes: Fred> I don't mind that, just don't stop the groupies! ;) Hey, take it from me, groupies are a dime a dozen. They ask you all kinds of boring questions like what kind of strings you use (or how fast your disk drives are). It's the "gropies" you want. 'Course, tappin' away at a keyboard that only makes one kind of annoying clicking sound and isn't midi-fied won't get you any gropies. Even if you're an amazing hunk of a bass god, it's tough (so you know I'm at a /severe/ disadvantage :) -Barry From bwarsaw at cnri.reston.va.us Thu Apr 13 19:38:29 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Thu, 13 Apr 2000 13:38:29 -0400 (EDT) Subject: [Python-Dev] CVS: python/dist/src/Python sysmodule.c,2.59,2.60 References: <200004131529.LAA29446@seahag.cnri.reston.va.us> <007c01bfa55e$0faf0360$34aab5d4@hagrid> <14581.59967.326442.73539@amarok.cnri.reston.va.us> <14581.65064.13261.43476@anthem.cnri.reston.va.us> <14581.65463.994272.442725@seahag.cnri.reston.va.us> <14582.761.365390.946880@anthem.cnri.reston.va.us> <14582.1205.389790.293558@beluga.mojam.com> Message-ID: <14582.1557.128677.346938@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: BAW> Thus Python 1.6a2 would have a sys.version_info() of (1, 6, BAW> 0, 'alpha', 2), e.g. the form is: BAW> (major, minor, micro, level, serial) BAW> You can't use 'gamma' though because then you break BAW> comparability. SM> Yeah, you can. Don't use 'final'. Use 'omega'... ;-) Or how 'bout: "zats the last one yer gonna git, ya peons, now leave me ALONE" ? -Barry From gmcm at hypernet.com Thu Apr 13 19:39:06 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 13:39:06 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.723.427231.355475@beluga.mojam.com> References: <1256476619-52065132@hypernet.com> Message-ID: <1256474945-52165985@hypernet.com> Skip wrote: > > Gordon> I don't see why they aren't still functions. Putting a rack on > Gordon> my bicycle doesn't make it a pickup truck. > > Though putting a gun in the rack might... ;-) Nah, I live in downeast Maine. I'd need a trailer hitch and snow- plow mount to qualify. - Gordon From skip at mojam.com Thu Apr 13 19:51:08 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 13 Apr 2000 12:51:08 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.1209.471995.242974@seahag.cnri.reston.va.us> References: <200004131522.RAA05137@python.inrialpes.fr> <1256476619-52065132@hypernet.com> <14582.723.427231.355475@beluga.mojam.com> <14582.1209.471995.242974@seahag.cnri.reston.va.us> Message-ID: <14582.2316.638334.342115@beluga.mojam.com> Fred> Skip Montanaro writes: >> Though putting a gun in the rack might... ;-) Fred> And make sure that rack is big enough for the dogs, we don't want Fred> them to feel left out! They fit in the panniers. (They're minature german shorthair pointers...) extending-this-silliness-ly y'rs... Skip From Vladimir.Marangozov at inrialpes.fr Thu Apr 13 20:10:33 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Thu, 13 Apr 2000 20:10:33 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM Message-ID: <200004131810.UAA05752@python.inrialpes.fr> Gordon McMillan wrote: > > I don't see anything here but an argument that allowing > attributes on function objects makes them vaguely similar to > instance objects. To the extent that I can agree with that, I fail > to see any harm in it. > To the extent it encourages confusion, I think it sucks. >>> def this(): ... sucks = "no" ... >>> this.sucks = "yes" >>> >>> print this.sucks 'yes' Why on earth 'sucks' is not the object defined in the function's namespace? Who made that deliberate decision? Clearly 'this' defines a new namespace, so it'll be also legitimate to get a NameError, or to: >>> print this.sucks 'no' Don't you think? And don't explain to me that this is because there's a code object, different from the function object, which is compiled at the function's definition, then assotiated with the function object, blah, blah, blah... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From jeremy at cnri.reston.va.us Thu Apr 13 21:08:12 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 15:08:12 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <000701bfa505$31008380$4d2d153f@tim> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> Message-ID: <14582.5791.148277.87450@walden> >>>>> "TP" == Tim Peters writes: TP> [Jeremy Hylton]> >> So the real problem is defining some reasonable semantics for >> comparison of recursive objects. TP> I think this is exactly a graph isomorphism problem, since TP> Python always compares "by value" (so isomorphism is the natural TP> generalization). I'm not familiar with any algorithms for the graph isomorphism problem, but I took a stab at a simple comparison algorithm. The idea is to detect comparisons that would cross back-edges in the object graphs. Instead of starting a new comparison, assume they are the same. If, in fact, the objects are not the same, they must differ in some other way; some other part of the comparison will fail. TP> This isn't hard (!= tedious, alas) to define or to implement TP> naively, but a straightforward implementation would be very TP> expensive at runtime compared to the status quo. That's why TP> "real languages" would rather suffer an infinite loop. TP> It's expensive because there's no cheap way to know whether you TP> have a loop in an object. My first attempt at implementing this is expensive. I maintain a dictionary that contains all the object pairs that are currently being compared. Specifically, the dictionary is used to implement a set of object id pairs. Every call to PyObject_Compare will add a new pair to the dictionary when it is called and remove it when it returns (except for a few trivial cases). A naive patch is included below. It does seem to involve a big performance hit -- more than 10% slower on pystone. It also uses a lot of extra space. Note that the patch has all its initialization code inline in PyObject_Compare; moving that elsewhere will help a little. It also use a bunch of function calls where macros would be more efficient. TP> An anal compromise would be to run comparisons full speed TP> without trying to detect loops, but if the recursion got "too TP> deep" break out and start over with an expensive alternative TP> that does check for loops. The later requires machinery similar TP> to copy.deepcopy's. It looks like the anal compromise might be necessary. I'll re-implement the patch more carefully and see what the real effect on performance is. Jeremy Index: object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 239c239 < "__repr__ returned non-string (type %.200s)", --- > "__repr__ returned non-string (type %s)", 276c276 < "__str__ returned non-string (type %.200s)", --- > "__str__ returned non-string (type %s)", 300a301,328 > static PyObject *cmp_state_key = NULL; > > static PyObject* > cmp_state_make_pair(v, w) > PyObject *v, *w; > { > PyObject *pair = PyTuple_New(2); > if (pair == NULL) > return NULL; > if ((long)v <= (long)w) { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); > } else { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); > } > return pair; > } > > void > cmp_state_clear_pair(dict, key) > PyObject *dict, *key; > { > PyDict_DelItem(dict, key); > Py_DECREF(key); > } > > 305a334,336 > PyObject *tstate_dict, *cmp_dict, *pair; > int result; > 311a343,376 > tstate_dict = PyThreadState_GetDict(); > if (tstate_dict == NULL) { > PyErr_BadInternalCall(); > return -1; > } > /* fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v, > v->ob_type->tp_name, (long)w, w->ob_type->tp_name); > */ > /* XXX should initialize elsewhere */ > if (cmp_state_key == NULL) { > cmp_state_key = PyString_InternFromString("compare_state"); > cmp_dict = PyDict_New(); > if (cmp_dict == NULL) > return -1; > PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); > } else { > cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key); > if (cmp_dict == NULL) > return NULL; > PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); > } > > pair = cmp_state_make_pair(v, w); > if (pair == NULL) { > PyErr_BadInternalCall(); > return -1; > } > if (PyDict_GetItem(cmp_dict, pair)) { > /* already comparing these objects. assume they're > equal until shown otherwise > */ > Py_DECREF(pair); > return 0; > } 316a382,384 > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } 317a386 > cmp_state_clear_pair(cmp_dict, pair); 329a399,401 > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } 344a417 > cmp_state_clear_pair(cmp_dict, pair); 350,364c423,425 < else if (PyUnicode_Check(v) || PyUnicode_Check(w)) { < int result = PyUnicode_Compare(v, w); < if (result == -1 && PyErr_Occurred() && < PyErr_ExceptionMatches(PyExc_TypeError)) < /* TypeErrors are ignored: if Unicode coercion < fails due to one of the arguments not < having the right type, we continue as < defined by the coercion protocol (see < above). Luckily, decoding errors are < reported as ValueErrors and are not masked < by this technique. */ < PyErr_Clear(); < else < return result; < } --- > cmp_state_clear_pair(cmp_dict, pair); > if (PyUnicode_Check(v) || PyUnicode_Check(w)) > return PyUnicode_Compare(v, w); 372c433,434 < if (vtp->tp_compare == NULL) --- > if (vtp->tp_compare == NULL) { > cmp_state_clear_pair(cmp_dict, pair); 374c436,439 < return (*vtp->tp_compare)(v, w); --- > } > result = (*vtp->tp_compare)(v, w); > cmp_state_clear_pair(cmp_dict, pair); > return result; From gstein at lyra.org Thu Apr 13 21:09:02 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 13 Apr 2000 12:09:02 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.60,2.61 In-Reply-To: <200004131744.NAA30726@seahag.cnri.reston.va.us> Message-ID: It's great that you made this change! I hadn't got through my mail, but was going to recommend it... :-) One comment: On Thu, 13 Apr 2000, Fred Drake wrote: >... > --- 409,433 ---- > v = PyInt_FromLong(PY_VERSION_HEX)); > Py_XDECREF(v); > + /* > + * These release level checks are mutually exclusive and cover > + * the field, so don't get too fancy with the pre-processor! > + */ > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_ALPHA > + v = PyString_FromString("alpha"); > + #endif > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_BETA > + v = PyString_FromString("beta"); > + #endif > + #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_GAMMA > + v = PyString_FromString("candidate"); > + #endif > #if PY_RELEASE_LEVEL == PY_RELEASE_LEVEL_FINAL > ! v = PyString_FromString("final"); > ! #endif > PyDict_SetItemString(sysdict, "version_info", > ! v = Py_BuildValue("iiiNi", PY_MAJOR_VERSION, > PY_MINOR_VERSION, > ! PY_MICRO_VERSION, v, > ! PY_RELEASE_SERIAL)); > Py_XDECREF(v); > PyDict_SetItemString(sysdict, "copyright", I would recommend using the "s" format code in Py_BuildValue. It simplifies the code, and it is quite a bit easier for a human to process. When I first saw the code, I thought "the level string leaks!" Then I saw the "N" code, went and looked it up, and realized what is going on. So... to avoid that, the "s" code would be great. Cheers, -g -- Greg Stein, http://www.lyra.org/ From bitz at bitdance.com Thu Apr 13 21:12:34 2000 From: bitz at bitdance.com (R. David Murray) Date: Thu, 13 Apr 2000 15:12:34 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Andrew M. Kuchling wrote: > longer use 32-bit ints to store file position. There's a > HAVE_LARGEFILE_SUPPORT #define that turns on the use of these > alternate system calls; see Python's configure.in for the test used to I just looked in my python config.h on my FreeBSD system, and I see: #define HAVE_LARGEFILE_SUPPORT 1 So it looks like it is on, and it seems to me the problem could be in either Python or FileStorage.py in Zope. This is a Zope 2.1.2 system (but I diffed filestorage.py against the 2.1.6 version and didn't see any relevant changes) running on a FreeBSD 3.1 system. A make test in Python passed all tests, but I don't know if large file support is tested by the tests. --RDM PS: anyone from the python list replying to this please CC me as I am not on that list. From gmcm at hypernet.com Thu Apr 13 21:26:13 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Thu, 13 Apr 2000 15:26:13 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131810.UAA05752@python.inrialpes.fr> References: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM Message-ID: <1256468519-52554453@hypernet.com> Vladimir Marangozov wrote: > Gordon McMillan wrote: > > > > I don't see anything here but an argument that allowing > > attributes on function objects makes them vaguely similar to > > instance objects. To the extent that I can agree with that, I fail > > to see any harm in it. > > > > To the extent it encourages confusion, I think it sucks. > > >>> def this(): > ... sucks = "no" > ... > >>> this.sucks = "yes" > >>> > >>> print this.sucks > 'yes' > > Why on earth 'sucks' is not the object defined in the function's namespace? Because that one is a local. Python allows the same name in different places. Used wisely, it's a handy feature of namespaces. > Who made that deliberate decision? What decision? To put a name "sucks" both in the function's locals and as a function attribute? To print something accessed with object.attribute notation in the obvious manner? Deciding not to cause gratuitous UnboundLocalErrors? This is nowhere near as confusing as, say, putting a module named X in a package named X and then saying "from X import *", (hi, Marc-Andre!). > Clearly 'this' defines a new namespace, > so it'll be also legitimate to get a NameError, or to: > > >>> print this.sucks > 'no' > > Don't you think? Only if you've done "this.sucks = 'no'". Or are you saying that if functions have attributes, people will all of a sudden expect that function locals will have initialized and maintained state? We certainly get plenty of newbie confusion about namespaces, assignment and scoping; maybe I've seen one or two where people thought function.local should be legal (do Python-tutors see this?). In those cases, is it the existence of function.__doc__ that causes the confusion? If yes, and this is a serious problem, then you should be arguing for the removal of __doc__. If not, why would allowing adding more attributes exacerbate the problem? > And don't explain to me that this is because there's a code object, > different from the function object, which is compiled at the function's > definition, then assotiated with the function object, blah, blah, blah... No problem. [Actually, the best argument against this I can see is that functional-types already try to use function objects where any sane person knows you should use an instance; and since this doesn't further their agenda, the bastard's will just scream louder ]. - Gordon From fdrake at acm.org Thu Apr 13 22:05:10 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Thu, 13 Apr 2000 16:05:10 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Python sysmodule.c,2.60,2.61 In-Reply-To: References: <200004131744.NAA30726@seahag.cnri.reston.va.us> Message-ID: <14582.10358.843823.467467@seahag.cnri.reston.va.us> Greg Stein writes: > I would recommend using the "s" format code in Py_BuildValue. It > simplifies the code, and it is quite a bit easier for a human to process. > When I first saw the code, I thought "the level string leaks!" Then I saw > the "N" code, went and looked it up, and realized what is going on. Good point; 'N' is relatively obscure in my experience as well. I've made the change (and there's probably less code in the binary as well!). -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From bitz at bitdance.com Thu Apr 13 22:04:56 2000 From: bitz at bitdance.com (R. David Murray) Date: Thu, 13 Apr 2000 16:04:56 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: OK, some more info. The code in FileStorage.py looks like this: ------------------- def read_index(file, name, index, vindex, tindex, stop='\377'*8, ltid=z64, start=4, maxoid=z64): read=file.read seek=file.seek seek(0,2) file_size=file.tell() print file_size, start if file_size: if file_size < start: raise FileStorageFormatError, file.name [etc] ------------------- I stuck that print statement in there. The results of the print are: -2147248811L 4 So it looks to my uneducated eye like file.tell() is broken. The actual on-disk size of the file, by the way, is indeed 2147718485, so it looks like somebody's not using the right size data structure somewhere. So, can anyone tell me what to look for, or am I stuck for the moment? --RDM PS: anyone on pthon-dev replying please CC me as I am only on the zope list. From paul at prescod.net Thu Apr 13 22:55:43 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 13 Apr 2000 15:55:43 -0500 Subject: [Python-Dev] OT: XML References: <001b01bfa4af$18b1c9c0$34aab5d4@hagrid> <20000412225638.E9002@thyrsus.com> Message-ID: <38F6344F.25D344B5@prescod.net> Well, as long as everyone else is going to be off-topic: What definition of "language" are you using? And while you're at it, what definition of "semantics" are you using? As I recall, a string is an ordered list of symbols and a language is an unordered set of strings. I know that Ka-Ping, despite going to a great university was in Engineering, not computer science, so I'll excuse him for not knowing the Chomskian definition of language, :), but what's your excuse Eric? Most XML people will happily admit that XML has no "semantics" but I think that's bullshit too. The mapping from the string to the abstract tree data model *is the semantic content* of the XML specification. Yes, it is a brain-dead simple mapping and so the semantic structure provided by the XML specification is minimal...but that's the whole point. It's supposed to be simple. It's supposed to not get in the way of higher level semantics. It makes as little sense to reject XML out of hand because it is a buzzword but is not innovative as it does for people to embrace it mystically because it is Microsoft's flavor of the week. XML takes simple ideas from the Lisp and document processing communities and popularize them so that they can achieve economies of scale. It sounds exactly like the relationship between Lisp and Python to me... By the way, what data model or text encoding is NOT isomorphic to Lisp S-expressions? Isn't Python code isomorphic to Lisp s-expessions? Paul Prescod From jeremy at cnri.reston.va.us Fri Apr 14 00:06:39 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 18:06:39 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> Message-ID: <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> I did one more round of work on this idea, and I'm satisfied with the results. Most of the performance hit can be eliminated by doing nothing until there are at least N recursive calls to PyObject_Compare, where N is fairly large. (I picked 25000.) Non-circular objects that are not deeply nested only pay for an integer increment, a decrement, and a compare. Background for patches-only readers: This patch appears to fix PR#7. Comments and suggestions solicitied. I think this is worth checking in. Jeremy Index: Include/object.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/object.h,v retrieving revision 2.52 diff -r2.52 object.h 286a287,289 > /* tstate dict key for PyObject_Compare helper */ > extern PyObject *_PyCompareState_Key; > Index: Python/pythonrun.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v retrieving revision 2.91 diff -r2.91 pythonrun.c 151a152,153 > _PyCompareState_Key = PyString_InternFromString("cmp_state"); > Index: Objects/object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 300a301,306 > PyObject *_PyCompareState_Key; > > int _PyCompareState_nesting = 0; > int _PyCompareState_flag = 0; > #define NESTING_LIMIT 25000 > 305a312,313 > int result; > 372c380 < if (vtp->tp_compare == NULL) --- > if (vtp->tp_compare == NULL) { 374c382,440 < return (*vtp->tp_compare)(v, w); --- > } > ++_PyCompareState_nesting; > if (_PyCompareState_nesting > NESTING_LIMIT) > _PyCompareState_flag = 1; > if (_PyCompareState_flag && > (vtp->tp_as_mapping || (vtp->tp_as_sequence && > !PyString_Check(v)))) > { > PyObject *tstate_dict, *cmp_dict, *pair; > > tstate_dict = PyThreadState_GetDict(); > if (tstate_dict == NULL) { > PyErr_BadInternalCall(); > return -1; > } > cmp_dict = PyDict_GetItem(tstate_dict, _PyCompareState_Key); > if (cmp_dict == NULL) { > cmp_dict = PyDict_New(); > if (cmp_dict == NULL) > return -1; > PyDict_SetItem(tstate_dict, > _PyCompareState_Key, > cmp_dict); > } > > pair = PyTuple_New(2); > if (pair == NULL) { > return -1; > } > if ((long)v <= (long)w) { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); > } else { > PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); > PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); > } > if (PyDict_GetItem(cmp_dict, pair)) { > /* already comparing these objects. assume > they're equal until shown otherwise > */ > Py_DECREF(pair); > --_PyCompareState_nesting; > if (_PyCompareState_nesting == 0) > _PyCompareState_flag = 0; > return 0; > } > if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { > return -1; > } > result = (*vtp->tp_compare)(v, w); > PyDict_DelItem(cmp_dict, pair); > Py_DECREF(pair); > } else { > result = (*vtp->tp_compare)(v, w); > } > --_PyCompareState_nesting; > if (_PyCompareState_nesting == 0) > _PyCompareState_flag = 0; > return result; From ping at lfw.org Fri Apr 14 00:41:44 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 17:41:44 -0500 (CDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.5791.148277.87450@walden> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > >>>>> "TP" == Tim Peters writes: > > TP> [Jeremy Hylton]> > >> So the real problem is defining some reasonable semantics for > >> comparison of recursive objects. There is a "right" way to do this, i believe, and my friend Mark Miller implemented it in E. He tells me his algorithm is inspired by the method for unification of cyclic structures in Prolog III. It's available in the E source code (in the file elib/tables/Equalizer.java). See interesting stuff on equality and cyclic data structures at http://www.erights.org/javadoc/org/erights/e/elib/tables/Equalizer.html http://www.erights.org/elang/same-ref.html http://www.erights.org/elang/blocks/defVar.html http://www.eros-os.org/~majordomo/e-lang/0698.html There is also a thread about equality issues in general at: http://www.eros-os.org/~majordomo/e-lang/0000.html It's long, but worth perusing. Here is my rough Python translation of the code in the E Equalizer. Python 1.4 (Mar 25 2000) [C] Copyright 1991-1997 Stichting Mathematisch Centrum, Amsterdam Python Console v1.4 by Ka-Ping Yee >>> def same(left, right, sofar={}): ... hypothesis = (id(left), id(right)) ... if left is right or sofar.has_key(hypothesis): return 1 ... if type(left) is not type(right): return 0 ... if type(left) is type({}): ... left, right = left.items(), right.items() ... if type(left) is type([]): ... sofar[hypothesis] = 1 ... try: ... for i in range(len(left)): ... if not same(left[i], right[i], sofar): return 0 ... return 1 ... finally: ... del sofar[hypothesis] ... return left == right ... ... >>> same([3],[4]) 0 >>> same([3],[3]) 1 >>> a = [1,2,3] >>> b = [1,2,3] >>> c = [1,2,3] >>> same(a,b) 1 >>> a[1] = a >>> same(a,a) 1 >>> same(a,b) 0 >>> b[1] = b >>> same(a,b) 1 >>> b[1] = c >>> b [1, [1, 2, 3], 3] >>> same(a,b) 0 >>> c[1] = b >>> same(a,b) 1 >>> same(b,c) 1 >>> I would like to see Python's comparisons work this way (i.e. "correct" as opposed to "we give up"). -- ?!ng From ping at lfw.org Fri Apr 14 00:49:21 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 17:49:21 -0500 (CDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: Message-ID: As a reference, here is the corresponding cyclic-structure-comparison example from a message about E: ? define tight := [1, tight, "x"] # value: [1, ***CYCLE***, x] ? define loose := [1, [1, loose, "x"], "x"] # value: [1, ***CYCLE***, x] ? tight == loose # value: true ? def map := [tight => "foo"] # value: [[1, ***CYCLE***, x] => foo] ? map[loose] # value: foo Internally, tight and loose have very different representations. However, when both cycles are unwound, they represent the same infinite tree. One could say that tight's representation of this tree is more tightly wound than loose's representation. However, this difference is only in the implementation, not in the semantics. The value of tight and loose is only the infinite tree they represent. If these trees are the same, then tight and loose are ==. Notice that loose prints out according to the tightest winding of the tree it represents, not according to the cycle by which it represents this tree. Only the tightest winding is finite and canonical. -- ?!ng From bwarsaw at cnri.reston.va.us Fri Apr 14 01:14:49 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Thu, 13 Apr 2000 19:14:49 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> Message-ID: <14582.21737.387268.332139@anthem.cnri.reston.va.us> JH> Comments and suggestions solicitied. I think this is worth JH> checking in. Please regenerate with unified or context diffs! -Barry From jeremy at cnri.reston.va.us Fri Apr 14 01:19:30 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 19:19:30 -0400 (EDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: References: Message-ID: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Looks like the proposed changed to PyObject_Compare matches E for your example. The printed representation doesn't match, but I'm not sure that is as important. >>> tight = [1, None, "x"] >>> tight[1] = tight >>> tight [1, [...], 'x'] >>> loose = [1, [1, None, "x"], "x"] >>> loose[1][1] = loose >>> loose [1, [1, [...], 'x'], 'x'] >>> tight [1, [...], 'x'] >>> tight == loose 1 Jeremy From jeremy at cnri.reston.va.us Fri Apr 14 01:30:02 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 19:30:02 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.21737.387268.332139@anthem.cnri.reston.va.us> References: <14580.48029.512656.911718@goon.cnri.reston.va.us> <000701bfa505$31008380$4d2d153f@tim> <14582.5791.148277.87450@walden> <14582.14222.865019.806313@bitdiddle.cnri.reston.va.us> <14582.17647.662905.959786@bitdiddle.cnri.reston.va.us> <14582.21737.387268.332139@anthem.cnri.reston.va.us> Message-ID: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> Here it is contextified. One small difference from the previous patch is that NESTING_LIMIT is now only 1000. I think this is sufficient to cover commonly occuring nested containers. Jeremy Index: Include/object.h =================================================================== RCS file: /projects/cvsroot/python/dist/src/Include/object.h,v retrieving revision 2.52 diff -c -r2.52 object.h *** object.h 2000/03/21 16:14:47 2.52 --- object.h 2000/04/13 21:50:10 *************** *** 284,289 **** --- 284,292 ---- extern DL_IMPORT(int) Py_ReprEnter Py_PROTO((PyObject *)); extern DL_IMPORT(void) Py_ReprLeave Py_PROTO((PyObject *)); + /* tstate dict key for PyObject_Compare helper */ + extern PyObject *_PyCompareState_Key; + /* Flag bits for printing: */ #define Py_PRINT_RAW 1 /* No string quotes etc. */ Index: Python/pythonrun.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Python/pythonrun.c,v retrieving revision 2.91 diff -c -r2.91 pythonrun.c *** pythonrun.c 2000/03/10 23:03:54 2.91 --- pythonrun.c 2000/04/13 21:50:25 *************** *** 149,154 **** --- 149,156 ---- /* Init Unicode implementation; relies on the codec registry */ _PyUnicode_Init(); + _PyCompareState_Key = PyString_InternFromString("cmp_state"); + bimod = _PyBuiltin_Init_1(); if (bimod == NULL) Py_FatalError("Py_Initialize: can't initialize __builtin__"); Index: Objects/object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -c -r2.67 object.c *** object.c 2000/04/10 13:42:33 2.67 --- object.c 2000/04/13 21:44:42 *************** *** 298,308 **** --- 298,316 ---- return PyInt_FromLong(c); } + PyObject *_PyCompareState_Key; + + int _PyCompareState_nesting = 0; + int _PyCompareState_flag = 0; + #define NESTING_LIMIT 1000 + int PyObject_Compare(v, w) PyObject *v, *w; { PyTypeObject *vtp, *wtp; + int result; + if (v == NULL || w == NULL) { PyErr_BadInternalCall(); return -1; *************** *** 369,377 **** /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); } ! if (vtp->tp_compare == NULL) return (v < w) ? -1 : 1; ! return (*vtp->tp_compare)(v, w); } long --- 377,443 ---- /* Numerical types compare smaller than all other types */ return strcmp(vname, wname); } ! if (vtp->tp_compare == NULL) { return (v < w) ? -1 : 1; ! } ! ++_PyCompareState_nesting; ! if (_PyCompareState_nesting > NESTING_LIMIT) ! _PyCompareState_flag = 1; ! if (_PyCompareState_flag && ! (vtp->tp_as_mapping || (vtp->tp_as_sequence && ! !PyString_Check(v)))) ! { ! PyObject *tstate_dict, *cmp_dict, *pair; ! ! tstate_dict = PyThreadState_GetDict(); ! if (tstate_dict == NULL) { ! PyErr_BadInternalCall(); ! return -1; ! } ! cmp_dict = PyDict_GetItem(tstate_dict, _PyCompareState_Key); ! if (cmp_dict == NULL) { ! cmp_dict = PyDict_New(); ! if (cmp_dict == NULL) ! return -1; ! PyDict_SetItem(tstate_dict, ! _PyCompareState_Key, ! cmp_dict); ! } ! ! pair = PyTuple_New(2); ! if (pair == NULL) { ! return -1; ! } ! if ((long)v <= (long)w) { ! PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); ! PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); ! } else { ! PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); ! PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); ! } ! if (PyDict_GetItem(cmp_dict, pair)) { ! /* already comparing these objects. assume ! they're equal until shown otherwise ! */ ! Py_DECREF(pair); ! --_PyCompareState_nesting; ! if (_PyCompareState_nesting == 0) ! _PyCompareState_flag = 0; ! return 0; ! } ! if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { ! return -1; ! } ! result = (*vtp->tp_compare)(v, w); ! PyDict_DelItem(cmp_dict, pair); ! Py_DECREF(pair); ! } else { ! result = (*vtp->tp_compare)(v, w); ! } ! --_PyCompareState_nesting; ! if (_PyCompareState_nesting == 0) ! _PyCompareState_flag = 0; ! return result; } long From ping at lfw.org Fri Apr 14 04:09:49 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Thu, 13 Apr 2000 21:09:49 -0500 (CDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > Looks like the proposed changed to PyObject_Compare matches E for your > example. The printed representation doesn't match, but I'm not sure > that is as important. Very, very cool. Well done. Say, when did printing get fixed? > >>> tight = [1, None, "x"] > >>> tight[1] = tight > >>> tight > [1, [...], 'x'] -- ?!ng From jeremy at cnri.reston.va.us Fri Apr 14 04:14:11 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 13 Apr 2000 22:14:11 -0400 (EDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: References: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: <14582.32499.38092.53395@bitdiddle.cnri.reston.va.us> >>>>> "KPY" == Ka-Ping Yee writes: KPY> On Thu, 13 Apr 2000, Jeremy Hylton wrote: >> Looks like the proposed changed to PyObject_Compare matches E for >> your example. The printed representation doesn't match, but I'm >> not sure that is as important. KPY> Very, very cool. Well done. Say, when did printing get fixed? Looks like the repr checkin was pre-1.5.1. I glanced at the sameness code in E, and it looks like it is doing exactly the same thing. It keeps a mapping of comparisons seen sofar and returns true for them. It seems that E's types don't define their own methods for sameness, though. The same methods seem to understand the internals of the various E types. Or is it just a few special ones. Jeremy From tim_one at email.msn.com Fri Apr 14 04:32:48 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 22:32:48 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256468519-52554453@hypernet.com> Message-ID: <000a01bfa5b9$b99a6760$182d153f@tim> [Gordon McMillan] > ... > Or are you saying that if functions have attributes, people will > all of a sudden expect that function locals will have initialized > and maintained state? I expect that they'll expect exactly what happens in JavaScript, which supports function attributes too, and where it's often used as a nicer-than-globals way to get the effect of C-like mutable statics (conceptually) local to the function. BTW, viewing this all in OO terms would make compelling sense only if Guido viewed everything in OO terms -- but he doesn't. To the extent that people must , Python doesn't stop you from adding arbitrary unique attrs to class instances today either. consistent-in-inconsistency-ly y'rs - tim From tim_one at email.msn.com Fri Apr 14 04:32:44 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 13 Apr 2000 22:32:44 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.5791.148277.87450@walden> Message-ID: <000901bfa5b9$b7f6c980$182d153f@tim> [Jeremy Hylton] > I'm not familiar with any algorithms for the graph isomorphism > problem, Well, while an instance of graph isomorphism, this one is a relatively simple special case (because "the graphs" here are rooted, directed, and have ordered children). > but I took a stab at a simple comparison algorithm. The idea > is to detect comparisons that would cross back-edges in the object > graphs. Instead of starting a new comparison, assume they are the > same. If, in fact, the objects are not the same, they must differ in > some other way; some other part of the comparison will fail. Bingo! That's the key trick. From effbot at telia.com Fri Apr 14 06:58:50 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 06:58:50 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000a01bfa5b9$b99a6760$182d153f@tim> Message-ID: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Tim Peters wrote: > [Gordon McMillan] > > ... > > Or are you saying that if functions have attributes, people will > > all of a sudden expect that function locals will have initialized > > and maintained state? > > I expect that they'll expect exactly what happens in JavaScript, which > supports function attributes too, and where it's often used as a > nicer-than-globals way to get the effect of C-like mutable statics > (conceptually) local to the function. so it's no longer an experimental feature, it's a "static variables" thing? umm. I had nearly changed my mind to a "okay, if you insist +1", but now it's back to -1 again. maybe in Py3K... From bwarsaw at cnri.reston.va.us Fri Apr 14 07:23:40 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 01:23:40 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000a01bfa5b9$b99a6760$182d153f@tim> <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <14582.43868.600655.132428@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> so it's no longer an experimental feature, it's a "static FL> variables" thing? FL> umm. I had nearly changed my mind to a "okay, if you insist FL> +1", but now it's back to -1 again. maybe in Py3K... C'mon! Most people are still going to just use module globals for function statics because they're less to type (notwithstanding the sometimes-optional global decl). You can't worry about all the novel abuses people will think up for this feature -- they're already doing it with all sorts of other things Pythonic, e.g. docstrings, global as pragma, etc. Can I get at least a +0? :) -Barry From tim_one at email.msn.com Fri Apr 14 09:34:46 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 14 Apr 2000 03:34:46 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <000401bfa5e3$e8c5ce60$612d153f@tim> [Tim] >> I expect that they'll expect exactly what happens in JavaScript, which >> supports function attributes too, and where it's often used as a >> nicer-than-globals way to get the effect of C-like mutable statics >> (conceptually) local to the function. [/F] > so it's no longer an experimental feature, it's a "static variables" > thing? Yes, of course people will use it to get the effect of function statics. OK by me. People do the same thing today with class data attributes (i.e., to get the effect of mutable statics w/o polluting the module namespace). They'll use it for all sorts of other stuff too -- it's mechanism, not policy. BTW, I don't think any "experimental feature" has ever been removed -- only features that weren't experimental. So if you want to see it go away ... > umm. I had nearly changed my mind to a "okay, if you insist +1", > but now it's back to -1 again. maybe in Py3K... Greg gave the voting rule as: > -1 "Veto. And is my reasoning." Vladimir has done some reasoning, but the basis of your objection remains a mystery. We should be encouraging our youth to submit patches with their crazy ideas . From gansevle at cs.utwente.nl Fri Apr 14 09:46:08 2000 From: gansevle at cs.utwente.nl (Fred Gansevles) Date: Fri, 14 Apr 2000 09:46:08 +0200 Subject: [Python-Dev] cvs-server out of sync with mailing-list ? Message-ID: <200004140746.JAA05473@localhost.localdomain> I try to keep up-to-date with the cvs-tree at cvs.python.org and receive the python-checkins at python.org mailing-list. Just now I discovered that the cvs-server and the checkins-list are out of sync. For example: according to the checkins-list the latest version of src/Python/sysmodule.c is 2.62 and according to the cvs-server the latest version is 2.59 Am I missing something or is there some kind of a problem ? ____________________________________________________________________________ Fred Gansevles Phone: +31 53 489 4613 >>> Your one-stop-shop for Linux/WinNT/NetWare <<< Org.: Twente University, Fac. of CS, Box 217, 7500 AE Enschede, Netherlands "Bill needs more time to learn Linux" - Steve B. From mal at lemburg.com Fri Apr 14 01:05:12 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 01:05:12 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256476619-52065132@hypernet.com> from "Gordon McMillan" at Apr 13, 2000 01:11:14 PM <1256468519-52554453@hypernet.com> Message-ID: <38F652A8.B2F8C822@lemburg.com> Gordon McMillan wrote: > ... > This is nowhere near as confusing as, say, putting a module > named X in a package named X and then saying "from X > import *", (hi, Marc-Andre!). Users shouldn't bother looking into packages... only at the documented interface ;-) The hack is required to allow sibling submodules to import the packages main module (I could have also written import __init__ everywhere but that wouldn't have made things clearer), BTW. It turned out to be very convenient during development of all those mx packages. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Apr 14 10:46:15 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 10:46:15 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> Message-ID: <38F6DAD7.BBAF72E5@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > Fredrik Lundh wrote: > > > > > > M.-A. Lemburg wrote: > > > > The current need for #pragmas is really very simple: to tell > > > > the compiler which encoding to assume for the characters > > > > in u"...strings..." (*not* "...8-bit strings..."). > > > > > > why not? > > > > Because plain old 8-bit strings should work just as before, > > that is, existing scripts only using 8-bit strings should not break. > > but they won't -- if you don't use an encoding directive, and > don't use 8-bit characters in your string literals, everything > works as before. > > (that's why the default is "none" and not "utf-8") > > if you use 8-bit characters in your source code and wish to > add an encoding directive, you need to add the right encoding > directive... Fair enough, but this would render all the auto-coercion code currently in 1.6 useless -- all string to Unicode conversions would have to raise an exception. > > > why keep on pretending that strings and strings are two > > > different things? it's an artificial distinction, and it only > > > causes problems all over the place. > > > > Sure. The point is that we can't just drop the old 8-bit > > strings... not until Py3K at least (and as Fred already > > said, all standard editors will have native Unicode support > > by then). > > I discussed that in my original "all characters are unicode > characters" proposal. in my proposal, the standard string > type will have to roles: a string either contains unicode > characters, or binary bytes. > > -- if it contains unicode characters, python guarantees that > methods like strip, lower (etc), and regular expressions work > as expected. > > -- if it contains binary data, you can still use indexing, slicing, > find, split, etc. but they then work on bytes, not on chars. > > it's still up to the programmer to keep track of what a certain > string object is (a real string, a chunk of binary data, an en- > coded string, a jpeg image, etc). if the programmer wants > to convert between a unicode string and an external encoding > to use a certain unicode encoding, she needs to spell it out. > the codecs are never called "under the hood". > > (note that if you encode a unicode string into some other > encoding, the result is binary buffer. operations like strip, > lower et al does *not* work on encoded strings). Huh ? If the programmer already knows that a certain string uses a certain encoding, then he can just as well convert it to Unicode by hand using the right encoding name. The whole point we are talking about here is that when having the implementation convert a string to Unicode all by itself it needs to know which encoding to use. This is where we have decided long ago that UTF-8 should be used. The pragma discussion is about a totally different issue: pragmas could make it possible for the programmer to tell the *compiler* which encoding to use for literal u"unicode" strings -- nothing more. Since "8-bit" strings currently don't have an encoding attached to them we store them as-is. I don't want to get into designing a completely new character container type here... this can all be done for Py3K, but not now -- it breaks things at too many ends (even though it would solve the issues with strings being used in different contexts). > > > -- we still need an encoding marker for ascii supersets (how about > > > ;-). however, it's up to > > > the tokenizer to detect that one, not the parser. the parser only > > > sees unicode strings. > > > > Hmm, the tokenizer doesn't do any string -> object conversion. > > That's a task done by the parser. > > "unicode string" meant Py_UNICODE*, not PyUnicodeObject. > > if the tokenizer does the actual conversion doesn't really matter; > the point is that once the code has passed through the tokenizer, > it's unicode. The tokenizer would have to know which parts of the input string to convert to Unicode and which not... plus there are different encodings to be applied, e.g. UTF-8, Unicode-Escape, Raw-Unicode-Escape, etc. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Apr 14 10:24:30 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 10:24:30 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> <14581.63094.538920.187344@seahag.cnri.reston.va.us> Message-ID: <38F6D5BE.924F4D62@lemburg.com> "Fred L. Drake, Jr." wrote: > > M.-A. Lemburg writes: > > Hmm, anything else would introduce a new keyword, I guess. And > > new keywords cause new scripts to fail in old interpreters > > even when they don't use Unicode at all and only include > > per convention. > > Only if the new keyword is used in the script or anything it > imports. This is exactly like using new syntax (u'...') or new > library features (unicode('abc', 'iso-8859-1')). Right, but I would guess that people would then start using these keywords in all files per convention (so as not to trip over bugs due to wrong encodings). Perhaps I'm overcautious here... > I can't think of anything that gets included "by convention" that > breaks anything. I don't recall a proposal that we should casually > add pragmas to our scripts if there's no need to do so. Adding > pragmas to library modules is *not* part of the issue; they'd only be > there if the version of Python they're part of supports the syntax. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From paul at prescod.net Fri Apr 14 11:12:08 2000 From: paul at prescod.net (Paul Prescod) Date: Fri, 14 Apr 2000 04:12:08 -0500 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000a01bfa5b9$b99a6760$182d153f@tim> <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: <38F6E0E8.E336F6C6@prescod.net> Fredrik Lundh wrote: > > so it's no longer an experimental feature, it's a "static variables" > thing? > > umm. I had nearly changed my mind to a "okay, if you insist +1", > but now it's back to -1 again. maybe in Py3K... I think that we get 95% of the benefit without any of the "dangers" (though I don't agree with the arguments against) if we allow the attachment of properties only at compile time and disallow mutation of them at runtime. That will allow Spark, EventDOM, multi-lingual docstrings etc., but disallow static variables. I'm not agreeing that using function properties as static variables is a bad thing...I'm just saying that we might be able to agree on a less powerful mechanism and then revisit the more general one in Py3K. Let's not forget that Py3K is going to be a very hard exercise in trying to combine everyone's ideas "all at once". Experience gained now is golden. We should probably be more amenable to "experimental ideas" now -- secure in the knowledge that they can be killed off in Py3K. If we put ideas we are not 100% comfortable with in Py3K we will be stuck with them forever. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From mhammond at skippinet.com.au Fri Apr 14 15:01:33 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 14 Apr 2000 23:01:33 +1000 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: > Can I get at least a +0? :) Im quite amazed this is contentious! Definately a +1 from me! Mark. From skip at mojam.com Fri Apr 14 15:05:44 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 08:05:44 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.6056.362378.834649@beluga.mojam.com> Mark> Im quite amazed this is contentious! Definately a +1 from me! +1 from the skippi in Chicago as well... Skip From mhammond at skippinet.com.au Fri Apr 14 15:11:39 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri, 14 Apr 2000 23:11:39 +1000 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <38F6E0E8.E336F6C6@prescod.net> Message-ID: > I think that we get 95% of the benefit without any of the > "dangers" > (though I don't agree with the arguments against) if we allow the > attachment of properties only at compile time and > disallow mutation of > them at runtime. AFAIK, this would be a pretty serious change. The compiler just generates (basically)PyObject_SetAttr() calls. There is no way in the current runtime to differentiate between "compile time" and "runtime" attribute references... If this was done, it would simply be ugly hacks to support what can only be described as unpythonic in the first place! [Unless of course Im missing something...] Mark. From fredrik at pythonware.com Fri Apr 14 15:34:48 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 15:34:48 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: Message-ID: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Barry Wrote: > > Can I get at least a +0? :) okay, I'll retract. here's today's opinion: +1 on an experimental future, which is not part of the language definition, and not necessarily supported by all implementations. (and where supported, not necessarily very efficient). -1 on static function variables implemented as attributes on function or method objects. def eff(): "eff" print "eff", eff.__doc__ def bot(): "bot" print "bot", bot.__doc__ eff() bot() eff, bot = bot, eff eff() bot() # or did your latest patch solve this little dilemma? # if so, -1 on your patch ;-) From fdrake at acm.org Fri Apr 14 15:46:11 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:46:11 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38F6D5BE.924F4D62@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <14581.52477.70286.774494@beluga.mojam.com> <38F5F09D.53E323EF@lemburg.com> <14581.63094.538920.187344@seahag.cnri.reston.va.us> <38F6D5BE.924F4D62@lemburg.com> Message-ID: <14583.8483.628361.523059@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > Right, but I would guess that people would then start using these > keywords in all files per convention (so as not to trip over > bugs due to wrong encodings). I don't imagine the new keywords would be used by anyone that wasn't specifically interested in their effect. Code that isn't needed tends not to get written! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Fri Apr 14 15:55:36 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:55:36 -0400 (EDT) Subject: [Python-Dev] cvs-server out of sync with mailing-list ? In-Reply-To: <200004140746.JAA05473@localhost.localdomain> References: <200004140746.JAA05473@localhost.localdomain> Message-ID: <14583.9048.857826.186107@seahag.cnri.reston.va.us> Fred Gansevles writes: > Just now I discovered that the cvs-server and the checkins-list are out of > sync. For example: according to the checkins-list the latest version of > src/Python/sysmodule.c is 2.62 and according to the cvs-server the latest > version is 2.59 > > Am I missing something or is there some kind of a problem ? There's a problem, but it's highly isolated. We're updating the public CVS using rsync tunnelled through ssh, which worked greate until some of us switched to Linux workstations, where OpenSSH behaves a little differently with some private keys files. I've not figured out how to work around it yet, but will keep playing with it. I've synced the public CVS from a Solaris box for now, so all the recent changes should be visible. Until I get things fixed, I'll try to remember to sync it before I head home in the evenings. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From fdrake at acm.org Fri Apr 14 15:57:34 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 09:57:34 -0400 (EDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: distutils/distutils unixccompiler.py,1.21,1.22 In-Reply-To: <200004141353.JAA04309@thrak.cnri.reston.va.us> References: <200004141353.JAA04309@thrak.cnri.reston.va.us> Message-ID: <14583.9166.166905.476276@seahag.cnri.reston.va.us> Greg Ward writes: > ! # Not many Unices required ranlib anymore -- SunOS 4.x is, I > ! # think the only major Unix that does. Maybe we need some You're saying that SunOS 4.x *is* a major Unix???? Not for a while, now.... -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From akuchlin at mems-exchange.org Fri Apr 14 16:15:37 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 14 Apr 2000 10:15:37 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000401bfa5e3$e8c5ce60$612d153f@tim> References: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> <000401bfa5e3$e8c5ce60$612d153f@tim> Message-ID: <14583.10249.322298.959083@amarok.cnri.reston.va.us> >Yes, of course people will use it to get the effect of function statics. OK >by me. People do the same thing today with class data attributes (i.e., to Wait, the attributes added to a function are visible inside the function? (I haven't looked that closely at the patch?) That strikes me as a much more significant change to Python's scoping, making it local, function attribute, then global scope. a I thought of the attributes as labels that could be attached to a callable object for the convenience of some external system, but the function would remain blissfully unaware of the external meaning attached to itself. -1 from me if a function's attributes are visible to code inside the function; +0 if they're not. -- A.M. Kuchling http://starship.python.net/crew/amk/ The paradox of money is that when you have lots of it you can manage life quite cheaply. Nothing so economical as being rich. -- Robertson Davies, _The Rebel Angels_ From skip at mojam.com Fri Apr 14 16:39:27 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 09:39:27 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.10249.322298.959083@amarok.cnri.reston.va.us> References: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> <000401bfa5e3$e8c5ce60$612d153f@tim> <14583.10249.322298.959083@amarok.cnri.reston.va.us> Message-ID: <14583.11679.107267.727484@beluga.mojam.com> >> Yes, of course people will use it to get the effect of function >> statics. OK by me. People do the same thing today with class data >> attributes (i.e., to AMK> Wait, the attributes added to a function are visible inside the AMK> function? (I haven't looked that closely at the patch?) No, they aren't. There is no change of Python's scoping rules using Barry's function attributes patch. In fact, they are *only* available to the function itself via the function's name in the module globals. That's why Fredrik's "eff, bot = bot, eff" trick worked as it did. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 16:41:39 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 16:41:39 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: from "Mark Hammond" at Apr 14, 2000 11:01:33 PM Message-ID: <200004141441.QAA02162@python.inrialpes.fr> Mark Hammond wrote: > > > Can I get at least a +0? :) > > Im quite amazed this is contentious! Definately a +1 from me! > > Mark. > Amazed or not, it is contentious. I have the responsability to remove my veto once my concerns are adressed. So far, I have the impression that all I get (if I get anything at all -- see above) is "conveniency" from Gordon, which is nothing else but laziness about creating instances. As long as we discuss customization of objects with builtin types, the "inconsistency" stays bound to classes and instances. Add modules if you wish, but they are just namespaces. This proposal expands the customization inconsistency to functions and methods. And I am reluctant to see this happening "under the hood", without a global vision of the problem, just because a couple of people have abused unprotected attributes and claim that they can't do what they want because Python doesn't let them to. As to the object model, together with naming and binding, I say: KISS or do it right the first time. add-more-oil-to-the-fire-and-you'll-burn-your-house--ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip at mojam.com Fri Apr 14 17:04:51 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 10:04:51 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141441.QAA02162@python.inrialpes.fr> References: <200004141441.QAA02162@python.inrialpes.fr> Message-ID: <14583.13203.900930.294033@beluga.mojam.com> Vladimir> So far, I have the impression that all I get (if I get Vladimir> anything at all -- see above) is "conveniency" from Gordon, Vladimir> which is nothing else but laziness about creating instances. No, you get function metadata. Barry's original reason for creating the patch was that the only writable attribute for functions or methods is the doc string. Multiple people are using it now to mean different things, and this leads to problems when those different uses clash. I submit that if I have to wrap methods (not functions) in classes and instantiate them to avoid being "lazy", then my code is going to look pretty horrible after applying this more than once or twice. Both Zope and John Aycock's system (SPARK?) demonstrate the usefulness of being able to attach metadata to functions and methods. All Barry is suggesting is that Python support that capability better. Finally, it's not clear to my feeble brain just how I would go about instantiating a method to get this capability today. Suppose I have class Spam: def eggs(self, a): return a and I want to attach an attribute to Spam.eggs that tells me if it is public/private in the Zope sense. Zope requires you to add a doc string to a method to declare that it's public: class Spam: def eggs(self, a): "doc" return a Fine, except that effectively prevents you from adding doc strings to your "private" methods as Greg Stein pointed out. Barry's proposal would allow the Spam.eggs author to attach an attribute to it: class Spam: def eggs(self, a): "doc" return a eggs.__zope_access__ = "private" I think the solution you're proposing is class Spam: class EggsMethod: def __call__(self, a): "doc" return a __zope_access__ = "private" eggs = EggsMethod() This seems to work, but also seems like a lot of extra baggage (and a performance hit to boot) to arrive at what seems like a very simple concept. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 17:30:31 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 17:30:31 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.13203.900930.294033@beluga.mojam.com> from "Skip Montanaro" at Apr 14, 2000 10:04:51 AM Message-ID: <200004141530.RAA02277@python.inrialpes.fr> Skip Montanaro wrote: > > Barry's proposal would allow the Spam.eggs author to attach an attribute to > it: > > class Spam: > def eggs(self, a): > "doc" > return a > eggs.__zope_access__ = "private" > > I think the solution you're proposing is > > class Spam: > class EggsMethod: > def __call__(self, a): > "doc" > return a > __zope_access__ = "private" > eggs = EggsMethod() > > This seems to work, but also seems like a lot of extra baggage (and a > performance hit to boot) to arrive at what seems like a very simple concept. > If you prefer embedded definitions, among other things, you could do: __zope_access__ = { 'Spam' : 'public' } class Spam: __zope_access__ = { 'eggs' : 'private', 'eats' : 'public' } def eggs(self, ...): ... def eats(self, ...): ... or have a completely separate class/structure for access control (which is what you would do it in C, btw, for existing objects to which you can't add slots, ex: file descriptors, mem segments, etc). -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From bwarsaw at cnri.reston.va.us Fri Apr 14 17:52:17 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 11:52:17 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <38F6E0E8.E336F6C6@prescod.net> Message-ID: <14583.16049.933693.237302@anthem.cnri.reston.va.us> >>>>> "MH" == Mark Hammond writes: MH> AFAIK, this would be a pretty serious change. The compiler MH> just generates (basically)PyObject_SetAttr() calls. There is MH> no way in the current runtime to differentiate between MH> "compile time" and "runtime" attribute references... If this MH> was done, it would simply be ugly hacks to support what can MH> only be described as unpythonic in the first place! MH> [Unless of course Im missing something...] You're not missing anything Mark! Remember Python's /other/ motto: "we're all consenting adults here". If you don't wanna mutate your function attrs at runtime... just don't! :) -Barry From bwarsaw at cnri.reston.va.us Fri Apr 14 17:59:55 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 14 Apr 2000 11:59:55 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Message-ID: <14583.16507.268456.950881@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> # or did your latest patch solve this little dilemma? No, definitely not. >>>>> "AMK" == Andrew M Kuchling writes: AMK> Wait, the attributes added to a function are visible inside AMK> the function? My patch definitely does not change Python's scoping rules in any way. This was a 1/2 hour hack, for Guido's sake! :) -Barry From tim_one at email.msn.com Fri Apr 14 18:04:32 2000 From: tim_one at email.msn.com (Tim Peters) Date: Fri, 14 Apr 2000 12:04:32 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.10249.322298.959083@amarok.cnri.reston.va.us> Message-ID: <000501bfa62b$1ef8f560$d82d153f@tim> [Tim] > Yes, of course people will use it to get the effect of function > statics. OK by me. People do the same thing today with class data > attributes (i.e., to [Andrew M. Kuchling] > Wait, the attributes added to a function are visible inside the > function? No, same as in JavaScript, you need funcname.attr, just as you need classname.attr in Python today to fake the effect of mutable class statics (in the C++ sense). > [hysteria deleted ] > ... > +0 if they're not. From paul at prescod.net Fri Apr 14 18:21:31 2000 From: paul at prescod.net (Paul Prescod) Date: Fri, 14 Apr 2000 11:21:31 -0500 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: Message-ID: <38F7458B.17F72652@prescod.net> Mark Hammond wrote: > > AFAIK, this would be a pretty serious change. The compiler just > generates (basically)PyObject_SetAttr() calls. I posted a proposal a few days back that does not use the "." SetAttr syntax and is clearly distinguisable (visually and by the compiler) from runtime property assignment. http://www.python.org/pipermail/python-dev/2000-April/004875.html The response was light but positive... -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Ivory towers are no longer in order. We need ivory networks. Today, sitting quietly and thinking is the world's greatest generator of wealth and prosperity." - http://www.bespoke.org/viridian/print.asp?t=140 From skip at mojam.com Fri Apr 14 18:29:34 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 11:29:34 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <38F7458B.17F72652@prescod.net> References: <38F7458B.17F72652@prescod.net> Message-ID: <14583.18286.67371.157754@beluga.mojam.com> Paul> I posted a proposal a few days back that does not use the "." Paul> SetAttr syntax and is clearly distinguisable (visually and by the Paul> compiler) from runtime property assignment. Paul> http://www.python.org/pipermail/python-dev/2000-April/004875.html Paul> The response was light but positive... Paul, I have a question. Given the following example from your note: decl {type:"def(myint: int) returns bar", french_doc:"Bonjour", english_doc: "Hello"} def func( myint ): return bar() how is the compiler supposed to associate a particular "decl {...}" with a particular function? Is it just by order in the file? -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From gmcm at hypernet.com Fri Apr 14 18:32:42 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 12:32:42 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141441.QAA02162@python.inrialpes.fr> References: from "Mark Hammond" at Apr 14, 2000 11:01:33 PM Message-ID: <1256392532-57122808@hypernet.com> Vladimir Marangozov wrote: > Amazed or not, it is contentious. I have the responsability to > remove my veto once my concerns are adressed. So far, I have the > impression that all I get (if I get anything at all -- see above) > is "conveniency" from Gordon, which is nothing else but laziness > about creating instances. I have the impression that majority of changes to Python are conveniences. > As long as we discuss customization of objects with builtin types, > the "inconsistency" stays bound to classes and instances. Add modules > if you wish, but they are just namespaces. This proposal expands > the customization inconsistency to functions and methods. And I am > reluctant to see this happening "under the hood", without a global > vision of the problem, just because a couple of people have abused > unprotected attributes and claim that they can't do what they want > because Python doesn't let them to. Can you please explain how "consistency" is violated? - Gordon From gmcm at hypernet.com Fri Apr 14 18:32:42 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 12:32:42 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <003701bfa616$34c79690$0500a8c0@secret.pythonware.com> Message-ID: <1256392531-57122875@hypernet.com> Fredrik Lundh wrote: > -1 on static function variables implemented as > attributes on function or method objects. > > def eff(): > "eff" > print "eff", eff.__doc__ > > def bot(): > "bot" > print "bot", bot.__doc__ > > eff() > bot() > > eff, bot = bot, eff > > eff() > bot() > > # or did your latest patch solve this little dilemma? > # if so, -1 on your patch ;-) To belabor the obvious (existing Python allows obsfuction), I present: class eff: "eff" def __call__(self): print "eff", eff.__doc__ class bot: "bot" def __call__(self): print "bot", bot.__doc__ e = eff() b = bot() e() b() eff, bot = bot, eff e = eff() b = bot() e() b() There's nothing new here. Why does allowing the ability to obsfucate suddenly warrant a -1? - Gordon From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 19:15:09 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 19:15:09 +0200 (CEST) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> from "Jeremy Hylton" at Apr 13, 2000 07:30:02 PM Message-ID: <200004141715.TAA02492@python.inrialpes.fr> Jeremy Hylton wrote: > > Here it is contextified. One small difference from the previous patch > is that NESTING_LIMIT is now only 1000. I think this is sufficient to > cover commonly occuring nested containers. > > Jeremy > > [patch omitted] Nice. I think you don't need the _PyCompareState_flag. Like in trashcan, _PyCompareState_nesting is enough to enter the sections of the code that depend on _PyCompareState_flag. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From moshez at math.huji.ac.il Fri Apr 14 19:46:12 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 14 Apr 2000 19:46:12 +0200 (IST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004131810.UAA05752@python.inrialpes.fr> Message-ID: On Thu, 13 Apr 2000, Vladimir Marangozov wrote: > >>> def this(): > ... sucks = "no" > ... > >>> this.sucks = "yes" > >>> > >>> print this.sucks > 'yes' > > Why on earth 'sucks' is not the object defined in the function's namespace? > Who made that deliberate decision? Clearly 'this' defines a new namespace, > so it'll be also legitimate to get a NameError, or to: > > >>> print this.sucks > 'no' > > Don't you think? No. >>> def this(turing_machine): ... if stops(turing_machine): ... confusing = "yes" ... else: ... confusing = "no" ... >>> print this.confusing -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From klm at digicool.com Fri Apr 14 20:19:42 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 14 Apr 2000 14:19:42 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > Can I get at least a +0? :) I want function attributes. (There are all sorts of occasions i need cues to classify functions for executives that map and apply them, and this seems like the perfect way to couple that information with the object. Much nicer than having to mangle the names of the functions, or create some external registry with the classifications.) And i think i'd want them even more if they were visible within the function, so i could do static variables. Why is that a bad thing? So i guess that means i'd give a +1 for the proposal as stands, with the understanding that you'd get *another* +1 for the additional feature - yielding a bigger, BETTER +1. Metadata, static vars, frameworks ... oh my!-) (Oh, and i'd suggest up front that documentation for this feature recommend people not use "__*__" names for their own object attributes, to avoid collisions with eventual use of them by python.) Ken klm at digicool.com From bwarsaw at cnri.reston.va.us Fri Apr 14 20:21:11 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Fri, 14 Apr 2000 14:21:11 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.24983.768952.870567@anthem.cnri.reston.va.us> >>>>> "KM" == Ken Manheimer writes: KM> (Oh, and i'd suggest up front that documentation for this KM> feature recommend people not use "__*__" names for their own KM> object attributes, to avoid collisions with eventual use of KM> them by python.) Agreed. From fdrake at acm.org Fri Apr 14 20:25:46 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 14:25:46 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14582.43868.600655.132428@anthem.cnri.reston.va.us> Message-ID: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Ken Manheimer writes: > (Oh, and i'd suggest up front that documentation for this feature > recommend people not use "__*__" names for their own object attributes, to > avoid collisions with eventual use of them by python.) Isn't that a standing recommendation for all names? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Fri Apr 14 20:29:43 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 20:29:43 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256392531-57122875@hypernet.com> Message-ID: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> > To belabor the obvious (existing Python allows obsfuction), I > present: > > class eff: > "eff" > def __call__(self): > print "eff", eff.__doc__ > > class bot: > "bot" > def __call__(self): > print "bot", bot.__doc__ > > e = eff() > b = bot() > e() > b() > > eff, bot = bot, eff > e = eff() > b = bot() > e() > b() > > There's nothing new here. Why does allowing the ability to > obsfucate suddenly warrant a -1? since when did Python grow full lexical scoping? does anyone that has learned about the LGB rule expect the above to work? in contrast, my example used a name which appears to be defined in the same scope as the other names introduced on the same line of source code -- but isn't. def foo(x): foo.x = x here, "foo" doesn't refer to the same namespace as the argument "x", but to instead whatever happens to be in an entirely different namespace at the time the function is executed. in other words, this feature cannot really be used to store statics -- it only looks that way... From effbot at telia.com Fri Apr 14 20:32:26 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 20:32:26 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes onfuncs and methods) References: Message-ID: <001901bfa63f$ca0c08c0$34aab5d4@hagrid> Ken Manheimer wrote: > I want function attributes. (There are all sorts of occasions i need cues > to classify functions for executives that map and apply them, and this > seems like the perfect way to couple that information with the > object. Much nicer than having to mangle the names of the functions, or > create some external registry with the classifications.) how do you expect to find all methods that has a given attribute? > And i think i'd want them even more if they were visible within the > function, so i could do static variables. Why is that a bad thing? because it doesn't work, unless you change python in a backwards incompatible way. that's okay in py3k, it's not okay in 1.6. From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 21:07:15 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 21:07:15 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256392532-57122808@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 12:32:42 PM Message-ID: <200004141907.VAA02670@python.inrialpes.fr> Gordon McMillan wrote: > > [VM] > > As long as we discuss customization of objects with builtin types, > > the "inconsistency" stays bound to classes and instances. Add modules > > if you wish, but they are just namespaces. This proposal expands > > the customization inconsistency to functions and methods. And I am > > reluctant to see this happening "under the hood", without a global > > vision of the problem, just because a couple of people have abused > > unprotected attributes and claim that they can't do what they want > > because Python doesn't let them to. > > Can you please explain how "consistency" is violated? > Yes, I can. To start with and to save me typing, please reread the 1st section of Demo/metaclasses/meta-vladimir.txt about Classes. ------- Now, whenever there are two instances 'a' and 'b' of the class A, the first inconsistency is that we're allowed to assign attributes to these instances dynamically, which are not declared in the class A. Strictly speaking, if I say: >>> a.author = "Guido" and if 'author' is not an attribute of 'a' after the instantiation of A (i.e. after a = A() completes), we should get a NameError. It's an inconsistency because whenever the above assignment succeeds, 'a' is no more an instance of A. It's an instance of some other class, because A prescribes what *all* instances of A have in *common*. So from here, we have to find our way in the object model and live with this 1st inconsistency. Problem: What is the class of the singleton 'a' then? Say, I need this class after the fact to build another society of objects, i.e. "clone" 'a' a hundred of times, because 'a' has dozens of attributes different than 'b'. To make a long story short, it turns out that we can build a Python class A1, having those attributes declared, then instantiate A1 hundreds of times and hopefully, let 'a' find its true identity with: >>> a.__class__ = A1 This is the key of the story. We *can* build, for a given singleton, its Python class, after the fact. And this is the only thing which still makes the Python class model 'relatively consistent'! If it weren't possible to build that class A1, it would have been better to stop talking about classes and a class model in Python. ("associations of typed structures with per-type binding rules" would have probably been a better term). Now to the question: how "consistency" is violated by the proposal? It is violated, because actually we *can't* build and restore the class, after the fact, of a builtin object (a funtion 'f') to which we add user attributes. We can't do it for 2 reasons, which we hope to solve in Py3K: 1) the class of 'f' is implemented in C 2) we still can't inherit from builtin classes (at least in CPython) As a consequence, we can't actually build hundreds of "clones" of 'f' by instantiating a class object. We can build them by adding manually the same attribute, but this is not OO, this is just 'binding to a namespace'. This is the true reason on why this fragile consistency is violated. Please, save me the trouble to expose the details you're missing, to each of you, where those details are omitted for simplicity. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From effbot at telia.com Fri Apr 14 21:17:23 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 21:17:23 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> Message-ID: <005401bfa646$123ef2a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > > but they won't -- if you don't use an encoding directive, and > > don't use 8-bit characters in your string literals, everything > > works as before. > > > > (that's why the default is "none" and not "utf-8") > > > > if you use 8-bit characters in your source code and wish to > > add an encoding directive, you need to add the right encoding > > directive... > > Fair enough, but this would render all the auto-coercion > code currently in 1.6 useless -- all string to Unicode > conversions would have to raise an exception. I though it was rather clear by now that I think the auto- conversion stuff *is* useless... but no, that doesn't mean that all string to unicode conversions need to raise exceptions -- any 8-bit unicode character obviously fits into a 16-bit unicode character, just like any integer fits in a long integer. if you convert the other way, you might get an OverflowError, just like converting from a long integer to an integer may give you an exception if the long integer is too large to be represented as an ordinary integer. after all, i = int(long(v)) doesn't always raise an exception... > > > > why keep on pretending that strings and strings are two > > > > different things? it's an artificial distinction, and it only > > > > causes problems all over the place. > > > > > > Sure. The point is that we can't just drop the old 8-bit > > > strings... not until Py3K at least (and as Fred already > > > said, all standard editors will have native Unicode support > > > by then). > > > > I discussed that in my original "all characters are unicode > > characters" proposal. in my proposal, the standard string > > type will have to roles: a string either contains unicode > > characters, or binary bytes. > > > > -- if it contains unicode characters, python guarantees that > > methods like strip, lower (etc), and regular expressions work > > as expected. > > > > -- if it contains binary data, you can still use indexing, slicing, > > find, split, etc. but they then work on bytes, not on chars. > > > > it's still up to the programmer to keep track of what a certain > > string object is (a real string, a chunk of binary data, an en- > > coded string, a jpeg image, etc). if the programmer wants > > to convert between a unicode string and an external encoding > > to use a certain unicode encoding, she needs to spell it out. > > the codecs are never called "under the hood". > > > > (note that if you encode a unicode string into some other > > encoding, the result is binary buffer. operations like strip, > > lower et al does *not* work on encoded strings). > > Huh ? If the programmer already knows that a certain > string uses a certain encoding, then he can just as well > convert it to Unicode by hand using the right encoding > name. I thought that was what I said, but the text was garbled. let's try again: if the programmer wants to convert between a unicode string and a buffer containing encoded text, she needs to spell it out. the codecs are never called "under the hood" > The whole point we are talking about here is that when > having the implementation convert a string to Unicode all > by itself it needs to know which encoding to use. This is > where we have decided long ago that UTF-8 should be > used. does "long ago" mean that the decision cannot be questioned? what's going on here? face it, I don't want to guess when and how the interpreter will convert strings for me. after all, this is Python, not Perl. if I want to convert from a "string of characters" to a byte buffer using a certain character encoding, let's make that explicit. Python doesn't convert between other data types for me, so why should strings be a special case? > The pragma discussion is about a totally different > issue: pragmas could make it possible for the programmer > to tell the *compiler* which encoding to use for literal > u"unicode" strings -- nothing more. Since "8-bit" strings > currently don't have an encoding attached to them we store > them as-is. what do I have to do to make you read my proposal? shout? okay, I'll try: THERE SHOULD BE JUST ONE INTERNAL CHARACTER SET IN PYTHON 1.6: UNICODE. for consistency, let this be true for both 8-bit and 16-bit strings (as well as Py3K's 31-bit strings ;-). there are many possible external string encodings, just like there are many possible external integer encodings. but for integers, that's not something that the core implementation cares much about. why are strings different? > I don't want to get into designing a completely new > character container type here... this can all be done for Py3K, > but not now -- it breaks things at too many ends (even though > it would solve the issues with strings being used in different > contexts). you don't need to -- you only need to define how the *existing* string type should be used. in my proposal, it can be used in two ways: -- as a string of unicode characters (restricted to the 0-255 subset, by obvious reasons). given a string 's', len(s) is always the number of characters, s[i] is the i'th character, etc. or -- as a buffer containing binary bytes. given a buffer 'b', len(b) is always the number of bytes, b[i] is the i'th byte, etc. this is one flavour less than in the 1.6 alphas -- where strings sometimes contain UTF-8 (and methods like upper etc doesn't work), sometimes an 8-bit character set (and upper works), and sometimes binary buffers (for which upper doesn't work). (hmm. I've said all this before, haven't I?) > > > > -- we still need an encoding marker for ascii supersets (how about > > > > ;-). however, it's up to > > > > the tokenizer to detect that one, not the parser. the parser only > > > > sees unicode strings. > > > > > > Hmm, the tokenizer doesn't do any string -> object conversion. > > > That's a task done by the parser. > > > > "unicode string" meant Py_UNICODE*, not PyUnicodeObject. > > > > if the tokenizer does the actual conversion doesn't really matter; > > the point is that once the code has passed through the tokenizer, > > it's unicode. > > The tokenizer would have to know which parts of the > input string to convert to Unicode and which not... plus there > are different encodings to be applied, e.g. UTF-8, Unicode-Escape, > Raw-Unicode-Escape, etc. sigh. why do you insist on taking a very simple thing and making it very very complicated? will anyone out there ever use an editor that supports different encodings for different parts of the file? why not just assume that the *ENTIRE SOURCE FILE* uses a single encoding, and let the tokenizer (or more likely, a conversion stage before the tokenizer) convert the whole thing to unicode. let the rest of the compiler work on Py_UNICODE* strings only, and all your design headaches will just disappear. ... frankly, I'm beginning to feel like John Skaller. do I have to write my own interpreter to get this done right? :-( From klm at digicool.com Fri Apr 14 21:18:18 2000 From: klm at digicool.com (Ken Manheimer) Date: Fri, 14 Apr 2000 15:18:18 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> Message-ID: > since when did Python grow full lexical scoping? > > does anyone that has learned about the LGB rule expect > the above to work? Not sure what LGB stands for. "Local / Global / Built-in"? > in contrast, my example used a name which appears to be > defined in the same scope as the other names introduced > on the same line of source code -- but isn't. > > def foo(x): > foo.x =3D x > > here, "foo" doesn't refer to the same namespace as the > argument "x", but to instead whatever happens to be in > an entirely different namespace at the time the function > is executed. > > in other words, this feature cannot really be used to store > statics -- it only looks that way... Huh. ?? I'm assuming your hypothetical foo.x means the attribute 'x' of the function 'foo' in the global namespace for the function 'foo' - which, conveniently, is the module where foo is defined! 8<--- foo.py --->8 def foo(): # Return the object named 'foo'. return foo 8<--- end foo.py --->8 8<--- bar.py --->8 from foo import * print foo() 8<--- end bar.py --->8 % python bar.py % I must be misapprehending what you're suggesting - i know you know this stuff better than i do - but it seems to me that foo.x would work, were foo to have an x. (And that foo.x would, in my esteem, be a suboptimal way to get at x from within foo, but that's besides the fact.) Ken klm at digicool.com From jeremy at cnri.reston.va.us Fri Apr 14 21:18:53 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Fri, 14 Apr 2000 15:18:53 -0400 (EDT) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <200004141715.TAA02492@python.inrialpes.fr> References: <14582.22650.792191.474554@bitdiddle.cnri.reston.va.us> <200004141715.TAA02492@python.inrialpes.fr> Message-ID: <14583.28445.105079.446201@bitdiddle.cnri.reston.va.us> >>>>> "VM" == Vladimir Marangozov writes: VM> Jeremy Hylton wrote: >> Here it is contextified. One small difference from the previous >> patch is that NESTING_LIMIT is now only 1000. I think this is >> sufficient to cover commonly occuring nested containers. >> >> Jeremy >> >> [patch omitted] VM> Nice. VM> I think you don't need the _PyCompareState_flag. Like in VM> trashcan, _PyCompareState_nesting is enough to enter the VM> sections of the code that depend on _PyCompareState_flag. Right. Thanks for the suggestion, and thanks to Barry & Fred for theirs. I've checked in the changes. Jeremy From effbot at telia.com Fri Apr 14 21:28:09 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 21:28:09 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: Message-ID: <006201bfa647$92f3c000$34aab5d4@hagrid> Ken Manheimer wrote: > > does anyone that has learned about the LGB rule expect > > the above to work? > > Not sure what LGB stands for. "Local / Global / Built-in"? certain bestselling python books are known to use this acronym... > I'm assuming your hypothetical foo.x means the attribute 'x' of the > function 'foo' in the global namespace for the function 'foo' - which, > conveniently, is the module where foo is defined! did you run the eff() bot() example? > I must be misapprehending what you're suggesting - i know you know this > stuff better than i do - but it seems to me that foo.x would work, were > foo to have an x. sure, it seems to be working. but not for the right reason. > (And that foo.x would, in my esteem, be a suboptimal > way to get at x from within foo, but that's besides the fact.) fwiw, I'd love to see a good syntax for this. might even change my mind... From effbot at telia.com Fri Apr 14 21:32:54 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 21:32:54 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <000401bfa5e3$e8c5ce60$612d153f@tim> Message-ID: <007e01bfa648$3c701de0$34aab5d4@hagrid> TimBot wrote: > Greg gave the voting rule as: > > > -1 "Veto. And is my reasoning." sorry, I must have missed that post, since I've interpreted the whole thing as: if reduce(operator.add, list_of_votes) > 0 and guido_likes_it(): implement(feature) (probably because I've changed the eff-bot script to use 'sre' instead of 're'...) can you repost the full set of rules? From gmcm at hypernet.com Fri Apr 14 21:36:53 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 15:36:53 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <000d01bfa63f$688d3ba0$34aab5d4@hagrid> Message-ID: <1256381481-212228@hypernet.com> Fredrik Lundh wrote: > > To belabor the obvious (existing Python allows obsfuction), I > > present: > > > > class eff: > > "eff" > > def __call__(self): > > print "eff", eff.__doc__ > > > > class bot: > > "bot" > > def __call__(self): > > print "bot", bot.__doc__ > > > > e = eff() > > b = bot() > > e() > > b() > > > > eff, bot = bot, eff > > e = eff() > > b = bot() > > e() > > b() > > > > There's nothing new here. Why does allowing the ability to > > obsfucate suddenly warrant a -1? > > since when did Python grow full lexical scoping? I know that's not Swedish, but I haven't the foggiest what you're getting at. Where did lexical scoping enter? > does anyone that has learned about the LGB rule expect > the above to work? You're the one who did "eff, bot = bot, eff". The only intent I can infer is obsfuction. The above works the same as yours, for whatever your definition of "work". > in contrast, my example used a name which appears to be > defined in the same scope as the other names introduced > on the same line of source code -- but isn't. > > def foo(x): > foo.x = x I guess I'm missing something. -------snip------------ def eff(): "eff" print "eff", eff.__doc__ def bot(): "bot" print "bot", bot.__doc__ eff() bot() eff, bot = bot, eff eff() bot() -----------end----------- I guess we're not talking about the same example. > here, "foo" doesn't refer to the same namespace as the > argument "x", but to instead whatever happens to be in > an entirely different namespace at the time the function > is executed. > > in other words, this feature cannot really be used to store > statics -- it only looks that way... Again, I'm mystified. After "eff, bot = bot, eff", I don't see why 'bot() == "eff bot"' is a wrong result. Put it another way: are you reporting a bug in 1.5.2? If it's a bug, why is my example not a bug? If it's not a bug, why would the existence of other attributes besides __doc__ be a problem? - Gordon From akuchlin at mems-exchange.org Fri Apr 14 21:37:01 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Fri, 14 Apr 2000 15:37:01 -0400 (EDT) Subject: Re[Python-Dev] #pragmas in Python source code In-Reply-To: <005401bfa646$123ef2a0$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> Message-ID: <14583.29533.608524.961284@amarok.cnri.reston.va.us> Fredrik Lundh writes: > if the programmer wants to convert between a unicode > string and a buffer containing encoded text, she needs > to spell it out. the codecs are never called "under the > hood" Watching the successive weekly Unicode patchsets, each one fixing some obscure corner case that turned out to be buggy -- '%s' % ustr, concatenating literals, int()/float()/long(), comparisons -- I'm beginning to agree with Fredrik. Automatically making Unicode strings and regular strings interoperate looks like it requires many changes all over the place, and I worry if it's possible to catch them all in time. Maybe we should consider being more conservative, and just having the Unicode built-in type, the unicode() built-in function, and the u"..." notation, and then leaving all responsibility for conversions up to the user. On the other hand, *some* default conversion seems needed, because it seems draconian to make open(u"abcfile") fail with a TypeError. (While I want to see Python 1.6 expedited, I'd also not like to see it saddled with a system that proves to have been a mistake, or one that's a maintenance burden. If forced to choose between delaying and getting it right, the latter wins.) >why not just assume that the *ENTIRE SOURCE FILE* uses a single >encoding, and let the tokenizer (or more likely, a conversion stage >before the tokenizer) convert the whole thing to unicode. To reinforce Fredrik's point here, note that XML only supports encodings at the level of an entire file (or external entity). You can't tell an XML parser that a file is in UTF-8, except for this one element whose contents are in Latin1. -- A.M. Kuchling http://starship.python.net/crew/amk/ Dream casts a human shadow, when it occurs to him to do so. -- From SANDMAN: "Season of Mists", episode 0 From effbot at telia.com Fri Apr 14 21:53:35 2000 From: effbot at telia.com (Fredrik Lundh) Date: Fri, 14 Apr 2000 21:53:35 +0200 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256381481-212228@hypernet.com> Message-ID: <009401bfa64b$202f6c00$34aab5d4@hagrid> > > > There's nothing new here. Why does allowing the ability to > > > obsfucate suddenly warrant a -1? > > > > since when did Python grow full lexical scoping? > > I know that's not Swedish, but I haven't the foggiest what > you're getting at. Where did lexical scoping enter? > > > does anyone that has learned about the LGB rule expect > > the above to work? > > You're the one who did "eff, bot = bot, eff". The only intent I > can infer is obsfuction. The above works the same as yours, > for whatever your definition of "work". okay, I'll try again: in your example, the __call__ function refers to a name that is defined several levels up. in my example, the "foo" function refers to a name that *looks* like it's in the same scope as the "x" argument (etc), but isn't. for the interpreter, the examples are identical. for the reader, they're not. > Put it another way: are you reporting a bug in 1.5.2? If it's a > bug, why is my example not a bug? If it's not a bug, why > would the existence of other attributes besides __doc__ be a > problem? because people isn't likely to use __doc__ to store static variables? From skip at mojam.com Fri Apr 14 22:03:41 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 15:03:41 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <006201bfa647$92f3c000$34aab5d4@hagrid> References: <006201bfa647$92f3c000$34aab5d4@hagrid> Message-ID: <14583.31133.851143.570161@beluga.mojam.com> >> (And that foo.x would, in my esteem, be a suboptimal way to get at x >> from within foo, but that's besides the fact.) Fredrik> fwiw, I'd love to see a good syntax for this. might even Fredrik> change my mind... Could we overload "_"'s meaning yet again (assuming it doesn't already have a special meaning within functions)? That way def bar(): print _.x def foo(): print _.x foo.x = "public" bar.x = "private" bar, foo = foo, bar foo() would display private on stdout. *Note* - I would not advocate this use be extended to do a more general lookup of attributes - it should just refer to attributes of the function of which the executing code object is an attribute. (It may not even be possible.) (I've never used _ for anything, so I don't know all its current (ab)uses. This is just a thought that occurred to me...) Skip From gmcm at hypernet.com Fri Apr 14 22:18:56 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Fri, 14 Apr 2000 16:18:56 -0400 Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141907.VAA02670@python.inrialpes.fr> References: <1256392532-57122808@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 12:32:42 PM Message-ID: <1256378958-363995@hypernet.com> Vladimir Marangozov wrote: > Gordon McMillan wrote: > > Can you please explain how "consistency" is violated? > > > > Yes, I can. > Strictly speaking, if I say: > > >>> a.author = "Guido" > > and if 'author' is not an attribute of 'a' after the instantiation > of A (i.e. after a = A() completes), we should get a NameError. Ah. I see. Quite simply, you're arguing from First Principles in an area where I have none. I used to, but I found that all systems built from First Principles (Eiffel, Booch's methodology...) yielded 3 headed monsters. It can be entertaining (in the WWF sense). Just trick some poor sucker into saying "class method" in the C++ sense and then watch Jim Fulton deck him, the ref and half the front row. Personally, I regard (dynamic instance.attribute) as a handy feature, not as a flaw in the object model. - Gordon From moshez at math.huji.ac.il Fri Apr 14 22:19:50 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Fri, 14 Apr 2000 22:19:50 +0200 (IST) Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7) In-Reply-To: <000901bfa5b9$b7f6c980$182d153f@tim> Message-ID: On Thu, 13 Apr 2000, Tim Peters wrote: > Well, while an instance of graph isomorphism, this one is a relatively > simple special case (because "the graphs" here are rooted, directed, and > have ordered children). Ordered? What about dictionaries? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From bwarsaw at cnri.reston.va.us Fri Apr 14 22:49:41 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 16:49:41 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <006201bfa647$92f3c000$34aab5d4@hagrid> Message-ID: <14583.33893.192967.369037@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> fwiw, I'd love to see a good syntax for this. might even FL> change my mind... def foo(x): self.x = x ? :) -Barry From bwarsaw at cnri.reston.va.us Fri Apr 14 23:03:25 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 17:03:25 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <1256381481-212228@hypernet.com> <009401bfa64b$202f6c00$34aab5d4@hagrid> Message-ID: <14583.34717.128345.245459@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> because people isn't likely to use __doc__ to store FL> static variables? Okay, let's really see how much we can abuse __doc__ today. I'm surprised neither Zope nor SPARK are this evil. Why must I add the extra level of obfuscating indirection? Or are we talking about making __doc__ read-only in 1.6, or restricting it to strings only? -Barry -------------------- snip snip -------------------- import sys print sys.version def decorate(func): class D: pass doc = func.__doc__ func.__doc__ = D() func.__doc__.__doc__ = doc def eff(): "eff" print "eff", eff.__doc__.__doc__ decorate(eff) def bot(): "bot" print "bot", bot.__doc__.__doc__ decorate(bot) eff.__doc__.publish = 1 bot.__doc__.publish = 0 eff() bot() eff, bot = bot, eff eff() bot() for f in (eff, bot): print 'Can I publish %s? ... %s' % (f.__name__, f.__doc__.publish and 'yes' or 'no') -------------------- snip snip -------------------- % python /tmp/scary.py 1.5.2 (#7, Apr 16 1999, 18:24:22) [GCC 2.8.1] eff eff bot bot bot eff eff bot Can I publish bot? ... no Can I publish eff? ... yes From bwarsaw at cnri.reston.va.us Fri Apr 14 23:05:43 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Fri, 14 Apr 2000 17:05:43 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <006201bfa647$92f3c000$34aab5d4@hagrid> <14583.31133.851143.570161@beluga.mojam.com> Message-ID: <14583.34855.459510.161223@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> (I've never used _ for anything, so I don't know all its SM> current (ab)uses. This is just a thought that occurred to SM> me...) One place it's used is in localized applications. See Tools/i18n/pygettext.py. -Barry From gstein at lyra.org Fri Apr 14 23:20:27 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:20:27 -0700 (PDT) Subject: [Python-Dev] voting (was: Object customization) In-Reply-To: <007e01bfa648$3c701de0$34aab5d4@hagrid> Message-ID: On Fri, 14 Apr 2000, Fredrik Lundh wrote: > TimBot wrote: > > Greg gave the voting rule as: > > > > > -1 "Veto. And is my reasoning." > > sorry, I must have missed that post, since I've > interpreted the whole thing as: > > if reduce(operator.add, list_of_votes) > 0 and guido_likes_it(): > implement(feature) As in all cases, that "and" should be an "or" :-) > (probably because I've changed the eff-bot script > to use 'sre' instead of 're'...) > > can you repost the full set of rules? http://www.python.org/pipermail/python-dev/2000-March/004312.html Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:23:50 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:23:50 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <14583.33893.192967.369037@anthem.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > >>>>> "FL" == Fredrik Lundh writes: > > FL> fwiw, I'd love to see a good syntax for this. might even > FL> change my mind... > > def foo(x): > self.x = x > > ? :) Hehe... actually, I'd take Skip's "_.x = x" over the above suggestion. The above syntax creates too much of an expectation to look for "self". There would, of course, be problems that self.x doesn't work in a method while _.x could. Cheers, -g -- Greg Stein, http://www.lyra.org/ From fdrake at acm.org Fri Apr 14 23:18:48 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 14 Apr 2000 17:18:48 -0400 (EDT) Subject: [Python-Dev] Re: [Zope-dev] >2GB Data.fs files on FreeBSD In-Reply-To: References: <14581.60243.557955.192783@amarok.cnri.reston.va.us> Message-ID: <14583.35640.746399.601030@seahag.cnri.reston.va.us> R. David Murray writes: > So it looks to my uneducated eye like file.tell() is broken. The actual > on-disk size of the file, by the way, is indeed 2147718485, so it looks > like somebody's not using the right size data structure somewhere. > > So, can anyone tell me what to look for, or am I stuck for the moment? Hmm. What is off_t defined to be on your platform? In config.h, is HAVE_FTELLO or HAVE_FTELL64 defined? -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From Vladimir.Marangozov at inrialpes.fr Fri Apr 14 23:21:59 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Fri, 14 Apr 2000 23:21:59 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <1256378958-363995@hypernet.com> from "Gordon McMillan" at Apr 14, 2000 04:18:56 PM Message-ID: <200004142121.XAA03202@python.inrialpes.fr> Gordon McMillan wrote: > > Ah. I see. Quite simply, you're arguing from First Principles Exactly. I think that these principles play an important role in the area of computer programming, because they put the markers in the evolution of our thoughts when we're trying to transcript the real world through formal computer terms. No kidding :-) So we need to put some limits before loosing completely these driving markers. No kidding. > in an area where I have none. too bad for you > I used to, but I found that all systems built from First Principles > (Eiffel, Booch's methodology...) yielded 3 headed monsters. Yes. This is the state Python tends to reach, btw. I'd like to avoid this madness. Put simply, if we loose the meaning of the notion of a class of objects, there's no need to have a 'class' keyword, because it would do more harm than good. > Personally, I regard (dynamic instance.attribute) as a handy feature Gordon, I know that it's handy! > not as a flaw in the object model. if we still pretend there is one... -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Fri Apr 14 23:22:08 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 23:22:08 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> Message-ID: <38F78C00.7BAE1C12@lemburg.com> Fredrik Lundh wrote: > > M.-A. Lemburg wrote: > > > but they won't -- if you don't use an encoding directive, and > > > don't use 8-bit characters in your string literals, everything > > > works as before. > > > > > > (that's why the default is "none" and not "utf-8") > > > > > > if you use 8-bit characters in your source code and wish to > > > add an encoding directive, you need to add the right encoding > > > directive... > > > > Fair enough, but this would render all the auto-coercion > > code currently in 1.6 useless -- all string to Unicode > > conversions would have to raise an exception. > > I though it was rather clear by now that I think the auto- > conversion stuff *is* useless... > > but no, that doesn't mean that all string to unicode conversions > need to raise exceptions -- any 8-bit unicode character obviously > fits into a 16-bit unicode character, just like any integer fits in a > long integer. > > if you convert the other way, you might get an OverflowError, just > like converting from a long integer to an integer may give you an > exception if the long integer is too large to be represented as an > ordinary integer. after all, > > i = int(long(v)) > > doesn't always raise an exception... This is exactly the same as proposing to change the default encoding to Latin-1. I don't have anything against that (being a native Latin-1 user :), but I would assume that other native language writer sure do: e.g. all programmers not using Latin-1 as native encoding (and there are lots of them). > > > > > why keep on pretending that strings and strings are two > > > > > different things? it's an artificial distinction, and it only > > > > > causes problems all over the place. > > > > > > > > Sure. The point is that we can't just drop the old 8-bit > > > > strings... not until Py3K at least (and as Fred already > > > > said, all standard editors will have native Unicode support > > > > by then). > > > > > > I discussed that in my original "all characters are unicode > > > characters" proposal. in my proposal, the standard string > > > type will have to roles: a string either contains unicode > > > characters, or binary bytes. > > > > > > -- if it contains unicode characters, python guarantees that > > > methods like strip, lower (etc), and regular expressions work > > > as expected. > > > > > > -- if it contains binary data, you can still use indexing, slicing, > > > find, split, etc. but they then work on bytes, not on chars. > > > > > > it's still up to the programmer to keep track of what a certain > > > string object is (a real string, a chunk of binary data, an en- > > > coded string, a jpeg image, etc). if the programmer wants > > > to convert between a unicode string and an external encoding > > > to use a certain unicode encoding, she needs to spell it out. > > > the codecs are never called "under the hood". > > > > > > (note that if you encode a unicode string into some other > > > encoding, the result is binary buffer. operations like strip, > > > lower et al does *not* work on encoded strings). > > > > Huh ? If the programmer already knows that a certain > > string uses a certain encoding, then he can just as well > > convert it to Unicode by hand using the right encoding > > name. > > I thought that was what I said, but the text was garbled. let's > try again: > > if the programmer wants to convert between a unicode > string and a buffer containing encoded text, she needs > to spell it out. the codecs are never called "under the > hood" Again and again... The orginal intent of the Unicode integration was trying to make Unicode and 8-bit strings interoperate without too much user intervention. At a cost (the UTF-8 encoding), but then if you do use this encoding (and this is not far fetched since there are input sources which do return UTF-8, e.g. TCL), the Unicode implementation will apply all its knowledge in order to get you satisfied. If you don't like this, you can always apply explicit conversion calls wherever needed. Latin-1 and UTF-8 are not compatible, the conversion is very likely to cause an exception, so the user will indeed be informed about this failure. > > The whole point we are talking about here is that when > > having the implementation convert a string to Unicode all > > by itself it needs to know which encoding to use. This is > > where we have decided long ago that UTF-8 should be > > used. > > does "long ago" mean that the decision cannot be > questioned? what's going on here? > > face it, I don't want to guess when and how the interpreter > will convert strings for me. after all, this is Python, not Perl. > > if I want to convert from a "string of characters" to a byte > buffer using a certain character encoding, let's make that > explicit. Hey, there's nothing which prevents you from doing so explicitly. > Python doesn't convert between other data types for me, so > why should strings be a special case? Sure it does: 1.5 + 2 == 3.5, 2L + 3 == 5L, etc... > > The pragma discussion is about a totally different > > issue: pragmas could make it possible for the programmer > > to tell the *compiler* which encoding to use for literal > > u"unicode" strings -- nothing more. Since "8-bit" strings > > currently don't have an encoding attached to them we store > > them as-is. > > what do I have to do to make you read my proposal? > > shout? > > okay, I'll try: > > THERE SHOULD BE JUST ONE INTERNAL CHARACTER > SET IN PYTHON 1.6: UNICODE. Please don't shout... simply read on... Note that you are again argueing for using Latin-1 as default encoding -- why don't you simply make this fact explicit ? > for consistency, let this be true for both 8-bit and 16-bit > strings (as well as Py3K's 31-bit strings ;-). > > there are many possible external string encodings, just like there > are many possible external integer encodings. but for integers, > that's not something that the core implementation cares much > about. why are strings different? > > > I don't want to get into designing a completely new > > character container type here... this can all be done for Py3K, > > but not now -- it breaks things at too many ends (even though > > it would solve the issues with strings being used in different > > contexts). > > you don't need to -- you only need to define how the *existing* > string type should be used. in my proposal, it can be used in two > ways: > > -- as a string of unicode characters (restricted to the > 0-255 subset, by obvious reasons). given a string 's', > len(s) is always the number of characters, s[i] is the > i'th character, etc. > > or > > -- as a buffer containing binary bytes. given a buffer 'b', > len(b) is always the number of bytes, b[i] is the i'th > byte, etc. > > this is one flavour less than in the 1.6 alphas -- where strings sometimes > contain UTF-8 (and methods like upper etc doesn't work), sometimes an > 8-bit character set (and upper works), and sometimes binary buffers (for > which upper doesn't work). Strings always contain data -- there's no encoding attached to them. If the user calls .upper() on a binary string the output will most probably no longer be usable... but that's the programmers fault, not the string type's fault. > (hmm. I've said all this before, haven't I?) You know as well as I do that the existing string type is used for both binary and text data. You cannot simply change this by introducing some new definition of what should be stored in buffers and what in strings... not until we officially redefined these things say in Py3K ;-) > frankly, I'm beginning to feel like John Skaller. do I have to write my > own interpreter to get this done right? :-( No, but you should have started this discussion in late November last year... not now, when everything has already been implemented and people are starting to the use the code that's there with great success. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Fri Apr 14 23:29:48 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 14 Apr 2000 23:29:48 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <14583.29533.608524.961284@amarok.cnri.reston.va.us> Message-ID: <38F78DCC.C630F32@lemburg.com> "Andrew M. Kuchling" wrote: > > >why not just assume that the *ENTIRE SOURCE FILE* uses a single > >encoding, and let the tokenizer (or more likely, a conversion stage > >before the tokenizer) convert the whole thing to unicode. > > To reinforce Fredrik's point here, note that XML only supports > encodings at the level of an entire file (or external entity). You > can't tell an XML parser that a file is in UTF-8, except for this one > element whose contents are in Latin1. Hmm, this would mean that someone who writes: """ #pragma script-encoding utf-8 u = u"\u1234" print u """ would suddenly see "\u1234" as output. If that's ok, fine with me... it would make things easier on the compiler side (even though I'm pretty sure that people won't like this). BTW: I will be offline for the next week... I'm looking forward to where this dicussion will be heading. Have fun, -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Fri Apr 14 23:43:16 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:43:16 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <004501bfa5cf$7ec7cd60$34aab5d4@hagrid> Message-ID: On Fri, 14 Apr 2000, Fredrik Lundh wrote: > Tim Peters wrote: > > [Gordon McMillan] > > > ... > > > Or are you saying that if functions have attributes, people will > > > all of a sudden expect that function locals will have initialized > > > and maintained state? > > > > I expect that they'll expect exactly what happens in JavaScript, which > > supports function attributes too, and where it's often used as a > > nicer-than-globals way to get the effect of C-like mutable statics > > (conceptually) local to the function. > > so it's no longer an experimental feature, it's a "static variables" > thing? Don't be so argumentative. Tim suggested a possible use. Not what it really means or how it really works. I look at it as labelling a function with metadata about that function. I use globals or class attrs for "static" data. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:45:51 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:45:51 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: Message-ID: On Fri, 14 Apr 2000, Mark Hammond wrote: > > I think that we get 95% of the benefit without any of the > > "dangers" > > (though I don't agree with the arguments against) if we allow the > > attachment of properties only at compile time and > > disallow mutation of > > them at runtime. > > AFAIK, this would be a pretty serious change. The compiler just > generates (basically)PyObject_SetAttr() calls. There is no way in > the current runtime to differentiate between "compile time" and > "runtime" attribute references... If this was done, it would simply > be ugly hacks to support what can only be described as unpythonic in > the first place! > > [Unless of course Im missing something...] You aren't at all! Paul hit his head, or he is assuming some additional work to allow the compiler to know more. I agree with you: compilation in Python is just code execution; there is no way Python can disallow runtime changes. (from a later note, it appears he is referring to introducing "decl", which I don't think is on the table for 1.6) Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:48:27 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:48:27 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004141530.RAA02277@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: >... > If you prefer embedded definitions, among other things, you could do: > > __zope_access__ = { 'Spam' : 'public' } > > class Spam: > __zope_access__ = { 'eggs' : 'private', > 'eats' : 'public' } > def eggs(self, ...): ... > def eats(self, ...): ... > > or have a completely separate class/structure for access control > (which is what you would do it in C, btw, for existing objects > to which you can't add slots, ex: file descriptors, mem segments, etc). This is uglier than attaching the metadata directly to the target that you are describing! If you want to apply metadata to functions, then apply them to the function! Don't shove them off in a separate structure. You're the one talking about cleanliness, yet you suggest something that is very poor from a readability, maintainability, and semantic angle. Ick. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:52:22 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:52:22 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Message-ID: On Fri, 14 Apr 2000, Fred L. Drake, Jr. wrote: > Ken Manheimer writes: > > (Oh, and i'd suggest up front that documentation for this feature > > recommend people not use "__*__" names for their own object attributes, to > > avoid collisions with eventual use of them by python.) > > Isn't that a standing recommendation for all names? Yup. Personally, I use "_*" for private variables or other "hidden" type things that shouldn't be part of an object's normal interface. For example, all the stuff that the Python/COM interface uses is prefixed by "_" to denote that it is metadata about the classes rather than part of its interface. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 14 23:56:37 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 14:56:37 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141907.VAA02670@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: >... > Now, whenever there are two instances 'a' and 'b' of the class A, > the first inconsistency is that we're allowed to assign attributes > to these instances dynamically, which are not declared in the class A. > > Strictly speaking, if I say: > > >>> a.author = "Guido" > > and if 'author' is not an attribute of 'a' after the instantiation > of A (i.e. after a = A() completes), we should get a NameError. I'll repeat what Gordon said: the current Python behavior is entirely correct, entirely desirable, and should not (can not) change. Your views on what an object model should be are not Python's views. If the person who writes "a.author =" wants to do that, then let them. Python does not put blocks in people's way, it simply presumes that people are intelligent and won't do Bad Things. There are enumerable times where I've done the following: class _blank() pass data = _blank() data.item = foo data.extra = bar func(data) It is a tremendously easy way to deal with arbitrary data on an attribute basis, rather than (say) dictionary's key-based basis. >... arguments about alternate classes and stuff ... Sorry. That just isn't Python. Not in practice, nor in intent. Applying metadata to the functions is an entirely valid, Pythonic idea. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Apr 15 00:01:51 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:01:51 -0700 (PDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004142121.XAA03202@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: > Gordon McMillan wrote: > > > > Ah. I see. Quite simply, you're arguing from First Principles > > Exactly. > > I think that these principles play an important role in the area > of computer programming, because they put the markers in the > evolution of our thoughts when we're trying to transcript the > real world through formal computer terms. No kidding :-) > So we need to put some limits before loosing completely these > driving markers. No kidding. In YOUR opinion. In MY opinion, they're bunk. Python provides me with the capabilities that I want: objects when I need them, and procedural flow when that is appropriate. It avoids obstacles and gives me freedom of expression and ways to rapidly develop code. I don't have to worry about proper organization unless and until I need it. Formalisms be damned. I want something that works for ME. Give me code, make it work, and get out of my way. That's what Python is good for. I could care less about "proper programming principles". Pragmatism. That's what I seek. >... > > I used to, but I found that all systems built from First Principles > > (Eiffel, Booch's methodology...) yielded 3 headed monsters. > > Yes. This is the state Python tends to reach, btw. I'd like to avoid > this madness. Does not. There are many cases where huge systems have been built using Python, built well, and are quite successful. And yes, there have also been giant, monster-sized Bad Python Programs out there, too. But that can be done in ANY language. Python doesn't *tend* towards that at all. Certainly, Perl does, but we aren't talking about that (until now :-) > Put simply, if we loose the meaning of the notion of a class of objects, > there's no need to have a 'class' keyword, because it would do more harm > than good. Huh? What the heck do you mean by this? >... > > not as a flaw in the object model. > > if we still pretend there is one... It *DOES* have one. To argue there isn't one is simply insane and argumentative. Python just doesn't have YOUR object model. Live with it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov at inrialpes.fr Sat Apr 15 00:00:19 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 00:00:19 +0200 (CEST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on In-Reply-To: from "Greg Stein" at Apr 14, 2000 02:56:37 PM Message-ID: <200004142200.AAA03409@python.inrialpes.fr> Greg Stein wrote: > > Your views on what an object model should be are not Python's views. Ehm, could you explain to me what are Python's views? Sorry, I don't see any worthy argument in your posts that would make me switch from -1 to -0. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip at mojam.com Sat Apr 15 00:00:30 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 17:00:30 -0500 (CDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: References: <14583.25258.427604.293809@seahag.cnri.reston.va.us> Message-ID: <14583.38142.520596.804466@beluga.mojam.com> Barry said "_" is effectively taken because it means something (at least when used a function?) to pygettext. How about "__" then? def bar(): print __.x def foo(): print __.x foo.x = "public" bar.x = "private" ... It has the added benefit that this usage adheres to the "Python gets to stomp on __-prefixed variables" convention. my-underscore-key-works-better-than-yours-ly y'rs, Skip From gstein at lyra.org Sat Apr 15 00:13:25 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:13:25 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142200.AAA03409@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Your views on what an object model should be are not Python's views. > > Ehm, could you explain to me what are Python's views? > Sorry, I don't see any worthy argument in your posts > that would make me switch from -1 to -0. "We're all adults here." Python says that you can do what you want. It won't get in your way. Badness is not defined. If somebody wants to write "a.author='Guido'" then they can. There are a number of objects that can have arbitrary attributes. Classes, modules, and instances are a few (others?). Function objects are a proposed additional one. In all cases, attaching new attributes is fine and dandy -- no restriction. (well, you can implement __setattr__ on a class instance) Python's object model specifies a number of other behaviors, but nothing really material here. Of course, all these "views" are simply based on Guido's thoughts and the implementation. Implementation, doc, current practice, and Guido's discussions over the past eight years of Python's existence have also contributed to the notion of "The Python Way". Some of that may be very hard to write down, although I've attempted to write a bit of that above. After five years of working with Python, I'd like to think that I've absorbed and understand the Python Way. Can I state it? No. "We're all adults here" is a good one for this discussion. If you think that function attributes are bad for your programs, then don't use them. There are many others who find them tremendously handy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Sat Apr 15 00:19:24 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 15:19:24 -0700 (PDT) Subject: [Python-Dev] veto? (was: Object customization) In-Reply-To: <200004142200.AAA03409@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Your views on what an object model should be are not Python's views. > > Ehm, could you explain to me what are Python's views? > Sorry, I don't see any worthy argument in your posts > that would make me switch from -1 to -0. Note that all votes are important, but only a signal to Guido about our individual feelings on the matter. Every single person on this list could vote -1, and Guido can still implement the feature (at his peril :-). Conversely, we could all vote +1 and he can refuse to implement it. In this particular case, your -1 vote says that you really dislike this feature. Great. And you've provided a solid explanation why. Even better! Now, people can respond to your vote and attempt to get you to change it. This is goodness because maybe you voted -1 based on a misunderstanding or something unclear in the proposal (I'm talking general now; I don't believe that is the case here). After explanation and enlightenment, you could change the vote. The discussion about *why* you voted -1 is also instructive to Guido. It may raise an issue that he hadn't considered. In addition, people attempting to change your mind are also providing input to Guido. [ maybe too much input is flying around, but the principle is there :-) ] Basically, we can call them vetoes or votes. Either way, this is still Guido's choice :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From Vladimir.Marangozov at inrialpes.fr Sat Apr 15 00:54:24 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 00:54:24 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Greg Stein" at Apr 14, 2000 03:13:25 PM Message-ID: <200004142254.AAA03535@python.inrialpes.fr> Greg Stein wrote: > > Python says that you can do what you want. 'Python' says nothing. Or are you The Voice of Python? If custom object attributes are convenient for you, then I'd suggest to generalize the concept, because I perceived it as a limitation too, but not for functions and methods. I'll repeat myself: >>> wink >>> wink.fraction = 1e+-1 >>> wink.fraction.precision = 1e-+1 >>> wink.compute() 0.0 Has anybody noticed that 'fraction' is a float I wanted to qualify with a 'precision' attribute? Again: if we're about to go that road, let's do it in one shot. *This* is what would change my vote. I'll leave Guido to cut the butter, or to throw it all out the window. You're right Greg: I hardly can contribute more in this case, even if I wanted to. Okay, +53 :-) -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From skip at mojam.com Sat Apr 15 01:01:14 2000 From: skip at mojam.com (Skip Montanaro) Date: Fri, 14 Apr 2000 18:01:14 -0500 (CDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142254.AAA03535@python.inrialpes.fr> References: <200004142254.AAA03535@python.inrialpes.fr> Message-ID: <14583.41786.144784.114440@beluga.mojam.com> Vladimir> I'll repeat myself: >>>> wink Vladimir> >>>> wink.fraction = 1e+-1 >>>> wink.fraction.precision = 1e-+1 >>>> wink.compute() Vladimir> 0.0 Vladimir> Has anybody noticed that 'fraction' is a float I wanted to Vladimir> qualify with a 'precision' attribute? Quick comment before I rush home... There is a significant cost to be had by adding attributes to numbers (ints at least). They can no longer be shared in the int cache. I think the runtime size increase would be pretty huge, as would the extra overhead in creating all those actual (small) IntObjects instead of sharing a single copy. On the other hand, functions are already pretty heavyweight objects and occur much less frequently than numbers in common Python programs. They aren't shared (except for instance methods, which Barry's patch already excludes), so there's no risk of stomping on attributes that are shared by more than one function. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From gstein at lyra.org Sat Apr 15 01:14:01 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 14 Apr 2000 16:14:01 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004142254.AAA03535@python.inrialpes.fr> Message-ID: On Sat, 15 Apr 2000, Vladimir Marangozov wrote: > Greg Stein wrote: > > > > Python says that you can do what you want. > > 'Python' says nothing. Or are you The Voice of Python? Well, yah. You're just discovering that?! :-) I meant "The Python Way" says that you can do what you want. It doesn't speak often, but if you know how to hear it... it is a revelation :-) > If custom object attributes are convenient for you, then I'd suggest to Custom *function* attributes. Functions are one of the few objects in Python that are "structural" in their intent and use, yet have no way to record data. Modules and classes have a way to, but not functions. [ by "structure", I mean something that contributes to the structure, organization, and mechanics of your program. as opposed to data, such as lists, dicts, instances. ] And ditto what Skip said about attaching attributes to ints and other immutables. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond at skippinet.com.au Sat Apr 15 03:45:27 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat, 15 Apr 2000 11:45:27 +1000 Subject: Re[Python-Dev] #pragmas in Python source code In-Reply-To: <14583.29533.608524.961284@amarok.cnri.reston.va.us> Message-ID: I can see the dilemma, but... > Maybe we should consider being more conservative, and > just having the > Unicode built-in type, the unicode() built-in function, > and the u"..." > notation, and then leaving all responsibility for > conversions up to > the user. Win32 and COM has been doing exactly this for the last couple of years. And it sucked. > On the other hand, *some* default conversion > seems needed, > because it seems draconian to make open(u"abcfile") fail with a > TypeError. For exactly this reason. The end result is that the first thing you ever do with a Unicode object is convert it to a string. > (While I want to see Python 1.6 expedited, I'd also not > like to see it > saddled with a system that proves to have been a mistake, or one > that's a maintenance burden. If forced to choose between > delaying and > getting it right, the latter wins.) Agreed. I thought this implementation stemmed from Guido's desire to do it this way in the 1.x family, and move towards Fredrik's proposal for Py3k. As a geneal comment: Im a little confused and dissapointed here. We are all bickering like children while our parents are away. All we are doing is creating a _huge_ pile of garbage for Guido to ignore when he returns. We are going to be presenting Guido with around 400 messages at my estimate. He can't possibly read them all. So the end result is that all the posturing and flapping going on here is for naught, and he is just going to do whatever he wants anyway - as he always has done, and as has worked so well for Python. Sheesh - we should all consider how we can be the most effective, not the most loud or aggressive! Mark. From moshez at math.huji.ac.il Sat Apr 15 07:06:00 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 15 Apr 2000 07:06:00 +0200 (IST) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) In-Reply-To: <200004141530.RAA02277@python.inrialpes.fr> Message-ID: On Fri, 14 Apr 2000, Vladimir Marangozov wrote: > If you prefer embedded definitions, among other things, you could do: > > __zope_access__ = { 'Spam' : 'public' } > > class Spam: > __zope_access__ = { 'eggs' : 'private', > 'eats' : 'public' } > def eggs(self, ...): ... > def eats(self, ...): ... This solution is close to what the eff-bot suggested. In this case it is horrible because of "editing effort": the meta-data and code of a function are better off together physically, so you would change it to class Spam: __zope_access__ = {} def eggs(self): pass __zope_access__['eggs'] = 'private' def eats(self): pass __zope_access__['eats'] = 'public' Which is way too verbose. Especially, if the method gets thrown around, you find yourself doing things like meth.im_class.__zope_access__[meth.im_func.func_name] Instead of meth.__zope_access__ And sometimes you write a function: def do_something(self): pass And the infrastructure adds the method to a class of its choice. Where would you stick the attribute then? -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tim_one at email.msn.com Sat Apr 15 07:51:57 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 15 Apr 2000 01:51:57 -0400 Subject: Comparison of cyclic objects (was RE: [Python-Dev] trashcan andPR#7) In-Reply-To: Message-ID: <000701bfa69e$b5d768e0$092d153f@tim> [Tim] > Well, while an instance of graph isomorphism, this one is a relatively > simple special case (because "the graphs" here are rooted, directed, and > have ordered children). [Moshe Zadka] > Ordered? What about dictionaries? An ordering of a dict's kids is forced in the context of comparison (see dict_compare in dictobject.c). From Vladimir.Marangozov at inrialpes.fr Sat Apr 15 08:56:44 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 08:56:44 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: <14583.41786.144784.114440@beluga.mojam.com> from "Skip Montanaro" at Apr 14, 2000 06:01:14 PM Message-ID: <200004150656.IAA03994@python.inrialpes.fr> Skip Montanaro wrote: > > Vladimir> Has anybody noticed that 'fraction' is a float I wanted to > Vladimir> qualify with a 'precision' attribute? > > Quick comment before I rush home... There is a significant cost to be had > by adding attributes to numbers (ints at least). They can no longer be > shared in the int cache. I think the runtime size increase would be pretty > huge, as would the extra overhead in creating all those actual (small) > IntObjects instead of sharing a single copy. I know that. Believe it or not, I have a good image of the cost it would infer, better than yours. because I've thought about this problem (as well as other related problems yet to be 'discovered'), and have spent some time in the past trying to find a couple of solutions to them. However, I eventually decided to stop my time machine and wait for these issues to show up, then take a stance on them. And this is what I did in this case. I'm tired to lack good arguments and see incoming capitalized words. This makes no sense here. Go to c.l.py and repeat "we're all adults here" *there*, please. To close this chapter, I think that if this gets in, Python's user base will get more confused and would have to swallow yet another cheap gimmick. You won't be able to explain it well to them. They won't really understand it, because their brains are still young, inexperienced, looking for logical explanations where all notions coexist peacefully. In the long term, what you're pushing for to get your money quickly, isn't a favor. And that's why I maintain my vote. call-me-again-if-you-need-more-than-53'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From mal at lemburg.com Sat Apr 15 11:28:15 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 15 Apr 2000 11:28:15 +0200 Subject: Re[Python-Dev] #pragmas in Python source code References: Message-ID: <38F8362F.51C2F7EC@lemburg.com> Mark Hammond wrote: > > I thought this implementation stemmed from Guido's desire > to do it this way in the 1.x family, and move towards Fredrik's > proposal for Py3k. Right. Let's do this step by step and get some experience first. With that gained experience we can still polish up the design towards a compromise which best suits all our needs. The integration of Unicode into Python is comparable to the addition of floats to an interpreter which previously only understood integers -- things are obviously going to be a little different than before. Our goal should be to make it as painless as possible and at least IMHO this can only be achieved by gaining practical experience in this new field first. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From effbot at telia.com Sat Apr 15 12:14:54 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sat, 15 Apr 2000 12:14:54 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <38F78C00.7BAE1C12@lemburg.com> Message-ID: <002101bfa6c3$73e6c3c0$34aab5d4@hagrid> > This is exactly the same as proposing to change the default > encoding to Latin-1. no, it isn't. here's what I'm proposing: -- the internal character set is unicode, and nothing but unicode. in 1.6, this applies to strings. in 1.7 or later, it applies to source code as well. -- the default source encoding is "unknown" -- the is no other default encoding. all strings use the unicode character set. to give you some background, let's look at section 3.2 of the existing language definition: [Sequences] represent finite ordered sets indexed by natural numbers. The built-in function len() returns the number of items of a sequence. When the length of a sequence is n, the index set contains the numbers 0, 1, ..., n-1. Item i of sequence a is selected by a[i]. An object of an immutable sequence type cannot change once it is created. The items of a string are characters. There is no separate character type; a character is represented by a string of one item. Characters represent (at least) 8-bit bytes. The built-in functions chr() and ord() convert between characters and nonnegative integers representing the byte values. Bytes with the values 0-127 usually represent the corre- sponding ASCII values, but the interpretation of values is up to the program. The string data type is also used to represent arrays of bytes, e.g., to hold data read from a file. (in other words, given a string s, len(s) is the number of characters in the string. s[i] is the i'th character. len(s[i]) is 1. etc. the existing string type doubles as byte arrays, where given an array b, len(b) is the number of bytes, b[i] is the i'th byte, etc). my proposal boils down to a few small changes to the last three sentences in the definition. basically, change "byte value" to "character code" and "ascii" to "unicode": The built-in functions chr() and ord() convert between characters and nonnegative integers representing the character codes. Character codes usually represent the corresponding unicode values. The 8-bit string data type is also used to represent arrays of bytes, e.g., to hold data read from a file. that's all. the rest follows from this. ... just a few quickies to sort out common misconceptions: > I don't have anything against that (being a native Latin-1 > user :), but I would assume that other native language > writer sure do: e.g. all programmers not using Latin-1 > as native encoding (and there are lots of them). the unicode folks have already made that decision. I find it very strange that we should use *another* model for the first 256 characters, just to "equally annoy everyone". (if people have a problem with the first 256 unicode characters having the same internal representation as the ISO 8859-1 set, tell them to complain to the unicode folks). > (and this is not far fetched since there are input sources > which do return UTF-8, e.g. TCL), the Unicode implementation > will apply all its knowledge in order to get you satisfied. there are all sorts of input sources. major platforms like windows and java use 16-bit unicode. and Tcl has an internal unicode string type, since they realized that storing UTF-8 in 8-bit strings was horridly inefficient (they tried to do it right, of course). the internal type looks like this: typedef unsigned short Tcl_UniChar; typedef struct String { int numChars; size_t allocated; size_t uallocated; Tcl_UniChar unicode[2]; } String; (Tcl uses dual-ported objects, where each object can have an UTF-8 string representation in addition to the internal representation. if you change one of them, the other is recalculated on demand) in fact, it's Tkinter that converts the return value to UTF-8, not Tcl. that can be fixed. > > Python doesn't convert between other data types for me, so > > why should strings be a special case? > > Sure it does: 1.5 + 2 == 3.5, 2L + 3 == 5L, etc... but that's the key point: 2L and 3 are both integers, from the same set of integers. if you convert a long integer to an integer, it still contains an integer from the same set. (maybe someone can fill me in here: what's the formally correct word here? set? domain? category? universe?) also, if you convert every item in a sequence of long integers to ordinary integers, all items are still members of the same integer set. in contrast, the UTF-8 design converts between strings of characters, and arrays of bytes. unless you change the 8-bit string type to know about UTF-8, that means that you change string items from one domain (characters) to another (bytes). > Note that you are again argueing for using Latin-1 as > default encoding -- why don't you simply make this fact > explicit ? nope. I'm standardizing on a character set, not an encoding. character sets are mapping between integers and characters. in this case, we use the unicode character set. encodings are ways to store strings of text as bytes in a byte array. > not now, when everything has already been implemented and > people are starting to the use the code that's there with great > success. the positive reports I've seen all rave about the codec frame- work. that's a great piece of work. without that, it would have been impossible to do what I'm proposing. (so what are you complaining about? it's all your fault -- if you hadn't done such a great job on that part of the code, I wouldn't have noticed the warts ;-) if you look at my proposal from a little distance, you'll realize that it doesn't really change much. all that needs to be done is to change some of the conversion stuff. if we decide to do this, I can do the work for you, free of charge. From effbot at telia.com Sat Apr 15 12:45:15 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sat, 15 Apr 2000 12:45:15 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F8362F.51C2F7EC@lemburg.com> Message-ID: <005801bfa6c7$b22783a0$34aab5d4@hagrid> M.-A. Lemburg wrote: > Right. Let's do this step by step and get some experience first. > With that gained experience we can still polish up the design > towards a compromise which best suits all our needs. so practical experience from other languages, other designs, and playing with the python alphas doesn't count? > The integration of Unicode into Python is comparable to the > addition of floats to an interpreter which previously only > understood integers. use "long integers" instead of "floats", and you'll get closer to the actual case. but where's the problem? python has solved this problem for numbers, and what's more important: the language reference tells us how strings are supposed to work: "The items of a string are characters." (see previous mail) "Strings are compared lexicographically using the numeric equivalents (the result of the built-in function ord()) of their characters." this solves most of the issues. to handle the rest, look at the language reference description of integer: [Integers] represent elements from the mathematical set of whole numbers. Borrowing the "elements from a single set" concept, define characters as Characters represent elements from the unicode character set. and let all mixed-string operations use string coercion, just like numbers. can it be much simpler? From effbot at telia.com Sat Apr 15 13:19:14 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sat, 15 Apr 2000 13:19:14 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <017b01bfa53e$748cc080$34aab5d4@hagrid> <38F5EDDC.731E6740@lemburg.com> <003a01bfa568$b190c560$34aab5d4@hagrid> <38F6DAD7.BBAF72E5@lemburg.com> <005401bfa646$123ef2a0$34aab5d4@hagrid> <14583.29533.608524.961284@amarok.cnri.reston.va.us> <38F78DCC.C630F32@lemburg.com> Message-ID: <00c901bfa6cd$62259580$34aab5d4@hagrid> M.-A. Lemburg wrote: > > To reinforce Fredrik's point here, note that XML only supports > > encodings at the level of an entire file (or external entity). You > > can't tell an XML parser that a file is in UTF-8, except for this one > > element whose contents are in Latin1. > > Hmm, this would mean that someone who writes: > > """ > #pragma script-encoding utf-8 > > u = u"\u1234" > print u > """ > > would suddenly see "\u1234" as output. not necessarily. consider this XML snippet: ሴ if I run this through an XML parser and write it out as UTF-8, I get: ?^? in other words, the parser processes "&#x" after decoding to unicode, not before. I see no reason why Python cannot do the same. From effbot at telia.com Sat Apr 15 14:02:02 2000 From: effbot at telia.com (Fredrik Lundh) Date: Sat, 15 Apr 2000 14:02:02 +0200 Subject: [Python-Dev] Object customization References: Message-ID: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Greg Stein wrote: > On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > > >>>>> "FL" == Fredrik Lundh writes: > > > > FL> fwiw, I'd love to see a good syntax for this. might even > > FL> change my mind... > > > > def foo(x): > > self.x = x > > > > ? :) > > Hehe... actually, I'd take Skip's "_.x = x" over the above suggestion. The > above syntax creates too much of an expectation to look for "self". There > would, of course, be problems that self.x doesn't work in a method while > _.x could. how about the obvious one: adding the name of the function to the local namespace? def foo(x): foo.x = x (in theory, this might of course break some code. but is that a real problem?) after all, my concern is that the above appears to work, but mostly by accident: >>> def foo(x): >>> foo.x = x >>> foo(10) >>> foo.x 10 >>> # cool. now let's try this on a method >>> class Foo: >>> def bar(self, x): >>> bar.x = x >>> foo = Foo() >>> foo.bar(10) Traceback (most recent call first): NameError: bar >>> # huh? maybe making it work in both cases would help? ... but on third thought, maybe it's sufficient to just keep the "static variable" aspect out of the docs. I just browsed a number of javascript manuals, and I couldn't find a trace of this feature. so how about this? -0.90000000000000002 on documenting this as "this can be used to store static data in a function" +1 on the feature itself. From Vladimir.Marangozov at inrialpes.fr Sat Apr 15 16:33:51 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sat, 15 Apr 2000 16:33:51 +0200 (CEST) Subject: [Python-Dev] veto? (was: Object customization) In-Reply-To: from "Greg Stein" at Apr 14, 2000 03:19:24 PM Message-ID: <200004151433.QAA04376@python.inrialpes.fr> Greg Stein wrote: > > Note that all votes are important, but only a signal to Guido ... > [good stuff deleted] Very true. I think we've made good progress here. > Now, people can respond to your vote and attempt to get you to change it. > ... After explanation and enlightenment, you could change the vote. Or vice-versa :-) Fredrik has been very informative about the evolution of his opinions as the discussion evolved. As was I, but I don't count ;-) It would be nice if we adopt his example and send more signals to Guido, emitted with a fixed (positive or negative) or with a sinusoidal frequency. -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From bwarsaw at cnri.reston.va.us Sat Apr 15 18:45:27 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Sat, 15 Apr 2000 12:45:27 -0400 (EDT) Subject: [Python-Dev] Object customization (was: Arbitrary attributes on funcs and methods) References: <14583.25258.427604.293809@seahag.cnri.reston.va.us> <14583.38142.520596.804466@beluga.mojam.com> Message-ID: <14584.40103.511210.929864@anthem.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: SM> Barry said "_" is effectively taken because it means something SM> (at least when used a function?) to pygettext. How about "__" SM> then? oops, yes, only when used as a function. so _.x would be safe. From bwarsaw at cnri.reston.va.us Sat Apr 15 18:52:56 2000 From: bwarsaw at cnri.reston.va.us (bwarsaw at cnri.reston.va.us) Date: Sat, 15 Apr 2000 12:52:56 -0400 (EDT) Subject: [Python-Dev] Object customization References: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Message-ID: <14584.40552.13918.707019@anthem.cnri.reston.va.us> >>>>> "FL" == Fredrik Lundh writes: FL> so how about this? FL> -0.90000000000000002 on documenting this as "this FL> can be used to store static data in a function" FL> +1 on the feature itself. Works for me! I think function attrs would be a lousy place to put statics anyway. -Barry From tim_one at email.msn.com Sat Apr 15 19:43:05 2000 From: tim_one at email.msn.com (Tim Peters) Date: Sat, 15 Apr 2000 13:43:05 -0400 Subject: [Python-Dev] Object customization In-Reply-To: <14584.40552.13918.707019@anthem.cnri.reston.va.us> Message-ID: <000801bfa702$0e1b3320$8a2d153f@tim> [/F] > -0.90000000000000002 on documenting this as "this > can be used to store static data in a function" -1 on that part from me. I never recommended to do it, I merely predicted that people *will* do it. And, they will. > +1 on the feature itself. I remain +0. [Barry] > Works for me! I think function attrs would be a lousy place to put > statics anyway. Yes, but the alternatives are also lousy: a global, or abusing default args. def f(): f.n = f.n + 1 return 42 f.n = 0 ... print "f called", f.n, "times" vs _f_n = 0 def f(): global _f_n _f_n = _f_n + 1 return 42 ... print "f called", _f_n, "times" vs def f(n=[0]): n[0] = n[0] + 1 return 42 ... print "f called ??? times" As soon as s person bumps into the first way, they're likely to adopt it, simply because it's less lousy than the others on first sight. From moshez at math.huji.ac.il Sat Apr 15 19:44:25 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 15 Apr 2000 19:44:25 +0200 (IST) Subject: [Python-Dev] Object customization In-Reply-To: <000801bfa702$0e1b3320$8a2d153f@tim> Message-ID: On Sat, 15 Apr 2000, Tim Peters wrote: [Barry] > Works for me! I think function attrs would be a lousy place to put > statics anyway. [Tim Peters] > Yes, but the alternatives are also lousy: a global, or abusing default > args. Personally I kind of like the alternative of a class: class _Foo: def __init__(self): self.n = 0 def f(self): self.n = self.n+1 return 42 f = _Foo().f getting-n-out-of-f-is-left-as-an-exercise-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Sun Apr 16 17:52:20 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sun, 16 Apr 2000 17:52:20 +0200 Subject: [Python-Dev] Python source code encoding Message-ID: <38F9E1B4.F664D173@lemburg.com> [Fredrik]: > [MAL]: > > > To reinforce Fredrik's point here, note that XML only supports > > > encodings at the level of an entire file (or external entity). You > > > can't tell an XML parser that a file is in UTF-8, except for this one > > > element whose contents are in Latin1. > > > > Hmm, this would mean that someone who writes: > > > > """ > > #pragma script-encoding utf-8 > > > > u = u"\u1234" > > print u > > """ > > > > would suddenly see "\u1234" as output. > > not necessarily. consider this XML snippet: > > > ሴ > > if I run this through an XML parser and write it > out as UTF-8, I get: > > ?^? > > in other words, the parser processes "&#x" after > decoding to unicode, not before. > > I see no reason why Python cannot do the same. Sure, and this is what I meant when I said that the compiler has to deal with several different encodings. Unicode escape sequences are currently handled by a special codec, the unicode-escape codec which reads all characters with ordinal < 256 as-is (meaning Latin-1, since the first 256 Unicode ordinals map to Latin-1 characters (*)) except a few escape sequences which it processes much like the Python parser does for 8-bit strings and the new \uXXXX escape. Perhaps we should make this processing use two levels... the escape codecs would need some rewriting to process Unicode-> Unicode instead of 8-bit->Unicode as they do now. -- To move along the method Fredrik is proposing I would suggest (for Python 1.7) to introduce a preprocessor step which gets executed even before the tokenizer. The preprocessor step would then translate char* input into Py_UNICODE* (using an encoding hint which would have to appear in the first few lines of input using some special format). The tokenizer could then work on Py_UNICODE* buffer and the parser would then take care of the conversion from Py_UNICODE* back to char* for Python's 8-bit strings. It should shout out loud in case it sees input data outside Unicode range(256) in what is supposed to be a 8-bit string. To make this fully functional we would have to change the 8-bit string to Unicode coercion mechanism, though. It would have to make a Latin-1 assumption instead of the current UTF-8 assumption. In contrast to the current scheme, this assumption would be correct for all constant strings appearing in source code given the above preprocessor logic. For strings constructed from file or user input the programmer would have to assure proper encoding or do the Unicode conversion himself. Sidenote: The UTF-8->Latin-1 change would probably also have to be propogated to all other Unicode in/output logic -- perhaps Latin-1 is the better default encoding after all... A programmer could then write a Python script completely in UTF-8, UTF-16 or Shift-JIS and the above logic would convert the input data to Unicode or Latin-1 (which is 8-bit Unicode) as appropriate and it would warn about impossible conversions to Latin-1 in the compile step. The programmer would still have to make sure that file and user input gets converted using the proper encoding, but this can easily be done using the stream wrappers in the standard codecs module. Note that in this discussion we need to be very careful not to mangle encodings used for source code and ones used when reading/writing to files or other streams (including stdin/stdout). BTW, to experiment with all this you can use the codecs.EncodedFile stream wrapper. It allows specifying both data and stream side encodings, e.g. you can redirect a UTF-8 stdin stream to Latin-1 returning file object which can then be used as source of data input. (*) The conversion from Unicode to Latin-1 is similar to converting a 2-byte unsigned short to an unsigned byte with some extra logic to catch data loss. Latin-1 is comparable to 8-bit Unicode... this is where all this talk about Latin-1 originates from :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From Vladimir.Marangozov at inrialpes.fr Sun Apr 16 22:28:41 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Sun, 16 Apr 2000 22:28:41 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Greg Stein" at Apr 14, 2000 02:48:27 PM Message-ID: <200004162028.WAA06045@python.inrialpes.fr> [Skip, on attaching access control rights to function objects] > [VM] > >... > > If you prefer embedded definitions, among other things, you could do: > > > > __zope_access__ = { 'Spam' : 'public' } > > > > class Spam: > > __zope_access__ = { 'eggs' : 'private', > > 'eats' : 'public' } > > def eggs(self, ...): ... > > def eats(self, ...): ... [Greg] > This is uglier than attaching the metadata directly to the target that you > are describing! If you want to apply metadata to functions, then apply > them to the function! Don't shove them off in a separate structure. > > You're the one talking about cleanliness, yet you suggest something that > is very poor from a readability, maintainability, and semantic angle. Ick. [Moshe] > This solution is close to what the eff-bot suggested. In this case it > is horrible because of "editing effort": the meta-data and code of a > function are better off together physically, so you would change it > to ... > [equivalent solution deleted] In this particular use case, we're discussing access control rights which are part of some protection policy. A protection policy is a matrix Objects/Rights. It can be impemented in 3 ways, depending on the system: 1. Attach the Rights to the Objects 2. Attach the Objects to the Rights 3. Have a separate structure which implements the matrix. I agree that in this particular case, it seems handy to attach the rights to the objects. But in other cases, it's more appropriate to attach the objects to the rights. However, the 3rd solution is the one to be used when the objects (respectively, the rights) are fixed from the start and cannot be modified, and solution 2 (resp, 3) is not desirable/optimal/plausible... That's what I meant with: [VM] > > or have a completely separate class/structure for access control > > (which is what you would do it in C, btw, for existing objects > > to which you can't add slots, ex: file descriptors, mem segments, etc). Which presents an advantage: the potential to change completely the protection policy of the system in future versions of the software, because the protection implementation is decoupled from the objects' and the rights' implementation. damned-but-persistent-first-principles-again-'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From klm at digicool.com Sun Apr 16 23:45:00 2000 From: klm at digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 17:45:00 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: On Sun, 16 Apr 2000 Vladimir.Marangozov at inrialpes.fr wrote: > > [Skip, on attaching access control rights to function objects] > > > [VM] > > >... > > > If you prefer embedded definitions, among other things, you could do: > > > > > > __zope_access__ = { 'Spam' : 'public' } > > > > > > class Spam: > > > __zope_access__ = { 'eggs' : 'private', > > > 'eats' : 'public' } > > > def eggs(self, ...): ... > > > def eats(self, ...): ... > > [Greg] > > This is uglier than attaching the metadata directly to the target that you > > are describing! If you want to apply metadata to functions, then apply > > them to the function! Don't shove them off in a separate structure. > [...] > In this particular use case, we're discussing access control rights > which are part of some protection policy. > A protection policy is a matrix Objects/Rights. It can be impemented > in 3 ways, depending on the system: > 1. Attach the Rights to the Objects > 2. Attach the Objects to the Rights > 3. Have a separate structure which implements the matrix. > [...] > [VM] > > > or have a completely separate class/structure for access control > > > (which is what you would do it in C, btw, for existing objects > > > to which you can't add slots, ex: file descriptors, mem segments, etc). > > Which presents an advantage: the potential to change completely the > protection policy of the system in future versions of the software, > because the protection implementation is decoupled from the objects' > and the rights' implementation. It may well make sense to have the system *implement* the rights somewhere else. (Distributed system, permissions caches in an object system, etc.) However it seems to me to make exceeding sense to have the initial intrinsic settings specified as part of the object! More generally, it is the ability to associate intrinsic metadata that is the issue, not the designs of systems that employ the metadata. Geez. And, in the case of functions, it seems to me to be outstandingly consistent with python's treatment of objects. I'm mystified about why you would reject that so adamantly! That said, i can entirely understand concerns about whether or how to express references to the metadata from within the function's body. We haven't even seen a satisfactory approach to referring to the function, itself, from within the function. Maybe it's not even desirable to be able to do that - that's an interesting question. (I happen to think it's a good idea, just requiring a suitable means of expression.) But being able to associate metadata with functions seems like a good idea, and i've seen no relevant clues in your "first principles" about why it would be bad. Ken klm at digicool.com Return-Path: Delivered-To: python-dev at python.org Received: from merlin.codesourcery.com (merlin.codesourcery.com [206.168.99.1]) by dinsdale.python.org (Postfix) with SMTP id 7312F1CD5A for ; Sat, 15 Apr 2000 12:50:20 -0400 (EDT) Received: (qmail 17758 invoked by uid 513); 15 Apr 2000 16:57:54 -0000 Mailing-List: contact sc-publicity-help at software-carpentry.com; run by ezmlm Precedence: bulk X-No-Archive: yes Delivered-To: mailing list sc-publicity at software-carpentry.com Delivered-To: moderator for sc-publicity at software-carpentry.com Received: (qmail 16214 invoked from network); 15 Apr 2000 16:19:12 -0000 Date: Sat, 15 Apr 2000 12:11:52 -0400 (EDT) From: To: sc-announce at software-carpentry.com, sc-publicity at software-carpentry.com Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: [Python-Dev] Software Carpentry entries now on-line Sender: python-dev-admin at python.org Errors-To: python-dev-admin at python.org X-BeenThere: python-dev at python.org X-Mailman-Version: 2.0beta3 List-Id: Python core developers First-round entries in the Software Carpentry design competition are now available on-line at: http://www.software-carpentry.com/entries/index.html Our thanks to everyone who entered; we look forward to some lively discussion on the "sc-discuss" list. Best regards, Greg Wilson Software Carpentry Project Coordinator From skip at mojam.com Mon Apr 17 00:06:08 2000 From: skip at mojam.com (Skip Montanaro) Date: Sun, 16 Apr 2000 17:06:08 -0500 (CDT) Subject: [Python-Dev] Object customization In-Reply-To: References: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: <14586.14672.949500.986951@beluga.mojam.com> Ken> We haven't even seen a satisfactory approach to referring to the Ken> function, itself, from within the function. Maybe it's not even Ken> desirable to be able to do that - that's an interesting question. I hereby propose that within a function the special name __ refer to the function. You could have def fact(n): if n <= 1: return 1 return __(n-1) * n You could also refer to function attributes through __ (presuming Barry's proposed patch gets adopted): def pub(*args): if __.access == "private": do_private_stuff(*args) else: do_public_stuff(*args) ... if validate_permissions(): pub.access = "private" else: pub.access = "public" When in a bound method, __ should refer to the bound method, not the unbound method, which is already accessible via the class name. As far as lexical scopes are concerned, this won't change anything. I think it could be implemented by adding a reference to the function called __ in the local vars of each function. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From klm at digicool.com Mon Apr 17 00:09:18 2000 From: klm at digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:09:18 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <010101bfa6d2$6de60f80$34aab5d4@hagrid> Message-ID: On Sat, 15 Apr 2000, Fredrik Lundh wrote: > Greg Stein wrote: > > On Fri, 14 Apr 2000, Barry A. Warsaw wrote: > > > >>>>> "FL" == Fredrik Lundh writes: > > > > > > FL> fwiw, I'd love to see a good syntax for this. might even > > > FL> change my mind... > > > > > > def foo(x): > > > self.x = x > > > > > > ? :) > > > > Hehe... actually, I'd take Skip's "_.x = x" over the above suggestion. The > > above syntax creates too much of an expectation to look for "self". There > > would, of course, be problems that self.x doesn't work in a method while > > _.x could. > > how about the obvious one: adding the name of the > function to the local namespace? > > def foo(x): > foo.x = x 'self.x' would collide profoundly with the convention of using 'self' for the instance-argument in bound methods. Here, foo.x assumes that 'foo' is not rebound in the context of the def - the class, module, function, or wherever it's defined. That seems like an unnecessarily too strong an assumption. Both of these things suggest to me that we don't want to use a magic variable name, but rather some kind of builtin function to get the object (lexically) containing the block. It's tempting to name it something like 'this()', but that would be much too easily confused in methods with 'self'. Since we're looking for the lexically containing object, i'd call it something like current_object(). class Something: """Something's cooking, i can feel it.""" def help(self, *args): """Spiritual and operational guidance for something or other. Instructions for using help: ...""" print self.__doc__ print current_object().__doc__ if args: self.do_mode_specific_help(args) I think i'd be pretty happy with the addition of __builtins__.current_object, and the allowance of arbitrary metadata with functions (and other funtion-like objects like methods). Ken klm at digicool.com From klm at digicool.com Mon Apr 17 00:12:29 2000 From: klm at digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:12:29 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14584.40552.13918.707019@anthem.cnri.reston.va.us> Message-ID: On Sat, 15 Apr 2000 bwarsaw at cnri.reston.va.us wrote: > >>>>> "FL" == Fredrik Lundh writes: > > FL> so how about this? > > FL> -0.90000000000000002 on documenting this as "this > FL> can be used to store static data in a function" > > FL> +1 on the feature itself. > > Works for me! I think function attrs would be a lousy place to put > statics anyway. Huh? Why? (I don't have a problem with omitting mention of this use - seems like encouraging the use of globals, often a mistake.) Ken klm at digicool.com From klm at digicool.com Mon Apr 17 00:21:59 2000 From: klm at digicool.com (Ken Manheimer) Date: Sun, 16 Apr 2000 18:21:59 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14586.14672.949500.986951@beluga.mojam.com> Message-ID: On Sun, 16 Apr 2000, Skip Montanaro wrote: > > Ken> We haven't even seen a satisfactory approach to referring to the > Ken> function, itself, from within the function. Maybe it's not even > Ken> desirable to be able to do that - that's an interesting question. > > I hereby propose that within a function the special name __ refer to the > function. You could have > > def fact(n): > if n <= 1: return 1 > return __(n-1) * n > > You could also refer to function attributes through __ (presuming Barry's > proposed patch gets adopted): At first i thought you were kidding about using '__' because '_' was taken - on lots of terminals that i use, there is no intervening whitespace separating the two '_'s, so it's pretty hard to tell the difference between it and '_'! Now, i wouldn't mind using '_' if it's available, but guido was pretty darned against using it in my initial designs for packages - i wanted to use it to refer to the package containing the current module, like unix '..'. I gathered that a serious part of the objection was in using a character to denote non-operation syntax - python just doesn't do that. I also like the idea of using a function instead of a magic variable - most of python's magic variables are in packages, like os.environ. Ken klm at digicool.com From Vladimir.Marangozov at inrialpes.fr Mon Apr 17 04:30:48 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Mon, 17 Apr 2000 04:30:48 +0200 (CEST) Subject: [Python-Dev] Object customization In-Reply-To: from "Ken Manheimer" at Apr 16, 2000 05:45:00 PM Message-ID: <200004170230.EAA06492@python.inrialpes.fr> Ken Manheimer wrote: > > However it seems to me to make exceeding sense to have the initial > intrinsic settings specified as part of the object! Indeed. It makes perfect sense to have _intial_, intrinsic attributes. The problem is that currently they can't be specified for builtin objects. Skip asked for existing solutions, so I've made a quick tour on the problem, pointing him to 3). > And, in the case of functions, it seems to me to be outstandingly > consistent with python's treatment of objects. Oustandingly consistent isn't my opinion, but that's fine with both of us. If functions win this cause, the next imminent wish of all (Zope) users will be to attach (protection, or other) attributes to *all* objects: class Spam(...): """ Spam product""" zope_product_version = "2.51" zope_persistency = 0 zope_cache_limit = 64 * 1024 def eggs(self): ... def eats(self): ... How would you qualify the zope_* attributes so that only the zope_product version is accessible? (without __getattr__ tricks, since we're talking about `metadata'). Note that I don't expect an answer :-). The issue is crystal clear already. Be prepared to answer cool questions like this one to your customers. > > I'm mystified about why you would reject that so adamantly! Oops, I'll demystify you instantly, here, by summing up my posts: I'm not rejecting anything adamantly! To the countrary, I've suggested more. Greg said it quite well: Barry's proposal made me sending you signals about different issues you've probably not thought about before, yet we'd better sort them out before adopting his patch. As a member of this list, I feel obliged to share with you my concerns whenever I have them. My concerns in this case are: a) consistency of the class model. Apparently this signal was lost in outerspace, because my interpretation isn't yours. Okay, fine by me. This one will come back in Py3K. I'm already curious to see what will be on the table at that time. :-) b) confusion about the namespaces associated with a function object. You've been more receptive to this one. It's currently being discussed. c) generalize user-attributes for all builtin objects. You'd like to, but it looks expensive. This one is a compromise: it's related with sharing, copy on write builtin objects with modified user-attr, etc. In short, it doesn't seem to be on the table, because this signal hasn't been emitted before, nor it was really decrypted on python-dev. Classifying objects as light and heavy, and attributing them specific functionality only because of their "weight" looks very hairy. That's all for now. Discussing these issues in prime time here is goodness for Python and its users! Adopting the proposal in a hurry, because of the tight schedule for 1.6, isn't. It needs more maturation. Witness the length of the thread. it's-vacation-time-for-me-so-see-you-all-after-Easter'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From gstein at lyra.org Mon Apr 17 10:14:51 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 17 Apr 2000 01:14:51 -0700 (PDT) Subject: [Python-Dev] baby steps for free-threading Message-ID: A couple months ago, I exchanged a few emails with Guido about doing the free-threading work. In particular, for the 1.6 release. At that point (and now), I said that I wouldn't be starting on it until this summer, which means it would miss the 1.6 release. However, there are some items that could go into 1.6 *today* that would make it easier down the road to add free-threading to Python. I said that I'd post those in the hope that somebody might want to look at developing the necessary patches. It fell off my plate, so I'm getting back to that now... Python needs a number of basic things to support free threading. None of these should impact its performance or reliability. For the most part, they just provide a platform for the later addition. 1) Create a portable abstraction for using the platform's per-thread state mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. This mechanism will be used to store PyThreadState structure pointers, rather than _PyThreadState_Current. The latter variable must go away. Rationale: two threads will be operating simultaneously. An inherent conflict arises if _PyThreadState_Current is used. The TLS-like mechanism is used by the threads to look up "their" state. There will be a ripple effect on PyThreadState_Swap(); dunno offhand what. It may become empty. 2) Python needs a lightweight, short-duration, internally-used critical section type. The current lock type is used at the Python level and internally. For internal operations, it is rather heavyweight, has unnecessary semantics, and is slower than a plain crit section. Specifically, I'm looking at Win32's CRITICAL_SECTION and pthread's mutex type. A spinlock mechanism would be coolness. Rationale: Python needs critical sections to protect data from being trashed by multiple, simultaneous access. These crit sections need to be as fast as possible since they'll execute at all key points where data is manipulated. 3) Python needs an atomic increment/decrement (internal) operation. Rationale: these are used in INCREF/DECREF to correctly increment or decrement the refcount in the face of multiple threads trying to do this. Win32: InterlockedIncrement/Decrement. pthreads would use the lightweight crit section above (on every INC/DEC!!). Some other platforms may have specific capabilities to keep this fast. Note that platforms (outside of their threading libraries) may have functions to do this. 4) Python's configuration system needs to be updated to include a --with-free-thread option since this will not be enabled by default. Related changes to acconfig.h would be needed. Compiling in the above pieces based on the flag would be nice (although Python could switch to the crit section in some cases where it uses the heavy lock today) Rationale: duh 5) An analysis of Python's globals needs to be performed. Any global that can safely be made "const" should. If a global is write-once (such as classobject.c::getattrstr), then these are marginally okay (there is a race condition, with an acceptable outcome, but a mem leak occurs). Personally, I would prefer a general mechanism in Python for creating "constants" which can be tracked by the runtime and freed. I would also like to see a generalized "object pool" mechanism be built and used for tuples, ints, floats, frames, etc. Rationale: any globals which are mutable must be made thread-safe. The fewer non-const globals to examine, the fewer to analyze for race conditions and thread-safety requirements. Note: making some globals "const" has a ripple effect through Python. This is sometimes known as "const poisoning". Guido has stated an acceptance to adding "const" throughout the interpreter, but would prefer a complete (rather than ripple-based, partial) overhaul. I think that is all for now. Achieving these five steps within the 1.6 timeframe means that the free-threading patches will be *much* smaller. It also creates much more visibility and testing for these sections. Post 1.6, a patch set to add critical sections to lists and dicts would be built. In addition, a new analysis would be done to examine the globals that are available along with possible race conditions in other mutable types and structures. Not all structures will be made thread-safe; for example, frame objects are used by a single thread at a time (I'm sure somebody could find a way to have multiple threads use or look at them, but that person can take a leap, too :-) Depending upon Guido's desire, the various schedules, and how well the development goes, Python 1.6.1 could incorporate the free-threading option in the base distribution. Cheers, -g -- Greg Stein, http://www.lyra.org/ From ping at lfw.org Mon Apr 17 03:54:41 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 16 Apr 2000 18:54:41 -0700 (PDT) Subject: [Python-Dev] OT: XML In-Reply-To: <38F6344F.25D344B5@prescod.net> Message-ID: I'll begin with my conclusion, so you can get the high-order bit and skip the rest of the message if you like: XML is useful, but it's not a language. On Thu, 13 Apr 2000, Paul Prescod wrote: > > What definition of "language" are you using? And while you're at it, > what definition of "semantics" are you using? > > As I recall, a string is an ordered list of symbols and a language is an > unordered set of strings. I use the word "language" to mean an expression medium that carries semantics at a usefully high level. The computer-science definition you gave would include, say, the set of all pairs of integers (not a language to most people), but not include classical music [1] (indeed a language to many people). I admit that the boundary of "enough" semantics to qualify as a language is fuzzy, but some things do seem quite clearly to fall on either side of the line for me. For example, saying that XML has semantics is roughly equivalent to saying that ASCII has semantics. Well, sure, i suppose 65 has the "semantics" of the uppercase letter A, but that's not semantics at any level high enough to be useful. That is why most people would probably not normally call ASCII a "language". It has to express something to be a language. Granted, you can pick nits and say that XML has semantics as you did, but to me that essentially amounts to calling the syntax the semantics. > I know that Ka-Ping, despite going to a great university was in > Engineering, not computer science Cute. (I'm glad i was in engineering; at least we got a little design and software engineering background there, and i didn't see much of that in CS, unfortunately.) > Most XML people will happily admit that XML has no "semantics" but I > think that's bullshit too. The mapping from the string to the abstract > tree data model *is the semantic content* of the XML specification. Okay, fine. Technically, it has semantics; they're just very minimal semantics (so minimal that i felt quite comfortable in saying that it has none). But that doesn't change my point -- for "it has no semantics and therefore doesn't qualify as a language" just read "it has far too minimal semantics to qualify as a language". > It makes as little sense to reject XML out of hand because it is a > buzzword but is not innovative as it does for people to embrace it > mystically because it is Microsoft's flavor of the week. Before you get the wrong impression, i don't intend to reject XML out of hand, or to suggest that people do. It has its uses, just as ASCII has its uses. As a way of serializing trees, it's quite acceptable. I am, however, reacting to the sudden onslaught of hype that gives people the impression that XML can do anything. It's this sort of attitude that "oh, all of our representation problems will go away if we throw XML at it" that makes me cringe; that's only avoiding the issue. (I'm not saying that you are this clueless, Paul! -- just that some people seem to be.) As long as we recognize XML as exactly what it is, no more and no less -- a generic mechanism for serializing trees, with associated technologies for manipulating those trees -- there's no problem. > By the way, what data model or text encoding is NOT isomorphic to Lisp > S-expressions? Isn't Python code isomorphic to Lisp s-expessions? No! You can run Python code. The code itself, of course, can be interpreted as a stream of bytes, or arranged into a tree of LISP s-expressions. But if s-expressions that were *all* that constituted Python, Python would be pretty useless indeed! The entity we call Python includes real content: the rules for deriving the expected behaviour of a Python program from its parse tree, as variously specified in the reference manual, the library manual, and in our heads. LISP itself is a great deal more than just s-expressions. The language system specifies the behaviour you expect from a given piece of LISP code, and *that* is the part i call semantics. "real" semantics: Python LISP English MIDI minimal or no semantics: ASCII lists alphabet bytes The things in the top row are generally referred to as "languages"; the things in the bottom row are not. Although each thing in the top row is constructed from its corresponding thing in the bottom row, the difference between the two is what i am calling "semantics". If the top row says A and the bottom row says B, you can look at the B-type things that constitute the A and say, "if you see this particular B, it means foo". XML belongs in the bottom row, not the top row. Python: "If you see 'a = 3' in a function, it means you take the integer object 3 and bind it to the name 'a' in the local namespace." XML: "If you see the tag , it means... well... uh, nothing. Sorry. But you do get to decide that 'spam' and 'eggs' and 'boiled' mean whatever you want." That is why i am unhappy with XML being referred to as a "language": it is a misleading label that encourages people to make the mistake of imagining that XML has more semantic power than it really does. Why is this a fatal mistake? Because using XML will no more solve your information interchange problems than writing Japanese using the Roman alphabet will suddenly cause English speakers to be able to read Japanese novels. It may *help*, but there's a lot more to it than serialization. Thus: XML is useful, but it's not a language. And, since that reasonably summarizes my views on the issue, i'll say no more on this topic on the python-dev list -- any further blabbing i'll do in private e-mail. -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton [1] I anticipate an objection such as "but you can encode a piece of classical music as accurately as you like as a sequence of symbols." But the music itself doesn't fit the Chomskian definition of "language" until you add that symbolic mapping and the rules to arrange those symbols in sequence. At that point the thing you've just added *is* the language: it's the mapping from symbols to the semantics of e.g. "and at time 5.36 seconds the first violinist will play an A-flat at medium volume". From ping at lfw.org Mon Apr 17 03:06:40 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Sun, 16 Apr 2000 18:06:40 -0700 (PDT) Subject: [Python-Dev] Re: Comparison of cyclic objects In-Reply-To: <14582.22018.284695.428029@bitdiddle.cnri.reston.va.us> Message-ID: On Thu, 13 Apr 2000, Jeremy Hylton wrote: > Looks like the proposed changed to PyObject_Compare matches E for your > example. The printed representation doesn't match, but I'm not sure > that is as important. > > >>> tight = [1, None, "x"] > >>> tight[1] = tight > >>> tight > [1, [...], 'x'] > >>> loose = [1, [1, None, "x"], "x"] > >>> loose[1][1] = loose > >>> loose > [1, [1, [...], 'x'], 'x'] > >>> tight > [1, [...], 'x'] > >>> tight == loose > 1 Actually, i thought about this a little more and realized that the above *is* exactly the correct behaviour. In E, [] makes an immutable list. To make it mutable you then have to "flex" it. A mutable empty list is written "[] flex" (juxtaposition means a method call). In the above, the identities of the inner and outer lists of "loose" are different, and so should be printed separately. They are equal but not identical: >>> loose == loose[1] 1 >>> loose is loose[1] 0 >>> loose is loose[1][1] 1 >>> loose.append(4) >>> loose [1, [1, [...], 'x'], 'x', 4] -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton From paul at prescod.net Tue Apr 18 16:58:08 2000 From: paul at prescod.net (Paul Prescod) Date: Tue, 18 Apr 2000 09:58:08 -0500 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> Message-ID: <38FC7800.D88E14D7@prescod.net> "M.-A. Lemburg" wrote: > > ... > The current need for #pragmas is really very simple: to tell > the compiler which encoding to assume for the characters > in u"...strings..." (*not* "...8-bit strings..."). The idea > behind this is that programmers should be able to use other > encodings here than the default "unicode-escape" one. I'm totally confused about this. Are we going to allow UCS-2 sequences in the middle of Python programs that are otherwise ASCII? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself From paul at prescod.net Mon Apr 17 15:37:19 2000 From: paul at prescod.net (Paul Prescod) Date: Mon, 17 Apr 2000 08:37:19 -0500 Subject: [Python-Dev] Unicode and XML References: Message-ID: <38FB138F.1DAF3891@prescod.net> Let's presume that we agreed that XML is not a language because it doesn't have semantics. What does that have to do with the applicability of its Unicode-handling model? Here is a list of a hundred specifications which we can probably agree have "useful semantics" that are all based on XML and thus have the same Unicode model: http://www.xml.org/xmlorg_registry/index.shtml XML's unicode model seems mostly appropriate to me. I can only see one reason it might not apply: which comes first the #! line or the #encoding line? We could say that the #! line can only be used in encodings that are direct supersets of ASCII (e.g. UTF-8 but not UTF-16). That shouldnt' cause any problems with Unix because as far as I know, Unix can only read the first line if it is in an ASCII superset anyhow! Then the second line could describe the precise ASCII superset in use (8859-1, 8859-2, UTF-8, raw ASCII, etc.). -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself When George Bush entered office, a Washington Post-ABC News poll found that 62 percent of Americans "would be willing to give up a few of the freedoms we have" for the war effort. They have gotten their wish. - "This is your bill of rights...on drugs", Harpers, Dec. 1999 From jeremy at cnri.reston.va.us Mon Apr 17 17:41:26 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 17 Apr 2000 11:41:26 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: References: <200004162028.WAA06045@python.inrialpes.fr> Message-ID: <14587.12454.7542.709571@goon.cnri.reston.va.us> >>>>> "KLM" == Ken Manheimer writes: KLM> It may well make sense to have the system *implement* the KLM> rights somewhere else. (Distributed system, permissions caches KLM> in an object system, etc.) However it seems to me to make KLM> exceeding sense to have the initial intrinsic settings KLM> specified as part of the object! It's not clear to me that the person writing the code is or should be the person specifying the security policy. I believe the CORBA security model separates policy definition into three parts -- security attributes, required rights, and policy domains. The developer would only be responsible for the first part -- the security attributes, which describe methods in a general way so that a security administrators can develop an effective policy for it. I suppose that function attributes would be a sensible way to do this, but it might also be accomplished with a separate wrapper object. I'm still not thrilled with the idea of using regular attribute access to describe static properties on code. To access the properties, yes, to define and set them, probably not. Jeremy From jeremy at cnri.reston.va.us Mon Apr 17 17:49:11 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Mon, 17 Apr 2000 11:49:11 -0400 (EDT) Subject: [Python-Dev] Object customization In-Reply-To: <14586.14672.949500.986951@beluga.mojam.com> References: <200004162028.WAA06045@python.inrialpes.fr> <14586.14672.949500.986951@beluga.mojam.com> Message-ID: <14587.12919.488508.522746@goon.cnri.reston.va.us> >>>>> "SM" == Skip Montanaro writes: Ken> We haven't even seen a satisfactory approach to referring to Ken> the function, itself, from within the function. Maybe it's not Ken> even desirable to be able to do that - that's an interesting Ken> question. SM> I hereby propose that within a function the special name __ SM> refer to the function. I think the syntax is fairly obscure. I'm neurtral on the whole idea of having a special way to get at the function object from within the body of the code. Also, the proposal to handle security policies using attributes attached to the function seems wrong. The access control decision depends on the security policy defined for the object *and* the authorization of the caller. You can't decide based solely on some attribute of the function, nor can you assume that every call of a function object will be made with the same authorization (from the same protection domain). Jeremy From gstein at lyra.org Mon Apr 17 22:28:18 2000 From: gstein at lyra.org (Greg Stein) Date: Mon, 17 Apr 2000 13:28:18 -0700 (PDT) Subject: [Python-Dev] Object customization In-Reply-To: <14587.12919.488508.522746@goon.cnri.reston.va.us> Message-ID: On Mon, 17 Apr 2000, Jeremy Hylton wrote: > >>>>> "SM" == Skip Montanaro writes: > > Ken> We haven't even seen a satisfactory approach to referring to > Ken> the function, itself, from within the function. Maybe it's not > Ken> even desirable to be able to do that - that's an interesting > Ken> question. > > SM> I hereby propose that within a function the special name __ > SM> refer to the function. > > I think the syntax is fairly obscure. I'm neurtral on the whole idea > of having a special way to get at the function object from within the > body of the code. I agree. > Also, the proposal to handle security policies using attributes > attached to the function seems wrong. This isn't the only application of function attributes. Can't throw them out because one use seems wrong :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA at ActiveState.com Mon Apr 17 23:37:16 2000 From: DavidA at ActiveState.com (David Ascher) Date: Mon, 17 Apr 2000 14:37:16 -0700 Subject: [Python-Dev] Encoding of code in XML Message-ID: Lots of projects embed scripting & other code in XML, typically as CDATA elements. For example, XBL in Mozilla. As far as I know, no one ever bothers to define how one should _encode_ code in a CDATA segment, and it appears that at least in the Mozilla world the 'encoding' used is 'cut & paste', and it's the XBL author's responsibility to make sure that ]]> is nowhere in the JavaScript code. That seems suboptimal to me, and likely to lead to disasters down the line. The only clean solution I can think of is to define a standard encoding/decoding process for storing program code (which may very well contain occurences of ]]> in CDATA, which effectively hides that triplet from the parser. While I'm dreaming, it would be nice if all of the relevant language communities (JS, Python, Perl, etc.) could agree on what that encoding is. I'd love to hear of a recommendation on the topic by the XML folks, but I haven't been able to find any such document. Any thoughts? --david ascher From ping at lfw.org Mon Apr 17 23:47:40 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 16:47:40 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts Message-ID: One gripe that i hear a lot is that it's really difficult to cut and paste chunks of Python code when you're working with the interpreter because the ">>> " and "... " prompts keep getting in the way. Does anyone else often have or hear of this problem? Here is a suggested solution: for interactive mode only, the console maintains a flag "dropdots", initially false. After line = raw_input(">>> "): if line[:4] in [">>> ", "... "]: dropdots = 1 line = line[4:] else: dropdots = 0 interpret(line) After line = raw_input("... "): if dropdots and line[:4] == "... ": line = line[4:] interpret(line) The above solution depends on the fact that ">>> " and "... " are always invalid at the beginning of a bit of Python. So, if sys.ps1 is not ">>> " or sys.ps2 is not "... ", all dropdots behaviour is disabled. I realize it's not going to handle all cases (in particular mixing pasted text with typed-in text), but at least it makes it *possible* to paste code, and it's quite a simple rule. I suppose it all depends on whether or not you guys often experience this particular little irritation. Any thoughts on this? -- ?!ng From skip at mojam.com Mon Apr 17 23:47:30 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 17 Apr 2000 16:47:30 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: References: Message-ID: <14587.34418.633570.133957@beluga.mojam.com> Ping> One gripe that i hear a lot is that it's really difficult to cut Ping> and paste chunks of Python code when you're working with the Ping> interpreter because the ">>> " and "... " prompts keep getting in Ping> the way. Does anyone else often have or hear of this problem? First time I encountered this and complained about it Guido responded with import sys sys.ps1 = sys.ps2 = "" Crude, but effective... -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From ping at lfw.org Tue Apr 18 00:07:47 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 17:07:47 -0500 (CDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: <14587.34418.633570.133957@beluga.mojam.com> Message-ID: On Mon, 17 Apr 2000, Skip Montanaro wrote: > > First time I encountered this and complained about it Guido responded with > > import sys > sys.ps1 = sys.ps2 = "" > > Crude, but effective... Yeah, i tried that, but it's suboptimal (no feedback), not the default behaviour, and certainly non-obvious to the beginner. -- ?!ng From ping at lfw.org Tue Apr 18 00:34:54 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 17:34:54 -0500 (CDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: On Mon, 17 Apr 2000, David Ascher wrote: > > The only clean solution I can think of is to define a standard > encoding/decoding process for storing program code (which may very well > contain occurences of ]]> in CDATA, which effectively hides that triplet > from the parser. Hmm. I think the way everybody does it is to use the language to get around the need for ever saying "]]>". For example, in Python, if that was outside of a string, you could insert some spaces without changing the meaning, or if it was inside a string, you could add two strings together etc. You're right that this seems a bit ugly, but i think it could be even harder to get all the language communities to swallow something like "replace all occurrences of ]]> with some ugly escape string" -- since the above (hackish) method has the advantage that you can just run code directly copied from a piece of CDATA, and now you're asking them all to run the CDATA through some unescaping mechanism beforehand. Although i'm less optimistic about the success of such a standard, i'd certainly be up for it, if we had a good answer to propose. Here is one possible answer (to pick "@@" as a string very unlikely to occur much in most scripting languages): @@ --> @@@ ]]> --> @@> def escape(text): cdata = replace(text, "@@", "@@@") cdata = replace(cdata, "]]>", "@@>") return cdata def unescape(cdata): text = replace(cdata, "@@>", "]]>") text = replace(text, "@@@", "@@") return text The string "@@" occurs nowhere in the Python standard library. Another possible solution: <] --> <]> ]]> --> <][ etc. Generating more solutions is left as an exercise to the reader. :) -- ?!ng From DavidA at ActiveState.com Tue Apr 18 00:51:21 2000 From: DavidA at ActiveState.com (David Ascher) Date: Mon, 17 Apr 2000 15:51:21 -0700 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: > Hmm. I think the way everybody does it is to use the language > to get around the need for ever saying "]]>". For example, in > Python, if that was outside of a string, you could insert some > spaces without changing the meaning, or if it was inside a string, > you could add two strings together etc. > You're right that this seems a bit ugly, but i think it could be > even harder to get all the language communities to swallow > something like "replace all occurrences of ]]> with some ugly > escape string" -- since the above (hackish) method has the > advantage that you can just run code directly copied from a piece > of CDATA, and now you're asking them all to run the CDATA through > some unescaping mechanism beforehand. But it has the bad disadvantages that it's language-specific and modifies code rather than encode it. It has the even worse disadvantage that it requires you to parse the code to encode/decode it, something much more expensive than is really necessary! > Although i'm less optimistic about the success of such a standard, > i'd certainly be up for it, if we had a good answer to propose. I'm thinking that if we had a good answer, we can probably get it into the core libraries for a few good languages, and document it as 'the standard', if we could get key people on board. > Here is one possible answer Right, that's the sort of thing I was looking for. > def escape(text): > cdata = replace(text, "@@", "@@@") > cdata = replace(cdata, "]]>", "@@>") > return cdata > > def unescape(cdata): > text = replace(cdata, "@@>", "]]>") > text = replace(text, "@@@", "@@") > return text (the above fails on @@>, but that's the general idea I had in mind). --david I know!: "]]>" <==> "Microsoft engineers are puerile weenies!" From ping at lfw.org Tue Apr 18 01:01:58 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 18:01:58 -0500 (CDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Message-ID: On Mon, 17 Apr 2000, David Ascher wrote: > > (the above fails on @@>, but that's the general idea I had in mind). Oh, that's stupid of me. I used the wrong test harness. Okay, well the latter example works (i tested it): <] --> <]> ]]> --> <][ And this also works: @@ --> @@] ]]> --> @@> -- ?!ng From ping at lfw.org Tue Apr 18 01:08:53 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Mon, 17 Apr 2000 18:08:53 -0500 (CDT) Subject: [Python-Dev] Escaping CDATA Message-ID: Here's what i'm playing with, if you want to mess with it too: import string def replace(text, old, new, join=string.join, split=string.split): return join(split(text, old), new) la, ra = "@@", "@@]" lb, rb = "]]>", "@@>" la, ra = "<]", "<]>" lb, rb = "]]>", "<][" def escape(text): cdata = replace(text, la, ra) cdata = replace(cdata, lb, rb) return cdata def unescape(cdata): text = replace(cdata, rb, lb) text = replace(text, ra, la) return text chars = "" for ch in la + ra + lb + rb: if ch not in chars: chars = chars + ch if __name__ == "__main__": class Tester: def __init__(self): self.failed = [] self.count = 0 def test(self, s, find=string.find): cdata = escape(s) text = unescape(cdata) print "%s -e-> %s -u-> %s" % (s, cdata, text) if find(cdata, "]]>") >= 0: print "EXPOSURE!" self.failed.append(s) elif s != text: print "MISMATCH!" self.failed.append(s) self.count = self.count + 1 tester = Tester() test = tester.test for a in chars: for b in chars: for c in chars: for d in chars: for e in chars: for f in chars: for g in chars: for h in chars: test(a+b+c+d+e+f+g+h) print if tester.failed == []: print "All tests succeeded." else: print "Failed %d of %d tests." % (len(tester.failed), tester.count) for t in tester.failed: tester.test(t) From moshez at math.huji.ac.il Tue Apr 18 08:55:20 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 18 Apr 2000 08:55:20 +0200 (IST) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Mon, 17 Apr 2000, Ka-Ping Yee wrote: > One gripe that i hear a lot is that it's really difficult > to cut and paste chunks of Python code when you're working > with the interpreter because the ">>> " and "... " prompts > keep getting in the way. Does anyone else often have or > hear of this problem? > > Here is a suggested solution: for interactive mode only, > the console maintains a flag "dropdots", initially false. > > After line = raw_input(">>> "): > if line[:4] in [">>> ", "... "]: > dropdots = 1 > line = line[4:] > else: > dropdots = 0 > interpret(line) Python 1.5.2 (#1, Feb 21 2000, 14:52:33) [GCC 2.95.2 19991024 (release)] on sunos5 Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam >>> a=[] >>> a[ ... ... ... ] Traceback (innermost last): File "", line 1, in ? TypeError: sequence index must be integer >>> Sorry. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From ping at lfw.org Tue Apr 18 10:01:54 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 01:01:54 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Moshe Zadka wrote: > > >>> a=[] > >>> a[ > ... ... > ... ] > Traceback (innermost last): > File "", line 1, in ? > TypeError: sequence index must be integer > >>> > > Sorry. What was your point? -- ?!ng From effbot at telia.com Tue Apr 18 09:44:50 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 09:44:50 +0200 Subject: [Python-Dev] Pasting interpreter prompts References: Message-ID: <00e301bfa909$fbb053a0$34aab5d4@hagrid> Ka-Ping Yee wrote: > On Tue, 18 Apr 2000, Moshe Zadka wrote: > > > > >>> a=[] > > >>> a[ > > ... ... > > ... ] > > Traceback (innermost last): > > File "", line 1, in ? > > TypeError: sequence index must be integer > > >>> > > > > Sorry. > > What was your point? a[...] is valid syntax, and not the same thing as a[]. From moshez at math.huji.ac.il Tue Apr 18 11:29:04 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Tue, 18 Apr 2000 11:29:04 +0200 (IST) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Ka-Ping Yee wrote: > On Tue, 18 Apr 2000, Moshe Zadka wrote: > > > > >>> a=[] > > >>> a[ > > ... ... > > ... ] > > Traceback (innermost last): > > File "", line 1, in ? > > TypeError: sequence index must be integer > > >>> > > > > Sorry. > > What was your point? That "... " in the beginning of the line is not a syntax error. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Tue Apr 18 00:01:38 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 18 Apr 2000 00:01:38 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> Message-ID: <38FB89C2.26817F97@lemburg.com> Paul Prescod wrote: > > "M.-A. Lemburg" wrote: > > > > ... > > The current need for #pragmas is really very simple: to tell > > the compiler which encoding to assume for the characters > > in u"...strings..." (*not* "...8-bit strings..."). The idea > > behind this is that programmers should be able to use other > > encodings here than the default "unicode-escape" one. > > I'm totally confused about this. Are we going to allow UCS-2 sequences > in the middle of Python programs that are otherwise ASCII? The idea is to make life a little easier for programmers who's native script is not easily writable using ASCII, e.g. the whole Asian world. While originally only the encoding used within the quotes of u"..." was targetted (on the i18n sig), there has now been some discussion on this list about whether to move forward in a whole new direction: that of allowing whole Python scripts to be encoded in many different encodings. The compiler will then convert the scripts first to Unicode and then to 8-bit strings as needed. Using this technique which was introduced by Fredrik Lundh we could in fact have Python scripts which are encoded in UTF-16 (two bytes per character) or other more obscure encodings. The Python interpreter would only see Unicode and Latin-1. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Tue Apr 18 00:10:12 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 18 Apr 2000 00:10:12 +0200 Subject: [Python-Dev] Unicode and XML References: <38FB138F.1DAF3891@prescod.net> Message-ID: <38FB8BC4.96678FD6@lemburg.com> Paul Prescod wrote: > > Let's presume that we agreed that XML is not a language because it > doesn't have semantics. What does that have to do with the applicability > of its Unicode-handling model? > > Here is a list of a hundred specifications which we can probably agree > have "useful semantics" that are all based on XML and thus have the same > Unicode model: > > http://www.xml.org/xmlorg_registry/index.shtml > > XML's unicode model seems mostly appropriate to me. I can only see one > reason it might not apply: which comes first the #! line or the > #encoding line? We could say that the #! line can only be used in > encodings that are direct supersets of ASCII (e.g. UTF-8 but not > UTF-16). That shouldnt' cause any problems with Unix because as far as I > know, Unix can only read the first line if it is in an ASCII superset > anyhow! > > Then the second line could describe the precise ASCII superset in use > (8859-1, 8859-2, UTF-8, raw ASCII, etc.). Sounds like a good idea... how would such a line look like ? #!/usr/bin/env python # version: 1.6, encoding: iso-8859-1 ... Meaning: the module script needs Python version >=1.6 and uses iso-8859-1 as source file encoding. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Tue Apr 18 12:35:33 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 06:35:33 -0400 Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: Your message of "Tue, 18 Apr 2000 00:01:38 +0200." <38FB89C2.26817F97@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <200004181035.GAA12526@eric.cnri.reston.va.us> > The idea is to make life a little easier for programmers > who's native script is not easily writable using ASCII, e.g. > the whole Asian world. > > While originally only the encoding used within the quotes of > u"..." was targetted (on the i18n sig), there has now been > some discussion on this list about whether to move forward > in a whole new direction: that of allowing whole Python scripts > to be encoded in many different encodings. The compiler will > then convert the scripts first to Unicode and then to 8-bit > strings as needed. > > Using this technique which was introduced by Fredrik Lundh > we could in fact have Python scripts which are encoded in > UTF-16 (two bytes per character) or other more obscure > encodings. The Python interpreter would only see Unicode > and Latin-1. Wouldn't it make more sense to have the Python compiler *always* see UTF-8 and to use a simple preprocessor to deal with encodings? (Disclaimer: there are about 300 unread python-dev messages in my inbox still.) --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 18 12:56:55 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 12:56:55 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <200004181035.GAA12526@eric.cnri.reston.va.us> Message-ID: <001001bfa925$021cf700$34aab5d4@hagrid> Guido van Rossum wrote: > > Using this technique which was introduced by Fredrik Lundh > > we could in fact have Python scripts which are encoded in > > UTF-16 (two bytes per character) or other more obscure > > encodings. The Python interpreter would only see Unicode > > and Latin-1. > > Wouldn't it make more sense to have the Python compiler *always* see > UTF-8 and to use a simple preprocessor to deal with encodings? to some extent, this depends on what the "everybody" in CP4E means -- if you were to do user-testing on non-americans, I suspect "why cannot I use my own name as a variable name" might be as common as "why are SPAM and spam two different variables?". and if you're willing to address both issues in Py3K, it's much easier to use a simple internal representation, and handle en- codings on the way in and out. and PY_UNICODE* strings are easier to process than UTF-8 encoded char* strings... From ping at lfw.org Tue Apr 18 13:59:34 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 04:59:34 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Message-ID: On Tue, 18 Apr 2000, Moshe Zadka wrote: > > What was your point? > > That "... " in the beginning of the line is not a syntax error. So? You can put "... " at the beginning of a line in a string, too: >>> a = """ ... ... spam spam""" >>> a '\012... spam spam' That isn't a problem with the suggested mechanism, since dropdots only comes into effect when the *first* line entered at a >>> begins with ">>> " or "... ". -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From guido at python.org Tue Apr 18 15:01:47 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 09:01:47 -0400 Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: Your message of "Tue, 18 Apr 2000 04:59:34 PDT." References: Message-ID: <200004181301.JAA12697@eric.cnri.reston.va.us> Has anybody noticed that this is NOT a problem in IDLE? It will eventually go away, especially for the vast masses. So I don't think a solution is necessary -- and as was shown, the simple hacks don't really work. --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Apr 18 15:38:36 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 18 Apr 2000 09:38:36 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <38FB89C2.26817F97@lemburg.com> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <14588.25948.202273.502469@seahag.cnri.reston.va.us> M.-A. Lemburg writes: > The idea is to make life a little easier for programmers > who's native script is not easily writable using ASCII, e.g. > the whole Asian world. > > While originally only the encoding used within the quotes of > u"..." was targetted (on the i18n sig), there has now been > some discussion on this list about whether to move forward > in a whole new direction: that of allowing whole Python scripts I had thought this was still an issue for interpretation of string contents, and really only meaningful when converting the source representations of Unicode strings to the internal represtenation. I see no need to change the language definition in general. Unless we *really* want to impose those evil trigraph sequences from C! ;) -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From effbot at telia.com Tue Apr 18 16:27:53 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 16:27:53 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com><38FC7800.D88E14D7@prescod.net><38FB89C2.26817F97@lemburg.com> <14588.25948.202273.502469@seahag.cnri.reston.va.us> Message-ID: <004401bfa942$49bcaa20$34aab5d4@hagrid> Fred Drake wrote: > > While originally only the encoding used within the quotes of > > u"..." was targetted (on the i18n sig), there has now been > > some discussion on this list about whether to move forward > > in a whole new direction: that of allowing whole Python scripts > > I had thought this was still an issue for interpretation of string > contents, and really only meaningful when converting the source > representations of Unicode strings to the internal represtenation. why restrict the set of possible source encodings to ASCII compatible 8-bit encodings? (or are there really authoring systems out there that can use different encodings for different parts of the file?) > I see no need to change the language definition in general. Unless > we *really* want to impose those evil trigraph sequences from C! ;) sorry, but I don't see the connection. From fdrake at acm.org Tue Apr 18 16:35:37 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Tue, 18 Apr 2000 10:35:37 -0400 (EDT) Subject: [Python-Dev] #pragmas in Python source code In-Reply-To: <004401bfa942$49bcaa20$34aab5d4@hagrid> References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <14588.25948.202273.502469@seahag.cnri.reston.va.us> <004401bfa942$49bcaa20$34aab5d4@hagrid> Message-ID: <14588.29369.39945.849489@seahag.cnri.reston.va.us> Fredrik Lundh writes: > why restrict the set of possible source encodings to ASCII > compatible 8-bit encodings? I'm not suggesting that. I just don't see any call to change the language definition (such as allowing additional characters in NAME tokens). I don't mind whatsoever if the source is stored in UCS-2, and the tokenizer does need to understand that to create the right value for Unicode strings specified as u'...' literals. > (or are there really authoring systems out there that can use > different encodings for different parts of the file?) Not that I know of, and I doubt I'd want to see the result! -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From paul at prescod.net Tue Apr 18 16:42:32 2000 From: paul at prescod.net (Paul Prescod) Date: Tue, 18 Apr 2000 09:42:32 -0500 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> Message-ID: <38FC7458.EA90F085@prescod.net> My vote is all or nothing. Either the whole file is in UCS-2 (for example) or none of it is. I'm not sure if we really need to allow multiple file encodings in version 1.6 but we do need to allow that ultimately. If we agree to allow the whole file to be in another encoding then we should use the XML trick of having a known start-sequence for encodings other than UTF-8. It doesn't matter much whether it is syntactically a comment or a pragma. I am still in favor of compile time pragmas but they can probably wait for Python 1.7. > Using this technique which was introduced by Fredrik Lundh > we could in fact have Python scripts which are encoded in > UTF-16 (two bytes per character) or other more obscure > encodings. The Python interpreter would only see Unicode > and Latin-1. In what sense is Latin-1 not Unicode? Isn't it just the first 256 characters of Unicode or something like that? -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself [In retrospect] the story of a Cold War that was the scene of history's only nuclear arms race will be very different from the story of a Cold War that turned out to be only the first of many interlocking nuclear arms races in many parts of the world. The nuclear, question, in sum, hangs like a giant question mark over our waning century. - The Unfinished Twentieth Century by Jonathan Schell Harper's Magazine, January 2000 From effbot at telia.com Tue Apr 18 16:56:28 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 16:56:28 +0200 Subject: [Python-Dev] #pragmas in Python source code References: <38F591D3.32CD3B2A@lemburg.com> <38FC7800.D88E14D7@prescod.net> <38FB89C2.26817F97@lemburg.com> <38FC7458.EA90F085@prescod.net> Message-ID: <00b301bfa946$47ebbde0$34aab5d4@hagrid> Paul Prescod wrote: > My vote is all or nothing. Either the whole file is in UCS-2 (for > example) or none of it is. agreed. > In what sense is Latin-1 not Unicode? Isn't it just the first 256 > characters of Unicode or something like that? yes. ISO Latin-1 is unicode. what MAL really meant was that the interpreter would only deal with 8-bit (traditional) or 16-bit (unicode) strings. (in my string type proposals, the same applies to text strings manipulated by the user. if it's not unicode, it's a byte array, and methods expecting text don't work) From jeremy at cnri.reston.va.us Tue Apr 18 17:40:10 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 11:40:10 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce Message-ID: <14588.33242.799629.934118@goon.cnri.reston.va.us> As many of you have probably noticed, the moderators of comp.lang.python.announce do not deal with pending messages in a timely manner. There have been no new posts since Mar 27, and delays of several weeks were common before then. I wanted to ask a smallish group of potential readers of this group what we should do about the problem. I have tried to contact the moderators several times, but haven't heard a peep from them since late February, when the response was: "Sorry. Temporary problem. It's all fixed now." Three possible solutions come to mind: - Get more moderators. It appears that Marcus Fleck is the only active moderator. I have never received a response to private email sent to Vladimir Ulogov. I suggested to Marcus that we get more moderators, but he appeared to reject the idea. Perhaps some peer pressure from other unsatisfied readers would help. - De-couple the moderation of comp.lang.python.announce and of python-annouce at python.org. We could keep the gateway between the lists going, but have different moderators for the mailing list. This would be less convenient for people who prefer to read news, but would at least get announcement out in a timely fashion. - Give up on comp.lang.python.announce. Since moderation has been so spotty, most people have reverted to making all anouncements to comp.lang.python anyway. This option is unfortunate, because it makes it harder for people who don't have time to read comp.lang.python to keep up with announcements. Any other ideas? Suggestions on how to proceed? Jeremy From skip at mojam.com Tue Apr 18 18:04:29 2000 From: skip at mojam.com (Skip Montanaro) Date: Tue, 18 Apr 2000 11:04:29 -0500 (CDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14588.33242.799629.934118@goon.cnri.reston.va.us> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> Message-ID: <14588.34701.354301.740696@beluga.mojam.com> Jeremy> Any other ideas? Suggestions on how to proceed? How about decouple the python-announce mailing list from the newsgroup (at least partially), manage the mailing list from Mailman (it probably already is), then require moderator approval to post? With a handful of moderators (5-10), the individual effort should be fairly low. You can set up the default reject message to be strongly related to the aims of the list so that most of the time the moderator needs only to click the approve or drop buttons or make a slight edit to the response and click the reject button. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ From ping at lfw.org Tue Apr 18 19:03:35 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Tue, 18 Apr 2000 10:03:35 -0700 (PDT) Subject: [Python-Dev] Pasting interpreter prompts In-Reply-To: <200004181301.JAA12697@eric.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Guido van Rossum wrote: > Has anybody noticed that this is NOT a problem in IDLE? Certainly. This was one of the first problems i solved when writing my console script, too. (Speaking of that, i still can't find auto-completion in IDLE -- is it in there?) But: startup time, startup time. I'm not going to wait to start IDLE every time i want to ask Python a quick question. Hey, i just tried it and actually it doesn't work. I mean, yes, sys.ps2 is missing, but that still doesn't mean you can select a whole line and paste it. You have to aim very carefully to start dragging from the fourth column. > So I don't think a solution is necessary -- and as was shown, the > simple hacks don't really work. I don't think this was shown at all. -- ?!ng From guido at python.org Tue Apr 18 18:50:49 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 12:50:49 -0400 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: Your message of "Tue, 18 Apr 2000 11:04:29 CDT." <14588.34701.354301.740696@beluga.mojam.com> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> <14588.34701.354301.740696@beluga.mojam.com> Message-ID: <200004181650.MAA12894@eric.cnri.reston.va.us> I vote to get more moderators for the newsgroup. If Marcus and Gandalf don't moderate quickly the community can oust them. --Guido van Rossum (home page: http://www.python.org/~guido/) From effbot at telia.com Tue Apr 18 18:54:40 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 18 Apr 2000 18:54:40 +0200 Subject: [Python-Dev] comp.lang.python.announce References: <14588.33242.799629.934118@goon.cnri.reston.va.us> Message-ID: <007201bfa956$d1311680$34aab5d4@hagrid> Jeremy Hylton wrote: > As many of you have probably noticed, the moderators of > comp.lang.python.announce do not deal with pending messages in a > timely manner. There have been no new posts since Mar 27, and > delays of several weeks were common before then. and as noted on c.l.py, those posts didn't make it to many servers, since they use "00" instead of "2000". I haven't seen any announcements on any local news- server since last year. > Any other ideas? Suggestions on how to proceed. post to comp.lang.python, and tell people who don't want to read the newsgroup to watch the python.org news page and/or the daily python URL? <0.5 wink> From jeremy at cnri.reston.va.us Tue Apr 18 18:58:54 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 12:58:54 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004181650.MAA12894@eric.cnri.reston.va.us> References: <14588.33242.799629.934118@goon.cnri.reston.va.us> <14588.34701.354301.740696@beluga.mojam.com> <200004181650.MAA12894@eric.cnri.reston.va.us> Message-ID: <14588.37966.825565.8871@goon.cnri.reston.va.us> >>>>> "GvR" == Guido van Rossum writes: GvR> I vote to get more moderators for the newsgroup. That seems like the simplest mechanism. We just need volunteers (I am one), and we need to get Marcus to notify the Usenet powers-that-be of the new moderators. GvR> If Marcus and Gandalf don't moderate quickly the community can GvR> oust them. A painful process. Vladimir/Gandalf seems to have disappeared completely. (The original message in this thread bounced when I sent it to him.) The only way to add new moderators without Marcus's help is to have a new RFD/CFV process. It would be like creating the newsgroup all over again, except we'd have to convince the moderator of news.announce.newsgroups that the current moderator was unfit first. Jeremy From bwarsaw at cnri.reston.va.us Tue Apr 18 20:24:30 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Tue, 18 Apr 2000 14:24:30 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce References: <14588.33242.799629.934118@goon.cnri.reston.va.us> Message-ID: <14588.43102.864339.347892@anthem.cnri.reston.va.us> >>>>> "JH" == Jeremy Hylton writes: JH> - De-couple the moderation of comp.lang.python.announce and of JH> python-annouce at python.org. We could keep the gateway between JH> the lists going, but have different moderators for the mailing JH> list. This would be less convenient for people who prefer to JH> read news, but would at least get announcement out in a timely JH> fashion. We could do this -- and in fact, this was the effective set up until a couple of weeks ago. We'd set it up as a moderated group, so that /every/ message is held for approval. I'd have to investigate, but we probably don't want to hold messages that originate on Usenet. Of course, gating back to Usenet will still be held up for c.l.py.a's moderators. Still, I'd rather not do this. It would be best to get more moderators helping out with the c.l.py.a content. -Barry From guido at python.org Tue Apr 18 20:25:11 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Apr 2000 14:25:11 -0400 Subject: [Python-Dev] baby steps for free-threading In-Reply-To: Your message of "Mon, 17 Apr 2000 01:14:51 PDT." References: Message-ID: <200004181825.OAA13261@eric.cnri.reston.va.us> > A couple months ago, I exchanged a few emails with Guido about doing the > free-threading work. In particular, for the 1.6 release. At that point > (and now), I said that I wouldn't be starting on it until this summer, > which means it would miss the 1.6 release. However, there are some items > that could go into 1.6 *today* that would make it easier down the road to > add free-threading to Python. I said that I'd post those in the hope that > somebody might want to look at developing the necessary patches. It fell > off my plate, so I'm getting back to that now... > > Python needs a number of basic things to support free threading. None of > these should impact its performance or reliability. For the most part, > they just provide a platform for the later addition. I agree with the general design sketched below. > 1) Create a portable abstraction for using the platform's per-thread state > mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. There are at least 7 other platform specific thread implementations -- probably an 8th for the Mac. These all need to support this. (One solution would be to have a portable implementation that uses the thread-ID to index an array.) > This mechanism will be used to store PyThreadState structure pointers, > rather than _PyThreadState_Current. The latter variable must go away. > > Rationale: two threads will be operating simultaneously. An inherent > conflict arises if _PyThreadState_Current is used. The TLS-like > mechanism is used by the threads to look up "their" state. > > There will be a ripple effect on PyThreadState_Swap(); dunno offhand > what. It may become empty. Cool. > 2) Python needs a lightweight, short-duration, internally-used critical > section type. The current lock type is used at the Python level and > internally. For internal operations, it is rather heavyweight, has > unnecessary semantics, and is slower than a plain crit section. > > Specifically, I'm looking at Win32's CRITICAL_SECTION and pthread's > mutex type. A spinlock mechanism would be coolness. > > Rationale: Python needs critical sections to protect data from being > trashed by multiple, simultaneous access. These crit sections need to > be as fast as possible since they'll execute at all key points where > data is manipulated. Agreed. > 3) Python needs an atomic increment/decrement (internal) operation. > > Rationale: these are used in INCREF/DECREF to correctly increment or > decrement the refcount in the face of multiple threads trying to do > this. > > Win32: InterlockedIncrement/Decrement. pthreads would use the > lightweight crit section above (on every INC/DEC!!). Some other > platforms may have specific capabilities to keep this fast. Note that > platforms (outside of their threading libraries) may have functions to > do this. I'm worried here that since INCREF/DECREF are used so much this will slow down significantly, especially on platforms that don't have safe hardware instructions for this. So it should only be enabled when free threading is turned on. > 4) Python's configuration system needs to be updated to include a > --with-free-thread option since this will not be enabled by default. > Related changes to acconfig.h would be needed. Compiling in the above > pieces based on the flag would be nice (although Python could switch to > the crit section in some cases where it uses the heavy lock today) > > Rationale: duh Maybe there should be more fine-grained choices? As you say, some stuff could be used without this flag. But in any case this is trivial to add. > 5) An analysis of Python's globals needs to be performed. Any global that > can safely be made "const" should. If a global is write-once (such as > classobject.c::getattrstr), then these are marginally okay (there is a > race condition, with an acceptable outcome, but a mem leak occurs). > Personally, I would prefer a general mechanism in Python for creating > "constants" which can be tracked by the runtime and freed. They are almost all string constants, right? How about a macro Py_CONSTSTROBJ("value", variable)? > I would also like to see a generalized "object pool" mechanism be built > and used for tuples, ints, floats, frames, etc. Careful though -- generalizing this will slow it down. (Here I find myself almost wishing for C++ templates :-) > Rationale: any globals which are mutable must be made thread-safe. The > fewer non-const globals to examine, the fewer to analyze for race > conditions and thread-safety requirements. > > Note: making some globals "const" has a ripple effect through Python. > This is sometimes known as "const poisoning". Guido has stated an > acceptance to adding "const" throughout the interpreter, but would > prefer a complete (rather than ripple-based, partial) overhaul. Actually, it's okay to do this on an "as-neeed" basis. I'm also in favor of changing all the K&R code to ANSI, and getting rid of Py_PROTO and friends. Cleaner code! > I think that is all for now. Achieving these five steps within the 1.6 > timeframe means that the free-threading patches will be *much* smaller. It > also creates much more visibility and testing for these sections. Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough testing of some of these changes, the extensive nature of some of the changes, and my other obligations during those 6 weeks, I don't see how it can be done for 1.6. I would prefer to do an accellerated 1.7 or 1.6.1 release that incorporates all this. (It could be called 1.6.1 only if it'nearly identical to 1.6 for the Python user and not too different for the extension writer.) > Post 1.6, a patch set to add critical sections to lists and dicts would be > built. In addition, a new analysis would be done to examine the globals > that are available along with possible race conditions in other mutable > types and structures. Not all structures will be made thread-safe; for > example, frame objects are used by a single thread at a time (I'm sure > somebody could find a way to have multiple threads use or look at them, > but that person can take a leap, too :-) It is unacceptable to have thread-unsafe structures that can be accessed in a thread-unsafe way using pure Python code only. > Depending upon Guido's desire, the various schedules, and how well the > development goes, Python 1.6.1 could incorporate the free-threading option > in the base distribution. Indeed. --Guido van Rossum (home page: http://www.python.org/~guido/) From DavidA at ActiveState.com Tue Apr 18 23:03:32 2000 From: DavidA at ActiveState.com (David Ascher) Date: Tue, 18 Apr 2000 14:03:32 -0700 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004181650.MAA12894@eric.cnri.reston.va.us> Message-ID: > I vote to get more moderators for the newsgroup. If Marcus and > Gandalf don't moderate quickly the community can oust them. FWIW, I think they should step down now. They've not held up their end of the bargain, even though several folks have offered to help repeatedly throughout the 'problem period', which includes most of the life of c.l.p.a. As a compromise solution, and only if it's effective, we can add moderators. I'll volunteer, as long as someone gives me hints as to the mechanisms (it's been a while since I was doing usenet for real). --david PS: I think decoupling the mailing list from the newsgroup is a bad precedent and a political trouble zone. From gstein at lyra.org Tue Apr 18 23:16:44 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 14:16:44 -0700 (PDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14588.37966.825565.8871@goon.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Jeremy Hylton wrote: > >>>>> "GvR" == Guido van Rossum writes: >... > GvR> If Marcus and Gandalf don't moderate quickly the community can > GvR> oust them. > > A painful process. Vladimir/Gandalf seems to have disappeared > completely. (The original message in this thread bounced when I sent > it to him.) The only way to add new moderators without Marcus's help > is to have a new RFD/CFV process. It would be like creating the > newsgroup all over again, except we'd have to convince the moderator > of news.announce.newsgroups that the current moderator was unfit > first. Nevertheless, adding more moderators is the "proper" answer to the problem. Even if it is difficult to get more moderators into the system, there doesn't seem to be a better alternative. Altering the mailing list gateway will simply serve to create divergent announcement forums. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Tue Apr 18 23:30:01 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 18 Apr 2000 17:30:01 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: References: <14588.37966.825565.8871@goon.cnri.reston.va.us> Message-ID: <14588.54233.528093.55997@goon.cnri.reston.va.us> >>>>> "GS" == Greg Stein writes: GvR> If Marcus and Gandalf don't moderate quickly the community can GvR> oust them. JH> A painful process. Vladimir/Gandalf seems to have disappeared JH> completely. (The original message in this thread bounced when I JH> sent it to him.) The only way to add new moderators without JH> Marcus's help is to have a new RFD/CFV process. It would be like JH> creating the newsgroup all over again, except we'd have to JH> convince the moderator of news.announce.newsgroups that the JH> current moderator was unfit first. GS> Nevertheless, adding more moderators is the "proper" answer to GS> the problem. Even if it is difficult to get more moderators into GS> the system, there doesn't seem to be a better alternative. Proper is not necessarily the same as possible. We may fail in an attempt to add a moderator without cooperation from Marcus. GS> Altering the mailing list gateway will simply serve to create GS> divergent announcement forums. If only one of the forums works, this isn't a big problem. Jeremy From gstein at lyra.org Wed Apr 19 02:05:01 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 17:05:01 -0700 (PDT) Subject: [Python-Dev] switch to ANSI C (was: baby steps for free-threading) In-Reply-To: <14588.58817.746201.456992@anthem.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Barry A. Warsaw wrote: > >>>>> "GvR" == Guido van Rossum writes: > > GvR> Actually, it's okay to do this on an "as-neeed" basis. I'm > GvR> also in favor of changing all the K&R code to ANSI, and > GvR> getting rid of Py_PROTO and friends. Cleaner code! > > I agree, and here's yet another plea for moving to 4-space indents in > the C code. For justification, look at the extended call syntax hacks > in ceval.c. They essentially /use/ 4si because they have no choice! > > Let's clean it up in one fell swoop! Obviously not for 1.6. I > volunteer to do all three mutations. Why not for 1.6? These changes are pretty brain-dead ("does it compile?") and can easily be reviewed. If somebody out there happens to have the time to work up ANSI C patches, then why refuse them? Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Apr 19 08:46:56 2000 From: gstein at lyra.org (Greg Stein) Date: Tue, 18 Apr 2000 23:46:56 -0700 (PDT) Subject: [Python-Dev] baby steps for free-threading In-Reply-To: <200004181825.OAA13261@eric.cnri.reston.va.us> Message-ID: On Tue, 18 Apr 2000, Guido van Rossum wrote: >... > > 1) Create a portable abstraction for using the platform's per-thread state > > mechanism. On Win32, this is TLS. On pthreads, this is pthread_key_*. > > There are at least 7 other platform specific thread implementations -- > probably an 8th for the Mac. These all need to support this. (One > solution would be to have a portable implementation that uses the > thread-ID to index an array.) Yes. As the platforms "come up to speed", they can replace the fallback, portable implementation. "Users" of the TLS mechanism would allocate indices into the per-thread arrays. Another alternative is to only manage a mapping of thread-ID to ThreadState structures. The TLS code can then get the ThreadState and access the per-thread dict. Of course, the initial impetus is to solve the lookup of the ThreadState rather than a general TLS mechanism :-) Hmm. I'd say that we stick with defining a Python TLS API (in terms of the platform when possible). The fallback code would be the per-thread arrays design. "thread dict" would still exist, but is deprecated. >... > > 3) Python needs an atomic increment/decrement (internal) operation. > > > > Rationale: these are used in INCREF/DECREF to correctly increment or > > decrement the refcount in the face of multiple threads trying to do > > this. > > > > Win32: InterlockedIncrement/Decrement. pthreads would use the > > lightweight crit section above (on every INC/DEC!!). Some other > > platforms may have specific capabilities to keep this fast. Note that > > platforms (outside of their threading libraries) may have functions to > > do this. > > I'm worried here that since INCREF/DECREF are used so much this will > slow down significantly, especially on platforms that don't have safe > hardware instructions for this. This definitely slows Python down. If an object is known to be visible to only one thread, then you can avoid the atomic inc/dec. But that leads to madness :-) > So it should only be enabled when free threading is turned on. Absolutely. No question. Note to readers: the different definitions of INCREF/DECREF has an impact on mixing modules in the same way Py_TRACE_REFS does. > > 4) Python's configuration system needs to be updated to include a > > --with-free-thread option since this will not be enabled by default. > > Related changes to acconfig.h would be needed. Compiling in the above > > pieces based on the flag would be nice (although Python could switch to > > the crit section in some cases where it uses the heavy lock today) > > > > Rationale: duh > > Maybe there should be more fine-grained choices? As you say, some > stuff could be used without this flag. But in any case this is > trivial to add. Sure. For example, something like the Python TLS API could be keyed off --with-threads. Replacing _PyThreadState_Current with a TLS-based mechanism should be keyed on free threads. The "critical section" stuff could be keyed on threading -- they would be nice for Python to use internally for its standard threading operation. > > 5) An analysis of Python's globals needs to be performed. Any global that > > can safely be made "const" should. If a global is write-once (such as > > classobject.c::getattrstr), then these are marginally okay (there is a > > race condition, with an acceptable outcome, but a mem leak occurs). > > Personally, I would prefer a general mechanism in Python for creating > > "constants" which can be tracked by the runtime and freed. > > They are almost all string constants, right? Yes, I believe so. (Analysis needed) > How about a macro Py_CONSTSTROBJ("value", variable)? Sure. Note that the variable name can usually be constructed from the value. > > I would also like to see a generalized "object pool" mechanism be built > > and used for tuples, ints, floats, frames, etc. > > Careful though -- generalizing this will slow it down. (Here I find > myself almost wishing for C++ templates :-) :-) This is a desire, but not a requirement. Same with the write-once stuff. A general pool mechanism would reduce code duplication for lock management, and possibly clarify some operation. >... > > Note: making some globals "const" has a ripple effect through Python. > > This is sometimes known as "const poisoning". Guido has stated an > > acceptance to adding "const" throughout the interpreter, but would > > prefer a complete (rather than ripple-based, partial) overhaul. > > Actually, it's okay to do this on an "as-neeed" basis. I'm also in > favor of changing all the K&R code to ANSI, and getting rid of > Py_PROTO and friends. Cleaner code! Yay! :-) > > I think that is all for now. Achieving these five steps within the 1.6 > > timeframe means that the free-threading patches will be *much* smaller. It > > also creates much more visibility and testing for these sections. > > Alas. Given the timeframe for 1.6 (6 weeks!), the need for thorough > testing of some of these changes, the extensive nature of some of the [ aside: most of these changes are specified with the intent of reducing the impact on Python. most are additional behavior rather than changing behavior. ] > changes, and my other obligations during those 6 weeks, I don't see > how it can be done for 1.6. I would prefer to do an accellerated 1.7 > or 1.6.1 release that incorporates all this. (It could be called > 1.6.1 only if it'nearly identical to 1.6 for the Python user and not > too different for the extension writer.) Ah. That would be nice. It also provides some focus on what would need to occur for the extension writer: *) Python TLS API *) critical sections *) WITH_FREE_THREAD from the configure process The INCREF/DECREF and const-ness is hidden from the extension writer. Adding integrity locks to list/dict/etc is also hidden. > > Post 1.6, a patch set to add critical sections to lists and dicts would be > > built. In addition, a new analysis would be done to examine the globals > > that are available along with possible race conditions in other mutable > > types and structures. Not all structures will be made thread-safe; for > > example, frame objects are used by a single thread at a time (I'm sure > > somebody could find a way to have multiple threads use or look at them, > > but that person can take a leap, too :-) > > It is unacceptable to have thread-unsafe structures that can be > accessed in a thread-unsafe way using pure Python code only. Hmm. I guess that I can grab a frame object reference via a traceback object. The frame and traceback objects can then be shared between threads. Now the question arises: if the original thread resumes execution and starts modifying these objects (inside the interpreter since both are readonly to Python), then the passed-to thread might see invalid data. I'm not sure whether these objects have multi-field integrity constraints. Conversely: if they don't, then changing a single field will simply create a race condition with the passed-to thread. Oh, and assuming that we remove a value from the structure before DECREF'ing it. By your "pure Python" statement, I'm presuming that you aren't worried about PyTuple_SET_ITEM() and similar. However, do you really want to start locking up the frame and traceback objects? (and code objects and ...) Cheers, -g -- Greg Stein, http://www.lyra.org/ From sjoerd at oratrix.nl Wed Apr 19 11:51:53 2000 From: sjoerd at oratrix.nl (Sjoerd Mullender) Date: Wed, 19 Apr 2000 11:51:53 +0200 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Your message of Mon, 17 Apr 2000 14:37:16 -0700. References: Message-ID: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> What is wrong with encoding ]]> in the XML way by using an extra CDATA. In other words split up the CDATA section into two in the middle of the ]]> sequence: import string def encode_cdata(str): return ''), ']]]]>')) + \ ']]>' On Mon, Apr 17 2000 "David Ascher" wrote: > Lots of projects embed scripting & other code in XML, typically as CDATA > elements. For example, XBL in Mozilla. As far as I know, no one ever > bothers to define how one should _encode_ code in a CDATA segment, and it > appears that at least in the Mozilla world the 'encoding' used is 'cut & > paste', and it's the XBL author's responsibility to make sure that ]]> is > nowhere in the JavaScript code. > > That seems suboptimal to me, and likely to lead to disasters down the line. > > The only clean solution I can think of is to define a standard > encoding/decoding process for storing program code (which may very well > contain occurences of ]]> in CDATA, which effectively hides that triplet > from the parser. > > While I'm dreaming, it would be nice if all of the relevant language > communities (JS, Python, Perl, etc.) could agree on what that encoding is. > I'd love to hear of a recommendation on the topic by the XML folks, but I > haven't been able to find any such document. > > Any thoughts? > > --david ascher > > -- Sjoerd Mullender From SalzR at CertCo.com Wed Apr 19 16:57:16 2000 From: SalzR at CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 10:57:16 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading Message-ID: >This definitely slows Python down. If an object is known to be visible to >only one thread, then you can avoid the atomic inc/dec. But that leads to >madness :-) I would much rather see the language extended to indicate that a particular variable is "shared" across free-threaded interpreters. The hit of taking a mutex on every incref/decref is way bad. From gvwilson at nevex.com Wed Apr 19 17:03:17 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Wed, 19 Apr 2000 11:03:17 -0400 (EDT) Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading In-Reply-To: Message-ID: > Rich Salz wrote: > I would much rather see the language extended to indicate that a > particular variable is "shared" across free-threaded interpreters. The > hit of taking a mutex on every incref/decref is way bad. In my experience, allowing/requiring programmers to specify sharedness is a very rich source of hard-to-find bugs. (Not saying I have an answer to the performance hit of locking on incref/decref, just saying that the development cost of 'shared' is very high.) Greg From petrilli at amber.org Wed Apr 19 17:09:04 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Wed, 19 Apr 2000 11:09:04 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading In-Reply-To: ; from SalzR@CertCo.com on Wed, Apr 19, 2000 at 10:57:16AM -0400 References: Message-ID: <20000419110904.C6107@trump.amber.org> Salz, Rich [SalzR at CertCo.com] wrote: > >This definitely slows Python down. If an object is known to be visible to > >only one thread, then you can avoid the atomic inc/dec. But that leads to > >madness :-) > > I would much rather see the language extended to indicate that a particular > variable is "shared" across free-threaded interpreters. The hit of taking > a mutex on every incref/decref is way bad. I wonder if the energy is better spent in a truly highly-optimized implementation on the major platforms rather than trying to conditional this. This may mean writing x86 assembler, and a few others, but then again, once written, it shouldn't need much modification. I wonder if the conditional mutexing might be slower because of the check and lack of focus on bringing the core implementation up to speed. Chris -- | Christopher Petrilli | petrilli at amber.org From ping at lfw.org Wed Apr 19 17:40:08 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 08:40:08 -0700 (PDT) Subject: [Python-Dev] Encoding of code in XML In-Reply-To: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> Message-ID: On Wed, 19 Apr 2000, Sjoerd Mullender wrote: > What is wrong with encoding ]]> in the XML way by using an extra > CDATA. In other words split up the CDATA section into two in the > middle of the ]]> sequence: Brilliant. Now that i've seen it, this has to be the right answer. -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From SalzR at CertCo.com Wed Apr 19 18:04:48 2000 From: SalzR at CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 12:04:48 -0400 Subject: [Thread-SIG] Re: [Python-Dev] baby steps for free-threading Message-ID: >In my experience, allowing/requiring programmers to specify sharedness is >a very rich source of hard-to-find bugs. My experience is the opposite, since most objects aren't shared. :) You could probably do something like add an "owning thread" to each object structure, and on refcount throw an exception if not shared and the current thread isn't the owner. Not sure if space is a concern, but since the object is either shared or needs its own mutex, you make them a union: bool shared; union { python_thread_id_type id; python_mutex_type m; }; (Not saying I have an answer to the performance hit of locking on incref/decref, just saying that the development cost of 'shared' is very high.) Greg _______________________________________________ Thread-SIG maillist - Thread-SIG at python.org http://www.python.org/mailman/listinfo/thread-sig From ping at lfw.org Wed Apr 19 19:07:36 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 10:07:36 -0700 (PDT) Subject: [Python-Dev] Generic notifier module Message-ID: I think it would be very nice for the Python standard library to provide a messaging mechanism (you may know it as signals/slots, publish/subscribe, listen/notify, etc.). This could be very useful, especially for interactive applications where various components need to keep each other up to date about things. I know of several Tkinter programs where i'd like to use this mechanism. The proposed interface is: To add notification ability, mix in class notifier.Notifier. object.notify(message, callback) - Set up notification for message. object.denotify(message[, callback]) - Turn off notification. object.send(message, **args) - Call all callbacks registered on object for message, in reverse order of registration, passing along message and **args as arguments to each callback. If a callback returns notifier.BREAK, no further callbacks are called. (Alternatively, we could use signals/slots terminology: connect/disconnect/emit. I'm not aware of anything the signals/slots mechanism has that the above lacks.) Two kinds of messages are supported: 1. The 'message' passed to notify/denotify may be a class, and the 'message' passed to send may be a class or instance of a message class. In this case callbacks registered on that class and all its bases are called. 2. The 'message' passed to all three methods may be any other hashable object, in which case it is looked up by its hash, and callbacks registered on a hash-equal object are called. Thoughts and opinions are solicited (especially from those who have worked with messaging-type things before, and know the gotchas!). I haven't run into many tricky problems with these things in general, and i figure that the predictable order of callbacks should reduce complication. (I chose reverse ordering so that you always have the ability to add a callback that overrides existing ones.) A straw-man implementation follows. The callback registry is maintained in the notifier module so you don't have to worry about it messing up the attributes of your objects. -------- snip snip ---------------------------------- notifier.py -------- # If a callback returns BREAK, no more callbacks are called. BREAK = "break" # This number goes up every time a callback is added. serial = 0 # This dictionary maps callback functions to serial numbers. callbacks = {} def recipients(sender, message): """Return a list of (serial, callback) pairs for all the callbacks on this message and its base classes.""" key = (sender, message) if callbacks.has_key(key): list = map(lambda (k, v): (v, k), callbacks[key].items()) else: list = [] if hasattr(message, "__bases__"): for base in message.__bases__: list.extend(recipients(sender, base)) return list class Notifier: def send(self, message, **args): """Call any callbacks registered on this object for the given message. If message is a class or instance, callbacks registered on the class or any base class are called. Otherwise callbacks registered on a message of the same value (compared by hash) are called. The message and any extra keyword arguments are passed along to each callback.""" if hasattr(message, "__class__"): message = message.__class__ recip = recipients(self, message) recip.sort() recip.reverse() for serial, callback in recip: if callback(message, **args) == BREAK: return def notify(self, message, callback): """Register a callback on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if not callbacks.has_key(key): callbacks[key] = {} callbacks[key][callback] = serial = serial + 1 def denotify(self, message, callback=None): """Unregister a particular callback or all existing callbacks on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] elif callbacks[key].has_key(callback): del callbacks[key][callback] -------- snip snip ---------------------------------- notifier.py -------- -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From ping at lfw.org Wed Apr 19 19:25:12 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 10:25:12 -0700 (PDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Ka-Ping Yee wrote: > object.send(message, **args) - Call all callbacks registered on > object for message, in reverse order of registration, passing > along message and **args as arguments to each callback. One revision to the above: callbacks should get the sender of the message passed in as well as the message. The tweaked module follows. -------- snip snip ---------------------------------- notifier.py -------- # If a callback returns BREAK, no more callbacks are called. BREAK = "break" # This number goes up every time a callback is added. serial = 0 # This dictionary maps callback functions to serial numbers. callbacks = {} def recipients(sender, message): """Return a list of (serial, callback) pairs for all the callbacks on this message and its base classes.""" key = (sender, message) if callbacks.has_key(key): list = map(lambda (k, v): (v, k), callbacks[key].items()) else: list = [] if hasattr(message, "__bases__"): for base in message.__bases__: list.extend(recipients(sender, base)) return list class Notifier: """Mix in this class to provide notifier functionality on your objects. On a notifier object, use the 'notify' and 'denotify' methods to register or unregister callbacks on messages, and use the 'send' method to send a message from the object.""" def send(self, message, **args): """Call any callbacks registered on this object for the given message. If message is a class or instance, callbacks registered on the class or any base class are called. Otherwise callbacks registered on a message of the same value (compared by hash) are called. The message and any extra keyword arguments are passed along to each callback.""" if hasattr(message, "__class__"): message = message.__class__ recip = recipients(self, message) recip.sort() recip.reverse() for serial, callback in recip: if callback(self, message, **args) == BREAK: return def notify(self, message, callback): """Register a callback on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if not callbacks.has_key(key): callbacks[key] = {} callbacks[key][callback] = serial def denotify(self, message, callback=None): """Unregister a particular callback or all existing callbacks on this object for a given message. The message should be a class (not an instance) or a hashable object.""" key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] elif callbacks[key].has_key(callback): del callbacks[key][callback] -------- snip snip ---------------------------------- notifier.py -------- -- ?!ng "Je n'aime pas les stupides gar?ons, m?me quand ils sont intelligents." -- Roople Unia From effbot at telia.com Wed Apr 19 19:15:28 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 19 Apr 2000 19:15:28 +0200 Subject: [Python-Dev] Generic notifier module References: Message-ID: <001901bfaa22$e202af60$34aab5d4@hagrid> Ka-Ping Yee wrote: > I think it would be very nice for the Python standard library to > provide a messaging mechanism (you may know it as signals/slots, > publish/subscribe, listen/notify, etc.). your notifier looks like a supercharged version of the "Observer" pattern [1]. here's a minimalistic observer mixin from "(the eff- bot guide to) Python Patterns and Idioms" [2]. class Observable: __observers = None def addobserver(self, observer): if not self.__observers: self.__observers = [] self.__observers.append(observer) def removeobserver(self, observer): self.__observers.remove(observer) def notify(self, event): for o in self.__observers or (): o(event) notes: -- in the GOF pattern, to "notify" is to tell observers that something happened, not to register an observer. -- GOF uses "attach" and "detach" to install and remove observers; the pattern book version uses slightly more descriptive names. -- the user is expected to use bound methods and event instances (or classes) to associate data with the notifier and events. earlier implementations were much more elaborate, but we found that the standard mechanisms was more than sufficient in real life... 1) "Design Patterns", by Gamma et al. 2) http://www.pythonware.com/people/fredrik/patternbook.htm From DavidA at ActiveState.com Wed Apr 19 19:43:26 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 19 Apr 2000 10:43:26 -0700 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: <20000419095154.9FDDB301CF9@bireme.oratrix.nl> Message-ID: > What is wrong with encoding ]]> in the XML way by using an extra > CDATA. In other words split up the CDATA section into two in the > middle of the ]]> sequence: > > import string > def encode_cdata(str): > return ' string.join(string.split(str, ']]>'), ']]]]>')) + \ > ']]>' If I understand what you're proposing, you're splitting a single bit of Python code into N XML elements. This requires smarts not on the decode function (where they should be, IMO), but on the XML parsing stage (several leaves of the tree have to be merged). Seems like the wrong direction to push things. Also, I can imagine cases where the app puts several scripts in consecutive CDATA elements (assuming that's legal XML), and where a merge which inserted extra ]]> would be very surprising. Maybe I'm misunderstanding you, though.... --david ascher From effbot at telia.com Wed Apr 19 19:50:17 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 19 Apr 2000 19:50:17 +0200 Subject: [Python-Dev] Encoding of code in XML References: Message-ID: <000701bfaa27$c3546e00$34aab5d4@hagrid> David Ascher wrote: > > What is wrong with encoding ]]> in the XML way by using an extra > > CDATA. In other words split up the CDATA section into two in the > > middle of the ]]> sequence: > > > > import string > > def encode_cdata(str): > > return ' > string.join(string.split(str, ']]>'), ']]]]>')) + \ > > ']]>' > > If I understand what you're proposing, you're splitting a single bit of > Python code into N XML elements. nope. CDATA sections are used to encode data, they're not elements: XML 1.0, section 2.7: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as mark- up. you can put each data character in its own CDATA section, if you like. if the parser cannot handle that, it's broken. (if you've used xmllib, think handle_data, not start_cdata). From sjoerd at oratrix.nl Wed Apr 19 21:24:31 2000 From: sjoerd at oratrix.nl (Sjoerd Mullender) Date: Wed, 19 Apr 2000 21:24:31 +0200 Subject: [Python-Dev] Encoding of code in XML In-Reply-To: Your message of Wed, 19 Apr 2000 10:43:26 -0700. References: Message-ID: <20000419192432.F2A19301CF9@bireme.oratrix.nl> On Wed, Apr 19 2000 "David Ascher" wrote: > > What is wrong with encoding ]]> in the XML way by using an extra > > CDATA. In other words split up the CDATA section into two in the > > middle of the ]]> sequence: > > > > import string > > def encode_cdata(str): > > return ' > string.join(string.split(str, ']]>'), ']]]]>')) + \ > > ']]>' > > If I understand what you're proposing, you're splitting a single bit of > Python code into N XML elements. This requires smarts not on the decode > function (where they should be, IMO), but on the XML parsing stage (several > leaves of the tree have to be merged). Seems like the wrong direction to > push things. Also, I can imagine cases where the app puts several scripts > in consecutive CDATA elements (assuming that's legal XML), and where a merge > which inserted extra ]]> would be very surprising. > > Maybe I'm misunderstanding you, though.... I think you're not misunderstanding me, but maybe you are misunderstanding XML. :-) [Of course, it is also conceivable that I misunderstand XML. :-] First of all, I don't propose to split up the single bit of Python into multiple XML elements. CDATA sections are not XML elements. The XML standard says this: CDATA sections may occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. [http://www.w3.org/TR/REC-xml#sec-cdata-sect] In other words, according to the XML standard wherever you are allowed to put character data (such as in this case Python code), you are allowed to use CDATA sections. Their purpose is to escape blocks of text containing characters that would otherwise be recognized as markup. CDATA sections are not part of the markup, so the XML parser is allowed to coallese the multiple CDATA sections and other character data into one string before it gives it to the application. So, yes, this requires smarts on the XML parsing stage, but I think those smarts need to be there anyway. If an application put several pieces of Python code in one character data section, it is basically on its own. I don't think XML guarantees that those pieces aren't merged into one string by the XML parser before it gets to the application. As I said already, this is my interpretation of XML, and I could be misinterpreting things. -- Sjoerd Mullender From ping at lfw.org Wed Apr 19 22:14:39 2000 From: ping at lfw.org (Ka-Ping Yee) Date: Wed, 19 Apr 2000 15:14:39 -0500 (CDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <001901bfaa22$e202af60$34aab5d4@hagrid> Message-ID: On Wed, 19 Apr 2000, Fredrik Lundh wrote: > > your notifier looks like a supercharged version of the "Observer" > pattern [1]. here's a minimalistic observer mixin from "(the eff- > bot guide to) Python Patterns and Idioms" [2]. Oh, yeah, "observer". That was the other name for this mechanism that i forgot. > class Observable: I'm not picky about names... anything is fine. > def notify(self, event): > for o in self.__observers or (): > o(event) *Some* sort of dispatch would be nice, i think, rather than having to check the kind of event you're getting in every callback. Here are the three sources of "more stuff" in Notifier as opposed to Observer: 1. Dispatch. You register callbacks for particular messages rather than on the whole object. 2. Ordering. The callbacks are always called in reverse order of registration, which makes BREAK meaningful. 3. Inheritance. You can use a class hierarchy of messages. I think #1 is fairly essential, and i know i've come across situations where #3 is useful. The need for #2 is only a conjecture on my part. Does anyone care about the order in which callbacks get called? If not (and no one needs to use BREAK), we can throw out #2 and make Notifier simpler: callbacks = {} def send(key, message, **args): if callbacks.has_key(key): for callback in callbacks[key]: callback(key[0], message, **args) if hasattr(key[1], "__bases__"): for base in key[1].__bases__: send((key[0], base), message, **args) class Notifier: def send(self, message, **args): if hasattr(message, "__class__"): send((self, message.__class__), message, **args) else: send((self, message), message, **args) def notify(self, message, callback): key = (self, message) if not callbacks.has_key(key): callbacks[key] = [] callbacks[key].append(callback) def denotify(self, message, callback=None): key = (self, message) if callbacks.has_key(key): if callback is None: del callbacks[key] else: callbacks[key].remove(callback) -- ?!ng From paul at prescod.net Wed Apr 19 22:19:31 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 19 Apr 2000 15:19:31 -0500 Subject: [Python-Dev] Encoding of code in XML References: Message-ID: <38FE14D3.AC05DAE0@prescod.net> David Ascher wrote: > > ... > > If I understand what you're proposing, you're splitting a single bit of > Python code into N XML elements. No, a CDATA section is not an element. But the question of whether boundary placements are meaningful is sepearate. This comes back to the "semantics question". Most tools will not differentiate between two adjacent CDATA sections and one. The XML specification does not say whether they should or should not but in practice tools that consume XML and then throw it away typically do NOT care about CDATA section boundaries and tools that edit XML *do* care. This "break it into to two sections" solution is the typical one but it is god-awful ugly, even in XML editors. Many stream-based XML tools (e.g. mostSAX parsers, xmllib) *will* report two separate CDATA sections as two different character events. Application code must be able to handle this situation. It doesn't only occur with CDATA sections. XML parsers could equally break up long text chunks based on 1024-byte block boundaries or line breaks or whatever they feel like. In my opinion these variances in behvior stem from the myth that XML has no semantics but that's another off-topic topic. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From gstein at lyra.org Wed Apr 19 22:27:11 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 13:27:11 -0700 (PDT) Subject: [Python-Dev] marking shared-ness (was: baby steps for free-threading) In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Salz, Rich wrote: > >In my experience, allowing/requiring programmers to specify sharedness is > >a very rich source of hard-to-find bugs. > > My experience is the opposite, since most objects aren't shared. :) > You could probably do something like add an "owning thread" to each object > structure, and on refcount throw an exception if not shared and the current > thread isn't the owner. Not sure if space is a concern, but since the object > is either shared or needs its own mutex, you make them a union: > bool shared; > union { > python_thread_id_type id; > python_mutex_type m; > }; > > > (Not saying I have an answer to > the performance hit of locking on incref/decref, just saying that the > development cost of 'shared' is very high.) Regardless of complexity or lack thereof, any kind of "specified sharedness" cannot be implemented. Consider the case where a programmer forgets to note the sharedness. He passes the object to another thread. At certain points: BAM! The interpreter dumps core. Guido has specifically stated that *nothing* should ever allow that (in terms of pure Python code; bad C extension coding is all right). Sharedness has merit, but it cannot be used :-( Cheers, -g -- Greg Stein, http://www.lyra.org/ From SalzR at CertCo.com Wed Apr 19 22:27:10 2000 From: SalzR at CertCo.com (Salz, Rich) Date: Wed, 19 Apr 2000 16:27:10 -0400 Subject: [Python-Dev] RE: [Thread-SIG] marking shared-ness (was: baby steps for free-th reading) Message-ID: >Consider the case where a programmer forgets to note the sharedness. He >passes the object to another thread. At certain points: BAM! The >interpreter dumps core. No. Using the "owning thread" idea prevents coredumps and allows the interpreter to throw an exception. Perhaps my note wasn't clear enough? /r$ From paul at prescod.net Wed Apr 19 22:25:42 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 19 Apr 2000 15:25:42 -0500 Subject: [Python-Dev] Encoding of code in XML References: <20000419192432.F2A19301CF9@bireme.oratrix.nl> Message-ID: <38FE1646.B29CAA8A@prescod.net> Sjoerd Mullender wrote: > > ... > > CDATA sections are not part of the markup, so the XML parser > is allowed to coallese the multiple CDATA sections and other character > data into one string before it gives it to the application. Allowed but not required. Most SAX parsers will not. Some DOM parsers will and some won't. :( > So, yes, this requires smarts on the XML parsing stage, but I think > those smarts need to be there anyway. I don't follow this part. Typically those "smarts" are not there. At the end of one CDATA section you get an event and at the beginning of the next you get a different event. It's the application's job to glue them together. :( Fixing this is one of the goals of EventDOM. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself Pop stars come and pop stars go, but amid all this change there is one eternal truth: Whenever Bob Dylan writes a song about a guy, the guy is guilty as sin. - http://www.nj.com/page1/ledger/e2efc7.html From tismer at tismer.com Wed Apr 19 22:38:31 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 19 Apr 2000 22:38:31 +0200 Subject: [Python-Dev] marking shared-ness (was: baby steps for free-threading) References: Message-ID: <38FE1947.70FC6AEE@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Salz, Rich wrote: > > >In my experience, allowing/requiring programmers to specify sharedness is > > >a very rich source of hard-to-find bugs. > > > > My experience is the opposite, since most objects aren't shared. :) > > You could probably do something like add an "owning thread" to each object > > structure, and on refcount throw an exception if not shared and the current > > thread isn't the owner. Not sure if space is a concern, but since the object > > is either shared or needs its own mutex, you make them a union: > > bool shared; > > union { > > python_thread_id_type id; > > python_mutex_type m; > > }; > > > > > > (Not saying I have an answer to > > the performance hit of locking on incref/decref, just saying that the > > development cost of 'shared' is very high.) > > Regardless of complexity or lack thereof, any kind of "specified > sharedness" cannot be implemented. > > Consider the case where a programmer forgets to note the sharedness. He > passes the object to another thread. At certain points: BAM! The > interpreter dumps core. > > Guido has specifically stated that *nothing* should ever allow that (in > terms of pure Python code; bad C extension coding is all right). > > Sharedness has merit, but it cannot be used :-( Too bad that we don't have incref/decref as methods. The possible mutables which have to be protected could in fact carry a thread handle of their current "owner" (probably the one who creted them), and incref would check whether the owner is still same. If it is not same, then the owner field would be wiped, and that turns the (higher cost) shared refcounting on, and all necessary protection as well. (Maybe some extra care is needed to ensure that this info isn't changed while we are testing it). Without inc/dec-methods, something similar could be done, but every inc/decref will be a bit more expensive since we must figure out wether we have a mutable or not. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein at lyra.org Wed Apr 19 22:52:12 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 13:52:12 -0700 (PDT) Subject: [Python-Dev] RE: marking shared-ness In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Salz, Rich wrote: > >Consider the case where a programmer forgets to note the sharedness. He > >passes the object to another thread. At certain points: BAM! The > >interpreter dumps core. > > No. Using the "owning thread" idea prevents coredumps and allows the > interpreter to throw an exception. Perhaps my note wasn't clear > enough? INCREF and DECREF cannot throw exceptions. Are there other points where you could safely detect erroneous sharing of objects? (in a guaranteed fashion) For example: what are all the ways that objects can be transported between threads. Can you erect tests at each of those points? I believe "no" since there are too many ways (func arg or an item in a shared ob). Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Wed Apr 19 23:15:39 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:15:39 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FE1947.70FC6AEE@tismer.com> Message-ID: On Wed, 19 Apr 2000, Christian Tismer wrote: >... > Too bad that we don't have incref/decref as methods. This would probably impose more overhead than some of the atomic inc/dec mechanisms. > The possible mutables which have to be protected could Non-mutable objects must be protected, too. An integer can be shared just as easily as a list. > in fact carry a thread handle of their current "owner" > (probably the one who creted them), and incref would > check whether the owner is still same. > If it is not same, then the owner field would be wiped, > and that turns the (higher cost) shared refcounting on, > and all necessary protection as well. > (Maybe some extra care is needed to ensure that this info > isn't changed while we are testing it). Ah. Neat. "Automatic marking of shared-ness" Could work. That initial test for the thread id could be expensive, though. What is the overhead of getting the current thread id? [ ... thinking about the code ... ] Nope. Won't work at all. There is a race condition when an object "becomes shared". DECREF: if ( object is not shared ) /* whoops! it just became shared! */ --(op)->ob_refcnt; else atomic_decrement(op) To prevent the race, you'd need an interlock which is more expensive than an atomic decrement. Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer at tismer.com Wed Apr 19 23:25:45 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 19 Apr 2000 23:25:45 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <38FE2459.E0300B5@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Christian Tismer wrote: > >... > > Too bad that we don't have incref/decref as methods. > > This would probably impose more overhead than some of the atomic inc/dec > mechanisms. > > > The possible mutables which have to be protected could > > Non-mutable objects must be protected, too. An integer can be shared just > as easily as a list. Uhh, right. Everything is mutable, since me mutate the refcount :-( ... > Ah. Neat. "Automatic marking of shared-ness" > > Could work. That initial test for the thread id could be expensive, > though. What is the overhead of getting the current thread id? Zero if we cache it in the thread state. > [ ... thinking about the code ... ] > > Nope. Won't work at all. @#$%?!!-| yes-you-are-right - gnnn! > There is a race condition when an object "becomes shared". > > DECREF: > if ( object is not shared ) > /* whoops! it just became shared! */ > --(op)->ob_refcnt; > else > atomic_decrement(op) > > To prevent the race, you'd need an interlock which is more expensive than > an atomic decrement. Really, sad but true. Are atomic decrements really so cheap, meaning "are they mapped to the atomic dec opcode"? Then this is all ok IMHO. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein at lyra.org Wed Apr 19 23:34:19 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:34:19 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FE2459.E0300B5@tismer.com> Message-ID: On Wed, 19 Apr 2000, Christian Tismer wrote: > Greg Stein wrote: >... > > Ah. Neat. "Automatic marking of shared-ness" > > > > Could work. That initial test for the thread id could be expensive, > > though. What is the overhead of getting the current thread id? > > Zero if we cache it in the thread state. You don't have the thread state at incref/decref time. And don't say "_PyThreadState_Current" or I'll fly to Germany and personally kick your ass :-) >... > > There is a race condition when an object "becomes shared". > > > > DECREF: > > if ( object is not shared ) > > /* whoops! it just became shared! */ > > --(op)->ob_refcnt; > > else > > atomic_decrement(op) > > > > To prevent the race, you'd need an interlock which is more expensive than > > an atomic decrement. > > Really, sad but true. > > Are atomic decrements really so cheap, meaning "are they mapped > to the atomic dec opcode"? On some platforms and architectures, they *might* be. On Win32, we call InterlockedIncrement(). No idea what that does, but I don't think that it is a macro or compiler-detected thingy to insert opcodes. I believe there is a function call involved. pthreads do not define atomic inc/dec, so we must use a critical section + normal inc/dec operators. Linux has a kernel macro for atomic inc/dec, but it is only valid if __SMP__ is defined in your compilation context. etc. Platforms that do have an API (as Donn stated: BeOS has one; Win32 has one), they will be cheaper than an interlock. Therefore, we want to take advantage of an "atomic inc/dec" semantic when possible (and fallback to slower stuff when not). Cheers, -g -- Greg Stein, http://www.lyra.org/ From fleck at triton.informatik.uni-bonn.de Wed Apr 19 23:32:42 2000 From: fleck at triton.informatik.uni-bonn.de (Markus Fleck) Date: Wed, 19 Apr 2000 23:32:42 +0200 (MET DST) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: from "Greg Stein" at Apr 18, 2000 02:16:44 PM Message-ID: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Greg Stein: > Nevertheless, adding more moderators is the "proper" answer to the > problem. Even if it is difficult to get more moderators into the system, > there doesn't seem to be a better alternative. I agree with this. What would be helpful would be (i) a web interface for multiple-moderator moderation (which I believe Mailman already provides), and (ii) some rather simple changes to the list-to-newsgroup gateway to do some header manipulations before posting each approved message to c.l.py.a. I've been more or less "off the Net" for almost two months now, while getting started at my new job, and I will try to do some (summary-style) retro-moderation of the ca. 50 c.l.py.a submissions that I missed during this time. Automating the submission process and getting additional moderators would make c.l.py.a less dependent on me and avoid such moderation lags in the future. (And yes, of course, I'm sorry for the lag. But now I'm back, and I'm willing to help change the process so that such lags won't happen again in the future. Getting additional moderators would likely help with this.) Yours, Markus. From gstein at lyra.org Wed Apr 19 23:42:34 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 14:42:34 -0700 (PDT) Subject: [Python-Dev] optimize atomic inc/dec? (was: baby steps for free-threading) In-Reply-To: <20000419110904.C6107@trump.amber.org> Message-ID: On Wed, 19 Apr 2000, Christopher Petrilli wrote: > Salz, Rich [SalzR at CertCo.com] wrote: > > >This definitely slows Python down. If an object is known to be visible to > > >only one thread, then you can avoid the atomic inc/dec. But that leads to > > >madness :-) > > > > I would much rather see the language extended to indicate that a particular > > variable is "shared" across free-threaded interpreters. The hit of taking > > a mutex on every incref/decref is way bad. > > I wonder if the energy is better spent in a truly highly-optimized > implementation on the major platforms rather than trying to > conditional this. This may mean writing x86 assembler, and a few > others, Bill Tutt had a good point -- we can get a bunch of assembler fragments from the Linux kernel for atomic inc/dec. On specific compiler and processor architecture combinations, we could drop to assembly to provide an atomic dec/inc. For example, when we see we're using GCC on an x86 processor (whether FreeBSD or Linux), we can define atomic_inc() as an __asm fragment. > but then again, once written, it shouldn't need much > modification. I wonder if the conditional mutexing might be slower > because of the check and lack of focus on bringing the core > implementation up to speed. Won't work anyhow. See previous email. Cheers, -g -- Greg Stein, http://www.lyra.org/ From jeremy at cnri.reston.va.us Wed Apr 19 23:32:17 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Wed, 19 Apr 2000 17:32:17 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004192132.XAA14501@hera.informatik.uni-bonn.de> References: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: <14590.9697.366632.708503@goon.cnri.reston.va.us> Glad to hear from you, Marcus! I'm willing to help with both (a) and (b). I'll talk to Barry about the Mailman issues tomorrow. Jeremy From DavidA at ActiveState.com Wed Apr 19 23:46:49 2000 From: DavidA at ActiveState.com (David Ascher) Date: Wed, 19 Apr 2000 14:46:49 -0700 Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: I can help moderate as well. --david ascher From billtut at microsoft.com Wed Apr 19 23:28:12 2000 From: billtut at microsoft.com (Bill Tutt) Date: Wed, 19 Apr 2000 14:28:12 -0700 Subject: [Python-Dev] RE: [Thread-SIG] Re: marking shared-ness Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF71@RED-MSG-50> > From: Christian Tismer [mailto:tismer at tismer.com] > Are atomic decrements really so cheap, meaning "are they mapped > to the atomic dec opcode"? > Then this is all ok IMHO. > On x86en they are mapped to an "atomic assembly fragment" i.e. params to some registers and then stick in a "lock add" instruction or something. (please forgive me if I've botched the details). So in that respect they are cheap, its a hardware level feature. On the otherhand though, given the effect that these instructions have on the CPU (its caches, buses, and so forth) it is by for no means free. My recollection vaguely recalls someone saying that all the platforms NT has supported so far has had at the minimum an InterlockedInc/Dec. InterlockCompareExchange() is where I think not all of the Intel family (386) and some of the other platforms may not have had the appropriate instructions. InterlockCompareExchange() is useful for creating your own spinlocks. The GCC list might be a good place for enquiring about the feasability of InterlockedInc/Dec on various platforms. Bill From bwarsaw at cnri.reston.va.us Thu Apr 20 02:51:55 2000 From: bwarsaw at cnri.reston.va.us (Barry A. Warsaw) Date: Wed, 19 Apr 2000 20:51:55 -0400 (EDT) Subject: [Python-Dev] comp.lang.python.announce References: <200004192132.XAA14501@hera.informatik.uni-bonn.de> Message-ID: <14590.21675.154913.979353@anthem.cnri.reston.va.us> >>>>> "MF" == Markus Fleck writes: MF> I agree with this. What would be helpful would be (i) a web MF> interface for multiple-moderator moderation (which I believe MF> Mailman already provides), and (ii) some rather simple changes MF> to the list-to-newsgroup gateway to do some header MF> manipulations before posting each approved message to MF> c.l.py.a. This is doable in Mailman, but I'm not so sure how much it will help, unless we make a further refinement. I don't know enough about the Usenet moderation process to know if this will work, but let me outline things here. There's two ways a message can get announced, first via email or first via Usenet. Here's what happens in each case: - A message is sent to python-announce at python.org. This is the preferred email address to post to. These messages get forwarded to clpa at python.net, which I believe is just a simple exploder to Markus and Vladimir. Obviously with the Starship current dead, this is broken too. I don't know what happens to these messages once Markus and Vladimir get it, but I assume that Markus adds a magic approval header and forwards the message to Usenet. Perhaps Markus can explain this process in more detail. - A message is sent to python-announce-list at python.org. This is not the official place to send announcements, but this specific alias simply forwards to python-announce at python.org so see above. Note that the other standard Mailman python-announce-list-* aliases are in place, and python-announce-list is a functioning Mailman mailing list. This list gates from Usenet, but not /to/ Usenet because of the forwarding described above. When it sees a message on c.l.py.a, it sucks the messages off the newsgroup and forwards it to all list members. Obviously those messages must have already been approved by the Usenet moderators. - A message is sent directly to c.l.py.a. From what I understand, the Usenet software itself forwards to the moderators, who again, do their magic and forwards the message to Usenet. So, given this arrangement, the messages never arrive unapproved at a mailing list. What it sounds like Markus is proposing is that the official Usenet moderator address would be a mailing list. It would be a closed mailing list whose members are approved moderators, with a shared Mailman alias. Any message posted there would be held for approval, and once approved, it would be injected directly into Usenet, with the appropriate magic header. I think I know what I'd need to add to Mailman to support this, though it'll be a little tricky. I need to know exactly how approved messages should be posted to Usenet. Does someone have a URL reference to this procedure, or is it easy enough to explain? -Barry From gstein at lyra.org Thu Apr 20 06:09:13 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 19 Apr 2000 21:09:13 -0700 (PDT) Subject: [Python-Dev] [OT] [Q] corruption in DB files? Message-ID: Hey guys, You're the Smart Guys that I know, and it seems this is also the forum where I once heard a long while back that DB can occasionally corrupt its files. True? Was it someone here that mentioned that? (Skip?) Or maybe it was bsddb? (or is that the same as the Berkeley DB, now handled by Sleepycat) A question just came up elsewhere about DB and I seemed to recall somebody mentioning the occasional corruption. Oh, maybe it was related to multiple threads. Any help appreciated! Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Thu Apr 20 08:09:45 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 20 Apr 2000 09:09:45 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: Message-ID: On Wed, 19 Apr 2000, Ka-Ping Yee wrote: > object.denotify(message[, callback]) - Turn off notification. You need to be a bit more careful here. What if callback is foo().function? It's unique, so I could never denotify it. A better way, and more popular (at least in the signal/slot terminology), is to return a cookie on connect, and have disconnect requests by a cookie. > object.send(message, **args) - Call all callbacks registered on > object for message, in reverse order of registration, passing > along message and **args as arguments to each callback. > If a callback returns notifier.BREAK, no further callbacks > are called. When I implemented that mechanism, I just used a special exception (StopCommandExecution). I prefer that, since it allows the programmer much more flexibility (which I used) > (Alternatively, we could use signals/slots terminology: > connect/disconnect/emit. I'm not aware of anything the signals/slots > mechanism has that the above lacks.) Me neither. Some offer a variety of connect-methods: connect after, connect-before (this actually has some uses). Have a short look at the Gtk+ signal mechanism -- it has all these. > 1. The 'message' passed to notify/denotify may be a class, and > the 'message' passed to send may be a class or instance of > a message class. In this case callbacks registered on that > class and all its bases are called. This seems a bit unneccessary, but YMMV. In all cases I've needed this, a simple string sufficed (i.e., method 2) Implementation nit: I usually use class _BREAK: pass BREAK=_BREAK() That way it is gurranteed that BREAK is unique. Again, I use this mostly with exceptions. All in all, great idea Ping! -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From fleck at triton.informatik.uni-bonn.de Thu Apr 20 09:02:33 2000 From: fleck at triton.informatik.uni-bonn.de (Markus Fleck) Date: Thu, 20 Apr 2000 09:02:33 +0200 (MET DST) Subject: [Python-Dev] comp.lang.python.announce In-Reply-To: <14590.21675.154913.979353@anthem.cnri.reston.va.us> from "Barry A. Warsaw" at Apr 19, 2000 08:51:55 PM Message-ID: <200004200702.JAA14939@hera.informatik.uni-bonn.de> Barry A. Warsaw: > What it sounds like Markus is proposing is that the official Usenet > moderator address would be a mailing list. It would be a closed > mailing list whose members are approved moderators, with a shared > Mailman alias. Any message posted there would be held for approval, > and once approved, it would be injected directly into Usenet, with the > appropriate magic header. Exactly. (In fact, each approved message could be both posted to Usenet and forwarded to the subscription-based shadow mailing list at the same time.) > I think I know what I'd need to add to Mailman to support this, though > it'll be a little tricky. I need to know exactly how approved messages > should be posted to Usenet. Does someone have a URL reference to this > procedure, or is it easy enough to explain? Basically, you need two headers: Newsgroups: comp.lang.python.announce Approved: python-announce at python.org The field contents of the "Approved:" header are in fact never checked for validity; it only has to be non-empty for the message to be successfully posted to a moderated newsgroup. (BTW, posting to the "alt.hackers" newsgroup actually relies on posters inserting "Approved: whatever" headers on their own, because "alt.hackers" is a moderated newsgroup without a moderator. You need to "hack" the Usenet moderation mechanism to be able to post there. :-) Because of the simplicity of this mechanism, no cross-posting to another moderated newsgroup should occur when posting an approved message to Usenet; e.g. if someone cross-posts to comp.lang.python, comp.lang.python.announce, comp.os.linux.misc and comp.os.linux.announce, the posting will go to the moderation e-mail address of the first moderated newsgroup in the "Newsgroups:" header supplied by the author's Usenet posting agent. (I.e., in this case, clpa at starship.skyport.net, if the header enumerates newsgroups in the above-mentioned order, "c.l.py,c.l.py.a,c.o.l.a,c.o.l.m".) Ideally, the moderators (or moderation software) of this first moderated newsgroup should split the posting up accordingly: a) remove names of newsgroups that we want to handle ourselves (e.g. c.l.py.a, possibly also c.l.py if cross-posted), and re-post the otherwise unchanged message to Usenet with only a changed "Newsgroups:" header (Headers: "Newsgroups: c.o.l.a,c.o.l.m" / no "Approved:" header added) -> this is necessary for the message to ever reach c.o.l.a and c.o.l.m -> the message will get forwarded by the Usenet server software to the moderation address of c.o.l.a, which is the first moderated newsgroup in the remaining list of newsgroups c) approve (or reject) posting to c.l.py.a and/or c.l.py (Headers: "Newsgroups: c.l.py.a" or "Newsgroups: c.l.py.a,c.l.py" or "Newsgroups: c.l.py" / an "Approved: python-announce at python.org" header may always be added, but is only necessary if also posting to c.l.py.a) According to the c.l.py.a posting guidelines, a "Followup-To:" header, will be added, if it doesn't exist yet, pointing to c.l.py for follow-up messages ("Follow-Up: c.l.py"). While a) may always happen automatically, prior to moderation, and needs to be custom-tailored for our c.l.py.a/c.l.py use case, the moderation software for b), i.e. Mailman, should allow moderators to adjust the "Newsgroups:" header while approving a message. It might also be nice to have an "X-Original-Newsgroups:" line in Mailman with a copy of the original "Newsgroups:" line. Regarding headers, usually e-mail will allow and forward almost any non-standard header field (a feature that is used to preserve the "Newsgroups:" header even when forwarding a posting to an e-mail address), but the Usenet server software may not accept all kinds of headers, so that just before posting, only known "standard" header fields should be preserved; any "X-*:" headers, for example, might be candidates for removal prior to posting, because some Usenet servers return strange errors when a message is posted that contains certain special "X-*:" headers. OTOH, AFAIK, the posting agent should generate and add a unique "Message-ID:" header for each Usenet posting itself. But if you have a Usenet forwarding agent already running, much of this should be implemented there already. Okay, now some links to resources and FAQs on that subject: Moderated Newsgroups FAQ http://www.swcp.com/~dmckeon/mod-faq.html USENET Moderators Archive http://www.landfield.com/moderators/ NetNews Moderators Handbook - 5.2.1 Approved: Line http://www.landfield.com/usenet/moderators/handbook/mod05.html#5.2.1 Please e-mail me if you have any further questions. Yours, Markus. From harri.pasanen at trema.com Thu Apr 20 09:42:29 2000 From: harri.pasanen at trema.com (Harri Pasanen) Date: Thu, 20 Apr 2000 09:42:29 +0200 Subject: [Python-Dev] Re: [Thread-SIG] optimize atomic inc/dec? (was: baby steps for free-threading) References: Message-ID: <38FEB4E5.47DCD834@trema.com> Greg Stein wrote, talking about optimizing atomic inc/dec: > > For example, when we see we're using GCC on an x86 processor (whether > FreeBSD or Linux), we can define atomic_inc() as an __asm fragment. > The same applies for Sparc. In our C++ software we have currently the atomic increment as inlined assembly for x86, sparc and sparc-v9, using GCC. It is a function though, so there is a function call involved. -Harri From tismer at tismer.com Thu Apr 20 15:23:31 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 20 Apr 2000 15:23:31 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <38FF04D3.4CE2067E@tismer.com> Greg Stein wrote: > > On Wed, 19 Apr 2000, Christian Tismer wrote: > > Greg Stein wrote: > >... > > > Ah. Neat. "Automatic marking of shared-ness" > > > > > > Could work. That initial test for the thread id could be expensive, > > > though. What is the overhead of getting the current thread id? > > > > Zero if we cache it in the thread state. > > You don't have the thread state at incref/decref time. > > And don't say "_PyThreadState_Current" or I'll fly to Germany and > personally kick your ass :-) A real temptation to see whether I can really get you to Germany :-)) ... Thanks for all the info. > Linux has a kernel macro for atomic inc/dec, but it is only valid if > __SMP__ is defined in your compilation context. Well, and while it looks cheap, it is for sure expensive since several caches are flushed, and the system is stalled until the modified value is written back into the memory bank. Could it be that we might want to use another thread design at all? I'm thinking of running different interpreters in the same process space, but with all objects really disjoint, invisible between the interpreters. This would perhaps need some internal changes, in order to make all the builtin free-lists disjoint as well. Now each such interpreter would be running in its own thread without any racing condition at all so far. To make this into threading and not just a flavor of multitasking, we now need of course shared objects, but only those objects which we really want to share. This could reduce the cost for free threading to nearly zero, except for the (hopefully) few shared objects. I think, instead of shared globals, it would make more sense to have some explicit shared resource pool, which controls every access via mutexes/semas/whateverweneed. Maybe also that we would prefer to copy objects into it over sharing, in order to minimize collisions. I hope the need for true sharing can be minimized to a few variables. Well, I hope. "freethreads" could even coexist with the current locking threads, we would not even need a special build for them, but to rethink threading. Like "the more free threading is, the more disjoint threads are". are-you-now-convinced-to-come-and-kick-my-ass-ly y'rs - chris :-) -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From skip at mojam.com Thu Apr 20 15:29:37 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 20 Apr 2000 08:29:37 -0500 (CDT) Subject: [Python-Dev] [OT] [Q] corruption in DB files? In-Reply-To: References: Message-ID: <14591.1601.125779.714243@beluga.mojam.com> Greg> You're the Smart Guys that I know, and it seems this is also the Greg> forum where I once heard a long while back that DB can Greg> occasionally corrupt its files. ... Greg> A question just came up elsewhere about DB and I seemed to recall Greg> somebody mentioning the occasional corruption. Oh, maybe it was Greg> related to multiple threads. Yes, Berkeley DB 1.85 (exposed through the bsddb module in Python) has bugs in the hash implementation. They never fixed them (well maybe in 1.86?), but moved on to version 2.x. Of course, they changed the function call interface and the file format, so many people didn't follow. They do provide a 1.85-compatible API but you have to #include db_185.h instead of db.h. As far as I know, if you stick to the btree interface with 1.85 you should be okay. Unfortunately, both the anydbm and dbhash modules both use the hash interface, so if you're trying to be more or less portable and not modify your Python sources, you've also got buggy db files... Someone did create a libdb 2.x-compatible module that exposes more of the underlying functionality. Check the VoP for it. libdb == Berkeley DB == Sleepycat... Skip From skip at mojam.com Thu Apr 20 15:40:06 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 20 Apr 2000 08:40:06 -0500 (CDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF04D3.4CE2067E@tismer.com> References: <38FF04D3.4CE2067E@tismer.com> Message-ID: <14591.2230.630609.500780@beluga.mojam.com> Chris> I think, instead of shared globals, it would make more sense to Chris> have some explicit shared resource pool, which controls every Chris> access via mutexes/semas/whateverweneed. Tuple space, anyone? Check out http://www.snurgle.org/~pybrenda/ It's a Linda implementation for Python. Linda was developed at Yale by David Gelernter. Unfortunately, he's better known to the general public as being one of the Unabomber's targets. You can find out more about Linda at http://www.cs.yale.edu/Linda/linda.html Skip From fredrik at pythonware.com Thu Apr 20 15:55:52 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 20 Apr 2000 15:55:52 +0200 Subject: [Python-Dev] Generic notifier module References: Message-ID: <000c01bfaad0$f2d1d2e0$0500a8c0@secret.pythonware.com> Moshe Zadka" wrote: > > object.denotify(message[, callback]) - Turn off notification. > > You need to be a bit more careful here. What if callback is > foo().function? It's unique, so I could never denotify it. if you need a value later, the usual approach is to bind it to a name. works in all other situations, so why not use it here? > A better way, and more popular (at least in the signal/slot terminology), > is to return a cookie on connect, and have disconnect requests by a cookie. in which way is "harder to use in all common cases" better? ... as for the "break" functionality, I'm not sure it really belongs in a basic observer class (in GOF terms, that's a "chain of responsibility"). but if it does, I sure prefer an exception over a magic return value. From tismer at tismer.com Thu Apr 20 16:23:56 2000 From: tismer at tismer.com (Christian Tismer) Date: Thu, 20 Apr 2000 16:23:56 +0200 Subject: [Python-Dev] Re: marking shared-ness References: <38FF04D3.4CE2067E@tismer.com> <14591.2230.630609.500780@beluga.mojam.com> Message-ID: <38FF12FC.32052356@tismer.com> Skip Montanaro wrote: > > Chris> I think, instead of shared globals, it would make more sense to > Chris> have some explicit shared resource pool, which controls every > Chris> access via mutexes/semas/whateverweneed. > > Tuple space, anyone? Check out > > http://www.snurgle.org/~pybrenda/ Very interesting, indeed. > It's a Linda implementation for Python. Linda was developed at Yale by > David Gelernter. Unfortunately, he's better known to the general public as > being one of the Unabomber's targets. You can find out more about Linda at > > http://www.cs.yale.edu/Linda/linda.html Many broken links. The most activity appears to have stopped around 94/95, the project looks kinda dead. But this doesn't mean that we cannot learn from them. Will think more when the starship problem is over... ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From moshez at math.huji.ac.il Thu Apr 20 16:24:49 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 20 Apr 2000 17:24:49 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <000c01bfaad0$f2d1d2e0$0500a8c0@secret.pythonware.com> Message-ID: On Thu, 20 Apr 2000, Fredrik Lundh wrote: > > A better way, and more popular (at least in the signal/slot terminology), > > is to return a cookie on connect, and have disconnect requests by a cookie. > > in which way is "harder to use in all common cases" > better? I'm not sure I agree this is harder to use in all common cases, but YMMV. Strings are prone to collisions, etc. And usually the code which connects the callback is pretty close (flow-control wise) to the code that would disconnect. FWIW, the Gtk+ signal mechanism has 3-4 different disconnects, and it might not be a bad idea, now that I think of it. > as for the "break" functionality, I'm not sure it really > belongs in a basic observer class (in GOF terms, that's ^^^ TLA overload! What's GOF? > a "chain of responsibility"). but if it does, I sure prefer > an exception over a magic return value. I don't know if it belongs or not, but I do know that it is sometimes needed, and is very hard and ugly to simulate otherwise. That's one FAQ I don't want to answer -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From skip at mojam.com Thu Apr 20 16:38:08 2000 From: skip at mojam.com (Skip Montanaro) Date: Thu, 20 Apr 2000 09:38:08 -0500 (CDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF12FC.32052356@tismer.com> References: <38FF04D3.4CE2067E@tismer.com> <14591.2230.630609.500780@beluga.mojam.com> <38FF12FC.32052356@tismer.com> Message-ID: <14591.5712.162339.740646@beluga.mojam.com> >> http://www.cs.yale.edu/Linda/linda.html Chris> Many broken links. The most activity appears to have stopped Chris> around 94/95, the project looks kinda dead. But this doesn't mean Chris> that we cannot learn from them. Yes, I think Linda mostly lurks under the covers these days. Their Piranha project, which aims to soak up spare CPU cycles to do parallel computing, uses Linda. I suspect Linda is probably hidden somewhere inside Lifestreams as well. As a correction to my original note, Nicholas Carriero was the other primary lead on Linda. I no longer recall the details, but he may have been on of Gelernter's grad students in the late 80's. Skip From gvwilson at nevex.com Thu Apr 20 16:40:48 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Thu, 20 Apr 2000 10:40:48 -0400 (EDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <14591.2230.630609.500780@beluga.mojam.com> Message-ID: > Chris> I think, instead of shared globals, it would make more sense to > Chris> have some explicit shared resource pool, which controls every > Chris> access via mutexes/semas/whateverweneed. > Skip wrote: > Tuple space, anyone? Check out > http://www.snurgle.org/~pybrenda/ > It's a Linda implementation for Python. You can find out more about > Linda at > http://www.cs.yale.edu/Linda/linda.html Linda is also the inspiration for Sun's JavaSpaces, an easier-to-use layer on top of Jini: http://java.sun.com/products/javaspaces/ http://cseng.aw.com/bookpage.taf?ISBN=0-201-30955-6 On the plus side: 1. It's much (much) easier to use than mutex, semaphore, or monitor models: students in my parallel programming course could start writing C-Linda programs after (literally) five minutes of instruction. 2. If you're willing/able to do global analysis of access patterns, its simplicity doesn't have a significant performance penalty. 3. (Bonus points) It integrates very well with persistence schemes. On the minus side: 1. Some things that "ought" to be simple (e.g. barrier synchronization) are surprisingly difficult to get right, efficiently, in vanilla Linda-like systems. Some VHLL derivates (based on SETL and Lisp dialects) solved this in interesting ways. 2. It's different enough from hardware-inspired shared-memory + mutex models to inspire the same "Huh, that looks weird" reaction as Scheme's parentheses, or Python's indentation. On the other hand, Bill Joy and company are now backing it... Personal opinion: I've felt for 15 years that something like Linda could be to threads and mutexes what structured loops and conditionals are to the "goto" statement. Were it not for the "Huh" effect, I'd recommend hanging "Danger!" signs over threads and mutexes, and making tuple spaces the "standard" concurrency mechanism in Python. I'd also recommend calling the system "Carol", after Monty Python regular Carol Cleveland. The story is that Linda itself was named after the 70s porn star Linda Lovelace, in response to the DoD naming its language "Ada" after the other Lovelace... Greg p.s. I talk a bit about Linda, and the limitations of the vanilla approach, in http://mitpress.mit.edu/book-home.tcl?isbn=0262231867. From mlh at swl.msd.ray.com Thu Apr 20 17:02:30 2000 From: mlh at swl.msd.ray.com (Milton L. Hankins) Date: Thu, 20 Apr 2000 11:02:30 -0400 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF12FC.32052356@tismer.com> Message-ID: On Thu, 20 Apr 2000, Christian Tismer wrote: > Skip Montanaro wrote: > > > > Tuple space, anyone? Check out > > > > http://www.snurgle.org/~pybrenda/ > > Very interesting, indeed. *Steps out of the woodwork and bows* PyBrenda doesn't have a thread implementation, but it could be adapted to do so. It might be prudent to eliminate the use of TCP/IP in that case as well. In case anyone is interested, I just created a mailing list for PyBrenda at egroups: http://www.egroups.com/group/pybrenda-users -- Milton L. Hankins \\ ><> Ephesians 5:2 ><> http://www.snurgle.org/~mhankins // These are my opinions, not Raytheon's. \\ W. W. J. D. ? From effbot at telia.com Thu Apr 20 19:14:08 2000 From: effbot at telia.com (Fredrik Lundh) Date: Thu, 20 Apr 2000 19:14:08 +0200 Subject: [Python-Dev] Generic notifier module References: Message-ID: <018101bfaaec$65e56740$34aab5d4@hagrid> Moshe Zadka wrote: > > in which way is "harder to use in all common cases" > > better? > > I'm not sure I agree this is harder to use in all common cases, but YMMV. > Strings are prone to collisions, etc. not sure what you're talking about here, so I suppose we're talking past each other. what I mean is that: model.addobserver(view.notify) model.removeobserver(view.notify) works just fine without any cookies. having to do: view.cookie = model.addobserver(view.notify) model.removeobserver(view.cookie) is definitely no improvement. and if you have an extraordinary case (like a function pointer extracted from an object returned from a factory function), you just have to assign the function pointer to a local variable: self.callback = strangefunction().notify model.addobserver(self.callback) model.removeobserver(self.callback) in this case, you would probably keep a pointer to the object returned by the function anyway: self.viewer = getviewer() model.addobserver(viewer.notify) model.removeobserver(viewer.notify) > And usually the code which connects > the callback is pretty close (flow-control wise) to the code that would > disconnect. FWIW, the Gtk+ signal mechanism has 3-4 different disconnects, > and it might not be a bad idea, now that I think of it. you really hate keeping things as simple as possible, don't you? ;-) what are these 3-4 "disconnects" doing? > > as for the "break" functionality, I'm not sure it really > > belongs in a basic observer class (in GOF terms, that's > ^^^ TLA overload! What's GOF? http://www.hillside.net/patterns/DPBook/GOF.html > > a "chain of responsibility"). but if it does, I sure prefer > > an exception over a magic return value. > > I don't know if it belongs or not, but I do know that it is sometimes > needed, and is very hard and ugly to simulate otherwise. That's one FAQ > I don't want to answer yeah, but the two patterns have different uses. From moshez at math.huji.ac.il Thu Apr 20 21:31:05 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Thu, 20 Apr 2000 22:31:05 +0300 (IDT) Subject: [Python-Dev] Generic notifier module In-Reply-To: <018101bfaaec$65e56740$34aab5d4@hagrid> Message-ID: [Fredrik Lundh] > not sure what you're talking about here, so I suppose > we're talking past each other. Nah, I guess it was a simple case of you being right and me being wrong. (In other words, you've convinced me) [Moshe] > FWIW, the Gtk+ signal mechanism has 3-4 different disconnects, > and it might not be a bad idea, now that I think of it. [Fredrik Lundh] > you really hate keeping things as simple as possible, > don't you? ;-) > > what are these 3-4 "disconnects" doing? gtk_signal_disconnect -- disconnect by cookie gtk_signal_disconnect_by_func -- disconnect by function pointer gtk_signal_disconnect_by_data -- disconnect by the void* pointer passed Hey, you asked just-preparing-for-my-lecture-next-friday-ly y'rs, Z. (see www.linux.org.il for more) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein at lyra.org Thu Apr 20 22:43:24 2000 From: gstein at lyra.org (Greg Stein) Date: Thu, 20 Apr 2000 13:43:24 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <38FF04D3.4CE2067E@tismer.com> Message-ID: On Thu, 20 Apr 2000, Christian Tismer wrote: >... > > Linux has a kernel macro for atomic inc/dec, but it is only valid if > > __SMP__ is defined in your compilation context. > > Well, and while it looks cheap, it is for sure expensive > since several caches are flushed, and the system is stalled > until the modified value is written back into the memory bank. Yes, Bill mentioned that yesterday. Important fact, but there isn't much you can do -- they must be atomic. > Could it be that we might want to use another thread design > at all? I'm thinking of running different interpreters in > the same process space, but with all objects really disjoint, > invisible between the interpreters. This would perhaps need > some internal changes, in order to make all the builtin > free-lists disjoint as well. > Now each such interpreter would be running in its own thread > without any racing condition at all so far. > To make this into threading and not just a flavor of multitasking, > we now need of course shared objects, but only those objects > which we really want to share. This could reduce the cost for > free threading to nearly zero, except for the (hopefully) few > shared objects. > I think, instead of shared globals, it would make more sense > to have some explicit shared resource pool, which controls > every access via mutexes/semas/whateverweneed. Maybe also that > we would prefer to copy objects into it over sharing, in order > to minimize collisions. I hope the need for true sharing > can be minimized to a few variables. Well, I hope. > "freethreads" could even coexist with the current locking threads, > we would not even need a special build for them, but to rethink > threading. > Like "the more free threading is, the more disjoint threads are". No. Now you're just talking processes with IPC. Yes, they happen to run in threads, but you got none of the advantages of a threaded application. Threading is about sharing an address space. Cheers, -g -- Greg Stein, http://www.lyra.org/ From DavidA at ActiveState.com Thu Apr 20 22:40:54 2000 From: DavidA at ActiveState.com (David Ascher) Date: Thu, 20 Apr 2000 13:40:54 -0700 Subject: [Python-Dev] String issues -- see the JavaScript world Message-ID: Just an FYI to those discussing Unicode issues. There is currently a big debate over in Mozilla-land looking at how XPIDL (their interface definition language) should deal with the various kinds of string types. Someone who cares may want to follow up on that to see if some of their issues apply to Python as well. News server: news.mozilla.org Newsgroup: netscape.public.mozilla.xpcom Thread: Encoding wars -- more in the Big String Story Cheers, --david From tismer at tismer.com Fri Apr 21 14:38:27 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 21 Apr 2000 14:38:27 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <39004BC3.1DD108D0@tismer.com> Greg Stein wrote: > > On Thu, 20 Apr 2000, Christian Tismer wrote: [me, about free threading with less sharing] > No. Now you're just talking processes with IPC. Yes, they happen to run in > threads, but you got none of the advantages of a threaded application. Are you shure that every thread user shares your opinion? I see many people using threads just in order to have multiple tasks in parallel, with none or quite few shared variables. > Threading is about sharing an address space. This is part of the truth. There are a number of other reasons to use threads, too. Since Python has nothing really private, this implies in fact to protect every single object for free threading, although nobody wants this in the first place to happen. Other languages have much fewer problems here (I mean C, C++, Delphi...), they are able to do the right thing in the right place. Python is not designed for that. Why do you want to enforce the impossible, letting every object pay a high penalty to become completely thread-safe? Sharing an address space should not mean to share everything, but something. If Python does not support this, we should think of a redesign of its threading model, instead of loosing so much of efficiency. You end up in a situation where all your C extensions can run free threaded at high speed, just Python is busy all the time to fight the threading. That is not Python. You know that I like to optimize things. For me, optimization mut give an overall gain, not just in one area, where others get worse. If free threading cannot be optimized in a way that gives better overall performance, then it is a wrong optimization to me. Well, this is all speculative until we did some measures. Maybe I'm just complaining about 1-2 percent of performance loss, then I'd agree to move my complaining into /dev/null :-) ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mlh at swl.msd.ray.com Fri Apr 21 18:36:40 2000 From: mlh at swl.msd.ray.com (Milton L. Hankins) Date: Fri, 21 Apr 2000 12:36:40 -0400 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: <39004BC3.1DD108D0@tismer.com> Message-ID: On Fri, 21 Apr 2000, Christian Tismer wrote: > Are you shure that every thread user shares your opinion? > I see many people using threads just in order to have > multiple tasks in parallel, with none or quite few shared > variables. About the only time I use threads is when 1) I'm doing something asynchronous in an event loop-driven paradigm (such as Tkinter) or 2) I'm trying to emulate fork() under win32 > Since Python has nothing really private, this implies in > fact to protect every single object for free threading, > although nobody wants this in the first place to happen. How does Java solve this problem? (Is this analagous to native vs. green threads?) > Python is not designed for that. Why do you want to enforce > the impossible, letting every object pay a high penalty > to become completely thread-safe? Hmm, how about declaring only certain builtins as free-thread safe? Or is "the impossible" necessary because of the nature of incref/decref? -- Milton L. Hankins :: ><> Ephesians 5:2 ><> Software Engineer, Raytheon Systems Company :: http://amasts.msd.ray.com/~mlh :: RayComNet 7-225-4728 From billtut at microsoft.com Fri Apr 21 18:50:47 2000 From: billtut at microsoft.com (Bill Tutt) Date: Fri, 21 Apr 2000 09:50:47 -0700 Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness Message-ID: <4D0A23B3F74DD111ACCD00805F31D8101D8BCF9F@RED-MSG-50> > From: Milton L. Hankins [mailto:mlh at swl.msd.ray.com] > > On Fri, 21 Apr 2000, Christian Tismer wrote: > > > Are you shure that every thread user shares your opinion? > > I see many people using threads just in order to have > > multiple tasks in parallel, with none or quite few shared > > variables. > > About the only time I use threads is when > 1) I'm doing something asynchronous in an event loop-driven > paradigm > (such as Tkinter) or > 2) I'm trying to emulate fork() under win32 > 3) I'm doing something that would block in an asynchronous FSM. (e.g. Medusa, or an NT I/O completion port driven system) > > Since Python has nothing really private, this implies in > > fact to protect every single object for free threading, > > although nobody wants this in the first place to happen. > > How does Java solve this problem? (Is this analagous to > native vs. green > threads?) > Java allows you to specifically mention whether something should be seralized or not, and no, this doesn't have anything to do with native vs. green threads) > > Python is not designed for that. Why do you want to enforce > > the impossible, letting every object pay a high penalty > > to become completely thread-safe? > > Hmm, how about declaring only certain builtins as free-thread > safe? incref/decref are not type object specific, they're global macros. Making them methods on the type object would be the sensible thing to do, but would definately be non-backward compatible. Bill From seanj at speakeasy.org Fri Apr 21 18:55:29 2000 From: seanj at speakeasy.org (Sean Jensen_Grey) Date: Fri, 21 Apr 2000 09:55:29 -0700 (PDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: > > Since Python has nothing really private, this implies in > > fact to protect every single object for free threading, > > although nobody wants this in the first place to happen. > > How does Java solve this problem? (Is this analagous to native vs. green > threads?) > > > Python is not designed for that. Why do you want to enforce > > the impossible, letting every object pay a high penalty > > to become completely thread-safe? > > Hmm, how about declaring only certain builtins as free-thread safe? Or is > "the impossible" necessary because of the nature of incref/decref? http://www.javacats.com/US/articles/MultiThreading.html I would like sync foo: bloc of code here maybe we could merge in some Occam while were at it. B^) sync would be a most excellent operator in python. From seanj at speakeasy.org Fri Apr 21 19:16:29 2000 From: seanj at speakeasy.org (Sean Jensen_Grey) Date: Fri, 21 Apr 2000 10:16:29 -0700 (PDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: http://www.cs.bris.ac.uk/~alan/javapp.html Take a look at the above link. It merges the Occam model with Java and uses 'channel based' interfaces (not sure exactly what this is). But they seem pretty exicted. I vote for using InterlockedInc/Dec as it is available as an assembly instruction on almost everyplatform. Could be then derive all other locking schemantics from this? And our portability problem is solved if it comes in the box with gcc. On Fri, 21 Apr 2000, Sean Jensen_Grey wrote: > > > Since Python has nothing really private, this implies in > > > fact to protect every single object for free threading, > > > although nobody wants this in the first place to happen. > > > > How does Java solve this problem? (Is this analagous to native vs. green > > threads?) > > > > > Python is not designed for that. Why do you want to enforce > > > the impossible, letting every object pay a high penalty > > > to become completely thread-safe? > > > > Hmm, how about declaring only certain builtins as free-thread safe? Or is > > "the impossible" necessary because of the nature of incref/decref? > > http://www.javacats.com/US/articles/MultiThreading.html > > I would like > > sync foo: > bloc of code here > > maybe we could merge in some Occam while were at it. B^) > > > sync would be a most excellent operator in python. > > > > > _______________________________________________ > Thread-SIG maillist - Thread-SIG at python.org > http://www.python.org/mailman/listinfo/thread-sig > From gvwilson at nevex.com Fri Apr 21 19:27:49 2000 From: gvwilson at nevex.com (gvwilson at nevex.com) Date: Fri, 21 Apr 2000 13:27:49 -0400 (EDT) Subject: [Thread-SIG] Re: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: > On Fri, 21 Apr 2000, Sean Jensen_Grey wrote: > http://www.cs.bris.ac.uk/~alan/javapp.html > Take a look at the above link. It merges the Occam model with Java and uses > 'channel based' interfaces (not sure exactly what this is). Channel-based programming has been called "the revenge of the goto", as in, "Where the hell does this channel go to?" Programmers must manage conversational continuity manually (i.e. keep track of the origins of messages, so that they can be replied to). It also doesn't really help with the sharing problem that started this thread: if you want a shared integer, you have to write a little server thread that knows how to act like a semaphore, and then it read/write requests that are exactly equivalent to P and V operations (and subject to all the same abuses). Oh, and did I mention the joys of trying to draw a semi-accurate diagram of the plumbing in your program after three months of upgrade work? *shudder* Greg From guido at python.org Fri Apr 21 19:29:06 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Apr 2000 13:29:06 -0400 Subject: [Python-Dev] Inspiration Message-ID: <200004211729.NAA16454@eric.cnri.reston.va.us> http://www.perl.com/pub/2000/04/whatsnew.html --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Fri Apr 21 21:52:06 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 12:52:06 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <39004BC3.1DD108D0@tismer.com> Message-ID: On Fri, 21 Apr 2000, Christian Tismer wrote: >... > > No. Now you're just talking processes with IPC. Yes, they happen to run in > > threads, but you got none of the advantages of a threaded application. > > Are you shure that every thread user shares your opinion? Now you're just being argumentative. I won't respond to this. >... > Other languages have much fewer problems here (I mean > C, C++, Delphi...), they are able to do the right thing > in the right place. > Python is not designed for that. Why do you want to enforce > the impossible, letting every object pay a high penalty > to become completely thread-safe? Existing Python semantics plus free-threading places us in this scenario. Many people have asked for free-threading, and the number of inquiries that I receive have grown over time. (nobody asked in 1996 when I first published my patches; I get a query every couple months now) >... > You know that I like to optimize things. For me, optimization > mut give an overall gain, not just in one area, where others > get worse. If free threading cannot be optimized in > a way that gives better overall performance, then > it is a wrong optimization to me. > > Well, this is all speculative until we did some measures. > Maybe I'm just complaining about 1-2 percent of performance > loss, then I'd agree to move my complaining into /dev/null :-) It is more than this. In my last shot at this, pystone ran about half as fast. There are a few things that will be different this time around, but it certainly won't in the "few percent" range. Presuming you can keep your lock contention low, then your overall performances *goes up* once you have a multiprocessor machine. Sure, each processor runs Python (say) 10% slower, but you have *two* of them going. That is 180% compared to a central-lock Python on an MP machine. Lock contention: my last patches had really high contention. It didn't scale across processors well. This round will have more fine-grained locks than the previous version. But it will be interesting to measure the contention. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Apr 21 21:49:09 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Apr 2000 15:49:09 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Your message of "Fri, 21 Apr 2000 12:52:06 PDT." References: Message-ID: <200004211949.PAA16911@eric.cnri.reston.va.us> > It is more than this. In my last shot at this, pystone ran about half as > fast. There are a few things that will be different this time around, but > it certainly won't in the "few percent" range. Interesting thought: according to patches recently posted to patches at python.org (but not yet vetted), "turning on" threads on Win32 in regular Python also slows down Pystone considerably. Maybe it's not so bad? Maybe those patches contain a hint of what we could do? --Guido van Rossum (home page: http://www.python.org/~guido/) From gstein at lyra.org Fri Apr 21 22:02:23 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 13:02:23 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <200004211949.PAA16911@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > > It is more than this. In my last shot at this, pystone ran about half as > > fast. There are a few things that will be different this time around, but > > it certainly won't in the "few percent" range. > > Interesting thought: according to patches recently posted to > patches at python.org (but not yet vetted), "turning on" threads on Win32 > in regular Python also slows down Pystone considerably. Maybe it's > not so bad? Maybe those patches contain a hint of what we could do? I think that my tests were threaded vs. free-threaded. It has been so long ago, though... :-) Yes, we'll get those patches reviewed and installed. That will at least help the standard threading case. With more discrete locks (e.g. one per object or one per code section), then we will reduce lock contention. Working on improving the lock mechanism itself and the INCREF/DECREF system will help, too. But this initial thread was to seek people to assist with some coding to get stuff into 1.6. The heavy lifting will certainly be after 1.6, but we can get some good stuff in *today*. We'll examine performance later on, then start improving it. Cheers, -g -- Greg Stein, http://www.lyra.org/ From gstein at lyra.org Fri Apr 21 22:21:55 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 13:21:55 -0700 (PDT) Subject: [Python-Dev] RE: [Thread-SIG] Re: marking shared-ness In-Reply-To: Message-ID: On Fri, 21 Apr 2000, Brent Fulgham wrote: >... > The problem is that having to grab the global interpreter lock > every time I want to manipulate Python objects from C seems wasteful. > This is perhaps more of a "interpreter" issue, rather than a > thread issue perhaps, but it does seem that if each thread (and > therefore interpreter state from my perspective) kept internal > track of itself, there would be much less lock contention as one > interpreter drops out of Python into the C code for a moment, then > releases the lock and returns, etc. > > So I think it's possible that free-threading changes might provide > some benefit even on uniprocessor systems. This is true. Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS become null macros. Your C extensions operate within their thread of execution, but have no central lock to worry about releasing before they block on something. And from an embedding standpoint, the same is true. You do not need to acquire any locks to start manipulating Python objects. Each object maintains its own integrity. Note: embedding/extending *can* destroy integrity. For example, tuples have no integrity locking -- Python programs cannot change them, so you cannot have two Python threads breaking things. C code can certainly destroy things with something this simple: Py_DECREF(PyTuple_GET_ITEM(tuple, 3)); PyTuple_SET_ITEM(tuple, 3, ob); Exercise for the reader on why the above code is a disaster waiting to happen :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From tismer at tismer.com Fri Apr 21 22:29:06 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 21 Apr 2000 22:29:06 +0200 Subject: [Python-Dev] Re: marking shared-ness References: <200004211949.PAA16911@eric.cnri.reston.va.us> Message-ID: <3900BA12.DFE0A6EB@tismer.com> Guido van Rossum wrote: > > > It is more than this. In my last shot at this, pystone ran about half as > > fast. There are a few things that will be different this time around, but > > it certainly won't in the "few percent" range. > > Interesting thought: according to patches recently posted to > patches at python.org (but not yet vetted), "turning on" threads on Win32 > in regular Python also slows down Pystone considerably. Maybe it's > not so bad? Maybe those patches contain a hint of what we could do? I had a rough look at the patches but didn't understand enough yet. But I tried the sample scriptlet on python 1.5.2 and Stackless Python - see here: D:\python>python -c "import test.pystone;test.pystone.main()" Pystone(1.1) time for 10000 passes = 1.96765 This machine benchmarks at 5082.2 pystones/second D:\python>python spc/threadstone.py Pystone(1.1) time for 10000 passes = 5.57609 This machine benchmarks at 1793.37 pystones/second This is even worse than Markovitch's observation. Now, let's try with Stackless Python: D:\python>cd spc D:\python\spc>python -c "import test.pystone;test.pystone.main()" Pystone(1.1) time for 10000 passes = 1.843 This machine benchmarks at 5425.94 pystones/second D:\python\spc>python threadstone.py Pystone(1.1) time for 10000 passes = 3.27625 This machine benchmarks at 3052.27 pystones/second Isn't that remarkable? Stackless performs nearly 1.8 as good under threads. Why? I've optimized the ticker code away for all those "fast" opcodes which never can cause another interpreter incarnation. Standard Python does a bit too much here, dealing the same way with extremely fast opcodes like POP_TOP, as with a function call. Responsiveness is still very good. Markovitch's example also tells us this story: Even with his patches, the threading stuff still costs 10 percent. This is the lock that we touch every ten opcodes. In other words: touching a lock costs about as much as an opcode costs on average. ciao - chris threadstone.py: import thread # Start empty thread to initialise thread mechanics (and global lock!) # This thread will finish immediately thus won't make much influence on # test results by itself, only by that fact that it initialises global lock thread.start_new_thread(lambda : 1, ()) import test.pystone test.pystone.main() -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From gstein at lyra.org Sat Apr 22 01:19:03 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 16:19:03 -0700 (PDT) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1 _socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10 mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11 python16.dsp,1.2,1.3 pythonw.dsp,1.8,1.9 select.dsp,1.1,1.2 unicodedata.dsp,1.1,1.2 zlib.dsp,1.10,1.11 In-Reply-To: <200004212126.RAA18041@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: >... > * Base address for all extension modules updated. PC\dllbase_nt.txt > also updated. Erroneous "libpath" directory removed for all > projects. Rather than specifying the base address in each DSP, the Apache project has used a text file for this stuff. Here is the text file used: --snip-- -- Begin New BaseAddr.ref -- ; os/win32/BaseAddr.ref contains the central repository ; of all module base addresses ; to avoid relocation ; WARNING: Update this file by reviewing the image size ; of the debug-generated dll files; release images ; should fit in the larger debug-sized space. ; module name base-address max-size aprlib 0x6FFA0000 0x00060000 ApacheCore 0x6FF00000 0x000A0000 mod_auth_anon 0x6FEF0000 0x00010000 mod_cern_meta 0x6FEE0000 0x00010000 mod_auth_digest 0x6FED0000 0x00010000 mod_expires 0x6FEC0000 0x00010000 mod_headers 0x6FEB0000 0x00010000 mod_info 0x6FEA0000 0x00010000 mod_rewrite 0x6FE80000 0x00020000 mod_speling 0x6FE70000 0x00010000 mod_status 0x6FE60000 0x00010000 mod_usertrack 0x6FE50000 0x00010000 mod_proxy 0x6FE30000 0x00020000 --snip-- And here is what one of the link lines looks like: # ADD LINK32 ApacheCore.lib aprlib.lib kernel32.lib /nologo /base:@BaseAddr.ref,mod_usertrack /subsystem:windows /dll /map /debug /machine:I386 /libpath:"..\..\CoreD" /libpath:"..\..\lib\apr\Debug" This mechanism could be quite helpful for Python. The .ref file replaces the dllbase_nt.txt file, centralizes the management, and directly integrates with the tools. Cheers, -g -- Greg Stein, http://www.lyra.org/ From mhammond at skippinet.com.au Sat Apr 22 02:44:31 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sat, 22 Apr 2000 10:44:31 +1000 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1_socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11python16.dsp In-Reply-To: Message-ID: [Greg writes] > Rather than specifying the base address in each DSP, the > Apache project > has used a text file for this stuff. Here is the text file used: Yes - I saw this in the docs for the linker when I was last playing here. I didnt bother with this, as it still seems to me the best longer term approach is to use the "rebind" tool. This would allow the tool to select the addresses (less chance of getting them wrong), but also would allow us to generate "debug info" for the release builds of Python... But I guess that in the meantime, having the linker process this file is an improvement... I will wait until Guido has got to my other build patches and look into this... Mark. From gstein at lyra.org Sat Apr 22 02:56:22 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 21 Apr 2000 17:56:22 -0700 (PDT) Subject: [Python-Dev] base addresses (was: [Python-checkins] CVS: ...) In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Mark Hammond wrote: > [Greg writes] > > Rather than specifying the base address in each DSP, the > > Apache project > > has used a text file for this stuff. Here is the text file used: > > Yes - I saw this in the docs for the linker when I was last playing > here. > > I didnt bother with this, as it still seems to me the best longer > term approach is to use the "rebind" tool. This would allow the > tool to select the addresses (less chance of getting them wrong), > but also would allow us to generate "debug info" for the release > builds of Python... Yes, although having specific addresses also means that every Python executable/DLL has the same set of addresses. You can glean information from the addresses without having symbols handy. Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Apr 22 05:53:39 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 22 Apr 2000 06:53:39 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: <200004211729.NAA16454@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > http://www.perl.com/pub/2000/04/whatsnew.html Yeah, loads of cool stuff we should steal... And loads of stuff that we shouldn't steal, no matter how cool it looks (lvaluable subroutines, anyone?) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From moshez at math.huji.ac.il Sat Apr 22 06:42:47 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 22 Apr 2000 07:42:47 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: <200004211729.NAA16454@eric.cnri.reston.va.us> Message-ID: On Fri, 21 Apr 2000, Guido van Rossum wrote: > http://www.perl.com/pub/2000/04/whatsnew.html OK, here's my summary of the good things we should copy: (In that order:) -- Weak references (as weak dictionaries? would "w{}" to signify a weak dictionary is alright parser-wise?) -- Binary numbers -- way way cool (and doesn't seem to hard -- need to patch the tokenizer, PyLong_FromString and PyOS_strtoul: anything I've missed?) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From gstein at lyra.org Sat Apr 22 09:07:09 2000 From: gstein at lyra.org (Greg Stein) Date: Sat, 22 Apr 2000 00:07:09 -0700 (PDT) Subject: [Python-Dev] Inspiration In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Moshe Zadka wrote: > On Fri, 21 Apr 2000, Guido van Rossum wrote: > > http://www.perl.com/pub/2000/04/whatsnew.html > > OK, here's my summary of the good things we should copy: > > (In that order:) > > -- Weak references (as weak dictionaries? would "w{}" to signify a weak > dictionary is alright parser-wise?) > -- Binary numbers -- way way cool (and doesn't seem to hard -- need to > patch the tokenizer, PyLong_FromString and PyOS_strtoul: anything > I've missed?) Yet another numeric format? eek. If anything, we should be dropping octal, rather than adding binary. You want binary? Just use int("10010", 2). No need for more syntax. I'd go for weak objects (proxies) rather than weak dictionaries. Duplicating the dict type just to deal with weak refs seems a bit much. But I'm not a big brain on this stuff -- I tend to skip all the discussions people have had on this stuff. I just avoid the need for circular refs and weak refs :-) Most of the need for weak refs would disappear with some simple form of GC installed. And it seems we'll have that by 1.7. Cheers, -g -- Greg Stein, http://www.lyra.org/ From moshez at math.huji.ac.il Sat Apr 22 11:46:29 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 22 Apr 2000 12:46:29 +0300 (IDT) Subject: [Python-Dev] Inspiration In-Reply-To: Message-ID: On Sat, 22 Apr 2000, Greg Stein wrote: > Yet another numeric format? eek. If anything, we should be dropping octal, > rather than adding binary. > > You want binary? Just use int("10010", 2). No need for more syntax. Damn, but you're right. > Most of the need for weak refs would disappear with some simple form of GC > installed. And it seems we'll have that by 1.7. Disagree. Think "destructors": with weak references, there's no problems: the referant dies first, and if later, the referer needs the referant to die, well, he'll get a "DeletionError: this object does not exist anymore" in his face, which is alright, because a weak referant should not trust the reference to live. 90%-of-the-cyclic-__del__-full-trash-problem-would-go-away-ly y'rs, Z. -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From tismer at tismer.com Sat Apr 22 13:53:57 2000 From: tismer at tismer.com (Christian Tismer) Date: Sat, 22 Apr 2000 13:53:57 +0200 Subject: [Python-Dev] Re: marking shared-ness References: Message-ID: <390192D5.57443E99@tismer.com> Greg, Greg Stein wrote: Presuming you can keep your lock contention low, then your overall > performances *goes up* once you have a multiprocessor machine. Sure, each > processor runs Python (say) 10% slower, but you have *two* of them going. > That is 180% compared to a central-lock Python on an MP machine. Why didn't I think of this. MP is a very very good point. Makes now all much sense to me. sorry for being dumb - happy easter - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From amk1 at erols.com Sat Apr 22 21:51:47 2000 From: amk1 at erols.com (A.M. Kuchling) Date: Sat, 22 Apr 2000 15:51:47 -0400 Subject: [Python-Dev] 1.6 speed Message-ID: <200004221951.PAA09193@mira.erols.com> Python 1.6a2 is around 10% slower than 1.5 on pystone. Any idea why? [amk at mira Python-1.6a2]$ ./python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.59 This machine benchmarks at 2785.52 pystones/second [amk at mira Python-1.6a2]$ python1.5 Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 3.19 This machine benchmarks at 3134.8 pystones/second --amk From tismer at tismer.com Sun Apr 23 04:21:47 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 23 Apr 2000 04:21:47 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> Message-ID: <39025E3B.35639080@tismer.com> "A.M. Kuchling" wrote: > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > Any idea why? > > [amk at mira Python-1.6a2]$ ./python Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.59 > This machine benchmarks at 2785.52 pystones/second > > [amk at mira Python-1.6a2]$ python1.5 Lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 3.19 > This machine benchmarks at 3134.8 pystones/second Hee hee :-) D:\python>python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.92135 This machine benchmarks at 5204.66 pystones/second D:\python>cd \python16 D:\Python16>python Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 2.06234 This machine benchmarks at 4848.86 pystones/second D:\Python16>cd \python\spc D:\python\spc>python Lib/test/pystone.py python: can't open file 'Lib/test/pystone.py' D:\python\spc>python ../Lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.81034 This machine benchmarks at 5523.82 pystones/second More hee hee :-) Python has been at a critical size with its main loop. The recently added extra code exceeds this size. I had the same effect with Stackless Python, and I worked around it already. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From mhammond at skippinet.com.au Sun Apr 23 04:21:01 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun, 23 Apr 2000 12:21:01 +1000 Subject: [Python-Dev] 1.6 speed In-Reply-To: <39025E3B.35639080@tismer.com> Message-ID: > Python has been at a critical size with its main loop. > The recently added extra code exceeds this size. > I had the same effect with Stackless Python, and > I worked around it already. OK - so let us in on your secret! :-) Were your work-arounds specific to the stackless work, or could they be applied here? Only-2-more-years-of-beating-up-Guido-before-stackless-time-ly, Mark. From tismer at tismer.com Sun Apr 23 16:43:10 2000 From: tismer at tismer.com (Christian Tismer) Date: Sun, 23 Apr 2000 16:43:10 +0200 Subject: [Python-Dev] 1.6 speed References: Message-ID: <39030BFE.1675EE20@tismer.com> Mark Hammond wrote: > > > Python has been at a critical size with its main loop. > > The recently added extra code exceeds this size. > > I had the same effect with Stackless Python, and > > I worked around it already. > > OK - so let us in on your secret! :-) > > Were your work-arounds specific to the stackless work, or could they be applied here? My work-arounds originated from code from last January where I was on a speed trip, but with the (usual) low interest from Guido. Then, with Stackless I saw a minor speed loss and finally came to the conclusion that I would be good to apply my patches to my Python version. That was nothing special so far, and Stackless was still a bit slow. I though this came from the different way to call functions for quite a long time, until I finally found out this February: The central loop of the Python interpreter is at a critical size for caching. Speed depends very much on which code gets near which other code, and how big the whole interpreter loop is. What I did: - Un-inlined several code pieces again, back into functions in order to make the big switch smaller. - simplified error handling, especially I ensured that all local error variables have very short lifetime and are optimized away - simplified the big switch, tuned the why_code handling into special opcodes, therefore the whole things gets much simpler. This reduces code size and therefore the probability that we are in the cache, and due to short variable lifetime and a simpler loop structure, the compiler seems to do a better job of code ordering. > Only-2-more-years-of-beating-up-Guido-before-stackless-time-ly, Yup, and until then I will not apply my patches to Python, this is part of my license: Use it but only *with* Stackless. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at trixie.triqs.com Mon Apr 24 00:16:38 2000 From: tismer at trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 00:16:38 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> Message-ID: <39037646.DEF8A139@trixie.triqs.com> "A.M. Kuchling" wrote: > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > Any idea why? I submitted a comparison with Stackless Python. Now I actually applied the Stackless Python patches to the current CVS version. My version does again show up as faster than standard Python, with the same relative measures, but I too have this effect: Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. Claim: This is not related to ceval.c . Something else must have introduced a significant speed loss. Stackless Python, upon the pre-unicode tag version of CVS: D:\python\spc>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.80724 This machine benchmarks at 5533.29 pystones/second Stackless Python, upon the recent version of CVS: D:\python\spc\Python-cvs\PCbuild>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.94941 This machine benchmarks at 5129.75 pystones/second Less than 10 percent, but bad enough. I guess we have to use MAL's test suite and measure everything alone. ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at trixie.triqs.com Mon Apr 24 00:45:12 2000 From: tismer at trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 00:45:12 +0200 Subject: [Python-Dev] 1.6 speed References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> Message-ID: <39037CF8.24E1D1BD@trixie.triqs.com> Ack, sorry. Please drop the last message. This one was done with the correct dictionaries. :-() Christian Tismer wrote: > > "A.M. Kuchling" wrote: > > > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > > Any idea why? > > I submitted a comparison with Stackless Python. > Now I actually applied the Stackless Python patches > to the current CVS version. > > My version does again show up as faster than standard Python, > with the same relative measures, but I too have this effect: > > Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. > > Claim: > This is not related to ceval.c . > Something else must have introduced a significant speed loss. > > Stackless Python, upon the pre-unicode tag version of CVS: > > D:\python\spc>python ../lib/test/pystone.py > Pystone(1.1) time for 10000 passes = 1.80724 > This machine benchmarks at 5533.29 pystones/second > > Stackless Python, upon the recent version of CVS: > this one corrected: D:\python\spc\Python-slp\PCbuild>python ../lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.98433 This machine benchmarks at 5039.49 pystones/second > Less than 10 percent, but bad enough. It is 10 percent, and bad enough. > > I guess we have to use MAL's test suite and measure everything > alone. > > ciao - chris > > -- > Christian Tismer :^) > Applied Biometrics GmbH : Have a break! Take a ride on Python's > Kaunstr. 26 : *Starship* http://starship.python.net > 14163 Berlin : PGP key -> http://wwwkeys.pgp.net > PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF > where do you want to jump today? http://www.stackless.com > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Mon Apr 24 15:03:56 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 09:03:56 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 14:49:11 +0200." <390442C7.F30179D9@trixie.triqs.com> References: <390442C7.F30179D9@trixie.triqs.com> Message-ID: <200004241303.JAA19894@eric.cnri.reston.va.us> [Moving this to python-dev because it's a musing > > The main point is to avoid string.*. > > Agreed. Also replacing map by a loop might not even be slower. > What remains as open question: Several modules need access > to string constants, and they therefore still have to import > string. > Is there an elegant solution to this? import string > That's why i asked for some way to access "".__class__ or > whatever, to get into some common namespace with the constants. I dunno. However, I've noticed that in many situations where map() could be used with a string.* function (*if* you care about the speed-up and you don't care about the readability issue), there's no equivalent that uses the new string methods. This stems from the fact that map() wants a function, not a method. Python 3000 solves this partly, assuming types and classes are unified there. Where in 1.5 we wrote map(string.strip, L) in Python 3K we will be able to write map("".__class__.strip, L) However, this is *still* not as powerful as map(lambda s: s.strip(), L) because the former requires that all items in L are in fact strings, while the latter works for anything with a strip() method (in particular Unicode objects and UserString instances). Maybe Python 3000 should recognize map(lambda) and generate more efficient code for it... --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at trixie.triqs.com Mon Apr 24 16:01:26 2000 From: tismer at trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 16:01:26 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> Message-ID: <390453B6.745E852B@trixie.triqs.com> > Christian Tismer wrote: > > > > "A.M. Kuchling" wrote: > > > > > > Python 1.6a2 is around 10% slower than 1.5 on pystone. > > > Any idea why? ... > > Stackless 1.5.2+ is 10 percent faster than Stackless 1.6a2. > > > > Claim: > > This is not related to ceval.c . > > Something else must have introduced a significant speed loss. I guess I can explain now what's happening, at least for the Windows platform. Python 1.5.2's .dll was nearly about 512K, something more. I think to remember that 512K is a common size of the secondary cache. Now, linking with the MS linker does not give you any particularly useful order of modules. When I look into the map file, the modules appear sorted by name. This is for sure not providing optimum performance. As I read the docs, explicit ordering of the linkage would only make sense for C++ and wouldn't work out for C, since we could order the exported functions, but not the private ones, giving even more distance between releated code. My solution to see if I might be right was this: I ripped out almost all builtin extension modules and compiled/linked without them. This shrunk the dll size down from 647K to 557K, very close to the 1.5.2 size. Now I get the following figures: Python 1.6, with stackless patches: D:\python\spc\Python-slp\PCbuild>python /python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 1.95468 This machine benchmarks at 5115.92 pystones/second Python 1.6, from the dist: D:\Python16>python /python/lib/test/pystone.py Pystone(1.1) time for 10000 passes = 2.09214 This machine benchmarks at 4779.8 pystones/second That means my optimizations are in charge again, after the overall code size went below about 512K. I think these 10 percent are quite valuable. These options come to my mind: a) try to do optimum code ordering in the too large .dll . This seems to be hard to achieve. b) Split the dll into two dll's in a way that all the necessary internal stuff sits closely in one of them. c) try to split the library like above, but use a static library layout for one of them, and link the static library into the final dll. This would hopefully keep related things together. I don't know if c) is possible, but it might be tried. Any thoughts? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From guido at python.org Mon Apr 24 17:11:14 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 11:11:14 -0400 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/PCbuild winsound.dsp,NONE,1.1 _socket.dsp,1.1,1.2 _sre.dsp,1.2,1.3 _tkinter.dsp,1.13,1.14 bsddb.dsp,1.9,1.10 mmap.dsp,1.2,1.3 parser.dsp,1.8,1.9 pyexpat.dsp,1.2,1.3 python.dsp,1.10,1.11 python16.dsp,1.2,1.3 pythonw.dsp,1.8,1.9 select.dsp,1.1,1.2 unicodedata.dsp,1.1,1.2 zlib.dsp,1.10,1.11 In-Reply-To: Your message of "Fri, 21 Apr 2000 16:19:03 PDT." References: Message-ID: <200004241511.LAA28854@eric.cnri.reston.va.us> > And here is what one of the link lines looks like: > > # ADD LINK32 ApacheCore.lib aprlib.lib kernel32.lib /nologo > /base:@BaseAddr.ref,mod_usertrack /subsystem:windows /dll /map /debug > /machine:I386 /libpath:"..\..\CoreD" /libpath:"..\..\lib\apr\Debug" > > This mechanism could be quite helpful for Python. The .ref file replaces > the dllbase_nt.txt file, centralizes the management, and directly > integrates with the tools. I agree. Just send me patches -- I'm *really* overwhelmed with patch management at the moment, I don't feel like coming up with new code right now... :-( --Guido van Rossum (home page: http://www.python.org/~guido/) From tismer at trixie.triqs.com Mon Apr 24 17:19:41 2000 From: tismer at trixie.triqs.com (Christian Tismer) Date: Mon, 24 Apr 2000 17:19:41 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> Message-ID: <3904660D.6F22F798@trixie.triqs.com> Sorry, it was not really found... Christian Tismer wrote: [thought he had found the speed leak] After re-inserting all the builtin modules, I got nearly the same result after a complete re-build, just marginally slower. There must something else be happening that I cannot understand. Stackless Python upon 1.5.2+ is still nearly 10 percent faster, regardless what I do to Python 1.6. Testing whether Unicode has some effect? I changed PyUnicode_Check to always return 0. This should optimize most related stuff away. Result: No change at all! Which changes were done after the pre-unicode tag, which might really count for performance? I'm quite desperate, any ideas? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tim_one at email.msn.com Tue Apr 25 02:56:18 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 24 Apr 2000 20:56:18 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004241303.JAA19894@eric.cnri.reston.va.us> Message-ID: <000101bfae51$10f467a0$3ea0143f@tim> [Guido] > ... > However, this is *still* not as powerful as > > map(lambda s: s.strip(), L) > > because the former requires that all items in L are in fact strings, > while the latter works for anything with a strip() method (in > particular Unicode objects and UserString instances). > > Maybe Python 3000 should recognize map(lambda) and generate more > efficient code for it... [s.strip() for s in L] That is, list comprehensions solved the speed, generality and clarity problems here before they were discovered . From guido at python.org Tue Apr 25 03:21:42 2000 From: guido at python.org (Guido van Rossum) Date: Mon, 24 Apr 2000 21:21:42 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 20:56:18 EDT." <000101bfae51$10f467a0$3ea0143f@tim> References: <000101bfae51$10f467a0$3ea0143f@tim> Message-ID: <200004250121.VAA00320@eric.cnri.reston.va.us> > > Maybe Python 3000 should recognize map(lambda) and generate more > > efficient code for it... > > [s.strip() for s in L] > > That is, list comprehensions solved the speed, generality and clarity > problems here before they were discovered . Ah! I knew there had to be a solution without lambda! :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Apr 25 05:19:35 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 24 Apr 2000 22:19:35 -0500 (CDT) Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <000101bfae51$10f467a0$3ea0143f@tim> References: <200004241303.JAA19894@eric.cnri.reston.va.us> <000101bfae51$10f467a0$3ea0143f@tim> Message-ID: <14597.3783.737317.226791@beluga.mojam.com> Tim> [s.strip() for s in L] Tim> That is, list comprehensions solved the speed, generality and Tim> clarity problems here before they were discovered . What is the status of list comprehensions in Python? I remember some work being done several months ago. They definitely don't appear to be in the 1.6a2. Was there some reason to defer them until later? Skip From tim_one at email.msn.com Tue Apr 25 05:26:24 2000 From: tim_one at email.msn.com (Tim Peters) Date: Mon, 24 Apr 2000 23:26:24 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <14597.3783.737317.226791@beluga.mojam.com> Message-ID: <000801bfae66$09191840$e72d153f@tim> [Skip Montanaro] > What is the status of list comprehensions in Python? I remember some work > being done several months ago. They definitely don't appear to be in the > 1.6a2. Was there some reason to defer them until later? Greg Ewing posted a patch to c.l.py that implemented a good start on the proposal. But nobody has pushed it. I had hoped to, but ran out of time; not sure Guido even knows about Greg's patch. Perhaps the 1.6 source distribution could contain a new "intriguing experimental patches" directory? Greg's list-comp and Christian's Stackless have enough fans that this would probably be appreciated. Perhaps some other things too, if we all run out of time (thinking mostly of Vladimir's malloc cleanup and NeilS's gc). From guido at python.org Tue Apr 25 06:13:51 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 00:13:51 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: Your message of "Mon, 24 Apr 2000 23:26:24 EDT." <000801bfae66$09191840$e72d153f@tim> References: <000801bfae66$09191840$e72d153f@tim> Message-ID: <200004250413.AAA00577@eric.cnri.reston.va.us> > Greg Ewing posted a patch to c.l.py that implemented a good start on the > proposal. But nobody has pushed it. I had hoped to, but ran out of time; > not sure Guido even knows about Greg's patch. I vaguely remember, but not really. We did use his f(*args, **kwargs) patches as a starting point for a 1.6 feature though -- if the list comprehensions are in a similar state, they'd be great to start but definitely need work. > Perhaps the 1.6 source distribution could contain a new "intriguing > experimental patches" directory? Greg's list-comp and Christian's Stackless > have enough fans that this would probably be appreciated. Perhaps some > other things too, if we all run out of time (thinking mostly of Vladimir's > malloc cleanup and NeilS's gc). Perhaps a webpage woule make more sense? There's no point in loading every download with this. And e.g. stackless evolves at a much faster page than core Python. I definitely want Vladimir's patches in -- I feel very guilty for not having reviewed his latest proposal yet. I expect that it's right on the mark, but I understand if Vladimir wants to wait with preparing yet another set of patches until I'm happy with the design... --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at mojam.com Tue Apr 25 06:37:42 2000 From: skip at mojam.com (Skip Montanaro) Date: Mon, 24 Apr 2000 23:37:42 -0500 (CDT) Subject: [Python-Dev] list comprehensions patch - updated for current CVS version Message-ID: <14597.8470.495090.799119@beluga.mojam.com> For those folks that might want to fiddle with list comprehensions I tweaked Greg Ewing's list comprehensions patch to work with the current CVS tree. The attached gzip'd patch contains diffs for Grammar/Grammar Include/graminit.h Lib/test/test_grammar.py Lib/test/output/test_grammar Python/compile.c Python/graminit.c I would have updated the corresponding section of the language reference, but the BNF there didn't match the contents of Grammar/Grammar, so I was a bit unclear what needed doing. If it gets that far perhaps someone else can contribute the necessary verbiage or at least point me in the right direction. -- Skip Montanaro | http://www.mojam.com/ skip at mojam.com | http://www.musi-cal.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: listcomp.patch.gz Type: application/octet-stream Size: 4814 bytes Desc: list comprehensions patch for Python URL: From Vladimir.Marangozov at inrialpes.fr Tue Apr 25 08:13:32 2000 From: Vladimir.Marangozov at inrialpes.fr (Vladimir Marangozov) Date: Tue, 25 Apr 2000 08:13:32 +0200 (CEST) Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004250413.AAA00577@eric.cnri.reston.va.us> from "Guido van Rossum" at Apr 25, 2000 12:13:51 AM Message-ID: <200004250613.IAA10174@python.inrialpes.fr> Hi, I'm back on-line. > [Tim] > > Perhaps the 1.6 source distribution could contain a new "intriguing > > experimental patches" directory? Greg's list-comp and Christian's > > Stackless have enough fans that this would probably be appreciated. > > Perhaps some other things too, if we all run out of time (thinking > > mostly of Vladimir's malloc cleanup and NeilS's gc). I'd be in favor of including gc as an optional (experimental) feature. I'm quite confident that it will evolve into a standard feature, in its current or in an improved state. The overall strategy looks good, but there are some black spots w.r.t its cost, both in speed and space. Neil reported in private mail something like 5-10% mem increase, but I doubt that the picture is so optimistic. My understanding is that these numbers reflect the behavior of the Linux VMM in terms of effectively used pages. In terms of absolute, peak requested virtual memory, things are probably worse than that. We're still unclear on this... For 1.6, the gc option would be a handy tool for detecting cyclic trash. It will answer some expectations, and I believe we're ready to give some good feedback on its functioning, its purpose, its limitations, etc. By the time 1.6 is finalized, I expect that we'll know roughly its cost in terms of mem overhead. Overall, it would be nice to have it in the distrib as an experimental feature -- it would both bootstrap some useful feedback, and would encourage enthousiasts to look more closely at DSA/GC (DSA - dynamic storage allocation). By 1.7 (with Py3K on the horizon), we would have a good understanding on what to do with gc and how to do it. If I go one step further, what I expect is that the garbage collector would be enabled together with a Python-specific memory allocator which will compensate the cost introduced by the collector. There will some some stable state again (in terms of speed and size) similar to what we have now, but with a bonus pack of additional memory services. > I definitely want Vladimir's patches in -- I feel very guilty for not > having reviewed his latest proposal yet. I expect that it's right on > the mark, but I understand if Vladimir wants to wait with preparing > yet another set of patches until I'm happy with the design... Yes, I'd prefer to wait and get it right. There's some basis, but it needs careful rethinking again. I'm willing to fit in the 1.6 timeline but I understand very well that it's a matter of time :-). -- Vladimir MARANGOZOV | Vladimir.Marangozov at inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252 From tim_one at email.msn.com Tue Apr 25 08:25:36 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:25:36 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: <000701bfae7f$1174a540$152d153f@tim> [Greg Stein] > ... > Many people have asked for free-threading, and the number of inquiries > that I receive have grown over time. (nobody asked in 1996 when I first > published my patches; I get a query every couple months now) Huh! That means people ask me about it more often than they ask you . I'll add, though, that you have to dig into the inquiry: almost everyone who asks me is running on a uniprocessor machine, and are really after one of two other things: 1. They expect threaded stuff to run faster if free-threaded. "Why?" is a question I can't answer <0.5 wink>. 2. Dealing with the global lock drives them insane, especially when trying to call back into Python from a "foreign" C thread. #2 may be fixable via less radical means (like a streamlined procedure enabled by some relatively minor core interpreter changes, and clearer docs). I'm still a fan of free-threading! It's just one of those things that may yield a "well, ya, that's what I asked for, but turns out it's not what I *wanted*" outcome as often as not. enthusiastically y'rs - tim From tim_one at email.msn.com Tue Apr 25 08:25:38 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:25:38 -0400 Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: Message-ID: <000801bfae7f$12c456c0$152d153f@tim> [Greg Wilson, on Linda and JavaSpaces] > ... > Personal opinion: I've felt for 15 years that something like Linda could > be to threads and mutexes what structured loops and conditionals are to > the "goto" statement. Were it not for the "Huh" effect, I'd recommend > hanging "Danger!" signs over threads and mutexes, and making tuple spaces > the "standard" concurrency mechanism in Python. There's no question about tuple spaces being easier to learn and to use, but Python slams into a conundrum here akin to the "floating-point versus *anything* sane " one: Python's major real-life use is as a glue language, and threaded apps (ditto IEEE-754 floating-point apps) are overwhelmingly what it needs to glue *to*. So Python has to have a good thread story. Free-threading would be a fine enhancement of it, Tuple spaces (spelled "PyBrenda" or otherwise) would be a fine alternative to it, but Python can't live without threads too. And, yes, everyone who goes down Hoare's CSP road gets lost <0.7 wink>. From tim_one at email.msn.com Tue Apr 25 08:40:26 2000 From: tim_one at email.msn.com (Tim Peters) Date: Tue, 25 Apr 2000 02:40:26 -0400 Subject: [Python-Dev] map() methods (was: Re: [Patches] Review (was: Please review before applying)) In-Reply-To: <200004250613.IAA10174@python.inrialpes.fr> Message-ID: <000901bfae81$240b5580$152d153f@tim> [Vladimir Marangozov, on NeilS's gc patch] > ... > The overall strategy looks good, but there are some black spots > w.r.t its cost, both in speed and space. Neil reported in private > mail something like 5-10% mem increase, but I doubt that the picture > is so optimistic. My understanding is that these numbers reflect > the behavior of the Linux VMM in terms of effectively used pages. In > terms of absolute, peak requested virtual memory, things are probably > worse than that. We're still unclear on this... Luckily, that's what Open Source is all about: if we have to wait for you (or Neil, or Guido, or anyone else) to do a formal study of the issue, the patch will never go in. Put the code out there and let people try it, and 50 motivated users will run the only 50 tests that really matter: i.e., does their real code suffer or not? If so, a few of them may even figure out why. less-thought-more-eyeballs-ly y'rs - tim From mal at lemburg.com Tue Apr 25 11:43:46 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 25 Apr 2000 11:43:46 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code Message-ID: <390568D2.2CC50766@lemburg.com> After the discussion about #pragmas two weeks ago and some interesting ideas in the direction of source code encodings and ways to implement them, I would like to restart the talk about encodings in source code and runtime auto-conversions. Fredrik recently posted patches to the patches list which loosen the currently hard-coded default encoding used throughout the Unicode design and add a layer of abstraction which would make it easily possible to change the default encoding at some later point. While making things more abstract is certainly a wise thing to do, I am not sure whether this particular case fits into the design decisions made a few months ago. Here's a short summary of what was discussed recently: 1. Fredrik posted the idea of changing the default encoding from UTF-8 to Latin-1 (he calls this 8-bit Unicode which points to the motivation behind this: 8-bit strings should behave like 8-bit Unicode). His recent patches work into this direction. 2. Fredrik also posted an interesting idea which enables writing Python source code in any supported encoding by having the Python tokenizer read Py_UNICODE data instead of char data. A preprocessor would take care of converting the input to Py_UNICODE; the parser would assure that 8-bit string data gets converted back to char data (using e.g. UTF-8 or Latin-1 for the encoding) 3. Regarding the addition of pragmas to allow specifying the used source code encoding several possibilities were mentioned: - addition of a keyword "pragma" to define pragma dictionaries - usage of a "global" as basis for this - adding a new keyword "decl" which also allows defining other things such as type information - XML like syntax embedded into Python comments Some comments: Ad 1. UTF-8 is used as basis in many other languages such as TCL or Perl. It is not an intuitive way of writing strings and causes problems due to one character spanning 1-6 bytes. Still, the world seems to be moving into this direction, so going the same way can't be all wrong... Note that stream IO can be recoded in a way which allows Python to print and read e.g. Latin-1 (see below). The general idea behind the fixed default encoding design was to give all the power to the user, since she eventually knows best which encoding to use or expect. Ad 2. I like this idea because it enables writing Unicode- aware programs *in* Unicode... the only problem which remains is again the encoding to use for the classic 8-bit strings. Ad 3. For 2. to work, the encoding would have to appear close to the top of the file. The preprocessor would have to be BOM-mark aware to tell whether UTF-16 or some ASCII extension is used by the file. Guido asked me for some code which demonstrates Latin-1 recoding using the existing mechanisms. I've attached a simple script to this mail. It is not much tested yet, so please give it a try. You can also change it to use any other encoding you like. Together with the Japanese codecs provided by Tamito Kajiyama (http://pseudo.grad.sccs.chukyo-u.ac.jp/~kajiyama/tmp/japanese-codecs.tar.gz) you should be able to type Shift-JIS at the raw_input() or interactive prompt, have it stored as UTF-8 and then printed back as Shift-JIS, provided you put add a recoder similar to the attached one for Latin-1 to your PYTHONSTARTUP or site.py script. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ -------------- next part -------------- A non-text attachment was scrubbed... Name: latin1io.py Type: text/python Size: 1740 bytes Desc: not available URL: From effbot at telia.com Tue Apr 25 17:16:25 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 25 Apr 2000 17:16:25 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> Message-ID: <00a401bfaec9$3aaae100$34aab5d4@hagrid> I'll follow up with a longer reply later; just one correction: M.-A. Lemburg wrote: > Ad 1. UTF-8 is used as basis in many other languages such > as TCL or Perl. It is not an intuitive way of > writing strings and causes problems due to one character > spanning 1-6 bytes. Still, the world seems to be moving > into this direction, so going the same way can't be all > wrong... the problem here is the current Python implementation doesn't use UTF-8 in the same way as Perl and Tcl. Perl and Tcl only exposes one string type, and that type be- haves exactly like it should: "The Tcl string functions properly handle multi- byte UTF-8 characters as single characters." "By default, Perl now thinks in terms of Unicode characters instead of simple bytes. /.../ All the relevant built-in functions (length, reverse, and so on) now work on a character-by-character basis instead of byte-by-byte, and strings are represented internally in Unicode." or in other words, both languages guarantee that given a string s: - s is a sequence of characters (not bytes) - len(s) is the number of characters in the string - s[i] is the i'th character - len(s[i]) is 1 and as I've pointed out a zillion times, Python 1.6a2 doesn't. this should be solved, and I see (at least) four ways to do that: -- the Tcl 8.1 way: make 8-bit strings UTF-8 aware. operations like len and getitem usually searches from the start of the string. to handle binary data, introduce a special ByteArray type. when mixing ByteArrays and strings, treat each byte in the array as an 8-bit unicode character (conversions from strings to byte arrays are lossy). [imho: lots of code, and seriously affects performance, even when unicode characters are never used. this approach was abandoned in Tcl 8.2] -- the Tcl 8.2 way: use a unified string type, which stores data as UTF-8 and/or 16-bit unicode: struct { char* bytes; /* 8-bit representation (utf-8) */ Tcl_UniChar* unicode; /* 16-bit representation */ } if one of the strings are modified, the other is regenerated on demand. operations like len, slice and getitem always convert to 16-bit first. still need a ByteArray type, similar to the one described above. [imho: faster than before, but still not as good as a pure 8-bit string type. and the need for a separate byte array type would break alot of existing Python code] -- the Perl 5.6 way? (haven't looked at the implementation, but I'm pretty sure someone told me it was done this way). essentially same as Tcl 8.2, but with an extra encoding field (to avoid con- versions if data is just passed through). struct { int encoding; char* bytes; /* 8-bit representation */ Tcl_UniChar* unicode; /* 16-bit representation */ } [imho: see Tcl 8.2] -- my proposal: expose both types, but let them contain characters from the same character set -- at least when used as strings. as before, 8-bit strings can be used to store binary data, so we don't need a separate ByteArray type. in an 8-bit string, there's always one character per byte. [imho: small changes to the existing code base, about as efficient as can be, no attempt to second-guess the user, fully backwards com- patible, fully compliant with the definition of strings in the language reference, patches are available, etc...] From jeremy at cnri.reston.va.us Tue Apr 25 19:20:44 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Tue, 25 Apr 2000 13:20:44 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <3904660D.6F22F798@trixie.triqs.com> References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> Message-ID: <14597.54252.185633.504968@goon.cnri.reston.va.us> The performance difference I see on my Sparc is smaller. The machine is a 200MHz Ultra Sparc 2 with 256MB of RAM, built both versions with GCC 2.8.1. It appears that 1.6a2 is about 3.3% slower. The median pystone time taken from 10 measurements are: 1.5.2 4.87 1.6a2 5.035 For comparison, the numbers I see on my Linux box (dual PII 266) are: 1.5.2 3.18 1.6a2 3.53 That's about 10% faster under 1.5.2. I'm not sure how important this change is. Three percent isn't enough for me to worry about, but it's a minority platform. I suppose 10 percent is right on the cusp. If the performance difference is the cost of the many improvements of 1.6, I think it's worth the price. Jeremy From tismer at tismer.com Tue Apr 25 20:12:39 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 25 Apr 2000 20:12:39 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> <14597.54252.185633.504968@goon.cnri.reston.va.us> Message-ID: <3905E017.1565757C@tismer.com> Jeremy Hylton wrote: > > The performance difference I see on my Sparc is smaller. The machine > is a 200MHz Ultra Sparc 2 with 256MB of RAM, built both versions with > GCC 2.8.1. It appears that 1.6a2 is about 3.3% slower. > > The median pystone time taken from 10 measurements are: > 1.5.2 4.87 > 1.6a2 5.035 > > For comparison, the numbers I see on my Linux box (dual PII 266) are: > > 1.5.2 3.18 > 1.6a2 3.53 > > That's about 10% faster under 1.5.2. Which GCC was it on the Linux box, and how much RAM does it have? > I'm not sure how important this change is. Three percent isn't enough > for me to worry about, but it's a minority platform. I suppose 10 > percent is right on the cusp. If the performance difference is the > cost of the many improvements of 1.6, I think it's worth the price. Yes, and I'm happy to pay the price if I can see where I pay. That's the problem, the changes between the pre-unicode tag and the current CVS are not enough to justify that speed loss. There must be something substantial. I also don't grasp why my optimizations are so much more powerful on 1.5.2+ as on 1.6 . Mark Hammond pointed me to the int/long unification. Was this done *after* the unicode patches? ciao - chris -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From tismer at tismer.com Tue Apr 25 20:27:20 2000 From: tismer at tismer.com (Christian Tismer) Date: Tue, 25 Apr 2000 20:27:20 +0200 Subject: [Python-Dev] Off-topic Message-ID: <3905E388.2C1911C1@tismer.com> Hey, don't blame me for posting a joke :-) Please read from the beginning, don't look at the end first. No, this is no offense... -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com -------------- next part -------------- An embedded message was scrubbed... From: "A.Bergmann bei BRAHMS" Subject: Moin..... Date: Tue, 25 Apr 2000 09:07:49 +0200 Size: 2723 URL: From mal at lemburg.com Tue Apr 25 22:13:39 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Tue, 25 Apr 2000 22:13:39 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> Message-ID: <3905FC73.7D7D6B1D@lemburg.com> Fredrik Lundh wrote: > > I'll follow up with a longer reply later; just one correction: > > M.-A. Lemburg wrote: > > Ad 1. UTF-8 is used as basis in many other languages such > > as TCL or Perl. It is not an intuitive way of > > writing strings and causes problems due to one character > > spanning 1-6 bytes. Still, the world seems to be moving > > into this direction, so going the same way can't be all > > wrong... > > the problem here is the current Python implementation > doesn't use UTF-8 in the same way as Perl and Tcl. Perl > and Tcl only exposes one string type, and that type be- > haves exactly like it should: > > "The Tcl string functions properly handle multi- > byte UTF-8 characters as single characters." > > "By default, Perl now thinks in terms of Unicode > characters instead of simple bytes. /.../ All the > relevant built-in functions (length, reverse, and > so on) now work on a character-by-character > basis instead of byte-by-byte, and strings are > represented internally in Unicode." > > or in other words, both languages guarantee that given a > string s: > > - s is a sequence of characters (not bytes) > - len(s) is the number of characters in the string > - s[i] is the i'th character > - len(s[i]) is 1 > > and as I've pointed out a zillion times, Python 1.6a2 doesn't. Just a side note: we never discussed turning the native 8-bit strings into any encoding aware type. > this > should be solved, and I see (at least) four ways to do that: > > ... > -- the Perl 5.6 way? (haven't looked at the implementation, but I'm > pretty sure someone told me it was done this way). essentially > same as Tcl 8.2, but with an extra encoding field (to avoid con- > versions if data is just passed through). > > struct { > int encoding; > char* bytes; /* 8-bit representation */ > Tcl_UniChar* unicode; /* 16-bit representation */ > } > > [imho: see Tcl 8.2] > > -- my proposal: expose both types, but let them contain characters > from the same character set -- at least when used as strings. > > as before, 8-bit strings can be used to store binary data, so we > don't need a separate ByteArray type. in an 8-bit string, there's > always one character per byte. > > [imho: small changes to the existing code base, about as efficient as > can be, no attempt to second-guess the user, fully backwards com- > patible, fully compliant with the definition of strings in the language > reference, patches are available, etc...] Why not name the beast ?! In your proposal, the old 8-bit strings simply use Latin-1 as native encoding. The current version doesn't make any encoding assumption as long as the 8-bit strings do not get auto-converted. In that case they are interpreted as UTF-8 -- which will (usually) fail for Latin-1 encoded strings using the 8th bit, but hey, at least you get an error message telling you what is going wrong. The key to these problems is using explicit conversions where 8-bit strings meet Unicode objects. Some more ideas along the convenience path: Perhaps changing just the way 8-bit strings are coerced to Unicode would help: strings would then be interpreted as Latin-1. str(Unicode) and "t" would still return UTF-8 to assure loss-less conversion. Another way to tackle this would be to first try UTF-8 conversion during auto-conversion and then fallback to Latin-1 in case it fails. Has anyone tried this ? Guido mentioned that TCL does something along these lines... -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From akuchlin at mems-exchange.org Tue Apr 25 22:54:11 2000 From: akuchlin at mems-exchange.org (Andrew M. Kuchling) Date: Tue, 25 Apr 2000 16:54:11 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <3905E017.1565757C@tismer.com> References: <200004221951.PAA09193@mira.erols.com> <39037646.DEF8A139@trixie.triqs.com> <39037CF8.24E1D1BD@trixie.triqs.com> <390453B6.745E852B@trixie.triqs.com> <3904660D.6F22F798@trixie.triqs.com> <14597.54252.185633.504968@goon.cnri.reston.va.us> <3905E017.1565757C@tismer.com> Message-ID: <14598.1523.533352.759437@amarok.cnri.reston.va.us> Christian Tismer writes: >Mark Hammond pointed me to the int/long unification. >Was this done *after* the unicode patches? Before. It seems unlikely they're the cause (they just add a 'if (PyLong_Check(key)' branch to the slicing functions in abstract.c. OTOH, if pystone really exercises sequence multiplication, maybe they're related (but 10% worth?). -- A.M. Kuchling http://starship.python.net/crew/amk/ I know flattery when I hear it; but I do not often hear it. -- Robertson Davies, _Fifth Business_ From effbot at telia.com Tue Apr 25 23:51:45 2000 From: effbot at telia.com (Fredrik Lundh) Date: Tue, 25 Apr 2000 23:51:45 +0200 Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.104,1.105 References: <200004252134.RAA02207@eric.cnri.reston.va.us> Message-ID: <002601bfaf00$74d462c0$34aab5d4@hagrid> > + insint(d, "MSG_DONWAIT", MSG_DONTWAIT); better make that > + insint(d, "MSG_DONTWAIT", MSG_DONTWAIT); right? From effbot at telia.com Wed Apr 26 00:05:54 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 00:05:54 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> <3905FC73.7D7D6B1D@lemburg.com> Message-ID: <002701bfaf02$734a2e60$34aab5d4@hagrid> M.-A. Lemburg wrote: > > and as I've pointed out a zillion times, Python 1.6a2 doesn't. > > Just a side note: we never discussed turning the native > 8-bit strings into any encoding aware type. hey, you just argued that we should use UTF-8 because Tcl and Perl use it, didn't you? my point is that they don't use it the way Python 1.6a2 uses it, and that their design is correct, while our design is slightly broken. so let's fix it ! > Why not name the beast ?! In your proposal, the old 8-bit > strings simply use Latin-1 as native encoding. in my proposal, there's an important distinction between character sets and character encodings. unicode is a character set. latin 1 is one of many possible encodings of (portions of) that set. maybe it's easier to grok if we get rid of the term "character set"? http://www.hut.fi/u/jkorpela/chars.html suggests the following replacements: character repertoire A set of distinct characters. character code A mapping, often presented in tabular form, which defines one-to-one correspondence between characters in a character repertoire and a set of nonnegative integers. character encoding A method (algorithm) for presenting characters in digital form by mapping sequences of code numbers of characters into sequences of octets. now, in my proposal, the *repertoire* contains all characters described by the unicode standard. the *codes* are defined by the same standard. but strings are sequences of characters, not sequences of octets: strings have *no* encoding. (the encoding used for the internal string storage is an implementation detail). (but sure, given the current implementation, the internal storage for an 8-bit string happens use Latin-1. just as the internal storage for a 16-bit string happens to use UCS-2 stored in native byte order. but from the outside, they're just character sequences). > The current version doesn't make any encoding assumption as > long as the 8-bit strings do not get auto-converted. In that case > they are interpreted as UTF-8 -- which will (usually) fail > for Latin-1 encoded strings using the 8th bit, but hey, at least > you get an error message telling you what is going wrong. sure, but I don't think you get the right message, or that you get it at the right time. consider this: if you're going from 8-bit strings to unicode using implicit con- version, the current design can give you: "UnicodeError: UTF-8 decoding error: unexpected code byte" if you go from unicode to 8-bit strings, you'll never get an error. however, the result is not always a string -- if the unicode string happened to contain any characters larger than 127, the result is a binary buffer containing encoded data. you cannot use string methods on it, you cannot use regular expressions on it. indexing and slicing won't work. unlike earlier versions of Python, and unlike unicode-aware versions of Tcl and Perl, the fundamental assumption that a string is a sequence of characters no longer holds. in my proposal, going from 8-bit strings to unicode always works. a character is a character, no matter what string type you're using. however, going from unicode to an 8-bit string may given you an OverflowError, say: "OverflowError: unicode character too large to fit in a byte" the important thing here is that if you don't get an exception, the result is *always* a string. string methods always work. etc. [8. Special cases aren't special enough to break the rules.] > The key to these problems is using explicit conversions where > 8-bit strings meet Unicode objects. yeah, but the flaw in the current design is the implicit conversions, not the explicit ones. [2. Explicit is better than implicit.] (of course, the 8-bit string type also needs an "encode" method under my proposal, but that's just a detail ;-) > Some more ideas along the convenience path: > > Perhaps changing just the way 8-bit strings are coerced > to Unicode would help: strings would then be interpreted > as Latin-1. ok. > str(Unicode) and "t" would still return UTF-8 to assure loss- > less conversion. maybe. or maybe str(Unicode) should return a unicode string? think about it! (after all, I'm pretty sure that ord() and chr() should do the right thing, also for character codes above 127) > Another way to tackle this would be to first try UTF-8 > conversion during auto-conversion and then fallback to > Latin-1 in case it fails. Has anyone tried this ? Guido > mentioned that TCL does something along these lines... haven't found any traces of that in the source code. hmm, you're right -- it looks like it attempts to "fix" invalid UTF-8 data (on a character by character basis), instead of choking on it. scary. [12. In the face of ambiguity, refuse the temptation to guess.] more tomorrow. From guido at python.org Wed Apr 26 00:35:30 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 18:35:30 -0400 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code In-Reply-To: Your message of "Tue, 25 Apr 2000 17:16:25 +0200." <00a401bfaec9$3aaae100$34aab5d4@hagrid> References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> Message-ID: <200004252235.SAA02554@eric.cnri.reston.va.us> [Fredrik] > -- my proposal: expose both types, but let them contain characters > from the same character set -- at least when used as strings. > > as before, 8-bit strings can be used to store binary data, so we > don't need a separate ByteArray type. in an 8-bit string, there's > always one character per byte. > > [imho: small changes to the existing code base, about as efficient as > can be, no attempt to second-guess the user, fully backwards com- > patible, fully compliant with the definition of strings in the language > reference, patches are available, etc...] Sorry, all this proposal does is change the default encoding on conversions from UTF-8 to Latin-1. That's very western-culture-centric. You already have control over the encoding: use unicode(s, "latin-1"). If there are places where you don't have enough control (e.g. file I/O), let's add control there. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Apr 26 01:08:39 2000 From: guido at python.org (Guido van Rossum) Date: Tue, 25 Apr 2000 19:08:39 -0400 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) Message-ID: <200004252308.TAA05717@eric.cnri.reston.va.us> The email below is a serious bug report. A quick analysis shows that UserString.count() calls the count() method on a string object, which calls PyArg_ParseTuple() with the format string "O|ii". The 'i' format code truncates integers. It probably should raise an overflow exception instead. But that would still cause the test to fail -- just in a different way (more explicit). Then the string methods should be fixed to use long ints instead -- and then something else would probably break... --Guido van Rossum (home page: http://www.python.org/~guido/) ------- Forwarded Message Date: Mon, 24 Apr 2000 19:26:27 -0400 From: mark.favas at per.dem.csiro.au To: python-bugs-list at python.org cc: bugs-py at python.org Subject: [Python-bugs-list] 1.6a2 issues with int/long on 64bit platforms - eg stringobject (PR#306) Full_Name: Mark Favas Version: 1.6a2 CVS of 25 April OS: DEC Alpha, Tru64 Unix 4.0F Submission from: wa107.dialup.csiro.au (130.116.4.107) There seems to be issues (and perhaps lurking cans of worms) on 64-bit platforms where sizeof(long) != sizeof(int). For example, the CVS version of 1.6a2 of 25 April fails the UserString regression test. The tests fail as follows (verbose set to 1): abcabcabc.count(('abc',)) no 'abcabcabc' 3 <> 2 abcabcabc.count(('abc', 1)) no 'abcabcabc' 2 <> 1 abcdefghiabc.find(('abc', 1)) no 'abcdefghiabc' 9 < > - -1 abcdefghiabc.rfind(('abc',)) no 'abcdefghiabc' 9 <> 0 abcabcabc.rindex(('abc',)) no 'abcabcabc' 6 <> 3 abcabcabc.rindex(('abc', 1)) no 'abcabcabc' 6 <> 3 These tests are failing because the calls from the UserString methods to the underlying string methods are setting the default value of the end-of-string parameter to sys.maxint, which is defined as LONG_MAX (9223372036854775807), whereas the string methods in stringobject.c are using ints and expecting them to be no larger than INT_MAX (2147483647). Thus the end-of-string parameter becomes -1 in the default case. The size of an int on my platform is 4, and the size of a long is 8, so the "natural size of a Python integer" should be 8, by my understanding. The obvious fix is to change stringobject.c to use longs, rather than ints, but the problem might be more widespread than that. INT_MAX is used in unicodeobject.c, pypcre.c, _sre.c, stropmodule.c, and ceval.c as well as stringobject.c. Some of these look as though LONG_MAX should have been used (variables compared to INT_MAX are longs, but I am not confident enough to submit patches for them... Mark _______________________________________________ Python-bugs-list maillist - Python-bugs-list at python.org http://www.python.org/mailman/listinfo/python-bugs-list ------- End of Forwarded Message From pf at artcom-gmbh.de Wed Apr 26 09:34:09 2000 From: pf at artcom-gmbh.de (Peter Funk) Date: Wed, 26 Apr 2000 09:34:09 +0200 (MEST) Subject: [Python-Dev] Re: [Python-checkins] CVS: python/dist/src/Modules socketmodule.c,1.104,1.105 In-Reply-To: <200004252134.RAA02207@eric.cnri.reston.va.us> from Guido van Rossum at "Apr 25, 2000 5:34:56 pm" Message-ID: Guido van Rossum: > Modified Files: > socketmodule.c [...] > *** 2526,2529 **** > --- 2526,2532 ---- > #ifdef MSG_DONTROUTE > insint(d, "MSG_DONTROUTE", MSG_DONTROUTE); > + #endif > + #ifdef MSG_DONTWAIT > + insint(d, "MSG_DONWAIT", MSG_DONTWAIT); -------------------------^^? Shouldn't this read "MSG_DONTWAIT"? ----------------------------^! Nitpicking, Peter From fredrik at pythonware.com Wed Apr 26 11:00:03 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 11:00:03 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> <200004252235.SAA02554@eric.cnri.reston.va.us> Message-ID: <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> > Sorry, all this proposal does is change the default encoding on > conversions from UTF-8 to Latin-1. That's very > western-culture-centric. That decision was made by ISO and the Unicode consortium, not me. I don't know why, and I don't really care -- I'm arguing that strings should contain characters, just like the language reference says, and that all characters should be from the same character repertoire and use the same character codes. From just at letterror.com Wed Apr 26 14:04:08 2000 From: just at letterror.com (Just van Rossum) Date: Wed, 26 Apr 2000 13:04:08 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <8e6bsl$f1a$1@nnrp1.deja.com> References: <1256565470-46720619@hypernet.com> <6M1J4.662$rc9.209708544@newsb.telia.net> <8daop0$8fk$1@slb6.atl.mindspring.net> Message-ID: Fredrik Lundh replied to himself in c.l.py: >> as far as I can tell, it's supposed to be a feature. >> >> if you mix 8-bit strings with unicode strings, python 1.6a2 >> attempts to interpret the 8-bit string as an utf-8 encoded >> unicode string. >> >> but yes, I also think it's a bug. but this far, my attempts >> to get someone else to fix it has failed. might have to do >> it myself... ;-) > >postscript: the powers-that-be has decided that this is not >a bug. if you thought that strings were just sequences of >characters, just as in Perl and Tcl, you're in for one big >surprise in Python 1.6... I just read the last few posts of the powers-that-be-list on this subject (Thanks to Christian for pointing out the archives in c.l.py ;-), and I must say I completely agree with Fredrik. The current situation sucks. A string should always be a sequence of characters. A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". An 8-bit string should never be assumed to be utf-8 because of that distinction. (The default encoding for the builtin unicode() function may be another story.) Just From mal at lemburg.com Wed Apr 26 14:03:36 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 26 Apr 2000 14:03:36 +0200 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) References: <200004252308.TAA05717@eric.cnri.reston.va.us> Message-ID: <3906DB18.CB76EEC0@lemburg.com> Guido van Rossum wrote: > > The email below is a serious bug report. A quick analysis shows that > UserString.count() calls the count() method on a string object, which > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > format code truncates integers. It probably should raise an overflow > exception instead. But that would still cause the test to fail -- > just in a different way (more explicit). Then the string methods > should be fixed to use long ints instead -- and then something else > would probably break... All uses in stringobject.c and unicodeobject.c use INT_MAX together with integers, so there's no problem on that side of the fence ;-) Since strings and Unicode objects use integers to describe the length of the object (as well as most if not all other builtin sequence types), the correct default value should thus be something like sys.maxlen which then gets set to INT_MAX. I'd suggest adding sys.maxlen and the modifying UserString.py, re.py and sre_parse.py accordingly. > --Guido van Rossum (home page: http://www.python.org/~guido/) > > ------- Forwarded Message > > Date: Mon, 24 Apr 2000 19:26:27 -0400 > From: mark.favas at per.dem.csiro.au > To: python-bugs-list at python.org > cc: bugs-py at python.org > Subject: [Python-bugs-list] 1.6a2 issues with int/long on 64bit platforms - eg > stringobject (PR#306) > > Full_Name: Mark Favas > Version: 1.6a2 CVS of 25 April > OS: DEC Alpha, Tru64 Unix 4.0F > Submission from: wa107.dialup.csiro.au (130.116.4.107) > > There seems to be issues (and perhaps lurking cans of worms) on 64-bit > platforms > where sizeof(long) != sizeof(int). > > For example, the CVS version of 1.6a2 of 25 April fails the UserString > regression test. The tests fail as follows (verbose set to 1): > > abcabcabc.count(('abc',)) no > 'abcabcabc' 3 <> > 2 > abcabcabc.count(('abc', 1)) no > 'abcabcabc' 2 <> > 1 > abcdefghiabc.find(('abc', 1)) no > 'abcdefghiabc' 9 < > > > - -1 > abcdefghiabc.rfind(('abc',)) no > 'abcdefghiabc' 9 > <> 0 > abcabcabc.rindex(('abc',)) no > 'abcabcabc' 6 <> > 3 > abcabcabc.rindex(('abc', 1)) no > 'abcabcabc' 6 <> > 3 > > These tests are failing because the calls from the UserString methods to the > underlying string methods are setting the default value of the end-of-string > parameter to sys.maxint, which is defined as LONG_MAX (9223372036854775807), > whereas the string methods in stringobject.c are using ints and expecting them > to be no larger than INT_MAX (2147483647). > Thus the end-of-string parameter becomes -1 in the default case. The size of an > int on my platform is 4, and the size of a long is 8, so the "natural size of > a Python integer" should be 8, by my understanding. The obvious fix is to > change > stringobject.c to use longs, rather than ints, but the problem might be more > widespread than that. INT_MAX is used in unicodeobject.c, pypcre.c, _sre.c, > stropmodule.c, and ceval.c as well as stringobject.c. Some of these look as > though LONG_MAX should have been used (variables compared to INT_MAX are longs, > but I am not confident enough to submit patches for them... > > Mark > > _______________________________________________ > Python-bugs-list maillist - Python-bugs-list at python.org > http://www.python.org/mailman/listinfo/python-bugs-list > > ------- End of Forwarded Message > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://www.python.org/mailman/listinfo/python-dev -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From gstein at lyra.org Wed Apr 26 15:00:21 2000 From: gstein at lyra.org (Greg Stein) Date: Wed, 26 Apr 2000 06:00:21 -0700 (PDT) Subject: [Python-Dev] Re: marking shared-ness In-Reply-To: <000701bfae7f$1174a540$152d153f@tim> Message-ID: On Tue, 25 Apr 2000, Tim Peters wrote: > [Greg Stein] > > ... > > Many people have asked for free-threading, and the number of inquiries > > that I receive have grown over time. (nobody asked in 1996 when I first > > published my patches; I get a query every couple months now) > > Huh! That means people ask me about it more often than they ask you . > > I'll add, though, that you have to dig into the inquiry: almost everyone > who asks me is running on a uniprocessor machine, and are really after one > of two other things: > > 1. They expect threaded stuff to run faster if free-threaded. "Why?" is > a question I can't answer <0.5 wink>. Heh. Yes, I definitely see this one. But there are some clueful people out there, too, so I'm not totally discouraged :-) > 2. Dealing with the global lock drives them insane, especially when trying > to call back into Python from a "foreign" C thread. > > #2 may be fixable via less radical means (like a streamlined procedure > enabled by some relatively minor core interpreter changes, and clearer > docs). No doubt. I was rather upset with Guido's "Swap" API for the thread state. Grr. I sent him a very nice (IMO) API that I used for my patches. The Swap was simply a poor choice on his part. It implies that you are swapping a thread state for another (specifically: the "current" thread state). Of course, that is wholly inappropriate in a free-threading environment. All those calls to _Swap() will be overhead in an FT world. I liked my "PyThreadState *PyThreadState_Ensure()" function. It would create the sucker if it didn't exist, then return *this* thread's state to you. Handy as hell. No monkeying around with "Get. oops. didn't exist. let's create one now." > I'm still a fan of free-threading! It's just one of those things that may > yield a "well, ya, that's what I asked for, but turns out it's not what I > *wanted*" outcome as often as not. hehe. Damn straight. :-) Cheers, -g -- Greg Stein, http://www.lyra.org/ From just at letterror.com Wed Apr 26 16:13:13 2000 From: just at letterror.com (Just van Rossum) Date: Wed, 26 Apr 2000 15:13:13 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: I wrote: >A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". Another way of putting this is: - utf-8 in an 8-bit string is to a unicode string what a pickle is to an object. - defaulting to utf-8 upon coercing is like implicitly trying to unpickle an 8-bit string when comparing it to an instance. Bad idea. Defaulting to Latin-1 is the only logical choice, no matter how western-culture-centric this may seem. Just From mal at lemburg.com Wed Apr 26 20:01:48 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Wed, 26 Apr 2000 20:01:48 +0200 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) References: Message-ID: <39072F0C.5214E339@lemburg.com> Just van Rossum wrote: > > I wrote: > >A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". > > Another way of putting this is: > - utf-8 in an 8-bit string is to a unicode string what a pickle is to an > object. > - defaulting to utf-8 upon coercing is like implicitly trying to unpickle > an 8-bit string when comparing it to an instance. Bad idea. > > Defaulting to Latin-1 is the only logical choice, no matter how > western-culture-centric this may seem. Please note that the support for mixing strings and Unicode objects is really only there to aid porting applications to Unicode. New code should use Unicode directly and apply all needed conversions explicitly using one of the many ways to encode or decode Unicode data. The auto-conversions are only there to help out and provide some convenience. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From guido at python.org Wed Apr 26 20:51:56 2000 From: guido at python.org (Guido van Rossum) Date: Wed, 26 Apr 2000 14:51:56 -0400 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: Your message of "Wed, 26 Apr 2000 14:03:36 +0200." <3906DB18.CB76EEC0@lemburg.com> References: <200004252308.TAA05717@eric.cnri.reston.va.us> <3906DB18.CB76EEC0@lemburg.com> Message-ID: <200004261851.OAA06794@eric.cnri.reston.va.us> > Guido van Rossum wrote: > > > > The email below is a serious bug report. A quick analysis shows that > > UserString.count() calls the count() method on a string object, which > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > format code truncates integers. It probably should raise an overflow > > exception instead. But that would still cause the test to fail -- > > just in a different way (more explicit). Then the string methods > > should be fixed to use long ints instead -- and then something else > > would probably break... > > All uses in stringobject.c and unicodeobject.c use INT_MAX > together with integers, so there's no problem on that side > of the fence ;-) > > Since strings and Unicode objects use integers to describe the > length of the object (as well as most if not all other > builtin sequence types), the correct default value should > thus be something like sys.maxlen which then gets set to > INT_MAX. > > I'd suggest adding sys.maxlen and the modifying UserString.py, > re.py and sre_parse.py accordingly. Hm, I'm not so sure. It would be much better if passing sys.maxint would just WORK... Since that's what people have been doing so far. --Guido van Rossum (home page: http://www.python.org/~guido/) From nascheme at enme.ucalgary.ca Wed Apr 26 21:06:51 2000 From: nascheme at enme.ucalgary.ca (Neil Schemenauer) Date: Wed, 26 Apr 2000 13:06:51 -0600 Subject: [Python-Dev] L1 data cache profile for Python 1.5.2 and 1.6 Message-ID: <20000426130651.C23227@acs.ucalgary.ca> Using this tool: http://www.cacheprof.org/ I got this output: http://www.enme.ucalgary.ca/~nascheme/python/cache.out http://www.enme.ucalgary.ca/~nascheme/python/cache-152.out The cache miss rate for eval_code2 is about two times larger in 1.6. The overall miss rate is about the same. Is this significant? I suspect that the instruction cache is more important for eval_code2. Unfortunately cacheprof can only profile the L1 data cache. Perhaps someone will find this data useful or interesting. Neil From tismer at tismer.com Wed Apr 26 23:24:39 2000 From: tismer at tismer.com (Christian Tismer) Date: Wed, 26 Apr 2000 23:24:39 +0200 Subject: [Fwd: [Python-Dev] Where the speed is lost! (was: 1.6 speed)] Message-ID: <39075E97.23DBDD63@tismer.com> I forgot to cc python-dev. This file is closed for me. the sun is shining again, life is so wonderful and now for something completely different - chris -------------- next part -------------- An embedded message was scrubbed... From: Christian Tismer Subject: Re: [Python-Dev] Where the speed is lost! (was: 1.6 speed) Date: Wed, 26 Apr 2000 23:19:20 +0200 Size: 3299 URL: From effbot at telia.com Wed Apr 26 23:29:10 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 23:29:10 +0200 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) References: <39072F0C.5214E339@lemburg.com> Message-ID: <002f01bfafc6$779804a0$34aab5d4@hagrid> (forwarded from c.l.py, on request) > New code should use Unicode directly and apply all needed > conversions explicitly using one of the many ways to > encode or decode Unicode data. The auto-conversions are > only there to help out and provide some convenience. does this mean that the 8-bit string type is deprecated ??? From effbot at telia.com Wed Apr 26 23:45:40 2000 From: effbot at telia.com (Fredrik Lundh) Date: Wed, 26 Apr 2000 23:45:40 +0200 Subject: [Python-Dev] fun with unicode, part 1 Message-ID: <004501bfafc8$c51d1240$34aab5d4@hagrid> >>> filename = u"gr?t" >>> file = open(filename, "w") >>> file.close() >>> import glob >>> print glob.glob("gr*") ['gr\303\266t'] >>> print glob.glob(u"gr*") [u'gr\366t'] >>> import os >>> os.system("dir gr*") ... GR??T 0 01-02-03 12.34 gr??t 1 fil(es) 0 byte 0 dir 12 345 678 byte free hmm. From mhammond at skippinet.com.au Thu Apr 27 02:08:23 2000 From: mhammond at skippinet.com.au (Mark Hammond) Date: Thu, 27 Apr 2000 10:08:23 +1000 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <002f01bfafc6$779804a0$34aab5d4@hagrid> Message-ID: It is necessary for us to also have this scrag-fight in public? Most of the thread on c.l.py is filled in by people who are also py-dev members! [MAL writes] > Please note that the support for mixing strings and Unicode > objects is really only there to aid porting applications > to Unicode. > > New code should use Unicode directly and apply all needed > conversions explicitly using one of the many ways to > encode or decode Unicode data. This will _never_ happen. The Python programmer should never need to be aware they have a Unicode string versus a standard string - just a "string"! The fact there are 2 string types should be considered an implementation detail, and not a conceptual model for people to work within. I think we will be mixing Unicode and strings for ever! The only way to avoid it would be a unified type - possibly Py3k. Until then, people will still generally use strings as literals in their code, and should not even be aware they are mixing. Im never going to prefix my ascii-only strings with u"" just to avoid the possibility of mixing! Listening to the arguments, Ive got to say Im coming down squarely on the side of Fredrik and Just. strings must be sequences of characters, whose length is the number of characters. A string holding an encoding should be considered logically a byte array, and conversions should be explicit. > The auto-conversions are only there to help out and provide some convenience. Doesn't sound like it is working :-( Mark. From akuchlin at mems-exchange.org Thu Apr 27 03:45:37 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Wed, 26 Apr 2000 21:45:37 -0400 (EDT) Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug In-Reply-To: References: <002f01bfafc6$779804a0$34aab5d4@hagrid> Message-ID: <14599.39873.159386.778558@newcnri.cnri.reston.va.us> Mark Hammond writes: >It is necessary for us to also have this scrag-fight in public? >Most of the thread on c.l.py is filled in by people who are also >py-dev members! Attempting to walk a delicate line here, my reading of the situation is that Fredrik's frustration level is increaing as he points out problems, but nothing much is done about them. Marc-Andre will usually respond, but there's been no indication from Guido about what to do. But GvR might be waiting to hear from more users about their experience with Unicode; so far I don't know if anyone has much experience with the new code. But why not have it in public? The python-dev archives are publicly available anyway, so it's not like this discussion was going on behind closed doors. The problem with discussing this on c.l.py is that not everyone reads c.l.py any more due to volume. --amk From paul at prescod.net Thu Apr 27 03:47:41 2000 From: paul at prescod.net (Paul Prescod) Date: Wed, 26 Apr 2000 20:47:41 -0500 Subject: [Python-Dev] Python Unicode References: <390568D2.2CC50766@lemburg.com> <00a401bfaec9$3aaae100$34aab5d4@hagrid> <200004252235.SAA02554@eric.cnri.reston.va.us> <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> Message-ID: <39079C3D.4000C74C@prescod.net> Fredrik Lundh wrote: > > ... > > But alright, I give up. I've wasted way too much time on this, my > patches were rejected, and nobody seems to care. Not exactly > inspiring. I can understand how frustrating this is. Sometimes something seems just so clean and mathematically obvious that you can't see why others don't see it that way. A character is the "smallest unit of text." Strings are lists of characters. Characters in character sets have numbers. Python users should never know or care whether a string object is an 8-bit string or a Unicode string. There should be no distinction. u"" should be a syntactic shortcut. The primary reason I have not been involved is that I have not had a chance to look at the implementation and figure out if there is an overriding implementation-based reason to ignore the obvious right thing (e.g the right thing will break too much code or be too slow or...). "Unicode objects" should be an implementation detail (if they exist at all). Strings are strings are strings. The Python programmer shouldn't care about whether one string was read from a Unicode file and another from an ASCII file and one typed in with "u" and one without. It's all the same thing! If the programmer wants to do an explicit UTF-8 decode on a string (whether it is Unicode or 8-bit string...no difference) then that decode should proceed by looking at each character, deriving an integer and then treating that integer as an octet according to the UTF-8 specification. Char -> Integer -> Byte -> Char The end result (and hopefully the performance) would be the same but the model is much, much cleaner if there is only one kind of string. We should not ignore the example set by every other language (and yes, I'm including XML here :) ). I'm as desperate (if not as vocal) as Fredrick is here. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From gmcm at hypernet.com Thu Apr 27 04:13:00 2000 From: gmcm at hypernet.com (Gordon McMillan) Date: Wed, 26 Apr 2000 22:13:00 -0400 Subject: [Python-Dev] Python Unicode In-Reply-To: <39079C3D.4000C74C@prescod.net> Message-ID: <1255320912-64004084@hypernet.com> I haven't weighed in on this one, mainly because I don't even need ISO-1, let alone Unicode, (and damned proud of it, too!). But Fredrik's glob example was horrifying. I do know that I am always concious of whether a particular string is a sequence of characters, or a sequence of bytes. Seems to me the Py3K answer is to make those separate types. Until then, I guess I'll just remain completely xenophobic (and damned proud of it, too!). - Gordon From tim_one at email.msn.com Thu Apr 27 04:27:47 2000 From: tim_one at email.msn.com (Tim Peters) Date: Wed, 26 Apr 2000 22:27:47 -0400 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: Message-ID: <000101bfaff0$2d5f5ee0$272d153f@tim> [Just van Rossum] > ... > Defaulting to Latin-1 is the only logical choice, no matter how > western-culture-centric this may seem. Indeed, if someone from an inferior culture wants to chime in, let them find Python-Dev with their own beady little eyes . western-culture-is-better-than-none-&-at-least-*we*-understand-it-ly y'rs - tim From tim_one at email.msn.com Thu Apr 27 06:39:21 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 27 Apr 2000 00:39:21 -0400 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code In-Reply-To: <003f01bfaf5d$f58e3460$0500a8c0@secret.pythonware.com> Message-ID: <000001bfb002$8f720260$0d2d153f@tim> [/F] > ... > But alright, I give up. I've wasted way too much time on this, my > patches were rejected, and nobody seems to care. Not exactly > inspiring. I lost track of this stuff months ago, and since I use only 7-bit ASCII in my own source code and file names and etc etc, UTF-8 and Latin-1 are identical to me <0.5 wink>. [Guido] > Sorry, all this proposal does is change the default encoding on > conversions from UTF-8 to Latin-1. That's very > western-culture-centric. Well, if you talk with an Asian, they'll probably tell you that Unicode itself is Eurocentric, and especially UTF-8 (UTF-7 introduces less bloat for non-Latin-1 Unicode characters). Most everyone likes their own national gimmicks best. Or, as Andy once said (paraphrasing), the virtue of UTF-8 is that it annoys everyone. I do expect that the vase bulk of users would be less surprised if Latin-1 *were* the default encoding. Then the default would be usable as-is for many more people; UTF-8 is usable as-is only for me (i.e., 7-bit Americans). The non-Euros are in for a world of pain no matter what. just-because-some-groups-can't-win-doesn't-mean-everyone-must- lose-ly y'rs - tim From just at letterror.com Thu Apr 27 07:42:43 2000 From: just at letterror.com (Just van Rossum) Date: Thu, 27 Apr 2000 06:42:43 +0100 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparing strings and ints) In-Reply-To: <000101bfaff0$2d5f5ee0$272d153f@tim> References: Message-ID: At 10:27 PM -0400 26-04-2000, Tim Peters wrote: >Indeed, if someone from an inferior culture wants to chime in, let them find >Python-Dev with their own beady little eyes . All irony aside, I think you've nailed one of the problems spot on: - most core Python developers seem to be too busy to read *anything* at all in c.l.py - most people that care about the issues are not on python-dev Just From tim_one at email.msn.com Thu Apr 27 07:08:11 2000 From: tim_one at email.msn.com (Tim Peters) Date: Thu, 27 Apr 2000 01:08:11 -0400 Subject: [Python-Dev] Re: Python 1.6a2 Unicode bug (was Re: comparingstrings and ints) In-Reply-To: Message-ID: <000101bfb006$95962280$0d2d153f@tim> [Just van Rossum] > All irony aside, I think you've nailed one of the problems spot on: > - most core Python developers seem to be too busy to read > *anything* at all in c.l.py > - most people that care about the issues are not on python-dev But they're not on c.l.py either, are they? I still read everything there, although that's gotten so time-consuming I rarely reply anymore. In any case, I've seen almost nothing useful about Unicode issues on c.l.py that wasn't also on Python-Dev; perhaps I missed something. ask-10-more-people-&-you'll-get-20-more-opinions-ly y'rs - tim From alisa at robanal.demon.co.uk Thu Apr 27 12:29:54 2000 From: alisa at robanal.demon.co.uk (Alisa Pasic Robinson) Date: Thu, 27 Apr 2000 10:29:54 GMT Subject: [Python-Dev] Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: <39080ddd.9837445@post.demon.co.uk> >I wrote: >>A utf-8-encoded 8-bit string in Python is *not* a string, but a "ByteArray". > >Another way of putting this is: >- utf-8 in an 8-bit string is to a unicode string what a pickle is to an >object. >- defaulting to utf-8 upon coercing is like implicitly trying to unpickle >an 8-bit string when comparing it to an instance. Bad idea. > >Defaulting to Latin-1 is the only logical choice, no matter how >western-culture-centric this may seem. > >Just The Van Rossum Common Sense gene strikes again! You guys owe it to the world to have lots of children. I agree 100%. Let me also add that if you want to do encoding work that goes beyond what the library gives you, you absolutely need a 'byte array' type which makes no assumptions and does nothing magic to its content. I have always thought of 8-bit strings as 'byte arrays' and not 'characer arrays', and doing anything magic to them in literals or standard input is going to cause lots of trouble. I think our proposal is BETTER than Java, Tcl, Visual Basic etc for the following reasons: - you can work with old fashioned strings, which are understood by everyone to be arrays of bytes, and there is no magic conversion going on. The bytes in literal strings in your script file are the bytes that end up in the program. - you can work with Unicode strings if you want - you are in explicit control of conversions between them - both types have similar methods so there isn't much to learn or remember The 'no magic' thing is very important with Japanese, where very often you need to roll your own codecs and look at the raw bytes; any auto-conversion might not go through the filter you want and you've already lost information before you started. Especially If your job is to repair possibly corrupt data. Any company with a few extra custom characters in the user-defined Shift-JIS range is going to suddenly find their Perl scripts are failing or trashing all their data as a result of the UTF-8 decision. I'm also convinced that the majority of Python scripts won't need to work in Unicode. Even working with exotic languages, there is always a native 8-bit encoding. I have only used Unicode when (a) working with data that is in several languages (b) doing conversions, which requires a 'central point' (b) wanting to do per-character operations safely on multi-byte data I still haven't sorted out in my head whether the default encoding thing is a big red herring or is important; I already have a safe way to construct Unicode literals in my source files if I want to using unicode('rawdata','myencoding'). But if there has to be one I'd say the following: - strict ASCII is an option - Latin-1 is the more generous option that is right for the most people, and has a 'special status' among 8-bit encodings - UTF-8 is not one byte per character and will confuse people Just my 2p worth, Andy From mal at lemburg.com Thu Apr 27 13:23:23 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 27 Apr 2000 13:23:23 +0200 Subject: [Python-Dev] Encoding of 8-bit strings and Python source code References: <000001bfb002$8f720260$0d2d153f@tim> Message-ID: <3908232B.C2668122@lemburg.com> Tim Peters wrote: > > [Guido about going Latin-1] > > Sorry, all this proposal does is change the default encoding on > > conversions from UTF-8 to Latin-1. That's very > > western-culture-centric. > > Well, if you talk with an Asian, they'll probably tell you that Unicode > itself is Eurocentric, and especially UTF-8 (UTF-7 introduces less bloat for > non-Latin-1 Unicode characters). Most everyone likes their own national > gimmicks best. Or, as Andy once said (paraphrasing), the virtue of UTF-8 is > that it annoys everyone. > > I do expect that the vase bulk of users would be less surprised if Latin-1 > *were* the default encoding. Then the default would be usable as-is for > many more people; UTF-8 is usable as-is only for me (i.e., 7-bit Americans). > The non-Euros are in for a world of pain no matter what. > > just-because-some-groups-can't-win-doesn't-mean-everyone-must- > lose-ly y'rs - tim People tend to forget that UTF-8 is a loss-less Unicode encoding while Latin-1 reduces Unicode to its lower 8 bits: conversion from non-Latin-1 Unicode to strings would simply not work, conversion from non-Latin-1 strings to Unicode would only be possible via unicode(). Thus mixing Unicode and strings would then run perfectly in all western countries using Latin-1 while the rest of the world would need to convert all their strings to Unicode... giving them an advantage over the western world we couldn't possibly accept ;-) FYI, here's a summary of which conversions take place (going Latin-1 would disable most of the Unicode integration in favour of conversion errors): Python: ------- string + unicode: unicode(string,'utf-8') + unicode string.method(unicode): unicode(string,'utf-8').method(unicode) print unicode: print unicode.encode('utf-8'); with stdout redirection this can be changed to any other encoding str(unicode): unicode.encode('utf-8') repr(unicode): repr(unicode.encode('unicode-escape')) C (PyArg_ParserTuple): ---------------------- "s" + unicode: same as "s" + unicode.encode('utf-8') "s#" + unicode: same as "s#" + unicode.encode('unicode-internal') "t" + unicode: same as "t" + unicode.encode('utf-8') "t#" + unicode: same as "t#" + unicode.encode('utf-8') This effects all C modules and builtins. In case a C module wants to receive a certain predefined encoding, it can use the new "es" and "es#" parser markers. Ways to enter Unicode: ---------------------- u'' + string same as unicode(string,'utf-8') unicode(string,encname) any supported encoding u'...unicode-escape...' unicode-escape currently accepts Latin-1 chars as single-char input; using escape sequences any Unicode char can be entered (*) codecs.open(filename,mode,encname) opens an encoded file for reading and writing Unicode directly raw_input() + stdin redirection (see one of my earlier posts for code) returns UTF-8 strings based on the input encoding Hmm, perhaps a codecs.raw_input(encname) which returns Unicode directly wouldn't be a bad idea either ?! (*) This should probably be changed to be source code encoding dependent, so that u"...data..." matches "...data..." in appearance in the Python source code (see below). IO: --- open(file,'w').write(unicode) same as open(file,'w').write(unicode.encode('utf-8')) open(file,'wb').write(unicode) same as open(file,'wb').write(unicode.encode('unicode-internal')) codecs.open(file,'wb',encname).write(unicode) same as open(file,'wb').write(unicode.encode(encname)) codecs.open(file,'rb',encname).read() same as unicode(open(file,'rb').read(),encname) stdin + stdout can be redirected using StreamRecoders to handle any of the supported encodings The Python parser should probably also be extended to read encoded Python source code using some hint at the start of the source file (perhaps only allowing a small subset of the supported encodings, e.g. ASCII, Latin-1, UTF-8 and UTF-16). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From mal at lemburg.com Thu Apr 27 12:27:18 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Thu, 27 Apr 2000 12:27:18 +0200 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> Message-ID: <39081606.2932F5FD@lemburg.com> Fredrik Lundh wrote: > > >>> filename = u"gr?t" > > >>> file = open(filename, "w") > >>> file.close() > > >>> import glob > >>> print glob.glob("gr*") > ['gr\303\266t'] > > >>> print glob.glob(u"gr*") > [u'gr\366t'] > > >>> import os > >>> os.system("dir gr*") > ... > GR??T 0 01-02-03 12.34 gr??t > 1 fil(es) 0 byte > 0 dir 12 345 678 byte free > > hmm. Where is the problem ? If you pass the output of glob() to open() you'll get the same file in both cases... even better, you can now even use Chinese in your filenames without the OS having to support Unicode filenames :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From fredrik at pythonware.com Thu Apr 27 13:49:07 2000 From: fredrik at pythonware.com (Fredrik Lundh) Date: Thu, 27 Apr 2000 13:49:07 +0200 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> Message-ID: <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> > Fredrik Lundh wrote: > > > > >>> filename = u"gr?t" > > > > >>> file = open(filename, "w") > > >>> file.close() > > > > >>> import glob > > >>> print glob.glob("gr*") > > ['gr\303\266t'] > > > > >>> print glob.glob(u"gr*") > > [u'gr\366t'] > > > > >>> import os > > >>> os.system("dir gr*") > > ... > > GR??T 0 01-02-03 12.34 gr??t > > 1 fil(es) 0 byte > > 0 dir 12 345 678 byte free > > > > hmm. > > Where is the problem ? I'm speechless. From akuchlin at mems-exchange.org Thu Apr 27 14:00:18 2000 From: akuchlin at mems-exchange.org (Andrew Kuchling) Date: Thu, 27 Apr 2000 08:00:18 -0400 (EDT) Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> Message-ID: <14600.11218.24960.705642@newcnri.cnri.reston.va.us> Fredrik Lundh writes: >M.A. Lemburg wrote: >> Where is the problem ? >I'm speechless. Ummm... since I'm not sure how open() currently reacts to being passed a Unicode file or if there's something special in open() for Windows, and don't know how you think it should react (an exception? fold to UTF-8? fold to Latin1?), I don't see what the particular problem is either. For the sake of people who haven't followed this debate closely, or who were busy during the earlier lengthy threads and simply deleted most of the messages, please try to be explicit. Ilya Zakharevich on the perl5-porters mailing list often employs the "This code is buggy and if you're too clueless to see how it's broken *I* certainly won't go explaining it to you" strategy, to devastatingly divisive effect, and with little effectiveness in getting the bugs fixed. Let's not go down that road. --amk From guido at python.org Thu Apr 27 17:01:48 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 27 Apr 2000 11:01:48 -0400 Subject: [Python-Dev] Unicode debate In-Reply-To: Your message of "Thu, 27 Apr 2000 06:42:43 BST." References: Message-ID: <200004271501.LAA13535@eric.cnri.reston.va.us> I'd like to reset this discussion. I don't think we need to involve c.l.py yet -- I haven't seen anyone with Asian language experience chime in there, and that's where this matters most. I am directing this to the Python i18n-sig mailing list, because that's where the debate belongs, and there interested parties can join the discussion without having to be vetted as "fit for python-dev" first. I apologize for having been less than responsive in the matter; unfortunately there's lots of other stuff on my mind right now that has recently had a tendency to distract me with higher priority crises. I've heard a few people claim that strings should always be considered to contain "characters" and that there should be one character per string element. I've also heard a clamoring that there should only be one string type. You folks have never used Asian encodings. In countries like Japan, China and Korea, encodings are a fact of life, and the most popular encodings are ASCII supersets that use a variable number of bytes per character, just like UTF-8. Each country or language uses different encodings, even though their characters look mostly the same to western eyes. UTF-8 and Unicode is having a hard time getting adopted in these countries because most software that people use deals only with the local encodings. (Sounds familiar?) These encodings are much less "pure" than UTF-8, because they only encode the local characters (and ASCII), and because of various problems with slicing: if you look "in the middle" of an encoded string or file, you may not know how to interpret the bytes you see. There are overlaps (in most of these encodings anyway) between the codes used for single-byte and double-byte encodings, and you may have to look back one or more characters to know what to make of the particular byte you see. To get an idea of the nightmares that non-UTF-8 multibyte encodings give C/C++ programmers, see the Multibyte Character Set (MBCS) Survival Guide (http://msdn.microsoft.com/library/backgrnd/html/msdn_mbcssg.htm). See also the home page of the i18n-sig for more background information on encoding (and other i18n) issues (http://www.python.org/sigs/i18n-sig/). UTF-8 attempts to solve some of these problems: the multi-byte encodings are chosen such that you can tell by the high bits of each byte whether it is (1) a single-byte (ASCII) character (top bit off), (2) the start of a multi-byte character (at least two top bits on; how many indicates the total number of bytes comprising the character), or (3) a continuation byte in a multi-byte character (top bit on, next bit off). Many of the problems with non-UTF-8 multibyte encodings are the same as for UTF-8 though: #bytes != #characters, a byte may not be a valid character, regular expression patterns using "." may give the wrong results, and so on. The truth of the matter is: the encoding of string objects is in the mind of the programmer. When I read a GIF file into a string object, the encoding is "binary goop". When I read a line of Japanese text from a file, the encoding may be JIS, shift-JIS, or ENC -- this has to be an assumption built-in to my program, or perhaps information supplied separately (there's no easy way to guess based on the actual data). When I type a string literal using Latin-1 characters, the encoding is Latin-1. When I use octal escapes in a string literal, e.g. '\303\247', the encoding could be UTF-8 (this is a cedilla). When I type a 7-bit string literal, the encoding is ASCII. The moral of all this? 8-bit strings are not going away. They are not encoded in UTF-8 henceforth. Like before, and like 8-bit text files, they are encoded in whatever encoding you want. All you get is an extra mechanism to convert them to Unicode, and the Unicode conversion defaults to UTF-8 because it is the only conversion that is reversible. And, as Tim Peters quoted Andy Robinson (paraphrasing Tim's paraphrase), UTF-8 annoys everyone equally. Where does the current approach require work? - We need a way to indicate the encoding of Python source code. (Probably a "magic comment".) - We need a way to indicate the encoding of input and output data files, and we need shortcuts to set the encoding of stdin, stdout and stderr (and maybe all files opened without an explicit encoding). Marc-Andre showed some sample code, but I believe it is still cumbersome. (I have to play with it more to see how it could be improved.) - We need to discuss whether there should be a way to change the default conversion between Unicode and 8-bit strings (currently hardcoded to UTF-8), in order to make life easier for people who want to continue to use their favorite 8-bit encoding (e.g. Latin-1, or shift-JIS) but who also want to make use of the new Unicode datatype. We're still in alpha, so we can still fix things. --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at prescod.net Thu Apr 27 17:01:00 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 10:01:00 -0500 Subject: [Python-Dev] fun with unicode, part 1 References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <39081606.2932F5FD@lemburg.com> <01eb01bfb03e$99267ac0$0500a8c0@secret.pythonware.com> <14600.11218.24960.705642@newcnri.cnri.reston.va.us> Message-ID: <3908562C.C2A2E1BC@prescod.net> You're asking the file system to "find you a filename". Depending on how you ask, you get two different file names for the same file. They are "==" equal (I think) but are of different length. I agree with /F that it's a little strange. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From guido at python.org Thu Apr 27 17:23:50 2000 From: guido at python.org (Guido van Rossum) Date: Thu, 27 Apr 2000 11:23:50 -0400 Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: Your message of "Wed, 26 Apr 2000 23:45:40 +0200." <004501bfafc8$c51d1240$34aab5d4@hagrid> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> Message-ID: <200004271523.LAA13614@eric.cnri.reston.va.us> > >>> filename = u"gr?t" > > >>> file = open(filename, "w") > >>> file.close() > > >>> import glob > >>> print glob.glob("gr*") > ['gr\303\266t'] > > >>> print glob.glob(u"gr*") > [u'gr\366t'] > > >>> import os > >>> os.system("dir gr*") > ... > GR??T 0 01-02-03 12.34 gr??t > 1 fil(es) 0 byte > 0 dir 12 345 678 byte free > > hmm. I presume that Fredrik's gripe is that the filename has been converted to UTF-8, while the encoding used by Windows to display his directory listing is Latin-1. (Not Microsoft's own 8-bit character set???) I'd like to solve this problem, but I have some questions: what *IS* the encoding used for filenames on Windows? This may differ per Windows version; perhaps it can differ drive letter? Or per application or per thread? On Windows NT, filenames are supposed to be Unicode. (I suppose also on Windowns 2000?) How do I open a file with a given Unicode string for its name, in a C program? I suppose there's a Win32 API call for that which has a Unicode variant. On Windows 95/98, the Unicode variants of the Win32 API calls don't exist. So what is the poor Python runtime to do there? Can Japanese people use Japanese characters in filenames on Windows 95/98? Let's assume they can. Since the filesystem isn't Unicode aware, the filenames must be encoded. Which encoding is used? Let's assume they use Microsoft's multibyte encoding. If they put such a file on a floppy and ship it to Link?ping, what will Fredrik see as the filename? (I.e., is the encoding fixed by the disk volume, or by the operating system?) Once we have a few answers here, we can solve the problem. Note that sometimes we'll have to refuse a Unicode filename because there's no mapping for some of the characters it contains in the filename encoding used. Question: how does Fredrik create a file with a Euro character (u'\u20ac') in its name? --Guido van Rossum (home page: http://www.python.org/~guido/) From bckfnn at worldonline.dk Thu Apr 27 18:21:20 2000 From: bckfnn at worldonline.dk (Finn Bock) Date: Thu, 27 Apr 2000 16:21:20 GMT Subject: [Python-Dev] fun with unicode, part 1 In-Reply-To: <200004271523.LAA13614@eric.cnri.reston.va.us> References: <004501bfafc8$c51d1240$34aab5d4@hagrid> <200004271523.LAA13614@eric.cnri.reston.va.us> Message-ID: <3908679a.16700013@smtp.worldonline.dk> On Thu, 27 Apr 2000 11:23:50 -0400, you wrote: >> >>> filename = u"gr?t" >> >> >>> file = open(filename, "w") >> >>> file.close() >> >> >>> import glob >> >>> print glob.glob("gr*") >> ['gr\303\266t'] >> >> >>> print glob.glob(u"gr*") >> [u'gr\366t'] >> >> >>> import os >> >>> os.system("dir gr*") >> ... >> GR??T 0 01-02-03 12.34 gr??t >> 1 fil(es) 0 byte >> 0 dir 12 345 678 byte free >> >> hmm. > >I presume that Fredrik's gripe is that the filename has been converted >to UTF-8, while the encoding used by Windows to display his directory >listing is Latin-1. (Not Microsoft's own 8-bit character set???) > >I'd like to solve this problem, but I have some questions: what *IS* >the encoding used for filenames on Windows? [This is just for inspiration] JDK "solves" this by running the filename through a CharToByteConverter (a codec) which is setup as the default encoding used for the platform. On my danish w2k this is encoding happens to be called 'Cp1252'. The codec name is chosen based on the users language and region with fall back to Cp1252. The mapping table is: "ar", "Cp1256", "be", "Cp1251", "bg", "Cp1251", "cs", "Cp1250", "el", "Cp1253", "et", "Cp1257", "iw", "Cp1255", "hu", "Cp1250", "ja", "MS932", "ko", "MS949", "lt", "Cp1257", "lv", "Cp1257", "mk", "Cp1251", "pl", "Cp1250", "ro", "Cp1250", "ru", "Cp1251", "sh", "Cp1250", "sk", "Cp1250", "sl", "Cp1250", "sq", "Cp1250", "sr", "Cp1251", "th", "MS874", "tr", "Cp1254", "uk", "Cp1251", "zh", "GBK", "zh_TW", "MS950", >This may differ per >Windows version; perhaps it can differ drive letter? Or per >application or per thread? On Windows NT, filenames are supposed to >be Unicode. (I suppose also on Windowns 2000?) JDK only uses GetThreadLocale() for the starting thread. It does not appears to check for windows versions at all. >How do I open a file >with a given Unicode string for its name, in a C program? I suppose >there's a Win32 API call for that which has a Unicode variant. The JDK does not make use the unicode API is it exists on the platform. >On Windows 95/98, the Unicode variants of the Win32 API calls don't >exist. So what is the poor Python runtime to do there? > >Can Japanese people use Japanese characters in filenames on Windows >95/98? Let's assume they can. Since the filesystem isn't Unicode >aware, the filenames must be encoded. Which encoding is used? Let's >assume they use Microsoft's multibyte encoding. If they put such a >file on a floppy and ship it to Link?ping, what will Fredrik see as >the filename? (I.e., is the encoding fixed by the disk volume, or by >the operating system?) > >Once we have a few answers here, we can solve the problem. Note that >sometimes we'll have to refuse a Unicode filename because there's no >mapping for some of the characters it contains in the filename >encoding used. JDK silently replaced the offending character with a '?' which cause an exception when attempting to open the file. The filename, directory name, or volume label syntax is incorrect >Question: how does Fredrik create a file with a Euro >character (u'\u20ac') in its name? import java.io.*; public class x { public static void main(String[] args) throws Exception { String filename = "An eurosign \u20ac"; System.out.println(filename); new FileOutputStream(filename).close(); } } The resulting file contains an euro sign when shown in FileExplorer. The output of the program also contains an euro sign when shown with notepad. But the filename/program output does *not* contain an euro when dir'ed/type'd in my DOS box. regards, finn From gresham at mediavisual.com Thu Apr 27 18:41:04 2000 From: gresham at mediavisual.com (Paul Gresham) Date: Fri, 28 Apr 2000 00:41:04 +0800 Subject: [Python-Dev] Re: [I18n-sig] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <010f01bfb067$64e43260$9a2b440a@miv01> Hi, I'm not sure how much value I can add, as I know little about the charsets etc. and a bit more about Python. As a user of these, and running a consultancy firm in Hong Kong, I can at least pass on some points and perhaps help you with testing later on. My first touch on international PCs was fixing a Japanese 8086 back in 1989, it didn't even have colour ! Hong Kong is quite an experience as there are two formats in common use, plus occasionally another gets thrown in. In HK they use the Traditional Chinese, whereas the mainland uses Simplified, as Guido says, there are a number of different types of these. Occasionally we see the Taiwanese charsets used. It seems to me that having each individual string variable encoded might just be too atomic, perhaps creating a cumbersome overhead in the system. For most applications I can settle for the entire app to be using a single charset, however from experience there are exceptions. We are normally working with prior knowledge of the charset being used, rather than having to deal with any charset which may come along (at an application level), and therefore generally work in a context, just as a European programmer would be working in say English or German. As you know, storage/retrieval is not a problem, but manipulation and comparison is. A nice way to handle this would be like operator overloading such that string operations would be perfomed in the context of the current charset, I could then change context as needed, removing the need for metadata surrounding the actual data. This should speed things up as each overloaded library could be optimised given the different quirks, and new ones could be added easily. My code could be easily re-used on different charsets by simply changing context externally to the code, rather than passing in lots of stuff and expecting Python to deal with it. Also I'd like very much to compile/load in only the International charsets that I need. I wouldn't want to see Java type bloat occurring to Python, and adding internationalisation for everything, is huge. I think what I am suggesting is a different approach which obviously places more onus on the programmer rather than Python. Perhaps this is not acceptable, I don't know as I've never developed a programming language. I hope this is a helpful point of view to get you thinking further, otherwise ... please ignore me and I'll keep quiet : ) Regards Paul ----- Original Message ----- From: "Guido van Rossum" To: ; Cc: "Just van Rossum" Sent: Thursday, April 27, 2000 11:01 PM Subject: [I18n-sig] Unicode debate > I'd like to reset this discussion. I don't think we need to involve > c.l.py yet -- I haven't seen anyone with Asian language experience > chime in there, and that's where this matters most. I am directing > this to the Python i18n-sig mailing list, because that's where the > debate belongs, and there interested parties can join the discussion > without having to be vetted as "fit for python-dev" first. > > I apologize for having been less than responsive in the matter; > unfortunately there's lots of other stuff on my mind right now that > has recently had a tendency to distract me with higher priority > crises. > > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) > > These encodings are much less "pure" than UTF-8, because they only > encode the local characters (and ASCII), and because of various > problems with slicing: if you look "in the middle" of an encoded > string or file, you may not know how to interpret the bytes you see. > There are overlaps (in most of these encodings anyway) between the > codes used for single-byte and double-byte encodings, and you may have > to look back one or more characters to know what to make of the > particular byte you see. To get an idea of the nightmares that > non-UTF-8 multibyte encodings give C/C++ programmers, see the > Multibyte Character Set (MBCS) Survival Guide > (http://msdn.microsoft.com/library/backgrnd/html/msdn_mbcssg.htm). > See also the home page of the i18n-sig for more background information > on encoding (and other i18n) issues > (http://www.python.org/sigs/i18n-sig/). > > UTF-8 attempts to solve some of these problems: the multi-byte > encodings are chosen such that you can tell by the high bits of each > byte whether it is (1) a single-byte (ASCII) character (top bit off), > (2) the start of a multi-byte character (at least two top bits on; how > many indicates the total number of bytes comprising the character), or > (3) a continuation byte in a multi-byte character (top bit on, next > bit off). > > Many of the problems with non-UTF-8 multibyte encodings are the same > as for UTF-8 though: #bytes != #characters, a byte may not be a valid > character, regular expression patterns using "." may give the wrong > results, and so on. > > The truth of the matter is: the encoding of string objects is in the > mind of the programmer. When I read a GIF file into a string object, > the encoding is "binary goop". When I read a line of Japanese text > from a file, the encoding may be JIS, shift-JIS, or ENC -- this has to > be an assumption built-in to my program, or perhaps information > supplied separately (there's no easy way to guess based on the actual > data). When I type a string literal using Latin-1 characters, the > encoding is Latin-1. When I use octal escapes in a string literal, > e.g. '\303\247', the encoding could be UTF-8 (this is a cedilla). > When I type a 7-bit string literal, the encoding is ASCII. > > The moral of all this? 8-bit strings are not going away. They are > not encoded in UTF-8 henceforth. Like before, and like 8-bit text > files, they are encoded in whatever encoding you want. All you get is > an extra mechanism to convert them to Unicode, and the Unicode > conversion defaults to UTF-8 because it is the only conversion that is > reversible. And, as Tim Peters quoted Andy Robinson (paraphrasing > Tim's paraphrase), UTF-8 annoys everyone equally. > > Where does the current approach require work? > > - We need a way to indicate the encoding of Python source code. > (Probably a "magic comment".) > > - We need a way to indicate the encoding of input and output data > files, and we need shortcuts to set the encoding of stdin, stdout and > stderr (and maybe all files opened without an explicit encoding). > Marc-Andre showed some sample code, but I believe it is still > cumbersome. (I have to play with it more to see how it could be > improved.) > > - We need to discuss whether there should be a way to change the > default conversion between Unicode and 8-bit strings (currently > hardcoded to UTF-8), in order to make life easier for people who want > to continue to use their favorite 8-bit encoding (e.g. Latin-1, or > shift-JIS) but who also want to make use of the new Unicode datatype. > > We're still in alpha, so we can still fix things. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > From petrilli at amber.org Thu Apr 27 18:48:16 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Thu, 27 Apr 2000 12:48:16 -0400 Subject: [Python-Dev] Unicode debate In-Reply-To: <200004271501.LAA13535@eric.cnri.reston.va.us>; from guido@python.org on Thu, Apr 27, 2000 at 11:01:48AM -0400 References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <20000427124816.C1723@trump.amber.org> Guido van Rossum [guido at python.org] wrote: > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) Actually a bigger concern that we hear from our customers in Japan is that Unicode has *serious* problems in asian languages. Theey took the "unification" of Chinese and Japanese, rather than both, and therefore can not represent los of phrases quite right. I can have someone write up a better dscription, but I was told by several Japanese people that they wouldn't use Unicode come hell or high water, basically. Basically it's JJIS, Shift-JIS or nothing for most Japanese companies. This was my experience working with Konica a few years ago as well. Chris -- | Christopher Petrilli | petrilli at amber.org From andy at reportlab.python.org Thu Apr 27 18:50:28 2000 From: andy at reportlab.python.org (Andy Robinson) Date: Thu, 27 Apr 2000 16:50:28 GMT Subject: [Python-Dev] Python 1.6a2 Unicode bug (was Re: comparing strings and ints) Message-ID: <39086e6a.34554266@post.demon.co.uk> >Alisa Pasic Robinson Drat! my wife's been hacking my email headers! Sorry... - Andy Robinson From jeremy at cnri.reston.va.us Fri Apr 28 00:12:15 2000 From: jeremy at cnri.reston.va.us (Jeremy Hylton) Date: Thu, 27 Apr 2000 18:12:15 -0400 (EDT) Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) In-Reply-To: <39075D58.C549938E@tismer.com> References: <3905EEB4.4153A845@tismer.com> <14598.9873.769055.198345@goon.cnri.reston.va.us> <39074295.FA136113@tismer.com> <14599.17827.23033.266024@goon.cnri.reston.va.us> <3907498B.C596C495@tismer.com> <14599.20985.493264.876095@goon.cnri.reston.va.us> <39075D58.C549938E@tismer.com> Message-ID: <14600.47935.704157.565225@goon.cnri.reston.va.us> >>>>> "CT" == Christian Tismer writes: CT> Summary: We had two effects here. Effect 1: Wasting time with CT> extra errors in instance creation. Effect 2: Loss of locality CT> due to code size increase. CT> Solution to 1 is Jeremy's patch. Solution to 2 could be a CT> little renaming of the one or the other module, in order to get CT> the default link order to support locality better. CT> Now everything is clear to me. My first attempts with reordering CT> could not reveal the loss with the instance stuff. CT> All together, Python 1.6 is a bit faster than 1.5.2 if we try to CT> get related code ordered better. I reach a different conclusion. The performance difference 1.5.2 and 1.6, measured with pystone and pybench, is so small that effects like the order in which the compiler assembles the code make a difference. I don't think we should make any non-trivial effort to improve performance based on this kind of voodoo. I also question the claim that the two effects here explain the performance difference between 1.5.2 and 1.6. Rather, they explain the performance difference of pystone and pybench running on different versions of the interpreter. Saying that pystone is the same speed is a far cry from saying that python is the same speed! Remember that performance on a benchmark is just that. (It's like the old joke about a person's IQ: It is a very good indicator of how well they did on the IQ test.) I think we could use better benchmarks of two sorts. The pybench microbenchmarks are quite helpful individually, though the overall number isn't particularly meaningful. However, these benchmarks are sometimes a little too big to be useful. For example, the instance creation effect was tracked down by running this code: class Foo: pass for i in range(big_num): Foo() The pybench test "CreateInstance" does all sorts of other stuff. It tests creation with and without an __init__ method. It tests instance deallocation (because all the created objected need to be dealloced, too). It also tests attribute assignment, since many of the __init__ methods make assignments. What would be better (and I'm not sure what priority should be placed on doing it) is a set of nano-benchmarks that try to limit themselves to a single feature or small set of features. Guido suggested having a hierarchy so that there are multiple nano-benchmarks for instance creation, each identifying a particular effect, and a micro-benchmark that is the aggregate of all these nano-benchmarks. We could also use some better large benchmarks. Using pystone is pretty crude, because it doesn't necessarily measure the performance of things we care about. It would be better to have a collection of 5-10 apps that each do something we care about -- munging text files or XML data, creating lots of objects, etc. For example, I used the compiler package (in nondist/src/Compiler) to compile itself. Based on that benchmark, an interpreter built from the current CVS tree is still 9-11% slower than 1.5. Jeremy From tismer at tismer.com Fri Apr 28 02:48:34 2000 From: tismer at tismer.com (Christian Tismer) Date: Fri, 28 Apr 2000 02:48:34 +0200 Subject: [Python-Dev] Where the speed is lost! (was: 1.6 speed) References: <3905EEB4.4153A845@tismer.com> <14598.9873.769055.198345@goon.cnri.reston.va.us> <39074295.FA136113@tismer.com> <14599.17827.23033.266024@goon.cnri.reston.va.us> <3907498B.C596C495@tismer.com> <14599.20985.493264.876095@goon.cnri.reston.va.us> <39075D58.C549938E@tismer.com> <14600.47935.704157.565225@goon.cnri.reston.va.us> Message-ID: <3908DFE1.F43A62EB@tismer.com> Jeremy Hylton wrote: > > >>>>> "CT" == Christian Tismer writes: > > CT> Summary: We had two effects here. Effect 1: Wasting time with > CT> extra errors in instance creation. Effect 2: Loss of locality > CT> due to code size increase. > > CT> Solution to 1 is Jeremy's patch. Solution to 2 could be a > CT> little renaming of the one or the other module, in order to get > CT> the default link order to support locality better. > > CT> Now everything is clear to me. My first attempts with reordering > CT> could not reveal the loss with the instance stuff. from here... > CT> All together, Python 1.6 is a bit faster than 1.5.2 if we try to > CT> get related code ordered better. ...to here I was not clear. The rest of it is at least 100% correct. > I reach a different conclusion. The performance difference 1.5.2 and > 1.6, measured with pystone and pybench, is so small that effects like > the order in which the compiler assembles the code make a difference. Sorry, it is 10 percent. Please do not shift the topic. I agree that there must be better measurements to be able to do my thoughtless claim ...from here to here..., but the question was raised in the py-dev thread "Python 1.6 speed" by Andrew, who was exactly asking why pystone gets 10 percent slower. I have been hunting that for a week now, and with your help, it is solved. > I don't think we should make any non-trivial effort to improve > performance based on this kind of voodoo. Thanks. I've already built it in - it was trivial, but I'll keep it for my version. > I also question the claim that the two effects here explain the > performance difference between 1.5.2 and 1.6. Rather, they explain > the performance difference of pystone and pybench running on different > versions of the interpreter. Exactly. I didn't want to claim anything else, it was all in the context of the inital thread. ciao - chris Oops, p.s: interesting: ... > For example, I used the compiler package (in nondist/src/Compiler) to > compile itself. Based on that benchmark, an interpreter built from > the current CVS tree is still 9-11% slower than 1.5. Did you adjust the string methods? I don't believe these are still fast. -- Christian Tismer :^) Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com From paul at prescod.net Fri Apr 28 04:20:22 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 21:20:22 -0500 Subject: [Python-Dev] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> Message-ID: <3908F566.8E5747C@prescod.net> Guido van Rossum wrote: > > ... > > I've heard a few people claim that strings should always be considered > to contain "characters" and that there should be one character per > string element. I've also heard a clamoring that there should only be > one string type. You folks have never used Asian encodings. In > countries like Japan, China and Korea, encodings are a fact of life, > and the most popular encodings are ASCII supersets that use a variable > number of bytes per character, just like UTF-8. Each country or > language uses different encodings, even though their characters look > mostly the same to western eyes. UTF-8 and Unicode is having a hard > time getting adopted in these countries because most software that > people use deals only with the local encodings. (Sounds familiar?) I think that maybe an important point is getting lost here. I could be wrong, but it seems that all of this emphasis on encodings is misplaced. The physical and logical makeup of character strings are entirely separate issues. Unicode is a character set. It works in the logical domain. Dozens of different physical encodings can be used for Unicode characters. There are XML users who work with XML (and thus Unicode) every day and never see UTF-8, UTF-16 or any other Unicode-consortium "sponsored" encoding. If you invent an encoding tomorrow, it can still be XML-compatible. There are many encodings older than Unicode that are XML (and Unicode) compatible. I have not heard complaints about the XML way of looking at the world and in fact it was explicitly endorsed by many of the world's leading experts on internationalization. I haven't followed the Java situation as closely but I have also not heard screams about its support for il8n. > The truth of the matter is: the encoding of string objects is in the > mind of the programmer. When I read a GIF file into a string object, > the encoding is "binary goop". IMHO, it's a mistake of history that you would even think it makes sense to read a GIF file into a "string" object and we should be trying to erase that mistake, as quickly as possible (which is admittedly not very quickly) not building more and more infrastructure around it. How can we make the transition to a "binary goops are not strings" world easiest? > The moral of all this? 8-bit strings are not going away. If that is a statement of your long term vision, then I think that it is very unfortunate. Treating string literals as if they were isomorphic with byte arrays was probably the right thing in 1991 but it won't be in 2005. It doesn't meet the definition of string used in the Unicode spec., nor in XML, nor in Java, nor at the W3C nor in most other up and coming specifications. From paul at prescod.net Fri Apr 28 04:21:44 2000 From: paul at prescod.net (Paul Prescod) Date: Thu, 27 Apr 2000 21:21:44 -0500 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> Message-ID: <3908F5B8.9F8D8A9A@prescod.net> Andy Robinson wrote: > > - you can work with old fashioned strings, which are understood > by everyone to be arrays of bytes, and there is no magic > conversion going on. The bytes in literal strings in your script file > are the bytes that end up in the program. Who is "everyone"? Are you saying that CP4E hordes are going to understand that the syntax "abcde" is constructing a *byte array*? It seems like you think that Python users are going to be more sophisticated in their understanding of these issues than Java programmers. In most other things, Python is simpler. > ... > > I'm also convinced that the majority of Python scripts won't need > to work in Unicode. Anything working with XML will need to be Unicode. Anything working with the Win32 API (especially COM) will want to do Unicode. Over time the entire Web infrastructure will move to Unicode. Anything written in JPython pretty much MOST use Unicode (doesn't it?). > Even working with exotic languages, there is always a native > 8-bit encoding. Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use 8-bit encodings of Unicode if you want. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From petrilli at amber.org Fri Apr 28 06:12:29 2000 From: petrilli at amber.org (Christopher Petrilli) Date: Fri, 28 Apr 2000 00:12:29 -0400 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: <3908F5B8.9F8D8A9A@prescod.net>; from paul@prescod.net on Thu, Apr 27, 2000 at 09:21:44PM -0500 References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> Message-ID: <20000428001229.A4790@trump.amber.org> Paul Prescod [paul at prescod.net] wrote: > > I'm also convinced that the majority of Python scripts won't need > > to work in Unicode. > > Anything working with XML will need to be Unicode. Anything working with > the Win32 API (especially COM) will want to do Unicode. Over time the > entire Web infrastructure will move to Unicode. Anything written in > JPython pretty much MOST use Unicode (doesn't it?). I disagree with this. Unicode has been a very long time, and it's not been adopted by a lot of people for a LOT of very valid reasons. > > Even working with exotic languages, there is always a native > > 8-bit encoding. > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > 8-bit encodings of Unicode if you want. Um, if you go: JIS -> Unicode -> JIS you don't get the same thing out that you put in (at least this is what I've been told by a lot of Japanese developers), and therefore it's not terribly popular because of the nature of the Japanese (and Chinese) langauge. My experience with Unicode is that a lot of Western people think it's the answer to every problem asked, while most asian language people disagree vehemently. This says the problem isn't solved yet, even if people wish to deny it. Chris -- | Christopher Petrilli | petrilli at amber.org From just at letterror.com Fri Apr 28 10:33:16 2000 From: just at letterror.com (Just van Rossum) Date: Fri, 28 Apr 2000 09:33:16 +0100 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: <200004271501.LAA13535@eric.cnri.reston.va.us> References: Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: At 11:01 AM -0400 27-04-2000, Guido van Rossum wrote: >Where does the current approach require work? > >- We need a way to indicate the encoding of Python source code. >(Probably a "magic comment".) How will other parts of a program know which encoding was used for non-unicode string literals? It seems to me that an encoding attribute for 8-bit strings solves this nicely. The attribute should only be set automatically if the encoding of the source file was specified or when the string has been encoded from a unicode string. The attribute should *only* be used when converting to unicode. (Hm, it could even be used when calling unicode() without the encoding argument.) It should *not* be used when comparing (or adding, etc.) 8-bit strings to each other, since they still may contain binary goop, even in a source file with a specified encoding! >- We need a way to indicate the encoding of input and output data >files, and we need shortcuts to set the encoding of stdin, stdout and >stderr (and maybe all files opened without an explicit encoding). Can you open a file *with* an explicit encoding? Just From mal at lemburg.com Fri Apr 28 11:39:37 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 28 Apr 2000 11:39:37 +0200 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> Message-ID: <39095C59.A5916EEB@lemburg.com> [Note: These discussion should all move to 18n-sig... CCing there] Christopher Petrilli wrote: > > Paul Prescod [paul at prescod.net] wrote: > > > Even working with exotic languages, there is always a native > > > 8-bit encoding. > > > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > > 8-bit encodings of Unicode if you want. > > Um, if you go: > > JIS -> Unicode -> JIS > > you don't get the same thing out that you put in (at least this is > what I've been told by a lot of Japanese developers), and therefore > it's not terribly popular because of the nature of the Japanese (and > Chinese) langauge. > > My experience with Unicode is that a lot of Western people think it's > the answer to every problem asked, while most asian language people > disagree vehemently. This says the problem isn't solved yet, even if > people wish to deny it. Isn't this a problem of the translation rather than Unicode itself (Andy mentioned several times that you can use the private BMP areas to implement 1-1 round-trips) ? -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From tree at cymru.basistech.com Fri Apr 28 12:44:00 2000 From: tree at cymru.basistech.com (Tom Emerson) Date: Fri, 28 Apr 2000 06:44:00 -0400 (EDT) Subject: [Python-Dev] [I18n-sig] Re: Unicode debate In-Reply-To: References: Message-ID: <14601.27504.337569.201251@cymru.basistech.com> Just van Rossum writes: > How will other parts of a program know which encoding was used for > non-unicode string literals? This is the exact reason that Unicode should be used for all string literals: from a language design perspective I don't understand the rationale for providing "traditional" and "unicode" string. > It seems to me that an encoding attribute for 8-bit strings solves this > nicely. The attribute should only be set automatically if the encoding of > the source file was specified or when the string has been encoded from a > unicode string. The attribute should *only* be used when converting to > unicode. (Hm, it could even be used when calling unicode() without the > encoding argument.) It should *not* be used when comparing (or adding, > etc.) 8-bit strings to each other, since they still may contain binary > goop, even in a source file with a specified encoding! In Dylan there is an explicit split between 'characters' (which are always Unicode) and 'bytes'. What are the compelling reasons to not use UTF-8 as the (source) document encoding? In the past the usual response is, "the tools are't there for authoring UTF-8 documents". This argument becomes more specious as more OS's move towards Unicode. I firmly believe this can be done without Java's bloat. One off-the-cuff solution is this: All character strings are Unicode (utf-8 encoding). Language terminals and operators are restricted to US-ASCII, which are identical to UTF8. The contents of comments are not interpreted in any way. > >- We need a way to indicate the encoding of input and output data > >files, and we need shortcuts to set the encoding of stdin, stdout and > >stderr (and maybe all files opened without an explicit encoding). > > Can you open a file *with* an explicit encoding? If you cannot, you lose. You absolutely must be able to specify the encoding of a file when opening it, so that the runtime can transcode into the native encoding as you read it. This should be otherwise transparent the user. -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From tree at cymru.basistech.com Fri Apr 28 12:56:50 2000 From: tree at cymru.basistech.com (Tom Emerson) Date: Fri, 28 Apr 2000 06:56:50 -0400 (EDT) Subject: [I18n-sig] Re: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: <39095C59.A5916EEB@lemburg.com> References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> <39095C59.A5916EEB@lemburg.com> Message-ID: <14601.28274.667733.660938@cymru.basistech.com> M.-A. Lemburg writes: > > > Unicode has many encodings: Shift-JIS, Big-5, EBCDIC ... You can use > > > 8-bit encodings of Unicode if you want. This is meaningless: legacy encodings of national character sets such Shift-JIS, Big Five, GB2312, or TIS620 are not "encodings" of Unicode. TIS620 is a single-byte, 8-bit encoding: each character is represented by a single byte. The Japanese and Chinese encodings are multibyte, 8-bit, encodings. ISO-2022 is a multi-byte, 7-bit encoding for multiple character sets. Unicode has several possible encodings: UTF-8, UCS-2, UCS-4, UTF-16... You can view all of these as 8-bit encodings, if you like. Some are multibyte (such as UTF-8, where each character in Unicode is represented in 1 to 3 bytes) while others are fixed length, two or four bytes per character. > > Um, if you go: > > > > JIS -> Unicode -> JIS > > > > you don't get the same thing out that you put in (at least this is > > what I've been told by a lot of Japanese developers), and therefore > > it's not terribly popular because of the nature of the Japanese (and > > Chinese) langauge. This is simply not true any more. The ability to round trip between Unicode and legacy encodings is dependent on the software: being able to use code points in the PUA for this is acceptable and commonly done. The big advantage is in using Unicode as a pivot when transcoding between different CJK encodings. It is very difficult to map between, say, Shift JIS and GB2312, directly. However, Unicode provides a good go-between. It isn't a panacea: transcoding between legacy encodings like GB2312 and Big Five is still difficult: Unicode or not. > > My experience with Unicode is that a lot of Western people think it's > > the answer to every problem asked, while most asian language people > > disagree vehemently. This says the problem isn't solved yet, even if > > people wish to deny it. This is a shame: it is an indication that they don't understand the technology. Unicode is a tool: nothing more. -tree -- Tom Emerson Basis Technology Corp. Language Hacker http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever" From gstein at lyra.org Fri Apr 28 14:41:11 2000 From: gstein at lyra.org (Greg Stein) Date: Fri, 28 Apr 2000 05:41:11 -0700 (PDT) Subject: [Python-Dev] c.l.py readership datapoint (was: Python 1.6a2 Unicode bug) In-Reply-To: Message-ID: On Thu, 27 Apr 2000, Just van Rossum wrote: > At 10:27 PM -0400 26-04-2000, Tim Peters wrote: > >Indeed, if someone from an inferior culture wants to chime in, let them find > >Python-Dev with their own beady little eyes . > > All irony aside, I think you've nailed one of the problems spot on: > - most core Python developers seem to be too busy to read *anything* at all > in c.l.py Datapoint: I stopped reading c.l.py almost two years ago. For a while, I would pop up a newsreader every month or so and skim what kinds of things were happening. That stopped at least a year or so ago. I get a couple hundred messages a day. Another 100+ from c.l.py would be way too much. Cheers, -g -- Greg Stein, http://www.lyra.org/ From guido at python.org Fri Apr 28 15:24:29 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 09:24:29 -0400 Subject: [Python-Dev] Re: [XML-SIG] Python 1.6a2 Unicode experiences? In-Reply-To: Your message of "Fri, 28 Apr 2000 11:39:37 +0200." <39095C59.A5916EEB@lemburg.com> References: <200004270208.WAA01413@newcnri.cnri.reston.va.us> <001c01bfb033$96bf66d0$01ac2ac0@boulder> <3908F5B8.9F8D8A9A@prescod.net> <20000428001229.A4790@trump.amber.org> <39095C59.A5916EEB@lemburg.com> Message-ID: <200004281324.JAA15642@eric.cnri.reston.va.us> > [Note: These discussion should all move to 18n-sig... CCing there] > > Christopher Petrilli wrote: > > you don't get the same thing out that you put in (at least this is > > what I've been told by a lot of Japanese developers), and therefore > > it's not terribly popular because of the nature of the Japanese (and > > Chinese) langauge. > > > > My experience with Unicode is that a lot of Western people think it's > > the answer to every problem asked, while most asian language people > > disagree vehemently. This says the problem isn't solved yet, even if > > people wish to deny it. [Marc-Andre Lenburg] > Isn't this a problem of the translation rather than Unicode > itself (Andy mentioned several times that you can use the private > BMP areas to implement 1-1 round-trips) ? Maybe, but apparently such high-quality translations are rare (note that Andy said "can"). Anyway, a word of caution here. Years ago I attended a number of IETF meetings on internationalization, in a time when Unicode wasn't as accepted as it is now. The one thing I took away from those meetings was that this is a *highly* emotional and controversial issue. As the Python community, I feel we have no need to discuss "why Unicode." Therein lies madness, controversy, and no progress. We know there's a clear demand for Unicode, and we've committed to support it. The question now at hand is "how Unicode." Let's please focus on that, e.g. in the other thread ("Unicode debate") in i18n-sig and python-dev. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 28 16:10:27 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:10:27 -0400 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 09:33:16 BST." References: Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: <200004281410.KAA16104@eric.cnri.reston.va.us> [GvR] > >- We need a way to indicate the encoding of Python source code. > >(Probably a "magic comment".) [JvR] > How will other parts of a program know which encoding was used for > non-unicode string literals? > > It seems to me that an encoding attribute for 8-bit strings solves this > nicely. The attribute should only be set automatically if the encoding of > the source file was specified or when the string has been encoded from a > unicode string. The attribute should *only* be used when converting to > unicode. (Hm, it could even be used when calling unicode() without the > encoding argument.) It should *not* be used when comparing (or adding, > etc.) 8-bit strings to each other, since they still may contain binary > goop, even in a source file with a specified encoding! Marc-Andre took this idea a bit further, but I think it's not practical given the current implementation: there are too many places where the C code would have to be changed in order to propagate the string encoding information, and there are too many sources of strings with unknown encodings to make it very useful. Plus, it would slow down 8-bit string ops. I have a better idea: rather than carrying around 8-bit strings with an encoding, use Unicode literals in your source code. If the source encoding is known, these will be converted using the appropriate codec. If you object to having to write u"..." all the time, we could say that "..." is a Unicode literal if it contains any characters with the top bit on (of course the source file encoding would be used just like for u"..."). But I think this should be enabled by a separate pragma -- people who want to write Unicode-unaware code manipulating 8-bit strings in their favorite encoding (e.g. shift-JIS or Latin-1) should not silently get Unicode strings. (I thought about an option to make *all strings* (not just literals) Unicode, but the current implementation would require too much hacking. This is what JPython does, and maybe it should be what Python 3000 does; I don't see it as a realistic option for the 1.x series.) --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Apr 28 16:27:18 2000 From: fdrake at acm.org (Fred L. Drake, Jr.) Date: Fri, 28 Apr 2000 10:27:18 -0400 (EDT) Subject: [Python-Dev] Brian Hooper's patch to add u & u# to Py_BuildValue Message-ID: <14601.40902.531340.684389@seahag.cnri.reston.va.us> Brian Hooper submitted a patch to add U and U# to the format strings for Py_BuildValue(), and there were comments that indicated u and u# would be better. He's submitted a documentation update for this as well the implementation. If there are no objections, I'll incorporate these changes. -Fred -- Fred L. Drake, Jr. Corporation for National Research Initiatives From guido at python.org Fri Apr 28 16:32:28 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:32:28 -0400 Subject: [Python-Dev] Re: [I18n-sig] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 06:44:00 EDT." <14601.27504.337569.201251@cymru.basistech.com> References: <14601.27504.337569.201251@cymru.basistech.com> Message-ID: <200004281432.KAA16418@eric.cnri.reston.va.us> > This is the exact reason that Unicode should be used for all string > literals: from a language design perspective I don't understand the > rationale for providing "traditional" and "unicode" string. In Python 3000, you would have a point. In current Python, there simply are too many programs and extensions written in other languages that manipulating 8-bit strings to ignore their existence. We're trying to add Unicode support to Python 1.6 without breaking code that used to run under Python 1.5.x; practicalities just make it impossible to go with Unicode for everything. I think that if Python didn't have so many extension modules (many maintained by 3rd party modules) it would be a lot easier to switch to Unicode for all strings (I think JavaScript has done this). In Python 3000, we'll have to seriously consider having separate character string and byte array objects, along the lines of Java's model. Note that I say "seriously consider." We'll first have to see how well the current solution works *in practice*. There's time before we fix Py3k in stone. :-) --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 28 16:33:24 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:33:24 -0400 Subject: [Python-Dev] Brian Hooper's patch to add u & u# to Py_BuildValue In-Reply-To: Your message of "Fri, 28 Apr 2000 10:27:18 EDT." <14601.40902.531340.684389@seahag.cnri.reston.va.us> References: <14601.40902.531340.684389@seahag.cnri.reston.va.us> Message-ID: <200004281433.KAA16446@eric.cnri.reston.va.us> > Brian Hooper submitted a patch to add U and U# to the format strings > for Py_BuildValue(), and there were comments that indicated u and u# > would be better. He's submitted a documentation update for this as > well the implementation. > If there are no objections, I'll incorporate these changes. Please go ahead, changing U/U# to u/u#. --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Apr 28 16:50:05 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 10:50:05 -0400 Subject: [I18n-sig] Re: [Python-Dev] Unicode debate In-Reply-To: Your message of "Thu, 27 Apr 2000 21:20:22 CDT." <3908F566.8E5747C@prescod.net> References: <200004271501.LAA13535@eric.cnri.reston.va.us> <3908F566.8E5747C@prescod.net> Message-ID: <200004281450.KAA16493@eric.cnri.reston.va.us> [Paul Prescod] > I think that maybe an important point is getting lost here. I could be > wrong, but it seems that all of this emphasis on encodings is misplaced. In practical applications that manipulate text, encodings creep up all the time. I remember a talk or message by Andy Robinson about the messiness of producing printed reports in Japanese for a large investment firm. Most off the issues that took his time had to do with encodings, if I recall correctly. (Andy, do you remember what I'm talking about? Do you have a URL?) > > The truth of the matter is: the encoding of string objects is in the > > mind of the programmer. When I read a GIF file into a string object, > > the encoding is "binary goop". > > IMHO, it's a mistake of history that you would even think it makes sense > to read a GIF file into a "string" object and we should be trying to > erase that mistake, as quickly as possible (which is admittedly not very > quickly) not building more and more infrastructure around it. How can we > make the transition to a "binary goops are not strings" world easiest? I'm afraid that's a bigger issue than we can solve for Python 1.6. We're committed to by and large backwards compatibility while supporting Unicode -- the backwards compatibility with tons of extension module (many 3rd party) requires that we deal with 8-bit strings in basically the same way as we did before. > > The moral of all this? 8-bit strings are not going away. > > If that is a statement of your long term vision, then I think that it is > very unfortunate. Treating string literals as if they were isomorphic > with byte arrays was probably the right thing in 1991 but it won't be in > 2005. I think you're a tad too optimistic about the evolution speed of software (Windows 2000 *still* has to support DOS programs), but I see your point. As I stated in another message, in Python 3000 we'll have to consider a more Java-esque solution: *character* strings are Unicode, and for bytes we have (mutable!) byte arras. Certainly 8-bit bytes as the smallest storage unit aren't going away. > It doesn't meet the definition of string used in the Unicode spec., nor > in XML, nor in Java, nor at the W3C nor in most other up and coming > specifications. OK, so that's a good indication of where you're coming from. Maybe you should spend a little more time in the trenches and a little less in standards bodies. Standards are good, but sometimes disconnected from reality (remember ISO networking? :-). > From the W3C site: > > ""While ISO-2022-JP is not sufficient for every ISO10646 document, it is > the case that ISO10646 is a sufficient document character set for any > entity encoded with ISO-2022-JP."" And this is exactly why encodings will remain important: entities encoded in ISO-2022-JP have no compelling reason to be recoded permanently into ISO10646, and there are lots of forces that make it convenient to keep it encoded in ISO-2022-JP (like existing tools). > http://www.w3.org/MarkUp/html-spec/charset-harmful.html I know that document well. --Guido van Rossum (home page: http://www.python.org/~guido/) From just at letterror.com Fri Apr 28 19:51:03 2000 From: just at letterror.com (Just van Rossum) Date: Fri, 28 Apr 2000 18:51:03 +0100 Subject: [Python-Dev] Re: Unicode debate In-Reply-To: <200004281410.KAA16104@eric.cnri.reston.va.us> References: Your message of "Fri, 28 Apr 2000 09:33:16 BST." Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: [GvR, on string.encoding ] >Marc-Andre took this idea a bit further, but I think it's not >practical given the current implementation: there are too many places >where the C code would have to be changed in order to propagate the >string encoding information, I may miss something, but the encoding attr just travels with the string object, no? Like I said in my reply to MAL, I think it's undesirable to do *anything* with the encoding attr if not in combination with a unicode string. >and there are too many sources of strings >with unknown encodings to make it very useful. That's why the default encoding must be settable as well, as Fredrik suggested. >Plus, it would slow down 8-bit string ops. Not if you ignore it most of the time, and just pass it along when concatenating. >I have a better idea: rather than carrying around 8-bit strings with >an encoding, use Unicode literals in your source code. Explain that to newbies... I guess is that they will want simple 8 bit strings in their native encoding. Dunno. >If the source >encoding is known, these will be converted using the appropriate >codec. > >If you object to having to write u"..." all the time, we could say >that "..." is a Unicode literal if it contains any characters with the >top bit on (of course the source file encoding would be used just like >for u"..."). Only if "\377" would still yield an 8-bit string, for binary goop... Just From guido at python.org Fri Apr 28 20:31:19 2000 From: guido at python.org (Guido van Rossum) Date: Fri, 28 Apr 2000 14:31:19 -0400 Subject: [I18n-sig] Re: [Python-Dev] Re: Unicode debate In-Reply-To: Your message of "Fri, 28 Apr 2000 18:51:03 BST." References: Your message of "Fri, 28 Apr 2000 09:33:16 BST." Your message of "Thu, 27 Apr 2000 06:42:43 BST." Message-ID: <200004281831.OAA17406@eric.cnri.reston.va.us> > [GvR, on string.encoding ] > >Marc-Andre took this idea a bit further, but I think it's not > >practical given the current implementation: there are too many places > >where the C code would have to be changed in order to propagate the > >string encoding information, [JvR] > I may miss something, but the encoding attr just travels with the string > object, no? Like I said in my reply to MAL, I think it's undesirable to do > *anything* with the encoding attr if not in combination with a unicode > string. But just propagating affects every string op -- s+s, s*n, s[i], s[:], s.strip(), s.split(), s.lower(), ... > >and there are too many sources of strings > >with unknown encodings to make it very useful. > > That's why the default encoding must be settable as well, as Fredrik > suggested. I'm open for debate about this. There's just something about a changeable global default encoding that worries me -- like any global property, it requires conventions and defensive programming to make things work in larger programs. For example, a module that deals with Latin-1 strings can't just set the default encoding to Latin-1: it might be imported by a program that needs it to be UTF-8. This model is currently used by the locale in C, where all locale properties are global, and it doesn't work well. For example, Python needs to go through a lot of hoops so that Python numeric literals use "." for the decimal indicator even if the user's locale specifies "," -- we can't change Python to swap the meaning of "." and "," in all contexts. So I think that a changeable default encoding is of limited value. That's different from being able to set the *source file* encoding -- this only affects Unicode string literals. > >Plus, it would slow down 8-bit string ops. > > Not if you ignore it most of the time, and just pass it along when > concatenating. And slicing, and indexing, and... > >I have a better idea: rather than carrying around 8-bit strings with > >an encoding, use Unicode literals in your source code. > > Explain that to newbies... I guess is that they will want simple 8 bit > strings in their native encoding. Dunno. If they are hap-py with their native 8-bit encoding, there's no need for them to ever use Unicode objects in their program, so they should be fine. 8-bit strings aren't ever interpreted or encoded except when mixed with Unicode objects. > >If the source > >encoding is known, these will be converted using the appropriate > >codec. > > > >If you object to having to write u"..." all the time, we could say > >that "..." is a Unicode literal if it contains any characters with the > >top bit on (of course the source file encoding would be used just like > >for u"..."). > > Only if "\377" would still yield an 8-bit string, for binary goop... Correct. --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at lemburg.com Fri Apr 28 20:57:18 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Fri, 28 Apr 2000 20:57:18 +0200 Subject: [Python-Dev] Changing PYC-Magic Message-ID: <3909DF0E.1D886485@lemburg.com> I have just looked at the Python/import.c file and the hard coded PYC magic number... /* Magic word to reject .pyc files generated by other Python versions */ /* Change for each incompatible change */ /* The value of CR and LF is incorporated so if you ever read or write a .pyc file in text mode the magic number will be wrong; also, the Apple MPW compiler swaps their values, botching string constants */ /* XXX Perhaps the magic number should be frozen and a version field added to the .pyc file header? */ /* New way to come up with the magic number: (YEAR-1995), MONTH, DAY */ #define MAGIC (20121 | ((long)'\r'<<16) | ((long)'\n'<<24)) A bit outdated, I'd say. With the addition of Unicode, the PYC files will contain marshalled Unicode objects which are not readable by older versions. I'd suggest bumping the magic number to 50428 ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From trentm at ActiveState.com Sat Apr 29 00:08:57 2000 From: trentm at ActiveState.com (Trent Mick) Date: Fri, 28 Apr 2000 15:08:57 -0700 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: <200004261851.OAA06794@eric.cnri.reston.va.us> Message-ID: > > Guido van Rossum wrote: > > > > > > The email below is a serious bug report. A quick analysis shows that > > > UserString.count() calls the count() method on a string object, which > > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > > format code truncates integers. It probably should raise an overflow > > > exception instead. But that would still cause the test to fail -- > > > just in a different way (more explicit). Then the string methods > > > should be fixed to use long ints instead -- and then something else > > > would probably break... > > MAL wrote: > > All uses in stringobject.c and unicodeobject.c use INT_MAX > > together with integers, so there's no problem on that side > > of the fence ;-) > > > > Since strings and Unicode objects use integers to describe the > > length of the object (as well as most if not all other > > builtin sequence types), the correct default value should > > thus be something like sys.maxlen which then gets set to > > INT_MAX. > > > > I'd suggest adding sys.maxlen and the modifying UserString.py, > > re.py and sre_parse.py accordingly. > Guido wrote: > Hm, I'm not so sure. It would be much better if passing sys.maxint > would just WORK... Since that's what people have been doing so far. > Possible solutions (I give 4 of them): 1. The 'i' format code could raise an overflow exception and the PyArg_ParseTuple() call in string_count() could catch it and truncate to INT_MAX (reasoning that any overflow of the end position of a string can be bound to INT_MAX because that is the limit for any string in Python). Pros: - This "would just WORK" for usage of sys.maxint. Cons: - This overflow exception catching should then reasonably be propagated to other similar functions (like string.endswith(), etc). - We have to assume that the exception raised in the PyArg_ParseTuple(args, "O|ii:count", &subobj, &i, &last) call is for the second integer (i.e. 'last'). This is subtle and ugly. Pro or Con: - Do we want to start raising overflow exceptions for other conversion formats (i.e. 'b' and 'h' and 'l', the latter *can* overflow on Win64 where sizeof(long) < size(void*))? I think this is a good idea in principle but may break code (even if it *does* identify bugs in that code). 2. Just change the definitions of the UserString methods to pass a variable length argument list instead of default value parameters. For example change UserString.count() from: def count(self, sub, start=0, end=sys.maxint): return self.data.count(sub, start, end) to: def count(self, *args)): return self.data.count(*args) The result is that the default value for 'end' is now set by string_count() rather than by the UserString implementation: >>> from UserString import UserString >>> s= 'abcabcabc' >>> u = UserString('abcabcabc') >>> s.count('abc') 3 >>> u.count('abc') 3 Pros: - Easy change. - Fixes the immediate bug. - This is a safer way to copy the string behaviour in UserString anyway (is it not?). Cons: - Does not fix the general problem of the (common?) usage of sys.maxint to mean INT_MAX rather than the actual LONG_MAX (this matters on 64-bit Unices). - The UserString code is no longer really self-documenting. 3. As MAL suggested: add something like sys.maxlen (set to INT_MAX) with breaks the logical difference with sys.maxint (set to LONG_MAX): - sys.maxint == "the largest value a Python integer can hold" - sys.maxlen == "the largest value for the length of an object in Python (e.g. length of a string, length of an array)" Pros: - More explicit in that it separates two distinct meanings for sys.maxint (which now makes a difference on 64-bit Unices). - The code changes should be fairly straightforward. Cons: - Places in the code that still use sys.maxint where they should use sys.maxlen will unknowingly be overflowing ints and bringing about this bug. - Something else for coders to know about. 4. Add something like sys.maxlen, but set it to SIZET_MAX (c.f. ANSI size_t type). It is probably not a biggie, but Python currently makes the assumption that string never exceed INT_MAX in length. While this assumption is not likely to be proven false it technically could be on 64-bit systems. As well, when you start compiling on Win64 (where sizeof(int) == sizeof(long) < sizeof(size_t)) then you are going to be annoyed by hundreds of warnings about implicit casts from size_t (64-bits) to int (32-bits) for every strlen, str*, fwrite, and sizeof call that you make. Pros: - IMHO logically more correct. - Might clean up some subtle bugs. - Cleans up annoying and disconcerting warnings. - Will probably mean less pain down the road as 64-bit systems (esp. Win64) become more prevalent. Cons: - Lot of coding changes. - As Guido said: "and then something else would probably break". (Though, on currently 32-bits system, there should be no effective change). Only 64-bit systems should be affected and, I would hope, the effect would be a clean up. I apologize for not being succinct. Note that I am volunteering here. Opinions and guidance please. Trent From moshez at math.huji.ac.il Sat Apr 29 04:08:48 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 29 Apr 2000 05:08:48 +0300 (IDT) Subject: [I18n-sig] Re: [Python-Dev] Unicode debate In-Reply-To: <200004281450.KAA16493@eric.cnri.reston.va.us> Message-ID: I agree with most of what you say, but... On Fri, 28 Apr 2000, Guido van Rossum wrote: > As I stated in another message, in Python 3000 we'll have > to consider a more Java-esque solution: *character* strings are > Unicode, and for bytes we have (mutable!) byte arras. I would prefer a different distinction: mutable immutable chars string string_buffer bytes bytes bytes_buffer Why not allow me the freedom to index a dictionary with goop? (Here's a sample application: UNIX "file" command) -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From mal at lemburg.com Sat Apr 29 14:50:07 2000 From: mal at lemburg.com (M.-A. Lemburg) Date: Sat, 29 Apr 2000 14:50:07 +0200 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) References: Message-ID: <390ADA7F.2C01C6C3@lemburg.com> Trent Mick wrote: > > > > Guido van Rossum wrote: > > > > > > > > The email below is a serious bug report. A quick analysis shows that > > > > UserString.count() calls the count() method on a string object, which > > > > calls PyArg_ParseTuple() with the format string "O|ii". The 'i' > > > > format code truncates integers. It probably should raise an overflow > > > > exception instead. But that would still cause the test to fail -- > > > > just in a different way (more explicit). Then the string methods > > > > should be fixed to use long ints instead -- and then something else > > > > would probably break... > > > > MAL wrote: > > > All uses in stringobject.c and unicodeobject.c use INT_MAX > > > together with integers, so there's no problem on that side > > > of the fence ;-) > > > > > > Since strings and Unicode objects use integers to describe the > > > length of the object (as well as most if not all other > > > builtin sequence types), the correct default value should > > > thus be something like sys.maxlen which then gets set to > > > INT_MAX. > > > > > > I'd suggest adding sys.maxlen and the modifying UserString.py, > > > re.py and sre_parse.py accordingly. > > > Guido wrote: > > Hm, I'm not so sure. It would be much better if passing sys.maxint > > would just WORK... Since that's what people have been doing so far. > > > > Possible solutions (I give 4 of them): > [...] Here is another one... I don't really like it because I think that silent truncations are a bad idea, but to make things "just work it would help: * Change PyArg_ParseTuple() to truncate the range(INT_MAX+1, LONG_MAX+1) to INT_MAX and the same for negative numbers when passing a Python integer to a "i" marked variable. This would map range(INT_MAX+1, LONG_MAX+1) to INT_MAX and thus sys.maxint would turn out as INT_MAX in all those cases where "i" is used as parser marker. Dito for negative values. With this truncation passing sys.maxint as default argument for length parameters would "just work" :-). The more radical alternative would be changing the Python object length fields to long -- I don't think this is practical though (and probably not really needed unless you intend to work with 3GB strings ;). -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/ From paul at prescod.net Sat Apr 29 16:18:05 2000 From: paul at prescod.net (Paul Prescod) Date: Sat, 29 Apr 2000 09:18:05 -0500 Subject: [I18n-sig] Re: [Python-Dev] Unicode debate References: <200004271501.LAA13535@eric.cnri.reston.va.us> <3908F566.8E5747C@prescod.net> <200004281450.KAA16493@eric.cnri.reston.va.us> Message-ID: <390AEF1D.253B93EF@prescod.net> Guido van Rossum wrote: > > [Paul Prescod] > > I think that maybe an important point is getting lost here. I could be > > wrong, but it seems that all of this emphasis on encodings is misplaced. > > In practical applications that manipulate text, encodings creep up all > the time. I'm not saying that encodings are unimportant. I'm saying that that they are *different* than what Fredrik was talking about. He was talking about a coherent logical model for characters and character strings based on the conventions of more modern languages and systems than C and Python. > > How can we > > make the transition to a "binary goops are not strings" world easiest? > > I'm afraid that's a bigger issue than we can solve for Python 1.6. I understand that we can't fix the problem now. I just think that we shouldn't go out of our ways to make it worst. If we make byte-array strings "magically" cast themselves into character-strings, people will expect that behavior forever. > > It doesn't meet the definition of string used in the Unicode spec., nor > > in XML, nor in Java, nor at the W3C nor in most other up and coming > > specifications. > > OK, so that's a good indication of where you're coming from. Maybe > you should spend a little more time in the trenches and a little less > in standards bodies. Standards are good, but sometimes disconnected > from reality (remember ISO networking? :-). As far as I know, XML and Java are used a fair bit in the real world...even somewhat in Asia. In fact, there is a book titled "XML and Java" written by three Japanese men. > And this is exactly why encodings will remain important: entities > encoded in ISO-2022-JP have no compelling reason to be recoded > permanently into ISO10646, and there are lots of forces that make it > convenient to keep it encoded in ISO-2022-JP (like existing tools). You cannot recode an ISO-2022-JP document into ISO10646 because 10646 is a character *set* and not an encoding. ISO-2022-JP says how you should represent characters in terms of bits and bytes. ISO10646 defines a mapping from integers to characters. They are both important, but separate. I think that this automagical re-encoding conflates them. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself It's difficult to extract sense from strings, but they're the only communication coin we can count on. - http://www.cs.yale.edu/~perlis-alan/quotes.html From moshez at math.huji.ac.il Sat Apr 29 20:09:40 2000 From: moshez at math.huji.ac.il (Moshe Zadka) Date: Sat, 29 Apr 2000 21:09:40 +0300 (IDT) Subject: [Python-Dev] At the interactive port Message-ID: Continuing the recent debate about what is appropriate to the interactive prompt printing, and the wide agreement that whatever we decide, users might think otherwise, I've written up a patch to have the user control via a function in __builtin__ the way things are printed at the prompt. This is not patches at python level stuff for two reasons: 1. I'm not sure what to call this function. Currently, I call it __print_expr__, but I'm not sure it's a good name 2. I haven't yet supplied a default in __builtin__, so the user *must* override this. This is unacceptable, of course. I'd just like people to tell me if they think this is worth while, and if there is anything I missed. *** ../python/dist/src/Python/ceval.c Fri Mar 31 04:42:47 2000 --- Python/ceval.c Sat Apr 29 03:55:36 2000 *************** *** 1014,1047 **** case PRINT_EXPR: v = POP(); ! /* Print value except if None */ ! /* After printing, also assign to '_' */ ! /* Before, set '_' to None to avoid recursion */ ! if (v != Py_None && ! (err = PyDict_SetItemString( ! f->f_builtins, "_", Py_None)) == 0) { ! err = Py_FlushLine(); ! if (err == 0) { ! x = PySys_GetObject("stdout"); ! if (x == NULL) { ! PyErr_SetString( ! PyExc_RuntimeError, ! "lost sys.stdout"); ! err = -1; ! } ! } ! if (err == 0) ! err = PyFile_WriteObject(v, x, 0); ! if (err == 0) { ! PyFile_SoftSpace(x, 1); ! err = Py_FlushLine(); ! } ! if (err == 0) { ! err = PyDict_SetItemString( ! f->f_builtins, "_", v); ! } } ! Py_DECREF(v); break; case PRINT_ITEM: --- 1014,1035 ---- case PRINT_EXPR: v = POP(); ! x = PyDict_GetItemString(f->f_builtins, ! "__print_expr__"); ! if (x == NULL) { ! PyErr_SetString(PyExc_SystemError, ! "__print_expr__ not found"); ! Py_DECREF(v); ! break; ! } ! t = PyTuple_New(1); ! if (t != NULL) { ! PyTuple_SET_ITEM(t, 0, v); ! w = PyEval_CallObject(x, t); ! Py_XDECREF(w); } ! /*Py_DECREF(x);*/ ! Py_XDECREF(t); break; case PRINT_ITEM: -- Moshe Zadka . http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com From trentm at activestate.com Sat Apr 29 20:12:07 2000 From: trentm at activestate.com (Trent Mick) Date: Sat, 29 Apr 2000 11:12:07 -0700 Subject: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306) In-Reply-To: <390ADA7F.2C01C6C3@lemburg.com> References: <390ADA7F.2C01C6C3@lemburg.com> Message-ID: <20000429111207.A16414@activestate.com> On Sat, Apr 29, 2000 at 02:50:07PM +0200, M.-A. Lemburg wrote: > Here is another one... I don't really like it because I think that > silent truncations are a bad idea, but to make things "just work > it would help: > > * Change PyArg_ParseTuple() to truncate the range(INT_MAX+1, LONG_MAX+1) > to INT_MAX and the same for negative numbers when passing a > Python integer to a "i" marked variable. This would map > range(INT_MAX+1, LONG_MAX+1) to INT_MAX and thus sys.maxint > would turn out as INT_MAX in all those cases where "i" is > used as parser marker. Dito for negative values. > > With this truncation passing sys.maxint as default argument > for length parameters would "just work" :-). > > The more radical alternative would be changing the Python > object length fields to long -- I don't think this is If we *do* make this change however, say "size_t" please, rather than long because on Win64 sizeof(long) < sizeof(size_t) == sizeof(void*). > practical though (and probably not really needed unless > you intend to work with 3GB strings ;). I know that 3GB+ strings are not likely to come along but if the length fields were size_t it would clean up implicit downcasts that you currently get from size_t to int on calls to strlen and the like on 64-bit systems. Trent