From dalcinl at gmail.com Wed Aug 1 03:09:50 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Tue, 31 Jul 2007 22:09:50 -0300 Subject: [Python-Dev] Python Package Index hostname change In-Reply-To: <46AFA27A.4090700@v.loewis.de> References: <46AFA27A.4090700@v.loewis.de> Message-ID: On 7/31/07, "Martin v. L?wis" wrote: > The Python Packaging Index (the software formerly known > as Cheeseshop) is now available at > > http://pypi.python.org/pypi Please, update 'DEFAULT_REPOSITORY' in Lib/distutils/command/upload.py (py-2.6 and py3k-struni branches) -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From steve at holdenweb.com Wed Aug 1 03:19:19 2007 From: steve at holdenweb.com (Steve Holden) Date: Tue, 31 Jul 2007 21:19:19 -0400 Subject: [Python-Dev] Cygwin: Problem detecting subprocess termination after _spawn_posix in distutils? In-Reply-To: <46AFAD41.9060203@v.loewis.de> References: <46AFAD41.9060203@v.loewis.de> Message-ID: <46AFDF97.3090909@holdenweb.com> Martin v. L?wis wrote: >> It would be really nice if test_distutils showed any failures, but it >> doesn't so any assistance would be welcome. At this point I can't even >> replicate the failure in a simpler test :-( > > My guess is that it's the environment; if not that, the working > directory. Assuming you have already instrumented > ccompiler.CCompiler.spawn, I suggest to dump os.environ and > print os.getcwd(). Assuming you really meant that you run under > Cygwin Python (instead of just using --compiler), you might > want to instrument spawn._spawn_posix instead. > > When you say you extracted _spawn_all from distutils/spawn.py: > what version of Python are you talking about? I can't find > _spawn_all in the sources of 2.5.x, or 2.6. > Thanks for taking the time to have a look at this. Sorry, it *was* _spawn_posix I extracted (and have instrumented in the live version) - I have no idea where "_spawn_all" came from. I am indeed running under Cygwin Python. Here-s a diff -u output against the original spawn.py so you can see what I have changed. $ diff -u /lib/python2.5/distutils/{spawn.py.org,spawn.py} --- /lib/python2.5/distutils/spawn.py.org 2007-07-14 09:09:24.114921600 -0400 +++ /lib/python2.5/distutils/spawn.py 2007-07-31 20:53:33.325945600 -0400 @@ -118,7 +118,9 @@ search_path=1, verbose=0, dry_run=0): - + for _k in sorted(os.environ.keys()): + print "%s=%s" % (_k, os.environ[_k]) + print "SPAWN:", cmd, "PATH?", search_path, "V:", verbose, "D:", dry_run log.info(string.join(cmd, ' ')) if dry_run: return @@ -144,20 +146,25 @@ # Loop until the child either exits or is terminated by a signal # (ie. keep waiting if it's merely stopped) while 1: + print "Are we done yet? Waiting on pid", pid try: (pid, status) = os.waitpid(pid, 0) + print "Got pid, status", pid, status except OSError, exc: import errno + print "Got OSError", exc.errno if exc.errno == errno.EINTR: continue raise DistutilsExecError, \ "command '%s' failed: %s" % (cmd[0], exc[-1]) if os.WIFSIGNALED(status): + print "Got WIFSIGNALED", status raise DistutilsExecError, \ "command '%s' terminated by signal %d" % \ (cmd[0], os.WTERMSIG(status)) elif os.WIFEXITED(status): + print "Got WIFEXITED", status exit_status = os.WEXITSTATUS(status) if exit_status == 0: return # hey, it succeeded! @@ -167,9 +174,11 @@ (cmd[0], exit_status) elif os.WIFSTOPPED(status): + print "Got WIFSTOPPED", status continue else: + print "Got unknown exception", status raise DistutilsExecError, \ "unknown error executing '%s': termination status %d" % \ (cmd[0], status) The output now includes the environment: $ python setup.py install running install running build running build_py running build_ext building '_imaging' extension !::=::\ !C:=C:\cygwin\bin ALLUSERSPROFILE=C:\Documents and Settings\All Users APPDATA=C:\Documents and Settings\sholden\Application Data APR_ICONV_PATH=C:\Program Files\Subversion\iconv CDPATH=.:/c/Steve:/c/Steve/Projects:/usr/local CLIENTNAME=Console COMMONPROGRAMFILES=C:\Program Files\Common Files COMPUTERNAME=BIGBOY COMSPEC=C:\WINDOWS\system32\cmd.exe CVSROOT=/usr/local/repository/ CVS_RSH=/bin/ssh FP_NO_HOST_CHECK=NO HOME=/c/Steve HOMEDRIVE=C: HOMEPATH=\Documents and Settings\sholden HOSTNAME=bigboy INFOPATH=/usr/local/info:/usr/share/info:/usr/info: LANG=C LOGONSERVER=\\BIGBOY MAKE_MODE=unix MANPATH=/usr/local/man:/usr/share/man:/usr/man::/usr/ssl/man NUMBER_OF_PROCESSORS=1 OLDPWD=/c/Steve OS=Windows_NT PATH=/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/c/WINDOWS/system32:/c/WINDOWS: /c/WINDOWS/System32/Wbem:/c/Program Files/ATI Technologies/ATI Control Panel:/c/ Program Files/Common Files/GTK/2.0/bin:/c/Program Files/Subversion/bin:/c/Python 25:/c/Steve/bin PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH PLAT=cygwin-1.5.24-i686 PRINTER=HP Photosmart C6100 series PROCESSOR_ARCHITECTURE=x86 PROCESSOR_IDENTIFIER=x86 Family 6 Model 13 Stepping 6, GenuineIntel PROCESSOR_LEVEL=6 PROCESSOR_REVISION=0d06 PROGRAMFILES=C:\Program Files PROMPT=$P$G PS1=\[\e]0;\w\a\]\n\[\e[32m\]\u@\h \[\e[33m\]\w\[\e[0m\]\n\$ PWD=/c/Steve/Imaging-1.1.6 PYSVN=svn+ssh://pythondev at svn.python.org/ PYTHONSTARTUP=/c/Steve/.pythonrc SESSIONNAME=Console SHLVL=1 SYSTEMDRIVE=C: SYSTEMROOT=C:\WINDOWS TEMP=/c/DOCUME~1/sholden/LOCALS~1/Temp TERM=cygwin TMP=/c/DOCUME~1/sholden/LOCALS~1/Temp USER=sholden USER1=u35582809 at s90820416.onlinehome.us USERDOMAIN=BIGBOY USERNAME=sholden USERPROFILE=C:\Documents and Settings\sholden VISUAL=vi VS80COMNTOOLS=C:\Program Files\Microsoft Visual Studio 8\Common7\Tools\ WINDIR=C:\WINDOWS _=/usr/bin/python SPAWN: ['gcc', '-fno-strict-aliasing', '-DNDEBUG', '-g', '-O3', '-Wall', '-Wstri ct-prototypes', '-DHAVE_LIBZ', '-IlibImaging', '-I/usr/include', '-I/usr/include /python2.5', '-c', 'libImaging/Dib.c', '-o', 'build/temp.cygwin-1.5.24-i686-2.5/ libImaging/Dib.o'] PATH? 1 V: 0 D: 0 gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -DHAVE_LIBZ - IlibImaging -I/usr/include -I/usr/include/python2.5 -c libImaging/Dib.c -o build /temp.cygwin-1.5.24-i686-2.5/libImaging/Dib.o Are we done yet? Waiting on pid 416 The only environment variables that don't appear in the shell output from the env command are INFOPATH, MAKE_MODE and PLAT. I am still flummoxed. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden --------------- Asciimercial ------------------ Get on the web: Blog, lens and tag the Internet Many services currently offer free registration ----------- Thank You for Reading ------------- From steve at holdenweb.com Wed Aug 1 03:19:19 2007 From: steve at holdenweb.com (Steve Holden) Date: Tue, 31 Jul 2007 21:19:19 -0400 Subject: [Python-Dev] Cygwin: Problem detecting subprocess termination after _spawn_posix in distutils? In-Reply-To: <46AFAD41.9060203@v.loewis.de> References: <46AFAD41.9060203@v.loewis.de> Message-ID: <46AFDF97.3090909@holdenweb.com> Martin v. L?wis wrote: >> It would be really nice if test_distutils showed any failures, but it >> doesn't so any assistance would be welcome. At this point I can't even >> replicate the failure in a simpler test :-( > > My guess is that it's the environment; if not that, the working > directory. Assuming you have already instrumented > ccompiler.CCompiler.spawn, I suggest to dump os.environ and > print os.getcwd(). Assuming you really meant that you run under > Cygwin Python (instead of just using --compiler), you might > want to instrument spawn._spawn_posix instead. > > When you say you extracted _spawn_all from distutils/spawn.py: > what version of Python are you talking about? I can't find > _spawn_all in the sources of 2.5.x, or 2.6. > Thanks for taking the time to have a look at this. Sorry, it *was* _spawn_posix I extracted (and have instrumented in the live version) - I have no idea where "_spawn_all" came from. I am indeed running under Cygwin Python. Here-s a diff -u output against the original spawn.py so you can see what I have changed. $ diff -u /lib/python2.5/distutils/{spawn.py.org,spawn.py} --- /lib/python2.5/distutils/spawn.py.org 2007-07-14 09:09:24.114921600 -0400 +++ /lib/python2.5/distutils/spawn.py 2007-07-31 20:53:33.325945600 -0400 @@ -118,7 +118,9 @@ search_path=1, verbose=0, dry_run=0): - + for _k in sorted(os.environ.keys()): + print "%s=%s" % (_k, os.environ[_k]) + print "SPAWN:", cmd, "PATH?", search_path, "V:", verbose, "D:", dry_run log.info(string.join(cmd, ' ')) if dry_run: return @@ -144,20 +146,25 @@ # Loop until the child either exits or is terminated by a signal # (ie. keep waiting if it's merely stopped) while 1: + print "Are we done yet? Waiting on pid", pid try: (pid, status) = os.waitpid(pid, 0) + print "Got pid, status", pid, status except OSError, exc: import errno + print "Got OSError", exc.errno if exc.errno == errno.EINTR: continue raise DistutilsExecError, \ "command '%s' failed: %s" % (cmd[0], exc[-1]) if os.WIFSIGNALED(status): + print "Got WIFSIGNALED", status raise DistutilsExecError, \ "command '%s' terminated by signal %d" % \ (cmd[0], os.WTERMSIG(status)) elif os.WIFEXITED(status): + print "Got WIFEXITED", status exit_status = os.WEXITSTATUS(status) if exit_status == 0: return # hey, it succeeded! @@ -167,9 +174,11 @@ (cmd[0], exit_status) elif os.WIFSTOPPED(status): + print "Got WIFSTOPPED", status continue else: + print "Got unknown exception", status raise DistutilsExecError, \ "unknown error executing '%s': termination status %d" % \ (cmd[0], status) The output now includes the environment: $ python setup.py install running install running build running build_py running build_ext building '_imaging' extension !::=::\ !C:=C:\cygwin\bin ALLUSERSPROFILE=C:\Documents and Settings\All Users APPDATA=C:\Documents and Settings\sholden\Application Data APR_ICONV_PATH=C:\Program Files\Subversion\iconv CDPATH=.:/c/Steve:/c/Steve/Projects:/usr/local CLIENTNAME=Console COMMONPROGRAMFILES=C:\Program Files\Common Files COMPUTERNAME=BIGBOY COMSPEC=C:\WINDOWS\system32\cmd.exe CVSROOT=/usr/local/repository/ CVS_RSH=/bin/ssh FP_NO_HOST_CHECK=NO HOME=/c/Steve HOMEDRIVE=C: HOMEPATH=\Documents and Settings\sholden HOSTNAME=bigboy INFOPATH=/usr/local/info:/usr/share/info:/usr/info: LANG=C LOGONSERVER=\\BIGBOY MAKE_MODE=unix MANPATH=/usr/local/man:/usr/share/man:/usr/man::/usr/ssl/man NUMBER_OF_PROCESSORS=1 OLDPWD=/c/Steve OS=Windows_NT PATH=/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/c/WINDOWS/system32:/c/WINDOWS: /c/WINDOWS/System32/Wbem:/c/Program Files/ATI Technologies/ATI Control Panel:/c/ Program Files/Common Files/GTK/2.0/bin:/c/Program Files/Subversion/bin:/c/Python 25:/c/Steve/bin PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH PLAT=cygwin-1.5.24-i686 PRINTER=HP Photosmart C6100 series PROCESSOR_ARCHITECTURE=x86 PROCESSOR_IDENTIFIER=x86 Family 6 Model 13 Stepping 6, GenuineIntel PROCESSOR_LEVEL=6 PROCESSOR_REVISION=0d06 PROGRAMFILES=C:\Program Files PROMPT=$P$G PS1=\[\e]0;\w\a\]\n\[\e[32m\]\u@\h \[\e[33m\]\w\[\e[0m\]\n\$ PWD=/c/Steve/Imaging-1.1.6 PYSVN=svn+ssh://pythondev at svn.python.org/ PYTHONSTARTUP=/c/Steve/.pythonrc SESSIONNAME=Console SHLVL=1 SYSTEMDRIVE=C: SYSTEMROOT=C:\WINDOWS TEMP=/c/DOCUME~1/sholden/LOCALS~1/Temp TERM=cygwin TMP=/c/DOCUME~1/sholden/LOCALS~1/Temp USER=sholden USER1=u35582809 at s90820416.onlinehome.us USERDOMAIN=BIGBOY USERNAME=sholden USERPROFILE=C:\Documents and Settings\sholden VISUAL=vi VS80COMNTOOLS=C:\Program Files\Microsoft Visual Studio 8\Common7\Tools\ WINDIR=C:\WINDOWS _=/usr/bin/python SPAWN: ['gcc', '-fno-strict-aliasing', '-DNDEBUG', '-g', '-O3', '-Wall', '-Wstri ct-prototypes', '-DHAVE_LIBZ', '-IlibImaging', '-I/usr/include', '-I/usr/include /python2.5', '-c', 'libImaging/Dib.c', '-o', 'build/temp.cygwin-1.5.24-i686-2.5/ libImaging/Dib.o'] PATH? 1 V: 0 D: 0 gcc -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -DHAVE_LIBZ - IlibImaging -I/usr/include -I/usr/include/python2.5 -c libImaging/Dib.c -o build /temp.cygwin-1.5.24-i686-2.5/libImaging/Dib.o Are we done yet? Waiting on pid 416 The only environment variables that don't appear in the shell output from the env command are INFOPATH, MAKE_MODE and PLAT. I am still flummoxed. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden --------------- Asciimercial ------------------ Get on the web: Blog, lens and tag the Internet Many services currently offer free registration ----------- Thank You for Reading ------------- From guido at python.org Wed Aug 1 03:48:50 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Jul 2007 18:48:50 -0700 Subject: [Python-Dev] Python Package Index hostname change In-Reply-To: References: <46AFA27A.4090700@v.loewis.de> Message-ID: And why not in the upcoming 2.5 release as well? On 7/31/07, Lisandro Dalcin wrote: > On 7/31/07, "Martin v. L?wis" wrote: > > The Python Packaging Index (the software formerly known > > as Cheeseshop) is now available at > > > > http://pypi.python.org/pypi > > Please, update 'DEFAULT_REPOSITORY' in Lib/distutils/command/upload.py > (py-2.6 and py3k-struni branches) > > > -- > Lisandro Dalc?n > --------------- > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > Tel/Fax: +54-(0)342-451.1594 > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Wed Aug 1 07:15:26 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Aug 2007 07:15:26 +0200 Subject: [Python-Dev] Python Package Index hostname change In-Reply-To: References: <46AFA27A.4090700@v.loewis.de> Message-ID: <46B016EE.9030309@v.loewis.de> > Please, update 'DEFAULT_REPOSITORY' in Lib/distutils/command/upload.py > (py-2.6 and py3k-struni branches) I did already, for 2.6, in r56543. For the other branches, this change will propagate through merging. Regards, Martin From martin at v.loewis.de Wed Aug 1 07:16:39 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Aug 2007 07:16:39 +0200 Subject: [Python-Dev] Python Package Index hostname change In-Reply-To: References: <46AFA27A.4090700@v.loewis.de> Message-ID: <46B01737.60608@v.loewis.de> Guido van Rossum schrieb: > And why not in the upcoming 2.5 release as well? It's changed there as well. Regards, Martin From martin at v.loewis.de Wed Aug 1 07:48:00 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Aug 2007 07:48:00 +0200 Subject: [Python-Dev] Cygwin: Problem detecting subprocess termination after _spawn_posix in distutils? In-Reply-To: <46AFDF97.3090909@holdenweb.com> References: <46AFAD41.9060203@v.loewis.de> <46AFDF97.3090909@holdenweb.com> Message-ID: <46B01E90.8090402@v.loewis.de> > The only environment variables that don't appear in the shell output > from the env command are INFOPATH, MAKE_MODE and PLAT. I am still flummoxed. At this point, I'd recommend to perform a cygwin update; with Cygwin, these problems often go away with an update. If that doesn't help, you can ask on the Cygwin list also; to analyse this further, ISTM one will need to debug the internals of cygwin. One thing you could try is to add -v to the list of gcc options; you can then see whether gcc is progressing correctly. Regards, Martin From steve at holdenweb.com Wed Aug 1 12:49:56 2007 From: steve at holdenweb.com (Steve Holden) Date: Wed, 01 Aug 2007 06:49:56 -0400 Subject: [Python-Dev] Cygwin: Problem detecting subprocess termination after _spawn_posix in distutils? In-Reply-To: <46B01E90.8090402@v.loewis.de> References: <46AFAD41.9060203@v.loewis.de> <46AFDF97.3090909@holdenweb.com> <46B01E90.8090402@v.loewis.de> Message-ID: <46B06554.6020501@holdenweb.com> Martin v. L?wis wrote: >> The only environment variables that don't appear in the shell output >> from the env command are INFOPATH, MAKE_MODE and PLAT. I am still flummoxed. > > At this point, I'd recommend to perform a cygwin update; with Cygwin, > these problems often go away with an update. > I updated Cygwin and did a rebaseall before posting. > If that doesn't help, you can ask on the Cygwin list also; to analyse > this further, ISTM one will need to debug the internals of cygwin. > I posted on Cygwin before asking here. > One thing you could try is to add -v to the list of gcc options; > you can then see whether gcc is progressing correctly. > I'll do that, though I have reason to believe the gcc *is* terminating and _spawn_posix isn't detecting the end of the process. At the very least we should get another test out of this dreadfully irritating bug. Thanks again for looking at this. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden --------------- Asciimercial ------------------ Get on the web: Blog, lens and tag the Internet Many services currently offer free registration ----------- Thank You for Reading ------------- From dalcinl at gmail.com Wed Aug 1 17:17:46 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Wed, 1 Aug 2007 12:17:46 -0300 Subject: [Python-Dev] Python Package Index hostname change In-Reply-To: <46B01737.60608@v.loewis.de> References: <46AFA27A.4090700@v.loewis.de> <46B01737.60608@v.loewis.de> Message-ID: Martin, could you please verify if this change did not break the download counter? -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From martin at v.loewis.de Wed Aug 1 21:45:21 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 01 Aug 2007 21:45:21 +0200 Subject: [Python-Dev] Python Package Index hostname change In-Reply-To: References: <46AFA27A.4090700@v.loewis.de> <46B01737.60608@v.loewis.de> Message-ID: <46B0E2D1.5010301@v.loewis.de> > Martin, could you please verify if this change did not break the > download counter? Do you have reason to believe that it did? It should not have. Regards, Martin From facundobatista at gmail.com Thu Aug 2 20:11:47 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 2 Aug 2007 15:11:47 -0300 Subject: [Python-Dev] NotImplemented comparisons Message-ID: People: Pablo Hoffman opened this bug: "[1764761] Decimal comparison with None fails in Windows". It's not a Decimal problem, see the differente behaviour of this basic test in Linux and Windows: Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2 >>> class C(object): ... def __cmp__(self, other): ... return NotImplemented ... >>> c = C() >>> print c < None False >>> print NotImplemented < None False Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 >>> class C(object): def __cmp__(self, other): return NotImplemented >>> c = C() >>> print c < None True >>> print NotImplemented < None False Here's where I stop: don't know where to keep looking... Does somebody know why is a difference here? Furthermore, we can check that is a problem regarding __cmp__: >>> class C(object): def __cmp__(self, other): return NotImplemented def m(self): return NotImplemented >>> c = C() >>> print c < None True >>> print c.m() < None False This is not the first time I find an issue through Decimal regarding NotImplemented, there was this thread: http://mail.python.org/pipermail/python-dev/2005-December/059046.html , but I don't know if that's a separate issue or not. Thanks for your help! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From guido at python.org Thu Aug 2 20:35:44 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 11:35:44 -0700 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: NonImplemented isn't treated as special when returned by __cmp__(); __cmp__ is not considered a binary operator like __add__. (__lt__ and friends *do* get treated as such -- but instead of __rlt__ we use __gt__, etc.) --Guido On 8/2/07, Facundo Batista wrote: > People: > > Pablo Hoffman opened this bug: "[1764761] Decimal comparison with None > fails in Windows". > > It's not a Decimal problem, see the differente behaviour of this basic > test in Linux and Windows: > > Python 2.5.1 (r251:54863, May 2 2007, 16:56:35) [GCC 4.1.2 (Ubuntu > 4.1.2-0ubuntu4)] on linux2 > >>> class C(object): > ... def __cmp__(self, other): > ... return NotImplemented > ... > >>> c = C() > >>> print c < None > False > >>> print NotImplemented < None > False > > > Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit > (Intel)] on win32 > >>> class C(object): > def __cmp__(self, other): > return NotImplemented > > >>> c = C() > >>> print c < None > True > >>> print NotImplemented < None > False > > > Here's where I stop: don't know where to keep looking... Does somebody > know why is a difference here? > > Furthermore, we can check that is a problem regarding __cmp__: > > >>> class C(object): > def __cmp__(self, other): > return NotImplemented > def m(self): > return NotImplemented > > >>> c = C() > >>> print c < None > True > >>> print c.m() < None > False > > > This is not the first time I find an issue through Decimal regarding > NotImplemented, there was this thread: > > http://mail.python.org/pipermail/python-dev/2005-December/059046.html > > , but I don't know if that's a separate issue or not. > > Thanks for your help! > > -- > . Facundo > > Blog: http://www.taniquetil.com.ar/plog/ > PyAr: http://www.python.org/ar/ > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From facundobatista at gmail.com Thu Aug 2 21:14:37 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 2 Aug 2007 16:14:37 -0300 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: 2007/8/2, Guido van Rossum : > NonImplemented isn't treated as special when returned by __cmp__(); > __cmp__ is not considered a binary operator like __add__. (__lt__ and > friends *do* get treated as such -- but instead of __rlt__ we use > __gt__, etc.) I understand that is tricky how NotImplemented and comparisons interact. But how do you explain the difference in behaviour between Linux and Windows? Thanks! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From p.f.moore at gmail.com Thu Aug 2 21:45:00 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 2 Aug 2007 20:45:00 +0100 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: <79990c6b0708021245t3ed790fdwf12f5bca227a0b5d@mail.gmail.com> On 02/08/07, Facundo Batista wrote: > I understand that is tricky how NotImplemented and comparisons interact. > > But how do you explain the difference in behaviour between Linux and Windows? A wild guess: c < None falls back to checking c.__cmp__(None) < 0. This translates to NotImplemented < 0, and as the ordering of built in types is implementation dependent, maybe that explains the difference between Windows and Linux? Paul. From g.brandl at gmx.net Thu Aug 2 22:06:14 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 02 Aug 2007 22:06:14 +0200 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > NonImplemented isn't treated as special when returned by __cmp__(); > __cmp__ is not considered a binary operator like __add__. (__lt__ and > friends *do* get treated as such -- but instead of __rlt__ we use > __gt__, etc.) But if it's not treated as special, why doesn't the comparison raise an exception, like when __cmp__ returns "foo", for example? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From tjreedy at udel.edu Thu Aug 2 22:11:35 2007 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 2 Aug 2007 16:11:35 -0400 Subject: [Python-Dev] NotImplemented comparisons References: Message-ID: "Facundo Batista" wrote in message news:e04bdf310708021111g2870662bo5c6fdb3c1c68a9c2 at mail.gmail.com... | >>> class C(object): | ... def __cmp__(self, other): | ... return NotImplemented | ... Given that you 'should' return an int, doing elsewise has undefined results. | >>> c = C() | >>> print c < None I presume that this translates into c.__compare(None) < 0 which becomes NotImplemented < 0. The result of that is undefined and interpreter dependent. | >>> print NotImplemented < None As is this. There is no reason to expect the two comparisons (NotImplemented to 0 and None) to give the same or different results. | Does somebody know why is a difference here? Different interpreters, different arbitrary results. I believe checking the ids of the right objects (the type objects, I have read) would explain. | Furthermore, we can check that is a problem regarding __cmp__: | | >>> class C(object): | def __cmp__(self, other): | return NotImplemented | def m(self): | return NotImplemented | | >>> c = C() | >>> print c < None | True | >>> print c.m() < None | False This is still NotImplemented < 0 versus NotImplemented < None. As I understand, such nonsense comparisions will raise exceptions in 3.0. tjr From g.brandl at gmx.net Thu Aug 2 22:15:46 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 02 Aug 2007 22:15:46 +0200 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: Facundo Batista schrieb: > 2007/8/2, Guido van Rossum : > >> NonImplemented isn't treated as special when returned by __cmp__(); >> __cmp__ is not considered a binary operator like __add__. (__lt__ and >> friends *do* get treated as such -- but instead of __rlt__ we use >> __gt__, etc.) > > I understand that is tricky how NotImplemented and comparisons interact. > > But how do you explain the difference in behaviour between Linux and Windows? I now investigated that, and it seems that if you return NotImplemented from a __cmp__() function, and the other's __cmp__() isn't helpful either, you end up comparing the addresses of the objects (in your case c and None) -- the outcome of which is not consistent across machines or sessions. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From facundobatista at gmail.com Thu Aug 2 22:17:18 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 2 Aug 2007 17:17:18 -0300 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: 2007/8/2, Paul Moore : > A wild guess: c < None falls back to checking c.__cmp__(None) < 0. > This translates to NotImplemented < 0, and as the ordering of built in > types is implementation dependent, maybe that explains the difference > between Windows and Linux? "NotImplemented < 0" returns False, which is ok, but different from "c < None" 2007/8/2, Guido van Rossum : > > But how do you explain the difference in behaviour between Linux and Windows? > > Perhaps the comparison compares the objects' address. No, because NotImplemented and None are always the same: if this is the problem Linux and Windows could be different but they would be consistent with themselves (and Windows is not coherent with itself). Bottom line: I can easily fix Decimal to handle this special case, the point is that maybe we have a lower level bug here... Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From g.brandl at gmx.net Thu Aug 2 23:25:27 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 02 Aug 2007 23:25:27 +0200 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: Terry Reedy schrieb: > "Facundo Batista" wrote in message > news:e04bdf310708021111g2870662bo5c6fdb3c1c68a9c2 at mail.gmail.com... > | >>> class C(object): > | ... def __cmp__(self, other): > | ... return NotImplemented > | ... > > Given that you 'should' return an int, doing elsewise has undefined > results. Returning anything other than an int or NotImplemented raises an exception. NotImplemented seems to be special cased so that the other object's __cmp__ can be tried too. > | >>> c = C() > | >>> print c < None > > I presume that this translates into c.__compare(None) < 0 which becomes > NotImplemented < 0. The result of that is undefined and interpreter > dependent. No, it becomes id(c) < id(None). See half_compare in Objects/typeobject.c. > This is still NotImplemented < 0 versus NotImplemented < None. As I > understand, such nonsense comparisions will raise exceptions in 3.0. Yes, fortunately. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From facundobatista at gmail.com Fri Aug 3 00:15:13 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 2 Aug 2007 19:15:13 -0300 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: 2007/8/2, Terry Reedy : > Given that you 'should' return an int, doing elsewise has undefined > results. I'll fix decimal to always return sane values from __cmp__, :) Thank you all! Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From guido at python.org Fri Aug 3 00:27:18 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 2 Aug 2007 15:27:18 -0700 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: On 8/2/07, Georg Brandl wrote: > Returning anything other than an int or NotImplemented raises an exception. > NotImplemented seems to be special cased so that the other object's > __cmp__ can be tried too. Oops, sorry for the misinformation. :-( -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nnorwitz at gmail.com Fri Aug 3 08:07:29 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Thu, 2 Aug 2007 23:07:29 -0700 Subject: [Python-Dev] T_PYSSIZET in Include/structmember.h can be hidden Message-ID: Martin, Do you know why T_PYSSIZET is inside a #ifdef HAVE_LONG_LONG? That seems like a mistake. Here's the code: #ifdef HAVE_LONG_LONG #define T_LONGLONG 17 #define T_ULONGLONG 18 #define T_PYSSIZET 19 /* Py_ssize_t */ #endif /* HAVE_LONG_LONG */ ISTM, that T_PYSSIZET should be after the #endif. Was this a mistake or intentional? n From walter at livinglogic.de Fri Aug 3 08:34:46 2007 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Fri, 03 Aug 2007 08:34:46 +0200 Subject: [Python-Dev] T_PYSSIZET in Include/structmember.h can be hidden In-Reply-To: References: Message-ID: <46B2CC86.1020802@livinglogic.de> Neal Norwitz wrote: > Martin, > > Do you know why T_PYSSIZET is inside a #ifdef HAVE_LONG_LONG? That > seems like a mistake. Here's the code: > > #ifdef HAVE_LONG_LONG > #define T_LONGLONG 17 > #define T_ULONGLONG 18 > #define T_PYSSIZET 19 /* Py_ssize_t */ > #endif /* HAVE_LONG_LONG */ > > ISTM, that T_PYSSIZET should be after the #endif. Was this a mistake > or intentional? That was my mistake. Iy should be outside of the #ifdef. Servus, Walter From g.brandl at gmx.net Fri Aug 3 10:23:55 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 03 Aug 2007 10:23:55 +0200 Subject: [Python-Dev] make iter() return an empty iterator? Message-ID: Sure, you could use ``iter(())`` or ``iter([])``, but for consistency's sake wouldn't it make sense for ``iter()`` to return an empty iterator, as ``str()`` returns an empty string etc.? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From andrew-pythondev at puzzling.org Fri Aug 3 11:10:20 2007 From: andrew-pythondev at puzzling.org (Andrew Bennetts) Date: Fri, 3 Aug 2007 19:10:20 +1000 Subject: [Python-Dev] make iter() return an empty iterator? In-Reply-To: References: Message-ID: <20070803091020.GA25389@steerpike.home.puzzling.org> Georg Brandl wrote: > Sure, you could use ``iter(())`` or ``iter([])``, but for consistency's sake > wouldn't it make sense for ``iter()`` to return an empty iterator, as ``str()`` > returns an empty string etc.? I had no idea that "str()" or "int()" would do that. "file()" certainly doesn't! :) I don't really think there's much reason to make "iter()" work. As you say, "iter([])" works just fine. For those rare times you want an empty iterator, I don't think the two extra characters is much of a price to pay. -Andrew. From rrr at ronadam.com Fri Aug 3 12:11:25 2007 From: rrr at ronadam.com (Ron Adam) Date: Fri, 03 Aug 2007 05:11:25 -0500 Subject: [Python-Dev] make iter() return an empty iterator? In-Reply-To: References: Message-ID: <46B2FF4D.8060207@ronadam.com> Georg Brandl wrote: > Sure, you could use ``iter(())`` or ``iter([])``, but for consistency's sake > wouldn't it make sense for ``iter()`` to return an empty iterator, as ``str()`` > returns an empty string etc.? > > Georg There is a difference. >>> type(iter) >>> type(str) >>> type(int) >>> type(list) Cheers, Ron From jon+python-dev at unequivocal.co.uk Fri Aug 3 14:29:47 2007 From: jon+python-dev at unequivocal.co.uk (Jon Ribbens) Date: Fri, 3 Aug 2007 13:29:47 +0100 Subject: [Python-Dev] Pythreads and BSD descendants In-Reply-To: <20070726160837.GA24583@lairds.us> References: <20070726160837.GA24583@lairds.us> Message-ID: <20070803122947.GM11696@snowy.squish.net> On Thu, Jul 26, 2007 at 04:08:37PM +0000, Cameron Laird wrote: > Folklore that I remember so unreliably I avoid trying to repeat it here > held that Python threading had problems on BSD and allied Unixes. What's > the status of this? I suspect the answer is, "Everything works, and the > only real problem ever was that *signals* have different semantics under > Linux and *BSD." Anyone who can answer explicitly, though, would repre- > sent a help to me. This is just my personal opinion, but I suspect that this is perhaps because people have *tried* threading more in Python than in many other languages, because Python makes it particularly easy. I've certainly had the experience that multithreaded stuff I have tried has sometimes had problems under various OSes (Linux, Solaris, OpenBSD, etc) due to operating system bugs with threading in general rather than Python problems per se. From facundobatista at gmail.com Fri Aug 3 15:01:47 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Fri, 3 Aug 2007 10:01:47 -0300 Subject: [Python-Dev] make iter() return an empty iterator? In-Reply-To: References: <20070803091020.GA25389@steerpike.home.puzzling.org> Message-ID: 2007/8/3, Andrew Bennetts : > I don't really think there's much reason to make "iter()" work. As you say, What bad thing could happen if we make iter() work? If nothing, we should ask ourselves: which is the more intuitive behaviour to expect of iter()? To raise an exception or to return an empty iterator? I'm +0 for the latter. -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From facundobatista at gmail.com Fri Aug 3 15:04:24 2007 From: facundobatista at gmail.com (Facundo Batista) Date: Fri, 3 Aug 2007 10:04:24 -0300 Subject: [Python-Dev] NotImplemented comparisons In-Reply-To: References: Message-ID: 2007/8/2, Facundo Batista : > > Given that you 'should' return an int, doing elsewise has undefined > > results. > > I'll fix decimal to always return sane values from __cmp__, :) Done, thanks again everybody! -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From bioinformed at gmail.com Fri Aug 3 15:14:20 2007 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Fri, 3 Aug 2007 09:14:20 -0400 Subject: [Python-Dev] make iter() return an empty iterator? In-Reply-To: References: <20070803091020.GA25389@steerpike.home.puzzling.org> Message-ID: <2e1434c10708030614t5a188a81w7f338483f151ae96@mail.gmail.com> On 8/3/07, Facundo Batista wrote: > > 2007/8/3, Andrew Bennetts : > > > I don't really think there's much reason to make "iter()" work. As you > say, > > What bad thing could happen if we make iter() work? If nothing, we > should ask ourselves: which is the more intuitive behaviour to expect > of iter()? To raise an exception or to return an empty iterator? > > I'm +0 for the latter. > -1. I'm a heavy user of iterators on finite and infinite streams and, for me, iter() is an error that I do not want to paper over. The alternate logic implies, e.g., len() should return 0. -Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20070803/40c517a6/attachment.htm From steve at holdenweb.com Fri Aug 3 15:33:25 2007 From: steve at holdenweb.com (Steve Holden) Date: Fri, 03 Aug 2007 09:33:25 -0400 Subject: [Python-Dev] make iter() return an empty iterator? In-Reply-To: <2e1434c10708030614t5a188a81w7f338483f151ae96@mail.gmail.com> References: <20070803091020.GA25389@steerpike.home.puzzling.org> <2e1434c10708030614t5a188a81w7f338483f151ae96@mail.gmail.com> Message-ID: Kevin Jacobs wrote: > On 8/3/07, *Facundo Batista* > wrote: > > 2007/8/3, Andrew Bennetts

>: > > > I don't really think there's much reason to make "iter()" > work. As you say, > > What bad thing could happen if we make iter() work? If nothing, we > should ask ourselves: which is the more intuitive behaviour to expect > of iter()? To raise an exception or to return an empty iterator? > > I'm +0 for the latter. > > > -1. I'm a heavy user of iterators on finite and infinite streams and, > for me, iter() is an error that I do not want to paper over. The > alternate logic implies, e.g ., len() should return 0. > -1 here too. iter() should have an argument just like sum() and len(). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://del.icio.us/steve.holden --------------- Asciimercial ------------------ Get on the web: Blog, lens and tag the Internet Many services currently offer free registration ----------- Thank You for Reading ------------- From pc at gafol.net Fri Aug 3 17:21:43 2007 From: pc at gafol.net (Paul Colomiets) Date: Fri, 03 Aug 2007 18:21:43 +0300 Subject: [Python-Dev] Pythreads and BSD descendants In-Reply-To: <20070726160837.GA24583@lairds.us> References: <20070726160837.GA24583@lairds.us> Message-ID: <46B34807.9020307@gafol.net> Cameron Laird wrote: > Folklore that I remember so unreliably I avoid trying to repeat it here > held that Python threading had problems on BSD and allied Unixes. What's > the status of this? I suspect the answer is, "Everything works, and the > only real problem ever was that *signals* have different semantics under > Linux and *BSD." Anyone who can answer explicitly, though, would repre- > sent a help to me. > I use Python with multithreading applications on FreeBSD for several years, and really single problem I've discovered is that default stack size for new threads is small for the default recursion limit. It can be easily fixed in Python 2.5. Apart from that everything works OK for me. From guido at python.org Fri Aug 3 19:07:24 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 3 Aug 2007 10:07:24 -0700 Subject: [Python-Dev] make iter() return an empty iterator? In-Reply-To: References: <20070803091020.GA25389@steerpike.home.puzzling.org> <2e1434c10708030614t5a188a81w7f338483f151ae96@mail.gmail.com> Message-ID: On 8/3/07, Steve Holden wrote: > Kevin Jacobs wrote: > > On 8/3/07, *Facundo Batista* > > wrote: > > > > 2007/8/3, Andrew Bennetts > >: > > > > > I don't really think there's much reason to make "iter()" > > work. As you say, > > > > What bad thing could happen if we make iter() work? If nothing, we > > should ask ourselves: which is the more intuitive behaviour to expect > > of iter()? To raise an exception or to return an empty iterator? > > > > I'm +0 for the latter. > > > > > > -1. I'm a heavy user of iterators on finite and infinite streams and, > > for me, iter() is an error that I do not want to paper over. The > > alternate logic implies, e.g ., len() should return 0. > > > -1 here too. iter() should have an argument just like sum() and len(). Amen. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alex.neundorf at kitware.com Fri Aug 3 19:34:42 2007 From: alex.neundorf at kitware.com (Alexander Neundorf) Date: Fri, 3 Aug 2007 13:34:42 -0400 Subject: [Python-Dev] Building Python with CMake In-Reply-To: References: <200707131359.17030.alex.neundorf@kitware.com> Message-ID: <200708031334.42813.alex.neundorf@kitware.com> On Friday 13 July 2007 16:11, Giovanni Bajo wrote: ... > Because it would be a single unified build system instead of having two > build systems like we have one (UNIX and Windows). > > Also, it would be much easier to maintain because Visual Studio projects > are generated from a simple description, while right now if you want to > change something you need to go through the hassle of defining it within > the Visual Studio GUI. > > Consider for instance if you want to change the Windows build so that a > builtin module is compiled as an external .pyd instead. Right now, you need > to go through the hassle of manually defining a new project, setting all > the include/libraries dependencies correctly, ecc. ecc. With CMake or a > similar tool, it would be a matter of a couple of textual line changes. > > [ I'll also remember that "ease of maintanance for developers" is the #1 > reason for having a 2.1Mb python25.dll under Windows, which I would really > love to reduce. ] I thought I'll keep you updated, so: attached you can find the current cmake files I use for Python 2.5.1. They work for eCos, Linux, BlueGene and Windows (which doesn't mean everything is supported or installed, but they create working python interpreters and libs and install the most required files). Compared to the first version they now contain more install rules, the platform path is not hardcoded anymore and it contains a basic setup for creating source and binary packages (tgz, Debian, Nullsoft installer, PackageMaker depending on your cmake version). Bye Alex -------------- next part -------------- A non-text attachment was scrubbed... Name: Python-2.5.1-cmakefiles.tar.gz Type: application/x-tgz Size: 16101 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20070803/3e04c813/attachment.bin From martin at v.loewis.de Sat Aug 4 00:46:39 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Aug 2007 00:46:39 +0200 Subject: [Python-Dev] Pythreads and BSD descendants In-Reply-To: <20070726160837.GA24583@lairds.us> References: <20070726160837.GA24583@lairds.us> Message-ID: <46B3B04F.7060100@v.loewis.de> Cameron Laird schrieb: > Folklore that I remember so unreliably I avoid trying to repeat it here > held that Python threading had problems on BSD and allied Unixes. What's > the status of this? The problem that people run into again and again is the stack size. The BSDs allow for so little stack so that even the quite conservative estimates of Python as to how many recursions you can have are incorrect, and you get an interpreter crash rather than a RuntimeError (as you should). Furthermore, every time we decrease the that number, the next system release somehow manages to make the limit even smaller. This was never properly analyzed; I suspect that the stack usage of Python increases, either due to compiler changes or due to change to Python itself. Another annoyance is the ongoing battle with Posix; the BSDs have not been very accepting towards Posix for many years. This resulted in an interpretation of Posix where defining _XOPEN_SOURCE hides many system interfaces, resulting in these system interfaces either not being present, or compilation to fail. I consider this a bug in the system: compilation should *never* fail if you define _XOPEN_SOURCE, and additional interfaces should be available if requested (that requires a way to request them). The work-around was to not define _XOPEN_SOURCE for those buggy system releases, hoping that the next release would fix the bug. Over the years, the maintainers of these systems seem to have come to a better understanding, so they offer various custom _SOURCE macros (_NETBSD_SOURCE, __BSD_VISIBLE). The latest addition here was OpenBSD, which now supports _BSD_SOURCE (apparently following a tradition set by GNU libc, and perhaps others). So I hope this is fixed for good now (except that FreeBSD may decide to break __BSD_VISIBLE the same way it got "broken" in OpenBSD, so we need to add their "official" feature selection macro once we find out what that is). The same problem exists of course on many other systems, but those solved the problem long ago (e.g. _GNU_SOURCE - glibc, _BSD_TYPES - Irix) Regards, Martin From dalcinl at gmail.com Sat Aug 4 01:44:23 2007 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Fri, 3 Aug 2007 20:44:23 -0300 Subject: [Python-Dev] [Python-3000] optimizing [x]range In-Reply-To: References: <1d85506f0707280806n1764151cx4961a0573dda435e@mail.gmail.com> Message-ID: On 8/2/07, Stargaming wrote: > >> made into an O(1) operation. here's a demo code (it should be trivial > >> to implement this in CPython) > [snipped algorithm] Did you taked into account that your patch is not backward compatible with py2.5?? Just try to do this with your patch, $ python Python 2.5.1 (r251:54863, Jun 1 2007, 12:15:26) >>> class A: ... def __eq__(self, other): ... return other == 3 ... >>> A() in xrange(3) False >>> A() in xrange(4) True >>> I know, my example is biased, but I have to insist. With this patch, 'a in xrange' will in general not be the same as 'a in range(...)'. I am fine with this for py3k, but not sure if all people will agree on this for python 2.6. -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From kbk at shore.net Sat Aug 4 07:26:32 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Sat, 4 Aug 2007 01:26:32 -0400 (EDT) Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200708040526.l745QWEW014541@hampton.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 404 open ( +5) / 3847 closed (+11) / 4251 total (+16) Bugs : 1059 open ( +3) / 6784 closed ( +8) / 7843 total (+11) RFE : 263 open ( +0) / 295 closed ( +1) / 558 total ( +1) New / Reopened Patches ______________________ struni: Fix test_aepack by converting 4cc's to bytes (2007-07-26) CLOSED http://python.org/sf/1761465 reopened by gvanrossum distutils.util.get_platform() return value on 64bit Windows (2007-07-27) http://python.org/sf/1761786 opened by Mark Hammond Some fix abount _WIN32_WINNT (2007-07-27) CLOSED http://python.org/sf/1761803 opened by Hirokazu Yamamoto struni: test_xml_etree.py (2007-07-27) CLOSED http://python.org/sf/1762412 opened by Joe Gregorio unable to serialize Infinity or NaN on ARM using marshal (2007-07-28) http://python.org/sf/1762561 opened by Matthias Klose struni: test_urllib2, test_cookielib (2007-07-29) CLOSED http://python.org/sf/1762940 opened by Joe Gregorio socket close fixed (2007-07-29) CLOSED http://python.org/sf/1763387 opened by Hasan Diwan tiny addition to peephole optimizer (2007-07-31) http://python.org/sf/1764087 opened by Paul Pogonyshev Fix for test_socketserver for Py3k (2007-07-31) CLOSED http://python.org/sf/1764815 opened by ??PC?? generic and more efficient removal of unreachable code (2007-08-01) http://python.org/sf/1764986 opened by Paul Pogonyshev logging: delay_fh option and configuration kwargs (2007-08-01) http://python.org/sf/1765140 opened by Chris Leary small improvement for peephole conditional jump optimizer (2007-08-01) http://python.org/sf/1765558 opened by Paul Pogonyshev urllib2-howto - correction (2007-08-02) http://python.org/sf/1765839 opened by O.R.Senthil Kumaran improve xrange.__contains__ (2007-08-02) http://python.org/sf/1766304 opened by Stargaming Fix for test_zipimport (2007-08-03) CLOSED http://python.org/sf/1766592 opened by ??PC?? Make xmlrpc use HTTP/1.1 and keepalive (2007-08-04) http://python.org/sf/1767370 opened by Donovan Baarda test_csv struni fixes + unicode support in _csv (2007-08-03) http://python.org/sf/1767398 opened by Adam Hupp Patches Closed ______________ struni: Fix test_aepack by converting 4cc's to bytes (2007-07-26) http://python.org/sf/1761465 closed by gvanrossum struni: Fix test_aepack by converting 4cc's to bytes (2007-07-26) http://python.org/sf/1761465 closed by gvanrossum Some fix abount _WIN32_WINNT (2007-07-27) http://python.org/sf/1761803 closed by mhammond struni pulldom: Don't use 'types' to check strings (2007-07-24) http://python.org/sf/1759922 closed by gvanrossum struni: test_xml_etree.py (2007-07-27) http://python.org/sf/1762412 closed by loewis struni: test_urllib2, test_cookielib (2007-07-28) http://python.org/sf/1762940 closed by gvanrossum socket close fixed (2007-07-30) http://python.org/sf/1763387 closed by facundobatista itertools.getitem() (2007-07-08) http://python.org/sf/1749857 closed by rhettinger Fix for test_socketserver for Py3k (2007-07-31) http://python.org/sf/1764815 closed by gvanrossum Fix Decimal.sqrt bugs described in #1725899 (2007-06-22) http://python.org/sf/1741308 closed by facundobatista Fix for test_zipimport (2007-08-03) http://python.org/sf/1766592 closed by gvanrossum urllib2 tests pass (2007-07-16) http://python.org/sf/1755133 closed by gvanrossum New / Reopened Bugs ___________________ 'exec' does not accept what 'open' returns (2007-07-28) http://python.org/sf/1762972 opened by Brett Cannon S.find documentation uses s[start, end] vs. s[start:end] (2007-07-29) CLOSED http://python.org/sf/1763149 opened by Rob copy 2 (2007-07-30) http://python.org/sf/1764044 opened by robs pythonid _RLock.__repr__ throws exception (2007-07-30) CLOSED http://python.org/sf/1764059 opened by Greg Kochanski The -m switch does not use the builtin __main__ module (2007-07-31) http://python.org/sf/1764407 opened by Nick Coghlan Decimal comparison with None fails in Windows (2007-07-31) CLOSED http://python.org/sf/1764761 opened by pablohoffman.com setup.py trashes LDFLAGS (2007-08-01) http://python.org/sf/1765375 opened by Harald Koenig poll() returns "status code", not "return code" (2007-08-02) http://python.org/sf/1766421 opened by sjbrown os.chmod failure (2007-08-03) http://python.org/sf/1767242 reopened by rgheck os.chmod failure (2007-08-03) http://python.org/sf/1767242 opened by Richard Heck String.capwords() does not capitalize first word (2007-08-03) http://python.org/sf/1767363 opened by Saatvik Agarwal Bugs Closed ___________ No docs for list comparison (2007-07-25) http://python.org/sf/1760423 closed by gbrandl SSL-ed sockets don't close correct? (2004-06-24) http://python.org/sf/978833 closed by loewis incorrect return value of unicodedata.lookup() - beoynd BMP (2007-04-21) http://python.org/sf/1704793 closed by loewis Thread.join() fails to release Lock on KeyboardInterrupt (2005-03-26) http://python.org/sf/1171023 closed by phansen S.find documentation uses s[start, end] vs. s[start:end] (2007-07-29) http://python.org/sf/1763149 closed by gbrandl _RLock.__repr__ throws exception (2007-07-31) http://python.org/sf/1764059 closed by ncoghlan Decimal comparison with None fails in Windows (2007-07-31) http://python.org/sf/1764761 closed by facundobatista decimal sqrt method doesn't use round-half-even (2007-05-25) http://python.org/sf/1725899 closed by facundobatista New / Reopened RFE __________________ add new bytecodes: JUMP_IF_{FALSE|TRUE}_AND_POP (2007-07-31) http://python.org/sf/1764638 opened by Paul Pogonyshev RFE Closed __________ add operator.fst and snd functions (2007-05-27) http://python.org/sf/1726697 closed by rhettinger From andymac at bullseye.apana.org.au Sat Aug 4 11:20:45 2007 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Sat, 04 Aug 2007 20:20:45 +1100 Subject: [Python-Dev] Pythreads and BSD descendants In-Reply-To: <46B3B04F.7060100@v.loewis.de> References: <20070726160837.GA24583@lairds.us> <46B3B04F.7060100@v.loewis.de> Message-ID: <46B444ED.8090802@bullseye.andymac.org> Martin v. L?wis wrote: > Cameron Laird schrieb: >> Folklore that I remember so unreliably I avoid trying to repeat it here >> held that Python threading had problems on BSD and allied Unixes. What's >> the status of this? > > The problem that people run into again and again is the stack size. The > BSDs allow for so little stack so that even the quite conservative > estimates of Python as to how many recursions you can have are > incorrect, and you get an interpreter crash rather than a RuntimeError > (as you should). There are 2 aspects to the thread stack size issue: - the stack size for the primary thread; - the stack size for created threads. I haven't done any investigating for FreeBSD 6.x and later, but I know that FreeBSD 4.x had a hard coded stack size of 1MB for the primary thread in a thread enabled application, which is what Martin's comment above particularly applies to. This affects code that doesn't use threads at all, and was particularly painful with REs prior to SRE being made non-recursive. If you build the interpreter without thread support, stack space is instead controlled by session limits which are usually generous (typically 64MB). I don't recall exactly FreeBSD's default stack size for threads created via pthread_create() but it is fairly small (32kB or 64kB comes to mind). Zope is one application known to be affected by this lack of stack size in created threads. At least the stack size for new threads can be adjusted at runtime, and a mechanism for doing this was added to Python 2.5. > Furthermore, every time we decrease the that number, the next system > release somehow manages to make the limit even smaller. This was > never properly analyzed; I suspect that the stack usage of Python > increases, either due to compiler changes or due to change to Python > itself. I have seen examples of stack consumption increasing with increasing gcc version number, sometimes depending on optimisation choices. Regards, Andrew. -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac at pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From jerker.back at telia.com Sat Aug 4 16:54:33 2007 From: jerker.back at telia.com (=?iso-8859-1?Q?Jerker_B=E4ck?=) Date: Sat, 4 Aug 2007 16:54:33 +0200 Subject: [Python-Dev] x86_64 Interix - Advise needed on size of long Message-ID: <000801c7d6a7$5ec0e1d0$1c42a570$@back@telia.com> Hello all, I'm in need of an advise how to handle sizeof long in python. I wanted a x86_64 compile of python for Interix (that is NT POSIX subsystem with x86_64 Interix 6 SDK). My first attempt to build failed due to the makefile insisted on linking as shared libraries (works only in x86 with GNU ld). Tried autoreconf to get rid of libtool - no luck. Q1: Is the static build broken? Q2: Anyone have a useable Makefile.am? My second attempt was based on the VS2005 project and the previous Makefile. Not to tire you with details, but for this to work I need to explicit assign the sizeof long (replace all long types with explicit sized ones, int32_t, ssize_t etc). There are 2 choices: All longs to 64bit (LP64 model) or all to 32bit (LLP64 model). Since Interix use LP64 the first alternative would be logic, but considering compatibility with the Windows DLL, performance(?) and whatever, I choosed the latter. A choice which later would turn me into trouble. Here's how I am reasoning: x64 Windows DLL = LLP64 model => sizeof(long) = 4 x86_64 Interix = LP64 model => sizeof(long) = 8 So, since the Windows build works, basically all long types in the code are 32bit (or at least works if they are 32bit). 64bit dependent variables like pointers have already been taken care of. Right? While it sounds reasonable as long as one are consistent, it's actually quite difficult to get it right (and a lot of work). To be precise, would this be OK? long PyInt_AsLong(PyObject *); change to: int32_t PyInt_AsLong(PyObject *); or unsigned long PyOS_strtoul(char*, char**, int); to: uint32_t PyOS_strtoul(char*, char**, int); Thanks, Erik From martin at v.loewis.de Sat Aug 4 22:33:14 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 04 Aug 2007 22:33:14 +0200 Subject: [Python-Dev] x86_64 Interix - Advise needed on size of long In-Reply-To: <000801c7d6a7$5ec0e1d0$1c42a570$@back@telia.com> References: <000801c7d6a7$5ec0e1d0$1c42a570$@back@telia.com> Message-ID: <46B4E28A.6090201@v.loewis.de> > My first attempt to build failed due to the makefile insisted on linking as > shared libraries (works only in x86 with GNU ld). Tried autoreconf to get > rid of libtool - no luck. > Q1: Is the static build broken? > Q2: Anyone have a useable Makefile.am? Are you sure you are talking about Python as released? It uses neither automake nor libtool (IMO, fortunately so). As for the static vs. shared libpython: On Unix, Python is typically built as a single executable (only linked shared with the system libraries). The challenge is then with extension modules, which are shared libraries. In particular, it is a challenge that those want to find symbols defined in the executable, without being linked with it. So you have three options: 1. If you use a sane binary format (such as ELF), symbol resolution considers symbols defined by the executable for use in shared libraries. This is necessary to support standard C, as you want to be able to redefined malloc(3) in the executable, and then all libraries should use your malloc implementation; it comes handy for Python's extensions. By this definition, Portable Executable (PE) is insane. 2. Don't use extension modules. Edit Modules/Setup to statically link all extension modules into the interpreter binary. 3. Arrange to make the interpreter a shared library (libpythonxy.so), then link all extension modules against it. > There are 2 choices: All longs to 64bit (LP64 model) or all to 32bit (LLP64 > model). Since Interix use LP64 the first alternative would be logic, but > considering compatibility with the Windows DLL, performance(?) and whatever, > I choosed the latter. A choice which later would turn me into trouble. I don't see the compatibility issue. You aren't going to use Win32 extensions in the Interix interpreter, are you? So why care about Win32? > Here's how I am reasoning: > > x64 Windows DLL = LLP64 model => sizeof(long) = 4 > x86_64 Interix = LP64 model => sizeof(long) = 8 I think we agree that the Windows model is insane, also. A good 64-bit platform has sizeof(long)==8. > So, since the Windows build works, basically all long types in the code are > 32bit (or at least works if they are 32bit). Right. However, LP64 also works with Python, and has been for many more years. > 64bit dependent variables like > pointers have already been taken care of. Right? While it sounds reasonable > as long as one are consistent, it's actually quite difficult to get it right > (and a lot of work). > > To be precise, would this be OK? > long PyInt_AsLong(PyObject *); > change to: > int32_t PyInt_AsLong(PyObject *); > or > unsigned long PyOS_strtoul(char*, char**, int); > to: > uint32_t PyOS_strtoul(char*, char**, int); OK in what sense? You making these changes locally? You can make whatever changes you please; this is free software. I can't see *why* you want to make all these changes, but if you so desire... This becoming part of Python? No way. It is intentional that PyInt_AsLong returns long (why else would the function be called this way?), and it is also intentional that the int type has its internal representation as a long. Likewise for strtoul: this function is defined to return long, for whatever definition long has on a platform. Regards, Martin From status at bugs.python.org Sun Aug 5 02:01:00 2007 From: status at bugs.python.org (Tracker) Date: Sun, 5 Aug 2007 00:01:00 +0000 (UTC) Subject: [Python-Dev] Summary of Tracker Issues Message-ID: <20070805000100.3E765781B4@psf.upfronthosting.co.za> ACTIVITY SUMMARY (07/29/07 - 08/05/07) Tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 1276 open ( +0) / 11101 closed ( +0) / 12377 total ( +0) Average duration of open issues: 695 days. Median duration of open issues: 561 days. Open Issues Breakdown open 1274 ( +0) pending 2 ( +0) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20070805/94ba0f84/attachment.html From jerker.back at telia.com Sun Aug 5 14:41:09 2007 From: jerker.back at telia.com (=?iso-8859-1?Q?Jerker_B=E4ck?=) Date: Sun, 5 Aug 2007 14:41:09 +0200 Subject: [Python-Dev] x86_64 Interix - Advise needed on size of long In-Reply-To: <46B4E28A.6090201@v.loewis.de> References: <000801c7d6a7$5ec0e1d0$1c42a570$@back@telia.com> <46B4E28A.6090201@v.loewis.de> Message-ID: <000901c7d75d$e68d78b0$b3a86a10$@back@telia.com> Hello Martin, Thanks very much for answering. > As for the static vs. shared libpython: On Unix, Python is typically > built as a single executable (only linked shared with the system > libraries). The challenge is then with extension modules, which are > shared libraries. In particular, it is a challenge that those want > to find symbols defined in the executable, without being linked with > it. So you have three options: Aha, now it lightens a bit. As I understand, I will need a x86_64 PE GNU ld to get this to work as intended - however, there is no such thing at this moment. > 2. Don't use extension modules. Edit Modules/Setup to statically link > all extension modules into the interpreter binary. This is the way. But how to do that? Shell output: ../configure --disable-shared ... ar cr libpython2.5.a Objects/ ar cr libpython2.5.a Python/ ar cr libpython2.5.a Modules/ ar cr libpython2.5.a Modules/ cc -o python \ Modules/python.o \ libpython2.5.a -lsocket -lm CC='cc' LDSHARED='ld' OPT='-DNDEBUG -O' ./python -E ../setup.py build;; Memory fault (core dumped) make: *** [sharedmods] Error 139 I assume the "Modules/" are the extension modules. To get them statically linked, the functions must be called somewhere. Statically linked = "Builtin modules"? You mean I should list all of these in "Modules/Setup"? FYI I got the "dynload_stub.c" compiled in. BTW, shouldn't "--disable-shared" take care of this? > OK in what sense? You making these changes locally? You can make > whatever changes you please; this is free software. I can't > see *why* you want to make all these changes, but if you so > desire... It's really very simple - I got LP64 libraries (Interix SDK). To get them to work with a LLP64 compiler I need explicit sized types in case of long. FYI: cc is a shell script wrapper of a x64 PE compiler, which in this case is the MS x64 compiler v 14.00.50727.762 - POSIX mode. It automatically translates all longs to long long in case of a 64bit compile. Thus, cc cannot easily be used in e.g. Visual Studio. > This becoming part of Python? No way. It is intentional that > PyInt_AsLong returns long (why else would the function be called > this way?), and it is also intentional that the int type has > its internal representation as a long. Oh, it was never my intention to propose a change to the LLP64 model. And your right: All exports should be according to the LP64 model in case of a POSIX compile. One must follow some rules! But you must admit it's tempting with all these: #if SIZEOF_LONG > 4 < get rid of the 64bit crap > #endif In my case the different paradigms are a real pain. I must take it into account all the time when porting. I can only hope people stop using in favour of explicit sized types or types like size_t, intptr_t etc. I would love to see the damn thing obsolete. Cheers, Erik From martin at v.loewis.de Sun Aug 5 15:33:55 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Aug 2007 15:33:55 +0200 Subject: [Python-Dev] x86_64 Interix - Advise needed on size of long In-Reply-To: <000901c7d75d$e68d78b0$b3a86a10$@back@telia.com> References: <000801c7d6a7$5ec0e1d0$1c42a570$@back@telia.com> <46B4E28A.6090201@v.loewis.de> <000901c7d75d$e68d78b0$b3a86a10$@back@telia.com> Message-ID: <46B5D1C3.2010304@v.loewis.de> >> 2. Don't use extension modules. Edit Modules/Setup to statically link >> all extension modules into the interpreter binary. > This is the way. But how to do that? > > Shell output: > ../configure --disable-shared --disable-shared should be the default, so you don't need to specify it explicitly. It works fine for me on Linux, so if it crashes for you, it is likely an Interix problem. Can you debug this? Dynamic loading of extension modules is automatically detected; it is active if the system is AIX, BeOS, HPUX, Darwin, atheos, or if dlopen(3) has been found. If you explicitly want to disable it, you can set DYNLOADFILE to dynload_stub.o in configure. Not sure whether Interix has dlopen, but even if does, it should do no harm to use it provided there aren't any modules to load. > I assume the "Modules/" are the extension modules. To get > them statically linked, the functions must be called somewhere. Statically > linked = "Builtin modules"? You mean I should list all of these in > "Modules/Setup"? Exactly so. They are already listed - just uncomment them all (with proper command line flags and libraries where necessary). > FYI I got the "dynload_stub.c" compiled in. Ok, so you don't have dynamic loading. > BTW, shouldn't "--disable-shared" take care of this? No, this talks only about libpythonxy.so. > It's really very simple - I got LP64 libraries (Interix SDK). To get them to > work with a LLP64 compiler I need explicit sized types in case of long. I still don't understand. Are you *certain* that these are LP64 libraries? Can you kindly refer to some official document that says Interix uses LP64 on AMD64? And if so, how did Microsoft manage to build them, if their compiler does not support LP64? (I see you kind of answer that below - although I'm unsure what "translate all longs to long long means - you mean literal text replacement?) Methinks you should just activate the LP64 mode of VC 2005, and be done (and no, I don't know how to do that :-) > FYI: cc is a shell script wrapper of a x64 PE compiler, which in this case > is the MS x64 compiler v 14.00.50727.762 - POSIX mode. It automatically > translates all longs to long long in case of a 64bit compile. Thus, cc > cannot easily be used in e.g. Visual Studio. So don't use Visual Studio, then. What's wrong with the Makefile? > Oh, it was never my intention to propose a change to the LLP64 model. And > your right: All exports should be according to the LP64 model in case of a > POSIX compile. One must follow some rules! But you must admit it's tempting > with all these: > #if SIZEOF_LONG > 4 > < get rid of the 64bit crap > > #endif My view would be different - I find it tempting not to use Interix, let alone on AMD64... > In my case the different paradigms are a real pain. I must take it into > account all the time when porting. I can only hope people stop using > in favour of explicit sized types or types like size_t, intptr_t etc. I > would love to see the damn thing obsolete. You mean the long type? I see nothing wrong with it. The real fault here is with Microsoft, who managed to provide a system for which they don't provide a C compiler, but just a hack that looks like a C compiler from remote. Regards, Martin From alan.mcintyre at gmail.com Sun Aug 5 16:02:58 2007 From: alan.mcintyre at gmail.com (Alan McIntyre) Date: Sun, 5 Aug 2007 10:02:58 -0400 Subject: [Python-Dev] [Python-3000] test_asyncore fails intermittently on Darwin In-Reply-To: <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com> References: <2cda2fc90707261505tdd9a0f1t861b5801c37ad11e@mail.gmail.com> <1d36917a0707261618oac94f20l98f464a2ab1edc4e@mail.gmail.com> <2cda2fc90707292338pff060c1i810737dcf6d5df54@mail.gmail.com> <2cda2fc90707292340k7eb11f2w82003e6f705438c3@mail.gmail.com> <46AE943D.1040105@canterbury.ac.nz> <5d44f72f0708041853m1bb0d005h9f1ff77103b9ebbe@mail.gmail.com> Message-ID: <1d36917a0708050702n6b48594bn824bd97ea6622421@mail.gmail.com> On 8/4/07, Jeffrey Yasskin wrote: > Well, regardless of the brokenness of the patch, I do get two > different failures from this test on OSX. The first is caused by > trying to socket.bind() a port that's already been bound recently: > That looks pretty easy to fix. It was fixed in the trunk on July 28 as part of rev 56604, by letting the OS assign the port (binding to port 0). I apologize if everybody was expecting me to fix this in Python 3000; I thought the initial complaint was in reference to 2.6. I'm working on test improvements for 2.6, so I'm sort of fixated on the trunk at the moment. :) I wouldn't mind trying to roll my changes forward into Py3k after GSoC is done if I have the time, though. Alan From g.brandl at gmx.net Sun Aug 5 16:51:25 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 05 Aug 2007 16:51:25 +0200 Subject: [Python-Dev] cStringIO.StringIO() buffer behavior Message-ID: See bugs #1548891 and #1730114. In the former, it was reported that cStringIO works differently from StringIO when handling unicode strings; it used GetReadBuffer which returned the raw internal UCS-2 or UCS-4 encoded string. I changed it to use GetCharBuffer, which converts to a string using the default encoding first. This fix was also in 2.5.1. The latter bug now complains that this excludes things like array.array()s from being used as an argument to cStringIO.StringIO(), which worked before with GetReadBuffer. What's the preferred solution here? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Sun Aug 5 17:00:27 2007 From: guido at python.org (Guido van Rossum) Date: Sun, 5 Aug 2007 08:00:27 -0700 Subject: [Python-Dev] cStringIO.StringIO() buffer behavior In-Reply-To: References: Message-ID: Methinks that this was a fundamental limitation of cStringIO, not a bug. Certainly not something to be "fixed" in a bugfix release. On 8/5/07, Georg Brandl wrote: > See bugs #1548891 and #1730114. > > In the former, it was reported that cStringIO works differently from StringIO > when handling unicode strings; it used GetReadBuffer which returned the raw > internal UCS-2 or UCS-4 encoded string. > > I changed it to use GetCharBuffer, which converts to a string using the > default encoding first. This fix was also in 2.5.1. > > The latter bug now complains that this excludes things like array.array()s > from being used as an argument to cStringIO.StringIO(), which worked before > with GetReadBuffer. > > What's the preferred solution here? > > Georg > > -- > Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. > Four shall be the number of spaces thou shalt indent, and the number of thy > indenting shall be four. Eight shalt thou not indent, nor either indent thou > two, excepting that thou then proceed to four. Tabs are right out. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From jerker.back at telia.com Sun Aug 5 17:45:52 2007 From: jerker.back at telia.com (=?iso-8859-1?Q?Jerker_B=E4ck?=) Date: Sun, 5 Aug 2007 17:45:52 +0200 Subject: [Python-Dev] x86_64 Interix - Advise needed on size of long Message-ID: <000b01c7d777$b42fcca0$1c8f65e0$@back@telia.com> Hello Martin, > > You mean I should list all of these in "Modules/Setup"? > Exactly so. They are already listed - just uncomment them all > (with proper command line flags and libraries where necessary). OK, I will try to get it compiled and tested. Meanwhile, you asked so: > I still don't understand. Are you *certain* that these are LP64 > libraries? Can you kindly refer to some official document that says > Interix uses LP64 on AMD64? MS is surprisingly very quiet of the POSIX subsystem and the Interix BSD implementation, so it's hard to find any official info on the net. But here is one the developers: In Interix SDK releasenotes.htm (SDK download): "64-bit compilation supports the LP64 data model." Interix general: To find details on how it all really works, one will have to look in the headers and try different features oneself. (Which actually is pretty fun because it's really fast and usually works well). The SDK comes with support for x86, x86_64 (EM64T or AMD64) and IA64. > And if so, how did Microsoft manage to build them, if their compiler > does not support LP64? (I see you kind of answer that below - although > I'm unsure what "translate all longs to long long means - you mean > literal text replacement?) Sure, cc precompiles the source file to a temporary file, flip it, runs a conversion tool - "l2ll" => all are converted to and finally compiles the converted file. The compile is done via a call from POSIX to the Windows subsystem and the compiler found in the POSIX path environment. To understand the details one has to know that the POSIX environment runs directly on top of the NT kernel and know nothing of Windows, Windows paths etc. This is kind of a compile on the fly. The libraries are also of two kinds: 1 The core POSIX libraries - part of the OS, uses DDK tools 2 Interix SDK - BSD libc and utils, uses cc and Interix gcc (x86 only) The DDK tools is turned to LP64 support via special defines in the headers. But here is some unclear issues with functions directly exported from the OS native LLP64 libraries (ntdll.dll) - don't know how this is solved. Somewhere here lies the reason why cc is hard to use with Visual Studio and why the long type is such a nuisance. I also tried the Intel x64 PE compiler (for better C99 support), but it produces applications which relies on Windows API functions (e.g. VirtualAlloc, LoadLibrary) and thus cannot be used in POSIX. Cheers, Erik From martin at v.loewis.de Sun Aug 5 18:37:48 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 05 Aug 2007 18:37:48 +0200 Subject: [Python-Dev] cStringIO.StringIO() buffer behavior In-Reply-To: References: Message-ID: <46B5FCDC.6050807@v.loewis.de> > See bugs #1548891 and #1730114. > > In the former, it was reported that cStringIO works differently from StringIO > when handling unicode strings; it used GetReadBuffer which returned the raw > internal UCS-2 or UCS-4 encoded string. > > I changed it to use GetCharBuffer, which converts to a string using the > default encoding first. This fix was also in 2.5.1. > > The latter bug now complains that this excludes things like array.array()s > from being used as an argument to cStringIO.StringIO(), which worked before > with GetReadBuffer. > > What's the preferred solution here? I think the 2.5.0 behavior to accept array.array should be restored (and a test case be added). What to do about Unicode strings, I don't know. I agree with Guido that they are officially not supported in cStringIO, so it would be best to reject them. OTOH, since 2.5.1 already supports them, another choice would be continue supporting them, in the same way as they are supported in 2.5.1. Either solution would special-case Unicode strings. Regards, Martin From alexandre at peadrop.com Sun Aug 5 20:48:11 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 5 Aug 2007 14:48:11 -0400 Subject: [Python-Dev] cStringIO.StringIO() buffer behavior In-Reply-To: References: Message-ID: On 8/5/07, Georg Brandl wrote: > See bugs #1548891 and #1730114. > > In the former, it was reported that cStringIO works differently from StringIO > when handling unicode strings; it used GetReadBuffer which returned the raw > internal UCS-2 or UCS-4 encoded string. > > I changed it to use GetCharBuffer, which converts to a string using the > default encoding first. This fix was also in 2.5.1. > > The latter bug now complains that this excludes things like array.array()s > from being used as an argument to cStringIO.StringIO(), which worked before > with GetReadBuffer. > > What's the preferred solution here? > The best thing would be add a special case for ascii-only unicode objects, and keep the old behavior. However, I believe this will be ugly, especially in O_write. So, it would perhaps be better to simply stop supporting unicode objects. -- Alexandre From g.brandl at gmx.net Mon Aug 6 09:48:25 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 06 Aug 2007 09:48:25 +0200 Subject: [Python-Dev] cStringIO.StringIO() buffer behavior In-Reply-To: <46B5FCDC.6050807@v.loewis.de> References: <46B5FCDC.6050807@v.loewis.de> Message-ID: Guido van Rossum schrieb: > Methinks that this was a fundamental limitation of cStringIO, not a > bug. Certainly not something to be "fixed" in a bugfix release. I'm sorry. Martin v. L?wis schrieb: >> See bugs #1548891 and #1730114. >> >> In the former, it was reported that cStringIO works differently from StringIO >> when handling unicode strings; it used GetReadBuffer which returned the raw >> internal UCS-2 or UCS-4 encoded string. >> >> I changed it to use GetCharBuffer, which converts to a string using the >> default encoding first. This fix was also in 2.5.1. >> >> The latter bug now complains that this excludes things like array.array()s >> from being used as an argument to cStringIO.StringIO(), which worked before >> with GetReadBuffer. >> >> What's the preferred solution here? > > I think the 2.5.0 behavior to accept array.array should be restored (and > a test case be added). What to do about Unicode strings, I don't know. > I agree with Guido that they are officially not supported in cStringIO, > so it would be best to reject them. OTOH, since 2.5.1 already supports > them, another choice would be continue supporting them, in the same way > as they are supported in 2.5.1. Either solution would special-case > Unicode strings. Okay, I propose the following patch: Index: Modules/cStringIO.c =================================================================== --- Modules/cStringIO.c (Revision 56763) +++ Modules/cStringIO.c (Arbeitskopie) @@ -673,12 +673,26 @@ char *buf; Py_ssize_t size; - if (PyObject_AsCharBuffer(s, (const char **)&buf, &size) != 0) + /* special-case Unicode objects: encode them in the default encoding */ + if (PyUnicode_Check(s)) { + s = PyUnicode_AsEncodedString(s, NULL, NULL); + if (s == NULL) return NULL; + } else { + Py_INCREF(s); + } + if (PyObject_AsReadBuffer(s, (const char **)&buf, &size)) { + PyErr_Format(PyExc_TypeError, "expected read buffer, %.200s found", + s->ob_type->tp_name); + return NULL; + } + self = PyObject_New(Iobject, &Itype); - if (!self) return NULL; - Py_INCREF(s); + if (!self) { + Py_DECREF(s); + return NULL; + } self->buf=buf; self->string_size=size; self->pbuf=s; Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From guido at python.org Mon Aug 6 20:22:31 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 11:22:31 -0700 Subject: [Python-Dev] cStringIO.StringIO() buffer behavior In-Reply-To: References: <46B5FCDC.6050807@v.loewis.de> Message-ID: On 8/6/07, Georg Brandl wrote: > Guido van Rossum schrieb: > > Methinks that this was a fundamental limitation of cStringIO, not a > > bug. Certainly not something to be "fixed" in a bugfix release. > > I'm sorry. No problem. Somebody else should have flagged this, so it's our collective responsibility. > Okay, I propose the following patch: > > Index: Modules/cStringIO.c [...] My proposal is much more radical -- get rid of cStringIO altogether. (And also of StringIO.py.) There aren't that many places using it any more, and almost all of these are easily replaced with io.StringIO (or io.BytesIO!). There's already a fixer in 2to3 to do this. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Mon Aug 6 20:24:15 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 11:24:15 -0700 Subject: [Python-Dev] cStringIO.StringIO() buffer behavior In-Reply-To: References: <46B5FCDC.6050807@v.loewis.de> Message-ID: Oops, never mind. This was in the context of 2.5 and 2.6, but my reply was in the context of 3.0. Still, in the light of cStringIO disappearing, it would be good to keep cStringIO is stable as possible (probably restoring 2.5.0 behavior) so as to avoid breaking 3rd party code more than once. On 8/6/07, Guido van Rossum wrote: > On 8/6/07, Georg Brandl wrote: > > Guido van Rossum schrieb: > > > Methinks that this was a fundamental limitation of cStringIO, not a > > > bug. Certainly not something to be "fixed" in a bugfix release. > > > > I'm sorry. > > No problem. Somebody else should have flagged this, so it's our > collective responsibility. > > > Okay, I propose the following patch: > > > > Index: Modules/cStringIO.c > [...] > > My proposal is much more radical -- get rid of cStringIO altogether. > (And also of StringIO.py.) There aren't that many places using it any > more, and almost all of these are easily replaced with io.StringIO (or > io.BytesIO!). There's already a fixer in 2to3 to do this. > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From alexandre at peadrop.com Mon Aug 6 21:43:23 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 6 Aug 2007 15:43:23 -0400 Subject: [Python-Dev] cStringIO.StringIO() buffer behavior In-Reply-To: References: <46B5FCDC.6050807@v.loewis.de> Message-ID: On 8/6/07, Georg Brandl wrote: > Okay, I propose the following patch: > [...] I think your patch is complicated for nothing. It would be much more straightforward to use PyString_AsStringAndSize to encode the Unicode string with the default encoding. I think it would be necessary to port the fix to O_write and O_writelines. -- Alexandre Index: Modules/cStringIO.c =================================================================== --- Modules/cStringIO.c (revision 56754) +++ Modules/cStringIO.c (working copy) @@ -665,8 +674,15 @@ char *buf; Py_ssize_t size; - if (PyObject_AsCharBuffer(s, (const char **)&buf, &size) != 0) - return NULL; + /* Special case for unicode objects. */ + if (PyUnicode_Check(s)) { + if (PyString_AsStringAndSize(s, &buf, &size) == -1) + return NULL; + } + else { + if (PyObject_AsReadBuffer(s, (const void **)&buf, &size) == -1) + return NULL; + } self = PyObject_New(Iobject, &Itype); if (!self) return NULL; From guido at python.org Tue Aug 7 01:55:18 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 6 Aug 2007 16:55:18 -0700 Subject: [Python-Dev] Pleaswe help with the countdown to zero failing tests in the struni branch! Message-ID: We're down to 11 failing test in the struni branch. I'd like to get this down to zero ASAP so that we can retire the old p3yk (yes, with typo!) branch and rename py3k-struni to py3k. Please help! Here's the list of failing tests: test_ctypes Recently one test started failing again, after Martin changed PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1. test_email test_email_codecs test_email_renamed Can someone contact the email-sig and ask for help with these? test_minidom Recently started failing again; probably shallow. test_sqlite Virgin territory, probably best done by whoever wrote the code or at least someone with time to spare. test_tarfile Virgin territory again (but different owner :-). test_urllib2_localnet test_urllib2net I think Jeremy Hylton may be close to fixing these, he's done a lot of work on urllib and httplib. test_xml_etree_c Virgin territory again. There are also a few tests that only fail on CYGWIN or OSX; I won't bother listing these. If you want to help, please refer to this wiki page: http://wiki.python.org/moin/Py3kStrUniTests There are also other tasks; see http://wiki.python.org/moin/Py3kToDo -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fijall at gmail.com Tue Aug 7 23:23:44 2007 From: fijall at gmail.com (Maciej Fijalkowski) Date: Tue, 7 Aug 2007 23:23:44 +0200 Subject: [Python-Dev] os.tmpfile() problem Message-ID: <693bc9ab0708071423t4027f99cs8aef222942b42e59@mail.gmail.com> I've got slight problem with os.tmpfile(). What I would like to do is to get the filedesc of tmpfile. First approach: os.tmpfile().fileno() of course does not work out, because fileno() does not keep object alive. The solution is to keep os.tmpfile() result somewhere for an arbitrary amount of time, which is quite obscure. This is problem with all file operations, but fortunately if I want a filedesc, I can do just os.open() which will not close the file for me. I've got several obscure solutions, noone satisfies me really: * If I use .fileno() than I'm on my own and I need to close file myself * .fileno() returns a int-like object which keeps alive file (well, this will explode when keeping this as an index in a list, which does not keep the object alive and so on) * have os._tmpfile() or whatever which returns filedesc What do you think? Cheers, fijal -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20070807/eeaa0959/attachment.htm From guido at python.org Tue Aug 7 23:28:23 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 14:28:23 -0700 Subject: [Python-Dev] os.tmpfile() problem In-Reply-To: <693bc9ab0708071423t4027f99cs8aef222942b42e59@mail.gmail.com> References: <693bc9ab0708071423t4027f99cs8aef222942b42e59@mail.gmail.com> Message-ID: This seems a question for comp.lang.python or help at python.org (does that still exist?). Also, you might consider the APIs available in the tempfile module rather than os.tempfile(). On 8/7/07, Maciej Fijalkowski wrote: > I've got slight problem with os.tmpfile(). What I would like to do is to get > the filedesc of tmpfile. > > First approach: > > os.tmpfile().fileno() of course does not work out, because fileno() does not > keep object alive. The solution is to keep os.tmpfile() result somewhere for > an arbitrary amount of time, which is quite obscure. This is problem with > all file operations, but fortunately if I want a filedesc, I can do just > os.open() which will not close the file for me. > > I've got several obscure solutions, noone satisfies me really: > > * If I use .fileno() than I'm on my own and I need to close file myself > > * .fileno() returns a int-like object which keeps alive file (well, this > will explode when keeping this as an index in a list, which does not keep > the object alive and so on) > > * have os._tmpfile() or whatever which returns filedesc > > What do you think? > > Cheers, > fijal > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Wed Aug 8 00:41:40 2007 From: guido at python.org (Guido van Rossum) Date: Tue, 7 Aug 2007 15:41:40 -0700 Subject: [Python-Dev] Pleaswe help with the countdown to zero failing tests in the struni branch! In-Reply-To: References: Message-ID: Here's a followup. We need help from someone with a 64-bit Linux box; these tests are failing on 64-bit only: test_io, test_largefile, test_ossaudiodev, test_poll, test_shelve, test_socket_ssl. I suspect that the _fileio.c module probably is one of the culprits. Other news: On 8/6/07, Guido van Rossum wrote: > We're down to 11 failing test in the struni branch. I'd like to get > this down to zero ASAP so that we can retire the old p3yk (yes, with > typo!) branch and rename py3k-struni to py3k. > > Please help! Here's the list of failing tests: > > test_ctypes > Recently one test started failing again, after Martin changed > PyUnicode_FromStringAndSize() to use UTF8 instead of Latin1. > > test_email > test_email_codecs > test_email_renamed > Can someone contact the email-sig and ask for help with these? > > test_minidom > Recently started failing again; probably shallow. > > test_sqlite > Virgin territory, probably best done by whoever wrote the code or at > least someone with time to spare. > > test_tarfile > Virgin territory again (but different owner :-). Lars Gustaebel fixed this except for a few bz2-related tests. > test_urllib2_localnet > test_urllib2net > I think Jeremy Hylton may be close to fixing these, he's done a lot of > work on urllib and httplib. > > test_xml_etree_c > Virgin territory again. > > There are also a few tests that only fail on CYGWIN or OSX; I won't > bother listing these. The two OSX tests listed at the time were fixed, thanks to those volunteers! We now only have an OSX-specific failure in test_csv. > If you want to help, please refer to this wiki page: > http://wiki.python.org/moin/Py3kStrUniTests > > There are also other tasks; see http://wiki.python.org/moin/Py3kToDo -- --Guido van Rossum (home page: http://www.python.org/~guido/) From nmm1 at cus.cam.ac.uk Wed Aug 8 11:28:16 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Wed, 08 Aug 2007 10:28:16 +0100 Subject: [Python-Dev] Regular expressions, Unicode etc. Message-ID: I have needed to push my stack to teach REs (don't ask), and am taking a look at the RE code. I may be able to extend it to support RFE 694374 and (more importantly) atomic groups and possessive quantifiers. While I regard such things as revolting beyond belief, they make a HELL of a difference to the efficiency of recognising things like HTML tags in a morass of mixed text. The other approach, which is to stick to true regular expressions, and wholly or partially convert to DFAs, has already been rendered impossible by even the limited Perl/PCRE extensions that Python has adopted. My first question is whether this would clash with any ongoing work, including being superseded by any changes in Python 3000. Note that I am NOT proposing to do a fixed task, but will produce a proper proposal only when I know what I can achieve for a small amount of work. If the SRE engine turns out to be unsuitable to extend in these ways, I shall quietly abandon the project. My second one is about Unicode. I really, but REALLY regard it as a serious defect that there is no escape for printing characters. Any code that checks arbitrary text is likely to need them - yes, I know why Perl and hence PCRE doesn't have that, but let's skip that. That is easy to add, though choosing a letter is tricky. Currently \c and \C, for 'character' (I would prefer 'text' or 'printable', but \t is obviously insane and \P is asking for incompatibility with Perl and Java). But attempting to rebuild the Unicode database hasn't worked. Tools/unicode is, er, a trifle incomplete and out of date. The only file I need to change is Objects/unicodetype_db.h, but the init attempts to run Tools/unicode/makeunicodedata.py have not been successful. I may be able to reverse engineer the mechanism enough to get the files off the Unicode site and run it, but I don't want to spend forever on it. Any clues? Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From nmm1 at cus.cam.ac.uk Wed Aug 8 13:15:40 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Wed, 08 Aug 2007 12:15:40 +0100 Subject: [Python-Dev] Regular expressions, Unicode etc. Message-ID: Further to the above, I found the Unicode sources, have rebuilt the files, but it involved some fairly serious hacking to the building mechanism and I have had to disable the Unicode 3.2 support. And, of course, that means that 4 of the tests fail. This area needs addressing, not least because Python should clearly be upgraded to Unicode 5.0.0 (which is what I am using) at some stage. I am not sure how best to report a bug that essentially says "The build mechanisms for Unicode have suffered bit-rot, no longer work and need redesigning." I could certainly do that, but it's not helpful - people already know that, from the comments :-( Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From g.brandl at gmx.net Wed Aug 8 14:52:47 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 08 Aug 2007 14:52:47 +0200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: Nick Maclaren schrieb: > Further to the above, I found the Unicode sources, have rebuilt > the files, but it involved some fairly serious hacking to the > building mechanism and I have had to disable the Unicode 3.2 > support. And, of course, that means that 4 of the tests fail. > > This area needs addressing, not least because Python should > clearly be upgraded to Unicode 5.0.0 (which is what I am using) > at some stage. > > I am not sure how best to report a bug that essentially says > "The build mechanisms for Unicode have suffered bit-rot, no longer > work and need redesigning." I could certainly do that, but it's > not helpful - people already know that, from the comments :-( FWIW, there is a patch on the tracker at python.org/sf/1571184 that may be helpful to you. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From martin at v.loewis.de Wed Aug 8 20:41:33 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Aug 2007 20:41:33 +0200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <46BA0E5D.60109@v.loewis.de> > My second one is about Unicode. I really, but REALLY regard it as > a serious defect that there is no escape for printing characters. > Any code that checks arbitrary text is likely to need them - yes, > I know why Perl and hence PCRE doesn't have that, but let's skip > that. That is easy to add, though choosing a letter is tricky. > Currently \c and \C, for 'character' (I would prefer 'text' or > 'printable', but \t is obviously insane and \P is asking for > incompatibility with Perl and Java). Before discussing the escape, I'd like to see a specification of it first - what characters precisely would classify as "printing"? > But attempting to rebuild the Unicode database hasn't worked. > Tools/unicode is, er, a trifle incomplete and out of date. The > only file I need to change is Objects/unicodetype_db.h, but the > init attempts to run Tools/unicode/makeunicodedata.py have not > been successful. > > I may be able to reverse engineer the mechanism enough to get > the files off the Unicode site and run it, but I don't want to > spend forever on it. Any clues? I see that you managed to do something here, so I'm not sure what kind of help you still need. Regards, Martin From martin at v.loewis.de Wed Aug 8 20:48:46 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Aug 2007 20:48:46 +0200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <46BA100E.4020006@v.loewis.de> > Further to the above, I found the Unicode sources, have rebuilt > the files, but it involved some fairly serious hacking to the > building mechanism and I have had to disable the Unicode 3.2 > support. And, of course, that means that 4 of the tests fail. > > This area needs addressing, not least because Python should > clearly be upgraded to Unicode 5.0.0 (which is what I am using) > at some stage. I recommend you use the 4.1 version of the database; this should work out of the box, with no change to the build environment at all. As for updating it - that has to wait until the next release of Python. At that point, 5.1 might be releasesd, so 5.0 might get skipped altogether. > I am not sure how best to report a bug that essentially says > "The build mechanisms for Unicode have suffered bit-rot, no longer > work and need redesigning." I could certainly do that, but it's > not helpful - people already know that, from the comments :-( I would likely close such a report as "works for me" (after testing it does - it did when I last ran it, which was before the release of Python 2.5). It did not suffer from bit-rot - it still works just fine for the version of the database that is supported. As for the need for redesigning - I don't see that need. What specific aspect do you think needs redesigning? If you merely meant to say "I don't understand the code" - this is not enough reason, I remember it took me some time to understand it as well, but now I see that it does precisely what it needs to do, and precisely in the way it needs to do that. Regards, Martin From mike.klaas at gmail.com Wed Aug 8 20:56:58 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Wed, 8 Aug 2007 11:56:58 -0700 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <9CBC9283-52BF-48AB-A39F-0DAE0E4EAFAE@gmail.com> On 8-Aug-07, at 2:28 AM, Nick Maclaren wrote: > I have needed to push my stack to teach REs (don't ask), and am > taking a look at the RE code. I may be able to extend it to support > RFE 694374 and (more importantly) atomic groups and possessive > quantifiers. While I regard such things as revolting beyond belief, > they make a HELL of a difference to the efficiency of recognising > things like HTML tags in a morass of mixed text. +1. I would use such a feature. > The other approach, which is to stick to true regular expressions, > and wholly or partially convert to DFAs, has already been rendered > impossible by even the limited Perl/PCRE extensions that Python > has adopted. Impossible? Surely, a sufficiently-competent re engine could detect when a DFA is possible to construct? -Mike From nmm1 at cus.cam.ac.uk Wed Aug 8 21:29:50 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Wed, 08 Aug 2007 20:29:50 +0100 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: Your message of "Wed, 08 Aug 2007 20:41:33 +0200." <46BA0E5D.60109@v.loewis.de> Message-ID: [ I would appreciate not getting private copies as well. ] =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= wrote: > > Before discussing the escape, I'd like to see a specification of > it first - what characters precisely would classify as "printing"? For basic ASCII and locale-based testing, whatever isprint() says. Just as for isalpha(). For Unicode, whatever people agree! I use the criterion that it has a defined category that doesn't start with 'C' - which is what I think that most people will accept. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From nmm1 at cus.cam.ac.uk Wed Aug 8 21:31:49 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Wed, 08 Aug 2007 20:31:49 +0100 Subject: [Python-Dev] =?iso-8859-1?q?cc=3A_=22Martin_v=2E_L=F6wis=22_=3Cma?= =?iso-8859-1?q?rtin=40v=2Eloewis=2Ede=3E?= Message-ID: Re: [Python-Dev] Regular expressions, Unicode etc. =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= wrote: > > I recommend you use the 4.1 version of the database; this should > work out of the box, with no change to the build environment at > all. I tried that, of course. See below. > As for updating it - that has to wait until the next release > of Python. At that point, 5.1 might be releasesd, so 5.0 might > get skipped altogether. Very true. > I would likely close such a report as "works for me" (after testing > it does - it did when I last ran it, which was before the release > of Python 2.5). I think that you will find that you are using a non-standard environment and set of Python sources. I started off with the standard distribution. > It did not suffer from bit-rot - it still works just fine for > the version of the database that is supported. Really? I have just checked 2.5.1, and the same defects are there. > As for the need for redesigning - I don't see that need. What specific > aspect do you think needs redesigning? If you merely meant to say > "I don't understand the code" - this is not enough reason, I > remember it took me some time to understand it as well, but now > I see that it does precisely what it needs to do, and precisely > in the way it needs to do that. Well, here are a selection of the issues that I found: The Makefile includes the command: ncftpget -R ftp.unicode.org . Public/MAPPINGS Not merely is ncftpget not a standard utility, the current mappings are no longer at that location. Indeed, I can see nothing useful in that directory at present, though I haven't searched it in depth! Looking through www.unicode.org, I could find the relevant files for 5.0.0, but for no other version. No, I am NOT going to type in over a megabyte of data from the PDF! makeunicodedata.py has a reference to the Unicode 3.2 files, but they are not present in the standard distribution, the Makefile doesn't fetch them, and I can't find them. makeunicodedata.py refers to (for example) UnicodeData.txt and Modules/unicodedata_db.h as such, which rather requires it to be run in a particular directory. I can find nothing in any file even referring to this. Having run it, running 'make all' does not rebuild Python correctly. I couldn't be bothered to work out why, so I hit it with the usual trick, 'make distclean'. And, of course, it SHOULD be possible to upgrade the Unicode data without having to change version of Python! Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From nmm1 at cus.cam.ac.uk Wed Aug 8 21:47:11 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Wed, 08 Aug 2007 20:47:11 +0100 Subject: [Python-Dev] Regular expressions, Unicode etc. Message-ID: I am not on "Python 3000", so am restricting. Mike Klaas wrote: > > > I have needed to push my stack to teach REs (don't ask), and am > > taking a look at the RE code. I may be able to extend it to support > > RFE 694374 and (more importantly) atomic groups and possessive > > quantifiers. While I regard such things as revolting beyond belief, > > they make a HELL of a difference to the efficiency of recognising > > things like HTML tags in a morass of mixed text. > > +1. I would use such a feature. I think that I am getting somewhere, but I really dislike the style of _sre.c. It has a very complex semi-stack, semi-finite-state design and no comments on how it is supposed to work. And its memory management looks like a recipe for leaks, so I may well introduce some of them. > > The other approach, which is to stick to true regular expressions, > > and wholly or partially convert to DFAs, has already been rendered > > impossible by even the limited Perl/PCRE extensions that Python > > has adopted. > > Impossible? Surely, a sufficiently-competent re engine could detect > when a DFA is possible to construct? I doubt it. While it isn't equivalent to the halting problem, it IS an intractable one! There are two problems: Firstly, things like backreferences are an absolute no-no. They are not regular, and REs with them in cannot be converted to DFAs. That could be 'solved' by a parser that kicked out such constructions, but it would get screams from many users. Secondly, anything involving explicit or implicit negation can lead to (if I recall) a super-exponential explosion in the size of the DFA. That could be 'solved' by imposing a limit, but few people would be able to predict when it would bite. Thirdly, I would require notice of the question of whether capturing parentheses could be supported, and what semantic changes would be to which were set and how. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From brett at python.org Wed Aug 8 22:28:12 2007 From: brett at python.org (Brett Cannon) Date: Wed, 8 Aug 2007 13:28:12 -0700 Subject: [Python-Dev] Please help verify SF data dump imported into (future) new tracker Message-ID: We are getting very close to moving over to the new tracker (hopefully by the end of the month; no firm date yet, though, as we are still planning things out)! Part of the transition is taking a data dump provided by SourceForge and loading it into our Roundup instance. But we need to make some effort to make sure SF's data dump is accurate and that our import is good. If you can, please go to SourceForge and choose some issue (bug, patch, whatever), and then look up the corresponding issue at http://bugs.python.org/ . If there is any discrepancy, please report it at http://psf.upfronthosting.co.za/roundup/meta (the link is also listed at the new tracker as where to report tracker problems) or to this email. -Brett P.S.: If you want to help with the transitionin other ways, you can also help with the tracker docs at http://wiki.python.org/moin/TrackerDocs. From mike.klaas at gmail.com Wed Aug 8 22:29:50 2007 From: mike.klaas at gmail.com (Mike Klaas) Date: Wed, 8 Aug 2007 13:29:50 -0700 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <04F52551-D458-490C-9945-6DAD1E46F7D1@gmail.com> In 8-Aug-07, at 12:47 PM, Nick Maclaren wrote: > >>> The other approach, which is to stick to true regular expressions, >>> and wholly or partially convert to DFAs, has already been rendered >>> impossible by even the limited Perl/PCRE extensions that Python >>> has adopted. >> >> Impossible? Surely, a sufficiently-competent re engine could detect >> when a DFA is possible to construct? > > I doubt it. While it isn't equivalent to the halting problem, it IS > an intractable one! There are two problems: > > Firstly, things like backreferences are an absolute no-no. They > are not regular, and REs with them in cannot be converted to DFAs. > That could be 'solved' by a parser that kicked out such constructions, > but it would get screams from many users. > > Secondly, anything involving explicit or implicit negation can lead > to (if I recall) a super-exponential explosion in the size of the > DFA. That could be 'solved' by imposing a limit, but few people > would be able to predict when it would bite. Right. The analysis I envisioned would be more along the lines of "if troublesome RE extensions are used, do not attempt to construct a DFA". It could even be exposed via an alternate api (re.compile_dfa ()) that admitted a subset of the usual grammar. > Thirdly, I would require notice of the question of whether capturing > parentheses could be supported, and what semantic changes would be > to which were set and how. Capturing groups are rather integral to the python regex api and, as you say, a major difficulty for DFA-based implementations. Sounds like a task best left to a thirdparty package. -Mike From martin at v.loewis.de Wed Aug 8 22:38:03 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Aug 2007 22:38:03 +0200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <46BA29AB.40804@v.loewis.de> >> Before discussing the escape, I'd like to see a specification of >> it first - what characters precisely would classify as "printing"? > > For basic ASCII and locale-based testing, whatever isprint() says. > Just as for isalpha(). In the mediate term, locale-based testing will go away/be not implementable (in particular, Py3k won't have a byte-oriented character string type, so we can't use isprint). In general, isprint is unsuitable since it doesn't support multi-byte character sets. > For Unicode, whatever people agree! I use the criterion that it > has a defined category that doesn't start with 'C' - which is what > I think that most people will accept. -1. There must be a better specification than that. Can you please explain the concept of "printing character"? If you have a Unicode code point, how do you determine whether it is printing? If rendering it would generate black pixels on white background? Regards, Martin From nmm1 at cus.cam.ac.uk Wed Aug 8 23:16:37 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Wed, 08 Aug 2007 22:16:37 +0100 Subject: [Python-Dev] Regular expressions, Unicode etc. Message-ID: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= wrote: > > >> Before discussing the escape, I'd like to see a specification of > >> it first - what characters precisely would classify as "printing"? > > > > For basic ASCII and locale-based testing, whatever isprint() says. > > Just as for isalpha(). > > In the mediate term, locale-based testing will go away/be not > implementable (in particular, Py3k won't have a byte-oriented > character string type, so we can't use isprint). In general, > isprint is unsuitable since it doesn't support multi-byte > character sets. Well, iswprint isn't so restricted :-) I don't see the relevance of this, as EXACTLY the same problem applies to isalnum and \w. If you can solve one problem (and you have to solve the latter), you can solve the other. > > For Unicode, whatever people agree! I use the criterion that it > > has a defined category that doesn't start with 'C' - which is what > > I think that most people will accept. > > -1. There must be a better specification than that. > > Can you please explain the concept of "printing character"? If > you have a Unicode code point, how do you determine whether it > is printing? If rendering it would generate black pixels on white > background? Eh? This is a character set we are talking about. The proposed extensions to include font and colour are an aberration that I shall thankfully be long retired before they hit. Unicode has a two letter classification of each character, with the main category being in upper case and the subsidiary one in lower. Let's ignore the latter, as it is irrelevant. The main categories are 'Z' (spaces), 'L' (letters), 'N' (numbers), 'S' (Symbols), 'P' (punctuation), 'M' (marks) and 'C' control characters. There are some pretty weird entries in 'L' and 'N' and the difference between 'S', P' and 'M' is arcane, to a degree. But all of the categories except 'C' are things that display, and 'C' is mainly the ASCII controls we know and, er, love - with some similar extras. Obviously, unclassified characters should not be called printing, and equally obviously controls shouldn't. There is no clear reason why the others should not be - especially as the difference between a modifying accent and a free-standing one is something so obscure that most people don't even know that there IS one. The point about an escape for printing characters is to check for bad characters in text input, and the rule I mentioned is fine for that. What's the problem with it? Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From martin at v.loewis.de Wed Aug 8 23:54:06 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 08 Aug 2007 23:54:06 +0200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <46BA3B7E.2090208@v.loewis.de> >> In the mediate term, locale-based testing will go away/be not >> implementable (in particular, Py3k won't have a byte-oriented >> character string type, so we can't use isprint). In general, >> isprint is unsuitable since it doesn't support multi-byte >> character sets. > > Well, iswprint isn't so restricted :-) Yes. However, it is even more difficult to convert from Py_UNICODE to wchar_t in general. > I don't see the relevance > of this, as EXACTLY the same problem applies to isalnum and \w. There is no problem for isalnum: it will just go away if byte-oriented characters go away. Fortunately, we have a replacement for the Unicode case. The relevance is that your specification of "printing character" as "isprint returns true" is nearly useless, as it only applies to byte-oriented characters. > If you can solve one problem (and you have to solve the latter), > you can solve the other. Unicode-isalnum is defined as isalpha|isdecimal|isdigit|isnumeric. isalpha means categories Ll, Lu, Lt, Lo, Lm. isdecimal means character has the decimal property. isigit means the character has the digit property. isnumeric means the character has the numeric property. >> Can you please explain the concept of "printing character"? If >> you have a Unicode code point, how do you determine whether it >> is printing? If rendering it would generate black pixels on white >> background? > > Eh? This is a character set we are talking about. The proposed > extensions to include font and colour are an aberration that I shall > thankfully be long retired before they hit. It was a proposal for a definition. English is not my native language, and "printing character" means nothing to me. So I kindly asked for a definition, and suggested one possibility. I would not have guessed that you consider white-space characters as "printing", as they don't actually print anything. > The point about an escape for printing characters is to check > for bad characters in text input, and the rule I mentioned is > fine for that. What's the problem with it? The problem is that you did not quite mention a rule, or else I missed it. You seem to be asking for being able to express "not a control character". I propose that this is best done with UTS#18, in which you would write [\P{C}] # or \P{Other} If this is what you want, I'm all in favor of having it implemented. Regards, Martin From martin at v.loewis.de Thu Aug 9 00:03:51 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 09 Aug 2007 00:03:51 +0200 Subject: [Python-Dev] =?iso-8859-1?q?cc=3A_=22Martin_v=2E_L=F6wis=22_=3Cma?= =?iso-8859-1?q?rtin=40v=2Eloewis=2Ede=3E?= In-Reply-To: References: Message-ID: <46BA3DC7.4020709@v.loewis.de> >> I would likely close such a report as "works for me" (after testing >> it does - it did when I last ran it, which was before the release >> of Python 2.5). > > I think that you will find that you are using a non-standard > environment and set of Python sources. Please trust me that I didn't. See below. > Well, here are a selection of the issues that I found: > > The Makefile includes the command: > ncftpget -R ftp.unicode.org . Public/MAPPINGS > Not merely is ncftpget not a standard utility, the current mappings > are no longer at that location. Indeed, I can see nothing useful in > that directory at present, though I haven't searched it in depth! Ah, the makefile. I don't think you use it create the Unicode database. It's only good for generating the codecs (Lib/encodings) AFAICT, the mappings are still where they always were: at the location given in the Makefile. (e.g. ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT ) For generating the Unicode database, you need to download the files manually > Looking through www.unicode.org, I could find the relevant files > for 5.0.0, but for no other version. No, I am NOT going to type > in over a megabyte of data from the PDF! And nobody asks you to. Just use http://www.unicode.org/Public/4.1.0/ucd/ (also available through ftp) Did you really believe the Unicode consortium doesn't have the old versions of the character database online? Do you think they are complete fools? > makeunicodedata.py has a reference to the Unicode 3.2 files, but > they are not present in the standard distribution, the Makefile > doesn't fetch them, and I can't find them. Googling for "unicode 3.2 ucd" gives me http://unicode.org/Public/3.2-Update/ as the top hit (of course, you have to know that they call the character database "ucd" to invoke that query). > makeunicodedata.py refers to (for example) UnicodeData.txt and > Modules/unicodedata_db.h as such, which rather requires it to be > run in a particular directory. I can find nothing in any file > even referring to this. Yes, that's something you have to know. Put the files into the root directory of the source tree, then run makeunicodedata.py > And, of course, it SHOULD be possible to upgrade the Unicode data > without having to change version of Python! Well. Regards, Martin From nmm1 at cus.cam.ac.uk Thu Aug 9 10:27:46 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Thu, 09 Aug 2007 09:27:46 +0100 Subject: [Python-Dev] Unicode database Message-ID: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= wrote: > > > I think that you will find that you are using a non-standard > > environment and set of Python sources. > > Please trust me that I didn't. See below. I always trust people as much as I trust myself, but I do tend to check up. See below. > Ah, the makefile. I don't think you use it create the Unicode database. > > It's only good for generating the codecs (Lib/encodings) Yes, but it DOES attempt to download the mappings, and is the ONLY script which attempts to do so. beelzebub$find Python-2.5.1 -type f | wc 3458 3460 135981 beelzebub$find Python-2.5.1 -type f | xargs grep ftp.unicode.org Python-2.5.1/Doc/lib/libunicodedata.tex:4.1.0 which is publicly available from \url{ftp://ftp.unicode.org/}. grep: Python-2.5.1/Mac/Icons/Disk: No such file or directory grep: Image.icns: No such file or directory grep: Python-2.5.1/Mac/Icons/Python: No such file or directory grep: Folder.icns: No such file or directory Python-2.5.1/Misc/NEWS: at ftp.unicode.org and contain a few updates (e.g. the Mac OS Python-2.5.1/Tools/unicode/Makefile:# files available at ftp://ftp.unicode.org/ Python-2.5.1/Tools/unicode/Makefile: ncftpget -R ftp.unicode.org . Public/MAPPINGS Python-2.5.1/Tools/unicode/gencodec.py:site (ftp://ftp.unicode.org/Public/MAPPINGS/) and creates Python codec Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT the Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT Python-2.5.1/Tools/unicode/python-mappings/KOI8-U.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT Python-2.5.1/Tools/unicode/python-mappings/CP1140.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT Python-2.5.1/Modules/unicodedata.c:4.1.0 which is publically available from ftp://ftp.unicode.org/.\n > AFAICT, the mappings are still where they always were: at the > location given in the Makefile. (e.g. > ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT > ) Then you DEFINITELY are using a non-standard set of files. That above was from the source of Python 2.5.1 that I have just downloaded. > Did you really believe the Unicode consortium doesn't have the > old versions of the character database online? Do you think > they are complete fools? Please don't be offensive. I said that I had failed to find them, after searching the Unicode Web site. Now that you have give me the actual file name, I can find them, but searching on the version and request for that database leads to unhelpful files. > Googling for "unicode 3.2 ucd" gives me > > http://unicode.org/Public/3.2-Update/ > > as the top hit (of course, you have to know that they call > the character database "ucd" to invoke that query). Generally, I distrust Google for such things, as it is as likely to lead to you the wrong information as the right one. For example, that hit you found was on a different logical server, and could well be an incorrect version of the database. It is VERY common for such things to 'escape' into Google. Have you checked whether or not that file is correct with the Unicode consortium? Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From nmm1 at cus.cam.ac.uk Thu Aug 9 11:10:10 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Thu, 09 Aug 2007 10:10:10 +0100 Subject: [Python-Dev] Regular expressions, Unicode etc. Message-ID: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= wrote: > > There is no problem for isalnum: it will just go away if > byte-oriented characters go away. Fortunately, we have a > replacement for the Unicode case. As we do for isprint. > The relevance is that your specification of "printing character" > as "isprint returns true" is nearly useless, as it only applies > to byte-oriented characters. Eh? That's ALL I used it to specify! I used a Unicode-based specification for Unicode. > Unicode-isalnum is defined as isalpha|isdecimal|isdigit|isnumeric. > isalpha means categories Ll, Lu, Lt, Lo, Lm. isdecimal means > character has the decimal property. isigit means the character has > the digit property. isnumeric means the character has the numeric > property. I sincerely hope it isn't! Using a mixture of categories and properties is truly horrible, because it isn't unlikely that some future version of Unicode will introduce anomalies, even if there aren't any there already. And the character aliases file doesn't include any properties called 'digit' or 'decimal' or anything much like them, so they need a painful amount of reverse engineering to determine what characters they bind to. It LOOKS as if they are the subcategories, which would be OK. A much cleaner and more future-proof specification would be any category beginning with 'L' or 'N'. For example, Unicode doesn't CURRENTLY have a category for indeterminate numbers or sacred case, such as are used in some languages, but it isn't implausible that it would add them :-) > It was a proposal for a definition. English is not my native > language, and "printing character" means nothing to me. So > I kindly asked for a definition, and suggested one possibility. > I would not have guessed that you consider white-space characters > as "printing", as they don't actually print anything. Ah. It's not an ordinary English term. It's a computer language one, so I assumed that you would know it. It is older than C, but C standardised its use to mean any of the characters which are intended to display (or leave a blank) with standard, single positioning semantics. Almost all languages derived from C use it in the same sense, and Python has a fair amount of C ancestry. > The problem is that you did not quite mention a rule, or else > I missed it. I did, and you did! I said that it should be any character with a defined category that is not 'control'. > You seem to be asking for being able to express "not a control > character". I propose that this is best done with UTS#18, > in which you would write > > [\P{C}] # or \P{Other} > > If this is what you want, I'm all in favor of having it > implemented. Excellent! We are agreed. Yes, that is equivalent. I am NOT volunteering to add the support of that to the parser, especially now I have discovered the format of the intermediate data :-( It would be a foul task, and it isn't clear what syntax to use, anyway. There is the horrible POSIX syntax, which I blame (perhaps wrongly) on HP-UX, and the Java one, which I believe is a modified subset of the example in UTS#8. But that says: All syntax and API presented in this document is only for the purpose of illustration; there is absolutely no requirement to follow such syntax or API. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From mal at egenix.com Thu Aug 9 11:23:50 2007 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 09 Aug 2007 11:23:50 +0200 Subject: [Python-Dev] Unicode database In-Reply-To: References: Message-ID: <46BADD26.9030200@egenix.com> Nick Maclaren wrote: >> Ah, the makefile. I don't think you use it create the Unicode database. >> >> It's only good for generating the codecs (Lib/encodings) > > Yes, but it DOES attempt to download the mappings, and is the ONLY > script which attempts to do so. Of course it does. The Tools/unicode/Makefile is meant to simplify recreating the codecs from the (possibly updated) mapping on the Unicode site. If it doesn't work for you, that may well be possible, since I wrote the Makefile and the other related stuff in that directory to help me with updating the codecs from the mappings. It's only checked in for convenience. > beelzebub$find Python-2.5.1 -type f | wc > 3458 3460 135981 > beelzebub$find Python-2.5.1 -type f | xargs grep ftp.unicode.org > Python-2.5.1/Doc/lib/libunicodedata.tex:4.1.0 which is publicly available from \url{ftp://ftp.unicode.org/}. > grep: Python-2.5.1/Mac/Icons/Disk: No such file or directory > grep: Image.icns: No such file or directory > grep: Python-2.5.1/Mac/Icons/Python: No such file or directory > grep: Folder.icns: No such file or directory > Python-2.5.1/Misc/NEWS: at ftp.unicode.org and contain a few updates (e.g. the Mac OS > Python-2.5.1/Tools/unicode/Makefile:# files available at ftp://ftp.unicode.org/ > Python-2.5.1/Tools/unicode/Makefile: ncftpget -R ftp.unicode.org . Public/MAPPINGS > Python-2.5.1/Tools/unicode/gencodec.py:site (ftp://ftp.unicode.org/Public/MAPPINGS/) and creates Python codec > Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT the > Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT > Python-2.5.1/Tools/unicode/python-mappings/KOI8-U.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT > Python-2.5.1/Tools/unicode/python-mappings/CP1140.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT > Python-2.5.1/Modules/unicodedata.c:4.1.0 which is publically available from ftp://ftp.unicode.org/.\n > >> AFAICT, the mappings are still where they always were: at the >> location given in the Makefile. (e.g. >> ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT >> ) > > Then you DEFINITELY are using a non-standard set of files. That > above was from the source of Python 2.5.1 that I have just downloaded. No idea where you get that impression from, but then I'm not really sure what you're after anyway ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 09 2007) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From guido at python.org Thu Aug 9 17:11:52 2007 From: guido at python.org (Guido van Rossum) Date: Thu, 9 Aug 2007 08:11:52 -0700 Subject: [Python-Dev] Move to a "py3k" branch *DONE* In-Reply-To: References: Message-ID: Please spread the word. The py3k-struni branch is dead! Don't use it any more. --Guido ---------- Forwarded message ---------- From: Guido van Rossum Date: Aug 9, 2007 7:43 AM Subject: Move to a "py3k" branch *DONE* To: Python 3000 Cc: Neal Norwitz This is done. The new py3k branch is ready for business. If you currently have the py3k-struni branch checked out (at its top level), *don't update*, but issue the following commands: svn switch svn+ssh://pythondev at svn.python.org/python/branches/py3k svn update Only a small amount of activity should result (unless you didn't svn update for a long time). For the p3yk branch, the same instructions will work, but the svn update will update most of your tree. A "make clean" is recommended in this case. Left to do: - update the wikis - clean out the old branches - switch the buildbot and the doc builder to use the new branch (Neal) There are currently about 7 failing unit tests left: test_bsddb test_bsddb3 test_email test_email_codecs test_email_renamed test_sqlite test_urllib2_localnet See http://wiki.python.org/moin/Py3kStrUniTests for detailed status regarding these. --Guido On 8/9/07, Guido van Rossum wrote: > I am starting now. Please, no more checkins to either p3yk ot py3k-struni. > > On 8/8/07, Guido van Rossum wrote: > > I would like to move to a new branch soon for all Py3k development. > > > > I plan to name the branch "py3k". It will be branched from > > py3k-struni. I will do one last set of merges from the trunk via p3yk > > (note typo!) and py3k-struni, and then I will *delete* the old py3k > > and py3k-struni branches (you will still be able to access their last > > known good status by syncing back to a previous revision). I will > > temporarily shut up some unit tests to avoid getting endless spam from > > Neal's buildbot. > > > > After the switch, you should be able to switch your workspaces to the > > new branch using the "svn switch" command. > > > > If anyone is in the middle of something that would become painful due > > to this changeover, let me know ASAP and I'll delay. > > > > I will send out another message when I start the move, and another > > when I finish it. > > > > -- > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > -- --Guido van Rossum (home page: http://www.python.org/~guido/) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Fri Aug 10 00:51:47 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Aug 2007 00:51:47 +0200 Subject: [Python-Dev] Unicode database In-Reply-To: References: Message-ID: <46BB9A83.4030901@v.loewis.de> >> Ah, the makefile. I don't think you use it create the Unicode database. >> >> It's only good for generating the codecs (Lib/encodings) > > Yes, but it DOES attempt to download the mappings, and is the ONLY > script which attempts to do so. Sure. But (again): you don't need to have the mappings at all for what you want to achieve. So there is no point in downloading them > beelzebub$find Python-2.5.1 -type f | xargs grep ftp.unicode.org > Python-2.5.1/Doc/lib/libunicodedata.tex:4.1.0 which is publicly available from \url{ftp://ftp.unicode.org/}. > grep: Python-2.5.1/Mac/Icons/Disk: No such file or directory > grep: Image.icns: No such file or directory > grep: Python-2.5.1/Mac/Icons/Python: No such file or directory > grep: Folder.icns: No such file or directory > Python-2.5.1/Misc/NEWS: at ftp.unicode.org and contain a few updates (e.g. the Mac OS > Python-2.5.1/Tools/unicode/Makefile:# files available at ftp://ftp.unicode.org/ > Python-2.5.1/Tools/unicode/Makefile: ncftpget -R ftp.unicode.org . Public/MAPPINGS > Python-2.5.1/Tools/unicode/gencodec.py:site (ftp://ftp.unicode.org/Public/MAPPINGS/) and creates Python codec > Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT the > Python-2.5.1/Tools/unicode/python-mappings/TIS-620.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT > Python-2.5.1/Tools/unicode/python-mappings/KOI8-U.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT > Python-2.5.1/Tools/unicode/python-mappings/CP1140.TXT:# ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/EBCDIC/CP037.TXT > Python-2.5.1/Modules/unicodedata.c:4.1.0 which is publically available from ftp://ftp.unicode.org/.\n > >> AFAICT, the mappings are still where they always were: at the >> location given in the Makefile. (e.g. >> ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT >> ) > > Then you DEFINITELY are using a non-standard set of files. That > above was from the source of Python 2.5.1 that I have just downloaded. I don't understand. Why does this follow? What should I read out of the grep lines above, and why does my citing of a URL prove that I did something to my build environment? Regards, Martin From martin at v.loewis.de Fri Aug 10 00:59:44 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 10 Aug 2007 00:59:44 +0200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <46BB9C60.2030405@v.loewis.de> Nick Maclaren schrieb: >> The relevance is that your specification of "printing character" >> as "isprint returns true" is nearly useless, as it only applies >> to byte-oriented characters. > > Eh? That's ALL I used it to specify! I used a Unicode-based > specification for Unicode. Your specification was "For Unicode, whatever people agree!" I would not call that "Unicode-based". >> Unicode-isalnum is defined as isalpha|isdecimal|isdigit|isnumeric. >> isalpha means categories Ll, Lu, Lt, Lo, Lm. isdecimal means >> character has the decimal property. isigit means the character has >> the digit property. isnumeric means the character has the numeric >> property. > > I sincerely hope it isn't! Please read the code. >> It was a proposal for a definition. English is not my native >> language, and "printing character" means nothing to me. So >> I kindly asked for a definition, and suggested one possibility. >> I would not have guessed that you consider white-space characters >> as "printing", as they don't actually print anything. > > Ah. It's not an ordinary English term. It's a computer language > one, so I assumed that you would know it. I know the term "printable character", which is what I read in definitions of the isprint() routine. "printing character" I never heard before. Regards, Martin From greg.ewing at canterbury.ac.nz Fri Aug 10 02:28:49 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Aug 2007 12:28:49 +1200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: <46BB9C60.2030405@v.loewis.de> References: <46BB9C60.2030405@v.loewis.de> Message-ID: <46BBB141.700@canterbury.ac.nz> Martin v. L?wis wrote: > I know the term "printable character", which is what I read > in definitions of the isprint() routine. "printing character" > I never heard before. Hmmm... I guess this means your brain is using a part-of-speech-sensitive word->technical_meaning mapping. Perhaps this will be fixed in English 3.0... -- Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | Carpe post meridiem! | Christchurch, New Zealand | (I'm not a morning person.) | greg.ewing at canterbury.ac.nz +--------------------------------------+ From foom at fuhm.net Fri Aug 10 07:02:16 2007 From: foom at fuhm.net (James Y Knight) Date: Fri, 10 Aug 2007 01:02:16 -0400 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote: > Firstly, things like backreferences are an absolute no-no. They > are not regular, and REs with them in cannot be converted to DFAs. > That could be 'solved' by a parser that kicked out such constructions, > but it would get screams from many users. People keep saying things like this as if GNU grep and tcl's regular expression matchers didn't exist. See http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm for example. time python -c 'import re; print re.match("("+"a?"*26+"a"*26+")b\\1", "a"*26+"b"+"a"*26).group(0)' aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa real 0m5.913s user 0m5.905s sys 0m0.006s time echo 'aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa' | grep -E '(a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a? aaaaaaaaaaaaaaaaaaaaaaaaaa)b\1' aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa real 0m0.002s user 0m0.002s sys 0m0.000s James From greg.ewing at canterbury.ac.nz Fri Aug 10 09:40:28 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 10 Aug 2007 19:40:28 +1200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References:

Message-ID: <46BC166C.7000300@canterbury.ac.nz> James Y Knight wrote: > On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote: > > Firstly, things like backreferences are an absolute no-no. They > > are not regular, and REs with them in cannot be converted to DFAs. > > People keep saying things like this as if GNU grep and tcl's regular > expression matchers didn't exist. But do these work by conversion to a DFA? -- Greg From nmm1 at cus.cam.ac.uk Fri Aug 10 10:12:33 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Fri, 10 Aug 2007 09:12:33 +0100 Subject: [Python-Dev] Regular expressions, Unicode etc. Message-ID: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= wrote: > > Your specification was "For Unicode, whatever people agree!" > > I would not call that "Unicode-based". Can we drop this, please? I am happy to agree that I was being unclear (it is a common failing of mine), but I did provide the specification I coded. Specifically, and in full, I said: For Unicode, whatever people agree! I use the criterion that it has a defined category that doesn't start with 'C' - which is what I think that most people will accept. That is equivalent to the definition you gave. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From nmm1 at cus.cam.ac.uk Fri Aug 10 10:23:42 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Fri, 10 Aug 2007 09:23:42 +0100 Subject: [Python-Dev] Unicode database Message-ID: =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= wrote: > > Sure. But (again): you don't need to have the mappings at all for > what you want to achieve. So there is no point in downloading them Sigh. No, I don't. But, if I want to be able to merge anything back into the main Python source, it is a VERY good idea to use the existing mechanisms and not invent new ones. The easiest thing would have been to hack re.py to create a Unicode table using unicodedata.py directly, and that would indeed be a rather cleaner solution in the long term. But it would have meant that there were now multiple different ways of generating the Unicode data for _sre.c, and that would have led to inconsistencies. As I pointed out, there is already a problem where upgrading the data needs a complete rebuild to get all of the Unicode data back in step; 'make all' in itself does not work. That is precisely the sort of problem that is caused by having duplicate update mechanisms. Now, IF I can work out how the _sre.c engine works enough to put atomic/possessive quantifiers in, this problem will return. My question would be how best to make a suitable proposal that, inter alia, includes changes that can't be made by the normal building mechanisms. And I still don't have a clue about that one. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From nmm1 at cus.cam.ac.uk Fri Aug 10 10:28:58 2007 From: nmm1 at cus.cam.ac.uk (Nick Maclaren) Date: Fri, 10 Aug 2007 09:28:58 +0100 Subject: [Python-Dev] Regular expressions, Unicode etc. Message-ID: James Y Knight wrote: > > > Firstly, things like backreferences are an absolute no-no. They > > are not regular, and REs with them in cannot be converted to DFAs. > > That could be 'solved' by a parser that kicked out such constructions, > > but it would get screams from many users. > > People keep saying things like this as if GNU grep and tcl's regular > expression matchers didn't exist. > See http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm for example. PCRE also has a breadth-first engine, but it does not convert the NFA to a DFA (its author is a close colleague of mine). Those engines won't do the conversion, either, and I am prepared to bet that I could produce a pattern that would either run very slowly or expose the semantics differences in most of them. I did NOT say that there were not, alternative, approaches. What I said was correct - you cannot convert such extended expressions to DFAs. You can convert them to things that are sort of NFA/DFA hybrids, which might or might not be a good way to proceed. Regards, Nick Maclaren, University of Cambridge Computing Service, New Museums Site, Pembroke Street, Cambridge CB2 3QH, England. Email: nmm1 at cam.ac.uk Tel.: +44 1223 334761 Fax: +44 1223 334679 From guido at python.org Fri Aug 10 20:23:45 2007 From: guido at python.org (Guido van Rossum) Date: Fri, 10 Aug 2007 11:23:45 -0700 Subject: [Python-Dev] Universal newlines support in Python 3.0 Message-ID: Python 3.0 currently has limited universal newlines support: by default, \r\n is translated into \n for text files, but this can be controlled by the newline= keyword parameter. For details on how, see PEP 3116. The PEP prescribes that a lone \r must also be translated, though this hasn't been implemented yet (any volunteers?). However, the old universal newlines feature also set an attibute named 'newlines' on the file object to a tuple of up to three elements giving the actual line endings that were observed on the file so far (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not implemented. I'm tempted to kill it. Does anyone have a use case for this? Has anyone even ever used this? -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Fri Aug 10 21:15:45 2007 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 11 Aug 2007 04:15:45 +0900 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > However, the old universal newlines feature also set an attibute named > 'newlines' on the file object to a tuple of up to three elements > giving the actual line endings that were observed on the file so far > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > implemented. I'm tempted to kill it. Does anyone have a use case for > this? I have run into files that intentionally have more than one newline convention used (mbox and Babyl mail folders, with messages received from various platforms). However, most of the time multiple newline conventions is a sign that the file is either corrupt or isn't text. If so, then saving the file may corrupt it. The newlines attribute could be used to check for this condition. > Has anyone even ever used this? Not I. When I care about such issues I prefer that the codec raise an exception at the time of detection. From martin at v.loewis.de Sat Aug 11 01:01:19 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 11 Aug 2007 01:01:19 +0200 Subject: [Python-Dev] Unicode database In-Reply-To: References: Message-ID: <46BCEE3F.7020109@v.loewis.de> >> Sure. But (again): you don't need to have the mappings at all for >> what you want to achieve. So there is no point in downloading them > > Sigh. No, I don't. But, if I want to be able to merge anything > back into the main Python source, it is a VERY good idea to use the > existing mechanisms and not invent new ones. I think you still don't understand. Why I keep calling "mappings" is *unrelated* to unicodedata. unicodedata is a different database, and not related at all to the makefile. It never was. > As I pointed out, there is already a problem where upgrading the data > needs a complete rebuild to get all of the Unicode data back in step; > 'make all' in itself does not work. That is precisely the sort of > problem that is caused by having duplicate update mechanisms. Right. Downloading the necessary files is a completely manual process, not supported at all by "make all", which is designed to do something entirely different. > Now, IF I can work out how the _sre.c engine works enough to put > atomic/possessive quantifiers in, this problem will return. My > question would be how best to make a suitable proposal that, inter > alia, includes changes that can't be made by the normal building > mechanisms. > > And I still don't have a clue about that one. You lost me somewhere. What are "changes that can't be made by the normal building process", and what is "this problem" that will return? Regards, Martin From greg.ewing at canterbury.ac.nz Sat Aug 11 03:28:07 2007 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 11 Aug 2007 13:28:07 +1200 Subject: [Python-Dev] Regular expressions, Unicode etc. In-Reply-To: References: Message-ID: <46BD10A7.4030708@canterbury.ac.nz> Nick Maclaren wrote: > You can convert them to things that are sort of NFA/DFA > hybrids, If you could express it as an NFA, then you could (in principle) convert it to a DFA. So whatever it's using can't be an NFA either. -- Greg From kbk at shore.net Sat Aug 11 04:08:34 2007 From: kbk at shore.net (Kurt B. Kaiser) Date: Fri, 10 Aug 2007 22:08:34 -0400 (EDT) Subject: [Python-Dev] Weekly Python Patch/Bug Summary Message-ID: <200708110208.l7B28YEW028649@hampton.thirdcreek.com> Patch / Bug Summary ___________________ Patches : 404 open ( +0) / 3855 closed ( +8) / 4259 total ( +8) Bugs : 1065 open ( +6) / 6790 closed ( +6) / 7855 total (+12) RFE : 263 open ( +0) / 295 closed ( +0) / 558 total ( +0) New / Reopened Patches ______________________ MSVC++8 x86 tkinter build patch for trunk (2007-08-05) http://python.org/sf/1767787 opened by brotchie test_asyncore fix (2007-08-05) CLOSED http://python.org/sf/1767834 opened by Hasan Diwan Fix for failing test_scriptpackages in py3k-struni (2007-08-07) CLOSED http://python.org/sf/1768976 opened by Antti Rasinen Fix for failing test_plistlib in py3k-struni (2007-08-07) CLOSED http://python.org/sf/1769016 opened by brotchie struni: test_xml_etree_c (2007-08-08) CLOSED http://python.org/sf/1769767 opened by Joe Gregorio Remove cStringIO usage (2007-08-08) CLOSED http://python.org/sf/1770008 reopened by tiran Remove cStringIO usage (2007-08-08) CLOSED http://python.org/sf/1770008 opened by Christian Heimes ctypes: c_char now uses bytes and not str (unicode) (2007-08-08) CLOSED http://python.org/sf/1770355 opened by STINNER Victor Misc improvements for the io module (2007-08-10) http://python.org/sf/1771364 opened by Christian Heimes Patches Closed ______________ test_asyncore fix (2007-08-05) http://python.org/sf/1767834 closed by gvanrossum test_csv struni fixes + unicode support in _csv (2007-08-03) http://python.org/sf/1767398 closed by gvanrossum urllib2-howto - correction (2007-08-02) http://python.org/sf/1765839 closed by gbrandl Fix for failing test_scriptpackages in py3k-struni (2007-08-06) http://python.org/sf/1768976 closed by nnorwitz Fix for failing test_plistlib in py3k-struni (2007-08-07) http://python.org/sf/1769016 closed by gvanrossum struni: test_xml_etree_c (2007-08-07) http://python.org/sf/1769767 closed by nnorwitz Remove cStringIO usage (2007-08-08) http://python.org/sf/1770008 closed by gvanrossum Remove cStringIO usage (2007-08-08) http://python.org/sf/1770008 closed by gvanrossum ctypes: c_char now uses bytes and not str (unicode) (2007-08-08) http://python.org/sf/1770355 closed by haypo New / Reopened Bugs ___________________ SocketServer.DatagramRequestHandler (2007-08-04) http://python.org/sf/1767511 opened by Alzheimer Badly formed XML using etree and utf-16 (2007-08-05) http://python.org/sf/1767933 opened by BugoK Byte code WITH_CLEANUP missing, MAKE_CLOSURE wrong (2007-08-05) http://python.org/sf/1768121 opened by L. Peter Deutsch tutorial (2007-08-06) CLOSED http://python.org/sf/1768767 opened by Michael R Bax Python - Operation time out problem (2007-08-06) http://python.org/sf/1768858 opened by MASK A paragraph about packages should be updated. (2007-08-07) CLOSED http://python.org/sf/1769002 opened by Noam Raphael decimal.Decimal("trash") produces informationless exception (2007-08-08) http://python.org/sf/1770009 opened by John Machin platform.mac_ver() returning incorrect patch version (2007-08-08) http://python.org/sf/1770190 opened by Gus Tabares Decimal.__int__ overflows for large values (2007-08-08) http://python.org/sf/1770416 opened by Jason G words able to decode but unable to encode in GB18030 (2007-08-09) http://python.org/sf/1770551 opened by Z-flagship Errors in site.py not reported properly (2007-08-09) http://python.org/sf/1771260 opened by Adam Olsen bsddb can't use unicode keys (2007-08-10) http://python.org/sf/1771381 opened by Erol Aktay another 'nothing to repeat' (2007-08-10) CLOSED http://python.org/sf/1771483 opened by viciousdog minor bug in turtle (2007-08-10) CLOSED http://python.org/sf/1771558 opened by Jeremy Sanders Bugs Closed ___________ String.capwords() does not capitalize first word (2007-08-03) http://python.org/sf/1767363 closed by gbrandl subprocess.Popen.wait fails sporadically with threads (2007-07-16) http://python.org/sf/1754642 closed by gbrandl subprocess raising "No Child Process" OSError (2007-07-14) http://python.org/sf/1753891 closed by gbrandl tutorial (2007-08-06) http://python.org/sf/1768767 deleted by mrbax A paragraph about packages should be updated. (2007-08-07) http://python.org/sf/1769002 closed by gbrandl cStringIO no longer accepts array.array objects (2007-06-03) http://python.org/sf/1730114 closed by gbrandl another 'nothing to repeat' (2007-08-10) http://python.org/sf/1771483 deleted by viciousdog minor bug in turtle (2007-08-10) http://python.org/sf/1771558 closed by gbrandl From g.brandl at gmx.net Sat Aug 11 11:08:24 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 11 Aug 2007 11:08:24 +0200 Subject: [Python-Dev] Exception pickling patch Message-ID: Can somebody please review this patch: https://sourceforge.net/support/tracker.php?aid=1692335 It aims to fix the pickling of exceptions whose __init__ methods don't call Exception.__init__ at all, or with a different number of arguments. This should be fixed before 2.5.2. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From tony at PageDNA.com Sat Aug 11 18:45:37 2007 From: tony at PageDNA.com (Tony Lownds) Date: Sat, 11 Aug 2007 09:45:37 -0700 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: Message-ID: On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote: > Python 3.0 currently has limited universal newlines support: by > default, \r\n is translated into \n for text files, but this can be > controlled by the newline= keyword parameter. For details on how, see > PEP 3116. The PEP prescribes that a lone \r must also be translated, > though this hasn't been implemented yet (any volunteers?). > I'm working on this, but now I'm not sure how the file is supposed to be read when the newline parameter is \r or \r\n. Here's the PEP language: buffer is a reference to the BufferedIOBase object to be wrapped with the TextIOWrapper. encoding refers to an encoding to be used for translating between the byte-representation and character-representation. If it is None, then the system's locale setting will be used as the default. newline can be None, '\n', '\r', or '\r\n' (all other values are illegal); it indicates the translation for '\n' characters written. If None, a system-specific default is chosen, i.e., '\r\n' on Windows and '\n' on Unix/Linux. Setting newline='\n' on input means that no CRLF translation is done; lines ending in '\r\n' will be returned as '\r\n'. ('\r' support is still needed for some OSX applications that produce files using '\r' line endings; Excel (when exporting to text) and Adobe Illustrator EPS files are the most common examples. Is this ok: when newline='\r\n' or newline='\r' is passed, only that string is used to determine the end of lines. No translation to '\n' is done. > However, the old universal newlines feature also set an attibute named > 'newlines' on the file object to a tuple of up to three elements > giving the actual line endings that were observed on the file so far > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > implemented. I'm tempted to kill it. Does anyone have a use case for > this? Has anyone even ever used this? > This strikes me as a pragmatic feature, making it easy to read a file and write back the same line ending. I can include in patch. http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 +show:cz2Fhijwr3s:yutdXigOmYY:YDns9IyEkLQ&sa=N&cd=12&ct=rc&cs_p=http://f tp.gnome.org/pub/gnome/sources/meld/1.0/ meld-1.0.0.tar.bz2&cs_f=meld-1.0.0/filediff.py#a0 http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 +show:SLyZnjuFadw:kOTmKU8aU2I:VX_dFr3mrWw&sa=N&cd=37&ct=rc&cs_p=http://s vn.python.org/projects/ctypes/trunk&cs_f=ctypeslib/ctypeslib/ dynamic_module.py#a0 Thanks -Tony From guido at python.org Sat Aug 11 19:29:38 2007 From: guido at python.org (Guido van Rossum) Date: Sat, 11 Aug 2007 10:29:38 -0700 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References:

Message-ID: On 8/11/07, Tony Lownds wrote: > > On Aug 10, 2007, at 11:23 AM, Guido van Rossum wrote: > > > Python 3.0 currently has limited universal newlines support: by > > default, \r\n is translated into \n for text files, but this can be > > controlled by the newline= keyword parameter. For details on how, see > > PEP 3116. The PEP prescribes that a lone \r must also be translated, > > though this hasn't been implemented yet (any volunteers?). > > > > I'm working on this, but now I'm not sure how the file is supposed to > be read when > the newline parameter is \r or \r\n. Here's the PEP language: > > buffer is a reference to the BufferedIOBase object to be wrapped > with the TextIOWrapper. > encoding refers to an encoding to be used for translating between > the byte-representation > and character-representation. If it is None, then the system's > locale setting will be used > as the default. newline can be None, '\n', '\r', or '\r\n' (all > other values are illegal); > it indicates the translation for '\n' characters written. If None, > a system-specific default > is chosen, i.e., '\r\n' on Windows and '\n' on Unix/Linux. Setting > newline='\n' on input > means that no CRLF translation is done; lines ending in '\r\n' > will be returned as '\r\n'. > ('\r' support is still needed for some OSX applications that > produce files using '\r' line > endings; Excel (when exporting to text) and Adobe Illustrator EPS > files are the most common examples. > > Is this ok: when newline='\r\n' or newline='\r' is passed, only that > string is used to determine > the end of lines. No translation to '\n' is done. I *think* it would be more useful if it always returned lines ending in \n (not \r\n or \r). Wouldn't it? Although this is not how it currently behaves; when you set newline='\r\n', it returns the \r\n unchanged, so it would make sense to do this too when newline='\r'. Caveat user I guess. > > However, the old universal newlines feature also set an attibute named > > 'newlines' on the file object to a tuple of up to three elements > > giving the actual line endings that were observed on the file so far > > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > > implemented. I'm tempted to kill it. Does anyone have a use case for > > this? Has anyone even ever used this? > > > > This strikes me as a pragmatic feature, making it easy to read a file > and write back the same line ending. I can include in patch. OK, if you think you can, that's good. It's not always sufficient (not if there was a mix of line endings) but it's a start. > http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 > +show:cz2Fhijwr3s:yutdXigOmYY:YDns9IyEkLQ&sa=N&cd=12&ct=rc&cs_p=http://f > tp.gnome.org/pub/gnome/sources/meld/1.0/ > meld-1.0.0.tar.bz2&cs_f=meld-1.0.0/filediff.py#a0 > > http://www.google.com/codesearch?hl=en&q=+lang:python+%22.newlines%22 > +show:SLyZnjuFadw:kOTmKU8aU2I:VX_dFr3mrWw&sa=N&cd=37&ct=rc&cs_p=http://s > vn.python.org/projects/ctypes/trunk&cs_f=ctypeslib/ctypeslib/ > dynamic_module.py#a0 -- --Guido van Rossum (home page: http://www.python.org/~guido/) From tony at pagedna.com Sat Aug 11 20:41:08 2007 From: tony at pagedna.com (Tony Lownds) Date: Sat, 11 Aug 2007 11:41:08 -0700 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References:

Message-ID: On Aug 11, 2007, at 10:29 AM, Guido van Rossum wrote: >> Is this ok: when newline='\r\n' or newline='\r' is passed, only that >> string is used to determine >> the end of lines. No translation to '\n' is done. > > I *think* it would be more useful if it always returned lines ending > in \n (not \r\n or \r). Wouldn't it? Although this is not how it > currently behaves; when you set newline='\r\n', it returns the \r\n > unchanged, so it would make sense to do this too when newline='\r'. > Caveat user I guess. Because there's an easy way to translate, having the option to not translate apply to all valid newline values is probably more useful. I do think it's easier to define the behavior this way. > OK, if you think you can, that's good. It's not always sufficient (not > if there was a mix of line endings) but it's a start. Right -Tony From status at bugs.python.org Sun Aug 12 02:00:55 2007 From: status at bugs.python.org (Tracker) Date: Sun, 12 Aug 2007 00:00:55 +0000 (UTC) Subject: [Python-Dev] Summary of Tracker Issues Message-ID: <20070812000055.9ED9C781B4@psf.upfronthosting.co.za> ACTIVITY SUMMARY (08/05/07 - 08/12/07) Tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 1295 open ( +8) / 11130 closed ( +2) / 12425 total (+10) Average duration of open issues: 690 days. Median duration of open issues: 553 days. Open Issues Breakdown open 1295 ( +8) pending 0 ( +0) Issues Created Or Reopened (10) _______________________________ x 08/08/07 http://bugs.python.org/issue1000 created gbrandl MSVC++8 x86 tkinter build patch for trunk 08/05/07 http://bugs.python.org/issue1767787 created brotch test_asyncore fix 08/05/07 http://bugs.python.org/issue1767834 created hdiwan650 Badly formed XML using etree and utf-16 08/05/07 http://bugs.python.org/issue1767933 created bugok Byte code WITH_CLEANUP missing, MAKE_CLOSURE wrong 08/06/07 http://bugs.python.org/issue1768121 created lpd tutorial 08/06/07 CLOSED http://bugs.python.org/issue1768767 created mrbax Python - Operation time out problem 08/06/07 http://bugs.python.org/issue1768858 created mohammedsk Fix for failing test_scriptpackages in py3k-struni 08/07/07 CLOSED http://bugs.python.org/issue1768976 created arsatiki A paragraph about packages should be updated. 08/07/07 http://bugs.python.org/issue1769002 created noamr Fix for failing test_plistlib in py3k-struni 08/07/07 http://bugs.python.org/issue1769016 created brotch Issues Now Closed (7) _____________________ fix 1668596: copy datafiles properly when package_dir is ' ' 83 days http://bugs.python.org/issue1720897 loewis unicode(None,charset) raise TypeError 15 days http://bugs.python.org/issue1758804 sf-robot socket close fixed 6 days http://bugs.python.org/issue1763387 jyasskin urllib2-howto - correction 4 days http://bugs.python.org/issue1765839 gbrandl test_csv struni fixes + unicode support in _csv 3 days http://bugs.python.org/issue1767398 gvanrossum tutorial 0 days http://bugs.python.org/issue1768767 mrbax Fix for failing test_scriptpackages in py3k-struni 0 days http://bugs.python.org/issue1768976 nnorwitz -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20070812/d9278580/attachment.htm From martin at v.loewis.de Sun Aug 12 12:29:07 2007 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 12 Aug 2007 12:29:07 +0200 Subject: [Python-Dev] Dropping support for Win9x Message-ID: <46BEE0F3.7070904@v.loewis.de> I'd like to remove support for Windows 9x (95, 98(SE), ME) soon from the Python trunk. This would primarily affect all wide-string APIs (which would be considered present unconditionally), as well as certain "new" Win32 functions; in this cleanup, I would also drop support for NT+ before Windows 2000 (i.e. NT 3.1, 3.5(1), 4.0). I'm not sure whether w9xpopen should be left in place; Tim suggested it should, however, I don't see why (something with alternative command line interpreters IIRC). The 2.5 installer already gives a warning on W9x that this will be the last release. If you object to this plan, please speak up. Regards, Martin From p.f.moore at gmail.com Sun Aug 12 18:58:44 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Aug 2007 17:58:44 +0100 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References:

Message-ID: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> On 11/08/07, Guido van Rossum wrote: > On 8/11/07, Tony Lownds wrote: > > Is this ok: when newline='\r\n' or newline='\r' is passed, only that > > string is used to determine > > the end of lines. No translation to '\n' is done. > > I *think* it would be more useful if it always returned lines ending > in \n (not \r\n or \r). Wouldn't it? Although this is not how it > currently behaves; when you set newline='\r\n', it returns the \r\n > unchanged, so it would make sense to do this too when newline='\r'. > Caveat user I guess. Neither this wording, nor the PEP are clear to me, but I'm assuming/hoping that there will be a way to spell the current behaviour for universal newlines on input[1], namely that files can have *either* bare \n, *or* the combination \r\n, to delimit lines. Whichever is used (I have no need for mixed-style files) gets translated to \n so that the program sees the same data regardless. [1] ... at least the bit I care about :-) This behaviour is immensely useful for uniform treatment of Windows text files, which are an inconsistent mess of \n-only and \r\n conventions. Specifically, I'm looking to replicate this behaviour: >xxd crlf 0000000: 610d 0a62 0d0a a..b.. >xxd lf 0000000: 610a 620a a.b. >python Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> open('crlf').read() 'a\nb\n' >>> open('lf').read() 'a\nb\n' >>> As demonstrated, this is the default in Python 2.5. I'd hope it was so in 3.0 as well. Sorry I can't test this for myself - I don't have the time/toolset to build my own Py3k on Windows... Paul. From g.brandl at gmx.net Sun Aug 12 20:24:07 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 12 Aug 2007 20:24:07 +0200 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> References:

<79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> Message-ID: Paul Moore schrieb: > Specifically, I'm looking to replicate this behaviour: > >>xxd crlf > 0000000: 610d 0a62 0d0a a..b.. > >>xxd lf > 0000000: 610a 620a a.b. > >>python > Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit > (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. >>>> open('crlf').read() > 'a\nb\n' >>>> open('lf').read() > 'a\nb\n' >>>> > > As demonstrated, this is the default in Python 2.5. I'd hope it was so > in 3.0 as well. Note that Python does nothing special in the above case. For non-Windows platforms, you'd get two different results -- the conversion from \r\n to \n is done by the Windows C runtime since the default open() mode is text mode. Only with mode 'U' does Python use its own universal newline mode. With Python 3.0, the C library is not used and Python uses universal newline mode by default. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From p.f.moore at gmail.com Sun Aug 12 21:12:29 2007 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 12 Aug 2007 20:12:29 +0100 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References:

<79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> Message-ID: <79990c6b0708121212m2490d6f0tb151c3c1d5aa1ea3@mail.gmail.com> On 12/08/07, Georg Brandl wrote: > Note that Python does nothing special in the above case. For non-Windows > platforms, you'd get two different results -- the conversion from \r\n to > \n is done by the Windows C runtime since the default open() mode is text mode. > > Only with mode 'U' does Python use its own universal newline mode. Pah. You're right - I almost used 'U' and then "discovered" that I didn't need it (and got bitten by a portability bug as a result :-() > With Python 3.0, the C library is not used and Python uses universal newline > mode by default. That's what I expected, but I was surprised to find that the PEP is pretty unclear on this. The phrase "universal newlines" is mentioned only once, and never defined. Knowing the meaning, I can see how the PEP is intended to say that universal newlines on input is the default (and you set the newline argument to specify a *specific*, non-universal, newline value) - but I missed it on first reading. Thanks for the clarification. Paul. From skip at pobox.com Mon Aug 13 18:55:26 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 13 Aug 2007 11:55:26 -0500 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: <79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> References:

<79990c6b0708120958p588aabd1ic6dadf2f65de86d3@mail.gmail.com> Message-ID: <18112.36094.979628.85609@montanaro.dyndns.org> Paul> ... that files can have *either* bare \n, *or* the combination Paul> \r\n, to delimit lines. As someone else pointed out, \r needs to be supported as well. Many Mac applications (Excel comes to mind) still emit text files with \r as the line terminator. Skip From guido at python.org Mon Aug 13 19:25:41 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 10:25:41 -0700 Subject: [Python-Dev] Python 3000 Sprint @ Google Message-ID: It's official! The second annual Python Sprint @ Google is happening again: August 22-25 (Wed-Sat). We're sprinting at two locations, this time Google headquarters in Mountain View and the Google office in Chicago (thanks to Brian Fitzpatrick). We'll connect the two sprints with full-screen videoconferencing. The event is *free* and includes Google's *free gourmet food*. Anyone with a reasonable Python experience is invited to attend. The primary goal is to work on Python 3000, to polish off the first alpha release; other ideas are welcome too. Experienced Python core developers will be available for mentoring. (The goal is not to learn Python; it is to learn *contributing* to Python.) For more information and to sign up, please see the wiki page on python.org: http://wiki.python.org/moin/GoogleSprint Sign-up via the wiki page is strongly recommended to avoid lines getting badges. Please read the whole wiki page to make sure you're prepared. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From trentm at activestate.com Mon Aug 13 19:37:31 2007 From: trentm at activestate.com (Trent Mick) Date: Mon, 13 Aug 2007 10:37:31 -0700 Subject: [Python-Dev] Dropping support for Win9x In-Reply-To: <46BEE0F3.7070904@v.loewis.de> References: <46BEE0F3.7070904@v.loewis.de> Message-ID: <46C096DB.9070608@activestate.com> Martin v. L?wis wrote: > I'd like to remove support for Windows 9x (95, 98(SE), ME) > soon from the Python trunk. This would primarily affect all > wide-string APIs (which would be considered present > unconditionally), as well as certain "new" Win32 functions; > in this cleanup, I would also drop support for NT+ before > Windows 2000 (i.e. NT 3.1, 3.5(1), 4.0). > > I'm not sure whether w9xpopen should be left in place; Tim > suggested it should, however, I don't see why (something > with alternative command line interpreters IIRC). I'm not entirely sure, but I think that w9xpopen gets used when it looks like the shell (as per %ComSpec%) is command.com rather than cmd.exe. My understanding is that a Win9x machine *upgraded* (rather than a clean re-install) to Win2k or XP will retain command.com as the %ComSpec% setting. If so, that *might* be sufficient reason to keep w9xpopen around. I don't have a strong opinion though: I'm all for dropping win9x support and would be happy with either a doc note that users need to ensure they aren't using command.com, or a warning in the installer if this is detected. Trent -- Trent Mick trentm at activestate.com From rowen at cesmail.net Mon Aug 13 19:46:08 2007 From: rowen at cesmail.net (Russell E Owen) Date: Mon, 13 Aug 2007 10:46:08 -0700 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>, "Stephen J. Turnbull" wrote: > Guido van Rossum writes: > > > However, the old universal newlines feature also set an attibute named > > 'newlines' on the file object to a tuple of up to three elements > > giving the actual line endings that were observed on the file so far > > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > > implemented. I'm tempted to kill it. Does anyone have a use case for > > this? > > I have run into files that intentionally have more than one newline > convention used (mbox and Babyl mail folders, with messages received > from various platforms). However, most of the time multiple newline > conventions is a sign that the file is either corrupt or isn't text. > If so, then saving the file may corrupt it. The newlines attribute > could be used to check for this condition. There is at least one Mac source code editor (SubEthaEdit) that is all too happy to add one kind of newline to a file that started out with a different line ending character. As a result I have seen a fair number of text files with mixed line endings. I don't see as many these days, though; perhaps because the current version of SubEthaEdit handles things a bit better. So perhaps it won't matter much for Python 3000. -- Russell From guido at python.org Mon Aug 13 22:15:03 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 13:15:03 -0700 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8/13/07, Russell E Owen wrote: > In article <87wsw3p5em.fsf at uwakimon.sk.tsukuba.ac.jp>, > "Stephen J. Turnbull" wrote: > > > Guido van Rossum writes: > > > > > However, the old universal newlines feature also set an attibute named > > > 'newlines' on the file object to a tuple of up to three elements > > > giving the actual line endings that were observed on the file so far > > > (\r, \n, or \r\n). This feature is not in PEP 3116, and it is not > > > implemented. I'm tempted to kill it. Does anyone have a use case for > > > this? > > > > I have run into files that intentionally have more than one newline > > convention used (mbox and Babyl mail folders, with messages received > > from various platforms). However, most of the time multiple newline > > conventions is a sign that the file is either corrupt or isn't text. > > If so, then saving the file may corrupt it. The newlines attribute > > could be used to check for this condition. > > There is at least one Mac source code editor (SubEthaEdit) that is all > too happy to add one kind of newline to a file that started out with a > different line ending character. As a result I have seen a fair number > of text files with mixed line endings. I don't see as many these days, > though; perhaps because the current version of SubEthaEdit handles > things a bit better. So perhaps it won't matter much for Python 3000. I've seen similar behavior in MS VC++ (long ago, dunno what it does these days). It would read files with \r\n and \n line endings, and whenever you edited a line, that line also got a \r\n ending. But unchanged lines that started out with \n-only endings would keep the \n only. And there was no way for the end user to see or control this. To emulate this behavior in Python you'd have to read the file in binary mode *or* we'd have to have an additional flag specifying to return line endings as encountered in the file. The newlines attribute (as defined in 2.x) doesn't help, because it doesn't tell which lines used which line ending. I think the newline feature in PEP 3116 falls short too; it seems mostly there to override the line ending *written* (from the default os.sep). I think we may need different flags for input and for output. For input, we'd need two things: (a) which are acceptable line endings; (b) whether to translate acceptable line endings to \n or not. For output, we need two things again: (c) whether to translate line endings at all; (d) which line endings to translate. I guess we could map (c) to (b) and (d) to (a) for a signature that's the same for input and output (and makes sense for read+write files as well). The default would be (a)=={'\n', '\r\n', '\r'} and (b)==True. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gh at ghaering.de Mon Aug 13 23:25:14 2007 From: gh at ghaering.de (=?ISO-8859-1?Q?Gerhard_H=E4ring?=) Date: Mon, 13 Aug 2007 23:25:14 +0200 Subject: [Python-Dev] Python 3000: confused about str8, str, bytes Message-ID: <46C0CC3A.70809@ghaering.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I maintain the sqlite module in the standard library, which makes heavy use of PyString_* C API. Now I've made it work under Python 3000 insofar as tests pass, but the new Python string semantics mean I have more work to do here and make some API choices. I've read in another thread that the future of str8 is not decided yet. To be honest I was confused when I saw it first, it's documented nowhere as far as I can see. Is that decided yet? Is str8 going away? What will happen with the Python C API? Will PyString_* become what PyUnicode_* is in Python 2.x? - -- Gerhard -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGwMw6dIO4ozGCH14RAoyEAJ0eoqZ8gSqKh5/HIXxhbG5xpMedLgCgquQV Qv+CGyoD8eSXaoAKzn2WBSM= =w4HB -----END PGP SIGNATURE----- From brett at python.org Tue Aug 14 00:13:10 2007 From: brett at python.org (Brett Cannon) Date: Mon, 13 Aug 2007 15:13:10 -0700 Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: References: Message-ID: On 8/13/07, Guido van Rossum wrote: > It's official! The second annual Python Sprint @ Google is happening > again: August 22-25 (Wed-Sat). I can't attend this year (damn doctor's appt.), but I will try to be on Google Talk (username of bcannon) in case I can help out somehow remotely. -Brett From guido at python.org Tue Aug 14 01:31:32 2007 From: guido at python.org (Guido van Rossum) Date: Mon, 13 Aug 2007 16:31:32 -0700 Subject: [Python-Dev] Python 3000: confused about str8, str, bytes In-Reply-To: <46C0CC3A.70809@ghaering.de> References: <46C0CC3A.70809@ghaering.de> Message-ID: When I said it wasn't decided I was totally serious. No decision has been reached. However, I strongly recommend that you try to write all your code using PyUnicode and PyBytes, avoiding PyString completely. Even if str8/PyString will remain in existence, it will be a last resort type for backwards compatibility. We *may* at some point rename PyUnicode to PyString (or PyText, to avoid confusion), but don't count on this. If we do, we'll provide a tool or service to do the conversion for you. --Guido On 8/13/07, Gerhard H?ring wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I maintain the sqlite module in the standard library, which makes heavy use > of PyString_* C API. Now I've made it work under Python 3000 insofar as > tests pass, but the new Python string semantics mean I have more work to do > here and make some API choices. > > I've read in another thread that the future of str8 is not decided yet. To > be honest I was confused when I saw it first, it's documented nowhere as > far as I can see. > > Is that decided yet? Is str8 going away? > > What will happen with the Python C API? Will PyString_* become what > PyUnicode_* is in Python 2.x? > > - -- Gerhard > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.6 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFGwMw6dIO4ozGCH14RAoyEAJ0eoqZ8gSqKh5/HIXxhbG5xpMedLgCgquQV > Qv+CGyoD8eSXaoAKzn2WBSM= > =w4HB > -----END PGP SIGNATURE----- > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Tue Aug 14 15:58:32 2007 From: barry at python.org (Barry Warsaw) Date: Tue, 14 Aug 2007 09:58:32 -0400 Subject: [Python-Dev] [Python-3000] Universal newlines support in Python 3.0 In-Reply-To: References: <87wsw3p5em.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 13, 2007, at 4:15 PM, Guido van Rossum wrote: > I've seen similar behavior in MS VC++ (long ago, dunno what it does > these days). It would read files with \r\n and \n line endings, and > whenever you edited a line, that line also got a \r\n ending. But > unchanged lines that started out with \n-only endings would keep the > \n only. And there was no way for the end user to see or control this. > > To emulate this behavior in Python you'd have to read the file in > binary mode *or* we'd have to have an additional flag specifying to > return line endings as encountered in the file. The newlines attribute > (as defined in 2.x) doesn't help, because it doesn't tell which lines > used which line ending. I think the newline feature in PEP 3116 falls > short too; it seems mostly there to override the line ending *written* > (from the default os.sep). > > I think we may need different flags for input and for output. > > For input, we'd need two things: (a) which are acceptable line > endings; (b) whether to translate acceptable line endings to \n or > not. For output, we need two things again: (c) whether to translate > line endings at all; (d) which line endings to translate. I guess we > could map (c) to (b) and (d) to (a) for a signature that's the same > for input and output (and makes sense for read+write files as well). > The default would be (a)=={'\n', '\r\n', '\r'} and (b)==True. I haven't thought about the output side of the equation, but I've already hit a situation where I'd like to see the input side (b) option implemented. I'm still sussing out the email package changes (down to 7F/9E of 247 tests!) but in trying to fix things I found myself wanting to open files in text mode so that I got strings out of the file instead of bytes. This was all fine except that some of the tests started failing because of the EOL translation that happens unconditionally now. The file contained \r\n and the test was ensuring these EOLs were preserved in the parsed text. I switched back to opening the file in binary mode, and doing a crufty conversion of bytes to strings (which I suspect is error prone but gets me farther along). It would have been perfect, I think, if I could have opened the file in text mode so that read() gave me strings, with universal newlines and preservation of line endings (i.e. no translation to \n). - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsG1CXEjvBPtnXfVAQKF3AP/X+/E44KI2EB3w0i3N5cGBCajJbMV93fk j2S/lfQf4tjBH3ZFEhUnybcJxsNukYY65T4MdzKh+IgJHV5s0rQtl2Hzr85e7Y0O i5Z3N4TAKc11PjSIk6vKrkgwPCEMzvwIQ5DFxeQBF5kOF6cZuXKaeDzB6z/GBYNv YiJEnOeZkW8= =u6OL -----END PGP SIGNATURE----- From alan.mcintyre at gmail.com Tue Aug 14 20:03:06 2007 From: alan.mcintyre at gmail.com (Alan McIntyre) Date: Tue, 14 Aug 2007 14:03:06 -0400 Subject: [Python-Dev] SimpleXMLRPCServer failure on G4 OS X Message-ID: <1d36917a0708141103j493d558crdf7440cdcae2061a@mail.gmail.com> Hi all, There are some new tests for xmlrpclib/SimpleXMLRPCServer that fail only on the G4 OS X buildbot. Unfortunately, the SimpleXMLRPCServer returns a vanilla 500 error for any local exceptions, so there's no obvious (to me) way to find out what's going wrong without having a G4 Mac on hand (which I don't). Can anybody recommend any way to look into this, or suggest any G4-specific issues I could check for? Thanks, Alan From lists at cheimes.de Tue Aug 14 22:46:56 2007 From: lists at cheimes.de (Christian Heimes) Date: Tue, 14 Aug 2007 22:46:56 +0200 Subject: [Python-Dev] Documentation switch imminent In-Reply-To: References: Message-ID: <46C214C0.3050703@cheimes.de> Georg Brandl wrote: > Infos for people who will write docs in the new trees can be found in the > new "Documenting Python" document, at the moment still available from > http://pydoc.gbrandl.de:3000/documenting/, especially the "Differences" > section at http://pydoc.gbrandl.de:3000/documenting/fromlatex/ (which > is not complete, patches are welcome :) http://pydoc.gbrandl.de:3000/documenting/fromlatex/ doesn't work for me: Keyword Not Found The keyword documenting/fromlatex is not directly associated with a page. Christian From brett at python.org Wed Aug 15 01:57:19 2007 From: brett at python.org (Brett Cannon) Date: Tue, 14 Aug 2007 16:57:19 -0700 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/14/07, Georg Brandl wrote: > Now that the converted documentation is fairly bug-free, I want to > make the switch. > > I will replace the old Doc/ trees in the trunk and py3k branches > tomorrow, moving over the reST ones found at > svn+ssh://svn.python.org/doctools/Doc-{26,3k}. First, that address is wrong; missing a 'trunk' in there. Second, are we going to keep the docs in a separate tree forever, or is this just for now? I am not thinking so much about the tools, but whether we will need to do two separate commits in order to make code changes *and* change the docs? Or are you going to add an externals dependency in the trees to their respective doc directories? -Brett From brett at python.org Wed Aug 15 19:16:10 2007 From: brett at python.org (Brett Cannon) Date: Wed, 15 Aug 2007 10:16:10 -0700 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/15/07, Georg Brandl wrote: > Brett Cannon schrieb: > > On 8/14/07, Georg Brandl wrote: > >> Now that the converted documentation is fairly bug-free, I want to > >> make the switch. > >> > >> I will replace the old Doc/ trees in the trunk and py3k branches > >> tomorrow, moving over the reST ones found at > >> svn+ssh://svn.python.org/doctools/Doc-{26,3k}. > > > > First, that address is wrong; missing a 'trunk' in there. > > Sorry again. > Not a problem. I also noticed, though, that the user (pythondev) is missing as well. =) > > Second, are we going to keep the docs in a separate tree forever, or > > is this just for now? > > They will be moved (in a few minutes...) to the location where the > Latex docs are now. > Yep, just did an update. > > I am not thinking so much about the tools, but > > whether we will need to do two separate commits in order to make code > > changes *and* change the docs? Or are you going to add an externals > > dependency in the trees to their respective doc directories? > > No separate commits will be needed to commit changes to the docs. > However, the tool to build the docs will not be in the tree under Doc/, > but continue to be maintained in the doctools/ toplevel project. > OK. > I spoke with Martin about including them as externals, but we agreed that > they are not needed and cost too much time on every "svn up". Instead, > the Doc/ makefile checks out the tools in a separate directory and runs > them from there. (The Doc/README.txt file explains this in more detail.) Seems simple enough! Thanks again for doing this, Georg (and the doc SIG)! -Brett From martin at v.loewis.de Wed Aug 15 19:51:02 2007 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 15 Aug 2007 19:51:02 +0200 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: <46C33D06.9030607@v.loewis.de> > Okay, I made the switch. I tagged the state of both Python branches > before the switch as tags/py{26,3k}-before-rstdocs/. Update instructions: 1. svn diff Doc; any pending changes will need to be redone 2. svn up; this will remove the tex sources, and then likely fail if there were still other files present in Doc, e.g. from building the documentation 3. review any files left in Doc 4. rm -rf Doc 5. svn up If you are certain there is nothing of interest in your sandbox copy of Doc, you can start with step 4. Regards, Martin From brett at python.org Wed Aug 15 23:40:22 2007 From: brett at python.org (Brett Cannon) Date: Wed, 15 Aug 2007 14:40:22 -0700 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: <46C33D06.9030607@v.loewis.de> References: <46C33D06.9030607@v.loewis.de> Message-ID: On 8/15/07, "Martin v. L?wis" wrote: > > Okay, I made the switch. I tagged the state of both Python branches > > before the switch as tags/py{26,3k}-before-rstdocs/. > > Update instructions: > > 1. svn diff Doc; any pending changes will need to be redone > 2. svn up; this will remove the tex sources, and then likely > fail if there were still other files present in Doc, e.g. > from building the documentation > 3. review any files left in Doc > 4. rm -rf Doc > 5. svn up > > If you are certain there is nothing of interest in your sandbox > copy of Doc, you can start with step 4. Why the 'rm' call? When I did ``svn update`` it deleted the files for me. Is this to ditch some metadata? -Brett From martin at v.loewis.de Wed Aug 15 23:54:01 2007 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 15 Aug 2007 23:54:01 +0200 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: References: <46C33D06.9030607@v.loewis.de> Message-ID: <46C375F9.3020908@v.loewis.de> >> 1. svn diff Doc; any pending changes will need to be redone >> 2. svn up; this will remove the tex sources, and then likely >> fail if there were still other files present in Doc, e.g. >> from building the documentation >> 3. review any files left in Doc >> 4. rm -rf Doc >> 5. svn up >> >> If you are certain there is nothing of interest in your sandbox >> copy of Doc, you can start with step 4. > > Why the 'rm' call? When I did ``svn update`` it deleted the files for > me. Is this to ditch some metadata? No, it's to delete any files in this tree not under version control, see step 2. If you had any such files, step 2 would abort with an error message svn: Konnte Verzeichnis ?Doc? nicht hinzuf?gen: ein Objekt mit demselben Namen existiert bereits (or some such) Regards, Martin From janssen at parc.com Thu Aug 16 03:29:37 2007 From: janssen at parc.com (Bill Janssen) Date: Wed, 15 Aug 2007 18:29:37 PDT Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: References: Message-ID: <07Aug15.182939pdt."57996"@synergy1.parc.xerox.com> I'd really like an excuse to implement server-side SSL support one of these days. Could that be a sprint activity? Probably against 2.6 (I doubt the Modules/_ssl.c file will change much for 3K). The idea is that if you call socket.ssl() on a socket that's bound to an address, the socket is assumed to be server-side, the cert passed in is assumed to be a server-side cert, and the SSLObject returned has a couple of extra methods, listen() and accept(). Calling accept() does the SSL dance with the remote side, and returns an SSLObject. Does this need a PEP? Bill From guido at python.org Thu Aug 16 04:03:31 2007 From: guido at python.org (Guido van Rossum) Date: Wed, 15 Aug 2007 19:03:31 -0700 Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: <2235611917962225454@unknownmsgid> References: <2235611917962225454@unknownmsgid> Message-ID: Sounds like a good plan. I'm not a great coach though since I didn't write _ssl.c and I've never used openssl directly. But I can help you with the Python stuff of course! --Guido On 8/15/07, Bill Janssen wrote: > I'd really like an excuse to implement server-side SSL support one of > these days. Could that be a sprint activity? Probably against 2.6 (I > doubt the Modules/_ssl.c file will change much for 3K). > > The idea is that if you call socket.ssl() on a socket that's bound to > an address, the socket is assumed to be server-side, the cert passed > in is assumed to be a server-side cert, and the SSLObject returned has > a couple of extra methods, listen() and accept(). Calling accept() does > the SSL dance with the remote side, and returns an SSLObject. > > Does this need a PEP? > > Bill > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From janssen at parc.com Thu Aug 16 04:45:55 2007 From: janssen at parc.com (Bill Janssen) Date: Wed, 15 Aug 2007 19:45:55 PDT Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: References: <2235611917962225454@unknownmsgid> Message-ID: <07Aug15.194559pdt."57996"@synergy1.parc.xerox.com> > Sounds like a good plan. I'm not a great coach though since I didn't > write _ssl.c and I've never used openssl directly. But I can help you > with the Python stuff of course! Thanks (though I think I can handle the Python end of it, too :-). It's been a while since I wrote any Python C code, though -- are there better tools these days for debugging reference counting? Anyone know? Bill From nnorwitz at gmail.com Thu Aug 16 05:13:04 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 15 Aug 2007 20:13:04 -0700 Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: <-8815054654740213484@unknownmsgid> References: <2235611917962225454@unknownmsgid> <-8815054654740213484@unknownmsgid> Message-ID: On 8/15/07, Bill Janssen wrote: > > Sounds like a good plan. I'm not a great coach though since I didn't > > write _ssl.c and I've never used openssl directly. But I can help you > > with the Python stuff of course! > > Thanks (though I think I can handle the Python end of it, too :-). > > It's been a while since I wrote any Python C code, though -- are there > better tools these days for debugging reference counting? Anyone know? The way I typically do it is to configure --with-pydebug. That shows the ref count in the interpreter and allows running tests with the -R flag to regrtest. When regrtest reports leaks, narrow down the (Python) code which causes a leak using bisection, find the C code which corresponds, and visually inspect the C code. Most leaks are pretty obvious this way. With good tests, this doesn't take much time. For pure memory leaks, valgrind works pretty well. n From foom at fuhm.net Thu Aug 16 06:06:32 2007 From: foom at fuhm.net (James Y Knight) Date: Thu, 16 Aug 2007 00:06:32 -0400 Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: <07Aug15.182939pdt."57996"@synergy1.parc.xerox.com> References: <07Aug15.182939pdt."57996"@synergy1.parc.xerox.com> Message-ID: <118A99D6-8BBB-47A2-A2F1-36C82931C610@fuhm.net> On Aug 15, 2007, at 9:29 PM, Bill Janssen wrote: > I'd really like an excuse to implement server-side SSL support one of > these days. Could that be a sprint activity? Probably against 2.6 (I > doubt the Modules/_ssl.c file will change much for 3K). > > The idea is that if you call socket.ssl() on a socket that's bound to > an address, the socket is assumed to be server-side, the cert passed > in is assumed to be a server-side cert, and the SSLObject returned has > a couple of extra methods, listen() and accept(). Calling accept() > does > the SSL dance with the remote side, and returns an SSLObject. > > Does this need a PEP? Maybe one of the three existing Python/SSL libraries should be stdlib- ified instead of starting another new one from scratch? Just a thought... James From nnorwitz at gmail.com Thu Aug 16 07:07:05 2007 From: nnorwitz at gmail.com (Neal Norwitz) Date: Wed, 15 Aug 2007 22:07:05 -0700 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/15/07, Georg Brandl wrote: > Georg Brandl schrieb: > > > > Neal will change his build scripts, so that the 2.6 and 3.0 devel > > documentation pages at docs.python.org will be built from these new > > trees soon. > > Okay, I made the switch. I tagged the state of both Python branches > before the switch as tags/py{26,3k}-before-rstdocs/. http://docs.python.org/dev/ http://docs.python.org/dev/3.0/ The upgrade went smoothly. Below are all the issues I noticed. I had to install a version of python 2.5 since that is a minimum requirement. I had to change from a plain 'make' in the Doc directory to 'make html'. The output is in build/html rather than html/ now. 2.6 output: trying to load pickled env... failed: [Errno 2] No such file or directory: 'build/doctrees/environment.pickle' writing output... ... library/contextlib.rst:3: Warning: 'with' will become a reserved keyword in Python 2.6 tutorial/errors.rst:1: Warning: 'with' will become a reserved keyword in Python 3.0 output: Traceback (most recent call last): File "tools/sphinx-build.py", line 13, in from sphinx import main File "/home/neal/python/py3k/Doc/tools/sphinx/__init__.py", line 16, in from .builder import builders File "/home/neal/python/py3k/Doc/tools/sphinx/builder.py", line 35, in from .environment import BuildEnvironment File "/home/neal/python/py3k/Doc/tools/sphinx/environment.py", line 34, in from docutils.parsers.rst.states import Body File "/home/neal/python/py3k/Doc/tools/docutils/parsers/rst/__init__.py", line 77, in from docutils.parsers.rst import states File "/home/neal/python/py3k/Doc/tools/docutils/parsers/rst/states.py", line 110, in import roman ImportError: No module named roman After this error, I just linked my tools directory to the one in 2.6 (trunk) and that worked. I'm not sure if this will create problems in the future. trying to load pickled env... failed: [Errno 2] No such file or directory: 'build/doctrees/environment.pickle' writing output... ... library/contextlib.rst:3: Warning: 'with' will become a reserved keyword in Python 2.6 library/shutil.rst:17: Warning: 'as' will become a reserved keyword in Python 2.6 library/subprocess.rst:7: Warning: 'as' will become a reserved keyword in Python 2.6 tutorial/errors.rst:1: Warning: 'with' will become a reserved keyword in Python 2.6 I realize none of these are a big deal. However, it would be nice if it was cleaned up so that people unfamiliar with building the docs aren't surprised. n From barry at python.org Thu Aug 16 06:40:33 2007 From: barry at python.org (Barry Warsaw) Date: Thu, 16 Aug 2007 00:40:33 -0400 Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: <07Aug15.194559pdt."57996"@synergy1.parc.xerox.com> References: <2235611917962225454@unknownmsgid> <07Aug15.194559pdt."57996"@synergy1.parc.xerox.com> Message-ID: <888D07AF-6168-4140-809A-217B3E43408B@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Aug 15, 2007, at 10:45 PM, Bill Janssen wrote: > > It's been a while since I wrote any Python C code, though -- are there > better tools these days for debugging reference counting? Anyone > know? No, but /that/ would make an awesome sprint topic . - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) iQCVAwUBRsPVQnEjvBPtnXfVAQIp9QP/Y9DCqvBdbDdVvTQp8gt4so2HW/AqRyZU IF3SI/rrzMneslZRbU9PBlKbhq7oE/zwThpPss73W+64CoF2Z7N2dEGJJZncp+RK bo1jyzG2bituz1ZqXRFW8t373XAWLrMusABXNAD5Ypfd1PfbmziFaa6ttyu2jl5O 4QWxPaw4qU0= =+Zfi -----END PGP SIGNATURE----- From g.brandl at gmx.net Thu Aug 16 08:40:24 2007 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 16 Aug 2007 08:40:24 +0200 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: Neal Norwitz schrieb: > On 8/15/07, Georg Brandl wrote: >> Georg Brandl schrieb: >> > >> > Neal will change his build scripts, so that the 2.6 and 3.0 devel >> > documentation pages at docs.python.org will be built from these new >> > trees soon. >> >> Okay, I made the switch. I tagged the state of both Python branches >> before the switch as tags/py{26,3k}-before-rstdocs/. > > http://docs.python.org/dev/ > http://docs.python.org/dev/3.0/ Great! > 3.0 output: > Traceback (most recent call last): > File "tools/sphinx-build.py", line 13, in > from sphinx import main > File "/home/neal/python/py3k/Doc/tools/sphinx/__init__.py", line 16, > in > from .builder import builders > File "/home/neal/python/py3k/Doc/tools/sphinx/builder.py", line 35, > in > from .environment import BuildEnvironment > File "/home/neal/python/py3k/Doc/tools/sphinx/environment.py", line > 34, in > from docutils.parsers.rst.states import Body > File "/home/neal/python/py3k/Doc/tools/docutils/parsers/rst/__init__.py", > line 77, in > from docutils.parsers.rst import states > File "/home/neal/python/py3k/Doc/tools/docutils/parsers/rst/states.py", > line 110, in > import roman > ImportError: No module named roman > > After this error, I just linked my tools directory to the one in 2.6 > (trunk) and that worked. I'm not sure if this will create problems in > the future. No, it shouldn't. I added roman.py in trunk, but didn't touch py3k since I don't want to disturb svnmerge more than necessary. > trying to load pickled env... failed: [Errno 2] No such file or > directory: 'build/doctrees/environment.pickle' That is expected. > writing output... > ... library/contextlib.rst:3: Warning: 'with' will become a > reserved keyword in Python 2.6 > library/shutil.rst:17: Warning: 'as' will become a reserved > keyword in Python 2.6 > library/subprocess.rst:7: Warning: 'as' will become a reserved > keyword in Python 2.6 > tutorial/errors.rst:1: Warning: 'with' will become a reserved > keyword in Python 2.6 > > I realize none of these are a big deal. However, it would be nice if > it was cleaned up so that people unfamiliar with building the docs > aren't surprised. I'll have the with/as problem fixed soon, it should be nothing more than setting the future flag for the call to compile(). Thanks, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From janssen at parc.com Thu Aug 16 17:52:02 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 16 Aug 2007 08:52:02 PDT Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: <118A99D6-8BBB-47A2-A2F1-36C82931C610@fuhm.net> References: <07Aug15.182939pdt."57996"@synergy1.parc.xerox.com> <118A99D6-8BBB-47A2-A2F1-36C82931C610@fuhm.net> Message-ID: <07Aug16.085210pdt."57996"@synergy1.parc.xerox.com> > Maybe one of the three existing Python/SSL libraries should be stdlib- > ified instead of starting another new one from scratch? Yep, that's my intent. This should just be a change to _ssl.c. Bill From janssen at parc.com Thu Aug 16 17:52:35 2007 From: janssen at parc.com (Bill Janssen) Date: Thu, 16 Aug 2007 08:52:35 PDT Subject: [Python-Dev] [Python-3000] Python 3000 Sprint @ Google In-Reply-To: <888D07AF-6168-4140-809A-217B3E43408B@python.org> References: <2235611917962225454@unknownmsgid> <07Aug15.194559pdt."57996"@synergy1.parc.xerox.com> <888D07AF-6168-4140-809A-217B3E43408B@python.org> Message-ID: <07Aug16.085238pdt."57996"@synergy1.parc.xerox.com> Barry Warsaw suggested: > > It's been a while since I wrote any Python C code, though -- are there > > better tools these days for debugging reference counting? Anyone > > know? > > No, but /that/ would make an awesome sprint topic . Indeed! Bill From alexandre at peadrop.com Fri Aug 17 01:43:10 2007 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Thu, 16 Aug 2007 19:43:10 -0400 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: References: Message-ID: On 8/16/07, Neal Norwitz wrote: > On 8/15/07, Georg Brandl wrote: > > Okay, I made the switch. I tagged the state of both Python branches > > before the switch as tags/py{26,3k}-before-rstdocs/. > > http://docs.python.org/dev/ > http://docs.python.org/dev/3.0/ > Is it just me, or the markup of the new docs is quite heavy? alex% wget -q -O- http://docs.python.org/api/genindex.html | wc -c 77868 alex% wget -q -O- http://docs.python.org/dev/3.0/genindex.html | wc -c 918359 Firefox, on my fairly recent machine, takes ~5 seconds rendering the index of the new docs from disk, compared to a fraction of a second for the old one. -- Alexandre From dasn at lavabit.com Fri Aug 17 08:41:55 2007 From: dasn at lavabit.com (Dasn) Date: Fri, 17 Aug 2007 14:41:55 +0800 Subject: [Python-Dev] mailbox._create_temporary without checking the file permission Message-ID: <20070817064155.GA5550@lavabit.com> Hi, guys. _create_temporary is not tracking the perm bits of the original mbox. $ ls -l me -rw------- 1 dasn users 274886 Aug 16 08:43 me $ python Python 2.5.1 (r251:54863, May 8 2007, 07:32:21) [GCC 3.3.5 (propolice)] on openbsd4 Type "help", "copyright", "credits" or "license" for more information. >>> from mailbox import mbox >>> m=mbox('me') >>> m.pop(0) >>> m.flush() >>>^D $ ls -l me -rwxr-xr-x 1 dasn users 268438 Aug 16 09:26 me* $ -- Dasn From dasn at lavabit.com Fri Aug 17 11:33:19 2007 From: dasn at lavabit.com (Dasn) Date: Fri, 17 Aug 2007 17:33:19 +0800 Subject: [Python-Dev] mailbox._create_temporary without checking the file permission In-Reply-To: <20070817064155.GA5550@lavabit.com> References: <20070817064155.GA5550@lavabit.com> Message-ID: <20070817093319.GA26162@lavabit.com> On 17/08/07 14:41 +0800, Dasn wrote: >Hi, guys. > >_create_temporary is not tracking the perm bits of the original mbox. > >$ ls -l me >-rw------- 1 dasn users 274886 Aug 16 08:43 me >$ python >Python 2.5.1 (r251:54863, May 8 2007, 07:32:21) >[GCC 3.3.5 (propolice)] on openbsd4 >Type "help", "copyright", "credits" or "license" for more information. >>>> from mailbox import mbox >>>> m=mbox('me') >>>> m.pop(0) > >>>> m.flush() >>>>^D >$ ls -l me >-rwxr-xr-x 1 dasn users 268438 Aug 16 09:26 me* >$ > I think there are 2 problems should be considered in the _create_temporary: 1. what to do if we have no write permission to the directory (e.g. /var/mail/), what about using tempfile module? 2. keep the temp file as the same mode as the original one. -- Dasn From ncoghlan at gmail.com Fri Aug 17 16:00:25 2007 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 18 Aug 2007 00:00:25 +1000 Subject: [Python-Dev] [Python-3000] Documentation switch imminent In-Reply-To: References: