From martin at v.loewis.de Tue Jun 1 00:42:50 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 01 Jun 2010 00:42:50 +0200 Subject: [Python-Dev] _XOPEN_SOURCE on Solaris Message-ID: <4C043B6A.4070900@v.loewis.de> In issue 1759169 people have been demanding for quite some time that the definition of _XOPEN_SOURCE on Solaris should be dropped, as it was unneeded and caused problems for other software. Now, issue 8864 reports that the multiprocessing module fails to compile, and indeed, if _XOPEN_SOURCE is not defined, control messages stop working. Several of the CMSG interfaces are only available if _XPG4_2 is defined (and, AFAICT, under no other condition); this, in turn, apparently is only defined if _XOPEN_SOURCE is 500, 600, or (has an arbitrary value and _XOPEN_SOURCE_EXTENDED is 1). So how should I go about fixing that? a) revert the patch for #1759169, documentating that Python compilation actually requires _XOPEN_SOURCE to be defined, or b) define _XOPEN_SOURCE only for the multiprocessing module. Any input appreciated. Regards, Martin From greg.ewing at canterbury.ac.nz Tue Jun 1 02:33:23 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 01 Jun 2010 12:33:23 +1200 Subject: [Python-Dev] tp_dealloc In-Reply-To: <20100531184522.173170@gmx.net> References: <20100531184522.173170@gmx.net> Message-ID: <4C045553.70909@canterbury.ac.nz> smarv at gmx.net wrote: > Now, the problem is, Python appears to read-access the deallocated memory > still after tp_dealloc. It's not clear exactly what you mean by "after tp_dealloc". The usual pattern is for a type's tp_dealloc method to call the base type's tp_dealloc, which can make further references to the object's memory. At the end of the tp_dealloc chain, tp_free gets called, which is what actually deallocates the memory. I would say your tp_dealloc shouldn't be modifying anything in the object struct that your corresponding tp_alloc method didn't set up, because code further along the tp_dealloc chain may rely on it. That includes fields in the object header. -- Greg From pje at telecommunity.com Tue Jun 1 04:18:02 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 31 May 2010 22:18:02 -0400 Subject: [Python-Dev] Implementing PEP 382, Namespace Packages In-Reply-To: References: <20100530074041.2279D3A405F@sparrow.telecommunity.com> <20100531050328.40B073A402D@sparrow.telecommunity.com> Message-ID: <20100601021806.360AA3A402D@sparrow.telecommunity.com> At 01:19 PM 5/31/2010 -0700, Brett Cannon wrote: >But as long as whatever mechanism gets exposed allows people to work >from a module name that will be enough. The path connection is not >required as load_module is the end-all-be-all method. If we have a >similar API added for .pth files that works off of module names then >those loaders that don't want to work from file paths don't have to. Right - that's why I suggested that a high-level request like get_pth_contents() would give the implementer the most flexibility. Then they don't have to fake a filesystem if they don't actually work that way. For example, a database that maps module names to code objects has no need for paths at all, and could just return either ['*'] or None depending on whether the package was marked as a namespace package in the database... without needing to fake up the existence of a .pth file in a virtual file system. (Of course, since lots of implementations *do* use filesystem-like backends, giving them some utility functions they can use to implement the API on top of filesystem operations gives us the best of both worlds.) From smarv at gmx.net Tue Jun 1 09:10:20 2010 From: smarv at gmx.net (smarv at gmx.net) Date: Tue, 01 Jun 2010 09:10:20 +0200 Subject: [Python-Dev] tp_dealloc Message-ID: <20100601071020.325170@gmx.net> My tp_dealloc method (of non-subtypable type) calls the freeMem-method of a memory manager (this manager was also used for the corresponding allocation). This freeMem-method deallocates and modifies the memory, which is a valid action, because after free, the memory-manager has ownership of the freed memory. Several memory managers do this (for example the Memory Manager in Delphi during debug mode, in order to track invalid memory access after free). The python31.dll calls tp_alloc and later (after return of tp-alloc) the python31.dll is still awaiting valid content in the deallocated memory. I don't know where this happens, I'm not a developer of CPython, but at this point the python31.dll causes an access violation. IMO the python31.dll assumes that freeMem never modifies the memory (pyobject header), this is valid for many memory managers, but not for all. And from my perspective, this assumption a bug, which can cause access violations in many applications (for example, applications which use the PythonForDelphi-package; PyScripter is one of them, but also many others) Please, could some CPython-developer take a look, thank you! -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From smarv at gmx.net Tue Jun 1 09:41:12 2010 From: smarv at gmx.net (smarv at gmx.net) Date: Tue, 01 Jun 2010 09:41:12 +0200 Subject: [Python-Dev] tp_dealloc Message-ID: <20100601074112.199330@gmx.net> Sorry, I wrote tp_alloc in last post, it should be always tp_dealloc: My tp_dealloc method (of non-subtypable type) calls the freeMem-method of a memory manager (this manager was also used for the corresponding allocation). This freeMem-method deallocates and modifies the memory, which is a valid action, because after free, the memory-manager has ownership of the freed memory. Several memory managers do this (for example the Memory Manager in Delphi during debug mode, in order to track invalid memory access after free). The python31.dll calls tp_dealloc and later (after return of tp_dealloc) the python31.dll is still awaiting valid content in the deallocated memory. I don't know where this happens, I'm not a developer of CPython, but at this point the python31.dll causes an access violation. IMO the python31.dll assumes that freeMem never modifies the memory (pyobject header), this is valid for many memory managers, but not for all. And from my perspective, this assumption a bug, which can cause access violations in many applications (for example, applications which use the PythonForDelphi-package; PyScripter is one of them, but also many others) Please, could some CPython-developer take a look, thank you! -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From amauryfa at gmail.com Tue Jun 1 11:52:44 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 1 Jun 2010 11:52:44 +0200 Subject: [Python-Dev] tp_dealloc In-Reply-To: <20100601074112.199330@gmx.net> References: <20100601074112.199330@gmx.net> Message-ID: 2010/6/1 : > Sorry, I wrote tp_alloc in last post, it should be always tp_dealloc: > > My tp_dealloc method (of non-subtypable type) calls the freeMem-method > of a memory manager (this manager was also used for the corresponding allocation). > This freeMem-method deallocates and modifies the memory, > which is a valid action, because after free, the memory-manager > has ownership of the freed memory. > Several memory managers do this (for example the Memory Manager in > Delphi during debug mode, in order to track invalid memory access after free). > > The python31.dll calls tp_dealloc and later (after return of tp_dealloc) > the python31.dll is still awaiting valid content in the deallocated memory. > I don't know where this happens, I'm not a developer of CPython, > but at this point the python31.dll causes an access violation. > IMO the python31.dll assumes that freeMem never modifies the memory > (pyobject header), this is valid for many memory managers, but not for all. > And from my perspective, this assumption a bug, which can cause access violations in many applications (for example, applications which use the > PythonForDelphi-package; PyScripter is one of them, but also many others) > > Please, could some CPython-developer take a look, thank you! CPython does not access memory after the call to tp_dealloc. There is even a mode (--without-pymalloc) where tp_dealloc calls free() at the end, and would cause crashes if the memory was read afterwards. This said, there may be a bug somewhere, but what do you want us to look at? Do you have a case that we could reproduce and investigate? -- Amaury Forgeot d'Arc From smarv at gmx.net Tue Jun 1 14:21:57 2010 From: smarv at gmx.net (smarv at gmx.net) Date: Tue, 01 Jun 2010 14:21:57 +0200 Subject: [Python-Dev] tp_dealloc In-Reply-To: References: <20100601074112.199330@gmx.net> Message-ID: <20100601122157.225330@gmx.net> > This said, there may be a bug somewhere, but what do you want us to look > at? > Do you have a case that we could reproduce and investigate? > > -- > Amaury Forgeot d'Arc Thank you, I'm not a C-Developer, but still I have one more detail: I call py_decRef( pyObj) of dll (version 3.1.1), ( which calls tp_dealloc, which calls my freeMem() method)) No problem is reported here. Now, the freed memory should not be accessed anymore by python31.dll. You may fill the freed pyObjectHead with invalid values, in my case it's: ob_refcnt= 7851148, ob_type = $80808080 But later, when I call Py_Finalize, there inside is some access to the same freed memory; this causes an AV, more precisely, when the value $80808080 is checked. My Delphi-Debugger shows the following byte-sequence inside python31.dll: 5EC3568B7424088B4604F74054004000007504 5E - pop esi C3 - ret 56 - push esi 8B742408 - mov esi, [esp+$08] 8B4604 - mov eax, [esi+$04] // eax = $80808080 // F7405400400000 - test [eax+$54], $00004000 // AV exception by read of address $808080D4 // 7504 - jnz $1e03681b Maybe this can help someone, thank you! -- Marvin GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From amauryfa at gmail.com Tue Jun 1 15:00:21 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 1 Jun 2010 15:00:21 +0200 Subject: [Python-Dev] tp_dealloc In-Reply-To: <20100601122157.225330@gmx.net> References: <20100601074112.199330@gmx.net> <20100601122157.225330@gmx.net> Message-ID: 2010/6/1 : >> This said, there may be a bug somewhere, but what do you want us to look >> at? >> Do you have a case that we could reproduce and investigate? >> >> -- >> Amaury Forgeot d'Arc > > Thank you, I'm not a C-Developer, > but still I have one more detail: > > I call py_decRef( pyObj) of dll (version 3.1.1), > ( which calls tp_dealloc, which calls my freeMem() method)) > No problem is reported here. > Now, the freed memory should not be accessed anymore by python31.dll. > You may fill the freed pyObjectHead with invalid values, > in my case it's: ?ob_refcnt= 7851148, ob_type = $80808080 > > But later, when I call Py_Finalize, > there inside is some access to the same freed memory; > this causes an AV, more precisely, > when the value $80808080 is checked. > > My Delphi-Debugger shows the following byte-sequence inside python31.dll: > 5EC3568B7424088B4604F74054004000007504 > > 5E ? ? ? ? ? ? ? ? ?- pop esi > C3 ? ? ? ? ? ? ? ? ?- ret > 56 ? ? ? ? ? ? ? ? ?- push esi > 8B742408 ? ? ? ? ? ?- mov esi, [esp+$08] > 8B4604 ? ? ? ? ? ? ?- mov eax, [esi+$04] > ? ? ? // eax = $80808080 // > > F7405400400000 ? ? ?- test [eax+$54], $00004000 > ? ? ? // AV exception by read of address $808080D4 // > > 7504 ? ? ? ? ? ? ? ?- jnz $1e03681b > > > Maybe this can help someone, thank you! I'm sorry but this kind of issue is difficult to investigate without the source code. Normally I would compile everything (python & your program) in debug mode, and try to see why the object is used after tp_dealloc. For example, it's possible that your code does not handle reference counts correctly A call to Py_INCREF() may be missing somewhere, for example. This is a common error. tp_dealloc() is called when the reference count falls to zero, but if the object is still referenced elsewhere, memory will be accessed again! Without further information, I cannot consider this as a problem in Python. I know other extension modules that manage memory in their own way, and work. It's more probably an issue in the code of your type. -- Amaury Forgeot d'Arc From smarv at gmx.net Tue Jun 1 17:42:07 2010 From: smarv at gmx.net (smarv at gmx.net) Date: Tue, 01 Jun 2010 17:42:07 +0200 Subject: [Python-Dev] tp_dealloc In-Reply-To: References: <20100601074112.199330@gmx.net> <20100601122157.225330@gmx.net> Message-ID: <20100601154207.178500@gmx.net> > Without further information, I cannot consider this as a problem in > Python. > I know other extension modules that manage memory in their own way, and > work. > It's more probably an issue in the code of your type. > > -- > Amaury Forgeot d'Arc Ok, thank you, but I'm still hoping, someone could test this. I'm very sure, my app is not the cause; only the python31.dll (py_finalize) is accessing the freed memory. Inside py_finalize there is really no call to my hosting app (or reverse), I even tested this in my debugger. In most applications this python-problem remains hidden, because their freeMem() leaves the freed memory unmodified. (And that's why very good debuggers modify the freed memory to reveal such hidden errors). You could simply test this by setting pyObject.ob_type = $80808080 after freeMem( pyObject). Then later, call py_finalize, and you will see the same problem (Access violation by trying to use ob_type) -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From amauryfa at gmail.com Tue Jun 1 18:56:39 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 1 Jun 2010 18:56:39 +0200 Subject: [Python-Dev] tp_dealloc In-Reply-To: <20100601154207.178500@gmx.net> References: <20100601074112.199330@gmx.net> <20100601122157.225330@gmx.net> <20100601154207.178500@gmx.net> Message-ID: 2010/6/1 : >> Without further information, I cannot consider this as a problem in >> Python. >> I know other extension modules that manage memory in their own way, and >> work. >> It's more probably an issue in the code of your type. >> >> -- >> Amaury Forgeot d'Arc > > Ok, thank you, but I'm still hoping, someone could test this. > I'm very sure, my app is not the cause; > only the python31.dll (py_finalize) is accessing the freed memory. > Inside py_finalize there is really no call to my hosting app (or reverse), > I even tested this in my debugger. To be clear: - you did not provide anything for us to test. - the fact that the crash is inside python31.dll does not indicates a bug in python. Consider this (bogus) code: FILE *fp = fopen("c:/temp/t", "w"); free(fp); This will lead to a crash at program exit (when fcloseall() is called by the system) but the issue is really in the code - it should not free(fp). Without knowing what your code really do, we won't be able to help. -- Amaury Forgeot d'Arc From ncoghlan at gmail.com Wed Jun 2 14:33:19 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 02 Jun 2010 22:33:19 +1000 Subject: [Python-Dev] tp_dealloc In-Reply-To: <20100601122157.225330@gmx.net> References: <20100601074112.199330@gmx.net> <20100601122157.225330@gmx.net> Message-ID: <4C064F8F.6000508@gmail.com> On 01/06/10 22:21, smarv at gmx.net wrote: >> This said, there may be a bug somewhere, but what do you want us to look >> at? >> Do you have a case that we could reproduce and investigate? >> >> -- >> Amaury Forgeot d'Arc > > Thank you, I'm not a C-Developer, > but still I have one more detail: > > I call py_decRef( pyObj) of dll (version 3.1.1), > ( which calls tp_dealloc, which calls my freeMem() method)) > No problem is reported here. As Amaury has pointed out, there are a number of ways this could be bug in your extension module, or some other CPython extension you are using (most obviously, a Py_DECREF without a corresponding Py_INCREF, but there are probably other more exotic ways to manage it). If you corrupt the reference count for a module global variable with an extra Py_DECREF call, then you may get an access violation at interpreter shutdown (i.e. in response to a Py_Finalize call) as the destruction of the module attempts to decrement the reference count of an object that was incorrectly deleted while it was still referenced. Since the symptoms you have described so far *exactly* match the expected symptoms of a reference counting bug which may not have anything whatsoever to do with the interpreter core or the standard library, you're going to need a much better defined test case (written in C or Python) to convince us that our code is the problem. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From flashk at gmail.com Wed Jun 2 20:32:55 2010 From: flashk at gmail.com (Farshid Lashkari) Date: Wed, 2 Jun 2010 11:32:55 -0700 Subject: [Python-Dev] Windows registry path not ignored with Py_IgnoreEnvironmentFlag set Message-ID: Hello, I noticed that if Py_IgnoreEnvironmentFlag is enabled, the Windows registry is still used to initialize sys.path during startup. Is this an oversight or intentional? I assumed one of the intentions of this flag is to prevent embedded Python interpreters from being affected by other Python installations. Ignoring the Window registry as well as environment variables seems to make sense in this situation. If this is an oversight, would it be too late to have this fixed in Python 2.7? Cheers, Farshid -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jun 4 18:08:50 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 4 Jun 2010 18:08:50 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20100604160850.026847813C@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-05-28 - 2010-06-04) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2727 open (+38) / 17988 closed (+15) / 20715 total (+53) Open issues with patches: 1103 Average duration of open issues: 719 days. Median duration of open issues: 504 days. Open Issues Breakdown open 2706 (+38) languishing 12 ( +0) pending 8 ( +0) Issues Created Or Reopened (58) _______________________________ Seconds range in time unit 2010-06-03 http://bugs.python.org/issue2568 reopened belopolsky patch, easy 26.rc1: test_signal issue on FreeBSD 6.3 2010-06-03 CLOSED http://bugs.python.org/issue3864 reopened skrah patch, easy, buildbot urllib2 basicauth broken in 2.6.5: RuntimeError: maximum recur 2010-06-04 http://bugs.python.org/issue8797 reopened orsenthil TZ offset description is unclear in docs 2010-06-04 http://bugs.python.org/issue8810 reopened belopolsky easy, needs review truncate() semantics changed in 3.1.2 2010-05-28 http://bugs.python.org/issue8840 reopened tjreedy Condition.wait() doesn't raise KeyboardInterrupt 2010-05-28 http://bugs.python.org/issue8844 created hobb0001 Expose sqlite3 connection inTransaction as read-only in_transa 2010-05-28 CLOSED http://bugs.python.org/issue8845 created r.david.murray patch, easy cgi.py bug report + fix: tailing carriage return and newline c 2010-05-28 http://bugs.python.org/issue8846 created wobsta patch crash appending list and namedtuple 2010-05-28 http://bugs.python.org/issue8847 created benrg Deprecate or remove "U" and "U#" formats of Py_BuildValue() 2010-05-29 http://bugs.python.org/issue8848 created haypo patch python.exe problem with cvxopt 2010-05-29 http://bugs.python.org/issue8849 created jroach Remove "w" format of PyParse_ParseTuple() 2010-05-29 http://bugs.python.org/issue8850 created haypo pkgutil document needs more markups 2010-05-29 http://bugs.python.org/issue8851 created mft patch _socket fails to build on OpenSolaris x64 2010-05-29 http://bugs.python.org/issue8852 created drkirkby patch getaddrinfo should accept port of type long 2010-05-29 http://bugs.python.org/issue8853 created AndiDog patch msvc9compiler.py: find_vcvarsall() doesn't work with VS2008 on 2010-05-29 http://bugs.python.org/issue8854 created lemburg 64bit Shelve documentation lacks security warning 2010-05-30 http://bugs.python.org/issue8855 created Longpoke Error in ceval.c when building --without-threads 2010-05-30 CLOSED http://bugs.python.org/issue8856 created merwok socket.getaddrinfo needs tests 2010-05-30 http://bugs.python.org/issue8857 created pitrou patch socket.getaddrinfo returns wrong results for IPv6 addresses 2010-05-30 CLOSED http://bugs.python.org/issue8858 created pitrou split() splits on non whitespace char when ther is no separato 2010-05-30 CLOSED http://bugs.python.org/issue8859 created PeterL Rounding in timedelta constructor is inconsistent with that in 2010-05-31 http://bugs.python.org/issue8860 created belopolsky patch curses.wrapper : unnessesary code 2010-05-31 http://bugs.python.org/issue8861 created july patch curses.wrapper does not restore terminal if curses.getkey() ge 2010-05-31 http://bugs.python.org/issue8862 created july patch Segfault handler: display Python backtrace on segfault 2010-05-31 http://bugs.python.org/issue8863 created haypo patch multiprocessing: undefined struct/union member: msg_control 2010-05-31 http://bugs.python.org/issue8864 created srid select.poll is not thread safe 2010-05-31 http://bugs.python.org/issue8865 created apexo socket.getaddrinfo() should support keyword arguments 2010-05-31 http://bugs.python.org/issue8866 created giampaolo.rodola patch serve.py (using wsgiref) cannot serve Python docs under Python 2010-05-31 http://bugs.python.org/issue8867 created r.david.murray Framework install does not behave as a framework 2010-06-01 CLOSED http://bugs.python.org/issue8868 created mdehoon execfile does not work with UNC paths 2010-06-01 http://bugs.python.org/issue8869 created stier08 --user-access-control=force produces invalid installer on Vist 2010-06-01 CLOSED http://bugs.python.org/issue8870 created techtonik --user-access-control=auto has no effect 2010-06-01 http://bugs.python.org/issue8871 created techtonik if/else stament bug? 2010-06-01 CLOSED http://bugs.python.org/issue8872 created chrits55 Popen uses 333 times as much CPU as a shell pipe on Mac OS X 2010-06-01 http://bugs.python.org/issue8873 created hughsw py3k documentation mentions deprecated opcode LOAD_LOCALS 2010-06-01 CLOSED http://bugs.python.org/issue8874 created Yaniv.Aknin XML-RPC improvement is described twice. 2010-06-02 http://bugs.python.org/issue8875 created naoki distutils should not assume that hardlinks will work 2010-06-02 http://bugs.python.org/issue8876 created samtygier patch 2to3 fixes stdlib import wrongly 2010-06-02 CLOSED http://bugs.python.org/issue8877 created djc IDLE - str(integer) - TypeError: 'str' object is not callable 2010-06-02 CLOSED http://bugs.python.org/issue8878 created Stranger381 Implement os.link on Windows 2010-06-02 http://bugs.python.org/issue8879 created brian.curtin ConfigParser.set does not convert non-string values 2010-06-02 CLOSED http://bugs.python.org/issue8880 created Edwin.Pozharski socket.getaddrinfo() should return named tuples 2010-06-02 http://bugs.python.org/issue8881 created giampaolo.rodola socketmodule.c`getsockaddrarg() should not check the length of 2010-06-03 http://bugs.python.org/issue8882 created Edward.Pilatowicz Proxy exception lookup fails on MacOS in urllib. 2010-06-03 http://bugs.python.org/issue8883 created yorik.sar patch Allow binding to local address in http.client 2010-06-03 CLOSED http://bugs.python.org/issue8884 created Gaz.Davidson markerbase declaration errors aren't recoverable 2010-06-03 http://bugs.python.org/issue8885 created mnot zipfile.ZipExtFile is a context manager, but that is not docum 2010-06-03 http://bugs.python.org/issue8886 created sandberg patch ???pydoc str??? works but not ???pydoc str.translate??? 2010-06-03 http://bugs.python.org/issue8887 created merwok Promote SafeConfigParser and warn about ConfigParser 2010-06-03 http://bugs.python.org/issue8888 created merwok test_support.transient_internet fails on Freebsd because socke 2010-06-03 http://bugs.python.org/issue8889 created r.david.murray patch Modules have dangerous examples in documentation 2010-06-04 http://bugs.python.org/issue8890 reopened Henri.Salo sort files before archiving for consistency 2010-06-03 http://bugs.python.org/issue8891 created techtonik patch 2to3 fails with assertion failure on "from itertools import *" 2010-06-03 http://bugs.python.org/issue8892 created dmalcolm patch file.{read,readlines} behaviour on Solaris 2010-06-03 http://bugs.python.org/issue8893 created kalt patch, needs review urllib2 authentication manager retries forever if password is 2010-06-04 CLOSED http://bugs.python.org/issue8894 created Jurjen newline vs. newlines in io module 2010-06-04 CLOSED http://bugs.python.org/issue8895 created jmfauth email.encoders.encode_base64 sets payload to bytes, should set 2010-06-04 CLOSED http://bugs.python.org/issue8896 created forest_atq patch Issues Now Closed (39) ______________________ distutils sdist add_defaults does not add data_files 813 days http://bugs.python.org/issue2279 merwok Vista UAC/elevation support for bdist_wininst 785 days http://bugs.python.org/issue2581 techtonik patch, patch 26.rc1: test_signal issue on FreeBSD 6.3 0 days http://bugs.python.org/issue3864 skrah patch, easy, buildbot Real segmentation fault handler 609 days http://bugs.python.org/issue3999 haypo patch Fix complex type to avoid coercion in 2.7. 474 days http://bugs.python.org/issue5211 mark.dickinson easy datetime.monthdelta 453 days http://bugs.python.org/issue5434 belopolsky patch Contradictory documentation for email.mime.text.MIMEText 317 days http://bugs.python.org/issue6521 r.david.murray patch shadows around the io truncate() semantics 253 days http://bugs.python.org/issue6939 ncoghlan patch Improve explanation of tab expansion in doctests 155 days http://bugs.python.org/issue7583 r.david.murray patch Too narrow platform check in test_datetime 113 days http://bugs.python.org/issue7879 belopolsky patch, 26backport Improve test_os._kill (failing on slow machines) 44 days http://bugs.python.org/issue8405 haypo patch Test assumptions for test_itimer_virtual and test_itimer_prof 48 days http://bugs.python.org/issue8424 skrah patch, buildbot Changes to content of Demo/turtle 24 days http://bugs.python.org/issue8616 georg.brandl patch test_winsound fails when no playback devices configured 28 days http://bugs.python.org/issue8618 brian.curtin patch 2.7 regression in tarfile: IOError: link could not be created 17 days http://bugs.python.org/issue8741 lars.gustaebel integer-to-complex comparisons give incorrect results 12 days http://bugs.python.org/issue8748 minge patch urllib.urlencode documentation unclear on doseq 10 days http://bugs.python.org/issue8788 orsenthil IDLE editior not opening 3 days http://bugs.python.org/issue8829 orsenthil tarfile: broken hardlink handling and testcase. 7 days http://bugs.python.org/issue8833 lars.gustaebel patch PyArg_ParseTuple(): remove old and unused "O?" format 1 days http://bugs.python.org/issue8837 haypo patch Expose sqlite3 connection inTransaction as read-only in_transa 3 days http://bugs.python.org/issue8845 r.david.murray patch, easy Error in ceval.c when building --without-threads 0 days http://bugs.python.org/issue8856 benjamin.peterson socket.getaddrinfo returns wrong results for IPv6 addresses 1 days http://bugs.python.org/issue8858 pitrou split() splits on non whitespace char when ther is no separato 1 days http://bugs.python.org/issue8859 PeterL Framework install does not behave as a framework 1 days http://bugs.python.org/issue8868 ronaldoussoren --user-access-control=force produces invalid installer on Vist 1 days http://bugs.python.org/issue8870 techtonik if/else stament bug? 0 days http://bugs.python.org/issue8872 r.david.murray py3k documentation mentions deprecated opcode LOAD_LOCALS 1 days http://bugs.python.org/issue8874 benjamin.peterson 2to3 fixes stdlib import wrongly 0 days http://bugs.python.org/issue8877 benjamin.peterson IDLE - str(integer) - TypeError: 'str' object is not callable 0 days http://bugs.python.org/issue8878 mark.dickinson ConfigParser.set does not convert non-string values 1 days http://bugs.python.org/issue8880 Edwin.Pozharski Allow binding to local address in http.client 0 days http://bugs.python.org/issue8884 loewis urllib2 authentication manager retries forever if password is 0 days http://bugs.python.org/issue8894 orsenthil newline vs. newlines in io module 0 days http://bugs.python.org/issue8895 merwok email.encoders.encode_base64 sets payload to bytes, should set 0 days http://bugs.python.org/issue8896 forest_atq patch timedelta multiply and divide by floating point 1722 days http://bugs.python.org/issue1289118 belopolsky patch unicode in email.MIMEText and email/Charset.py 1647 days http://bugs.python.org/issue1368247 r.david.murray patch email package and Unicode strings handling 1360 days http://bugs.python.org/issue1555842 r.david.murray improve xrange.__contains__ 1036 days http://bugs.python.org/issue1766304 benjamin.peterson patch Top Issues Most Discussed (10) ______________________________ 18 multipart/form-data encoding 704 days open http://bugs.python.org/issue3244 16 sort files before archiving for consistency 1 days open http://bugs.python.org/issue8891 16 datetime lacks concrete tzinfo impl. for UTC 492 days open http://bugs.python.org/issue5094 12 improve xrange.__contains__ 1036 days closed http://bugs.python.org/issue1766304 10 Modules have dangerous examples in documentation 0 days open http://bugs.python.org/issue8890 10 multiprocessing: undefined struct/union member: msg_control 4 days open http://bugs.python.org/issue8864 9 TZ offset description is unclear in docs 1 days open http://bugs.python.org/issue8810 8 Rounding in timedelta constructor is inconsistent with that in 4 days open http://bugs.python.org/issue8860 8 Expose sqlite3 connection inTransaction as read-only in_transac 3 days closed http://bugs.python.org/issue8845 7 --user-access-control=force produces invalid installer on Vista 1 days closed http://bugs.python.org/issue8870 From skippy.hammond at gmail.com Sat Jun 5 01:47:26 2010 From: skippy.hammond at gmail.com (Mark Hammond) Date: Fri, 04 Jun 2010 16:47:26 -0700 Subject: [Python-Dev] Windows registry path not ignored with Py_IgnoreEnvironmentFlag set In-Reply-To: References: Message-ID: <4C09908E.3080706@gmail.com> On 2/06/2010 11:32 AM, Farshid Lashkari wrote: > Hello, > > I noticed that if Py_IgnoreEnvironmentFlag is enabled, the Windows > registry is still used to initialize sys.path during startup. Is this an > oversight or intentional? I guess it falls somewhere in the middle - the flag refers to the 'environment' so I believe it hasn't really been considered as applying to the registry - IOW, the reference to 'environment' probably refers to the specific 'environment variables' rather than the more general 'execution environment'. > I assumed one of the intentions of this flag is to prevent embedded > Python interpreters from being affected by other Python installations. > Ignoring the Window registry as well as environment variables seems to > make sense in this situation. I agree. > If this is an oversight, would it be too late to have this fixed in > Python 2.7? Others will have opinions which carry more weight than mine, but I see no reason it should not be fixed for *some* Python version. Assuming no objections from anyone else, I suggest the best way to get this to happen in the short to medium term would be to open a bug with a patch. A bug without a patch would also be worthwhile but would almost certainly cause it to be pushed back to a future 3.x version... Cheers, Mark From kristjan at ccpgames.com Sat Jun 5 10:34:19 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sat, 5 Jun 2010 08:34:19 +0000 Subject: [Python-Dev] ssl Message-ID: <930F189C8A437347B80DF2C156F7EC7F0A8D533E28@exchis.ccp.ad.local> Hello there. I wanted to do some work on the ssl module, but I was a bit daunted at the prerequisites. Is there anywhere that I can get at precompiled libs for the openssl that we use? In general, gettin all those "external" projects seem to be complex to build. Is there a fast way? What I want to do, is to implement a separate BIO for OpenSSL, one that calls back into python for writes and reads. This is so that I can use my own sockets implementation for the actual IO, in particular, I want to funnel the encrypted data through our IOCompletion-based stackless sockets. If successful, I think this would be a useful addition to ssl. You would do something like: class BIO(): def write(): pass def read(): pass from ssl.import bio = BIO() ssl_socket = ssl.wrap_bio(bio, ca_certs=...) I am new to OpenSSL, I haven't even looked at what a BIO looks like, but I read this: http://marc.info/?l=openssl-users&m=99909952822335&w=2 which indicates that this ought to be possible. And before I start experimenting, I need to get my OpenSSL external ready. Any thoughts? Kristj?n -------------- next part -------------- An HTML attachment was scrubbed... URL: From exarkun at twistedmatrix.com Sat Jun 5 15:11:09 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Sat, 05 Jun 2010 13:11:09 -0000 Subject: [Python-Dev] ssl In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0A8D533E28@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0A8D533E28@exchis.ccp.ad.local> Message-ID: <20100605131109.1708.564335160.divmod.xquotient.15@localhost.localdomain> On 08:34 am, kristjan at ccpgames.com wrote: >Hello there. >I wanted to do some work on the ssl module, but I was a bit daunted at >the prerequisites. Is there anywhere that I can get at precompiled >libs for the openssl that we use? >In general, gettin all those "external" projects seem to be complex to >build. Is there a fast way? I take it the challenge is that you want to do development on Windows? If so, this might help: http://www.slproweb.com/products/Win32OpenSSL.html It's what I use for any Windows pyOpenSSL development I need to do. > >What I want to do, is to implement a separate BIO for OpenSSL, one that >calls back into python for writes and reads. This is so that I can use >my own sockets implementation for the actual IO, in particular, I want >to funnel the encrypted data through our IOCompletion-based stackless >sockets. For what it's worth, Twisted's IOCP SSL support is implemented using pyOpenSSL's support of OpenSSL memory BIOs. This is a little different from your idea: memory BIOs are a built-in part of OpenSSL, and just give you a buffer from which you can pull whatever bytes OpenSSL wanted to write (or a buffer into which to put bytes for OpenSSL to read). I suspect this would work well enough for your use case. Being able to implement an actual BIO in Python would be pretty cool, though. > >If successful, I think this would be a useful addition to ssl. >You would do something like: > >class BIO(): > def write(): pass > def read(): pass > >from ssl.import >bio = BIO() >ssl_socket = ssl.wrap_bio(bio, ca_certs=...) Hopefully this would integrate more nicely with the recent work Antoine has done with SSL contexts. The preferred API for creating an SSL connection is now more like this: import ssl ctx = ssl.SSLContext(...) conn = ctx.wrap_socket(...) So perhaps you want to add a wrap_bio method to SSLContext. In fact, this would be the more general API, and could supercede wrap_socket: after all, socket support is just implemented with the socket BIOs. wrap_socket would become a simple wrapper around something like wrap_bio(SocketBIO(socket)). > >I am new to OpenSSL, I haven't even looked at what a BIO looks like, >but I read this: http://marc.info/?l=openssl- >users&m=99909952822335&w=2 >which indicates that this ought to be possible. And before I start >experimenting, I need to get my OpenSSL external ready. > >Any thoughts? It should be possible. One thing that's pretty tricky is getting threading right, though. Python doesn't have to deal with this problem yet, as far as I know, because it never does something that causes OpenSSL to call back into Python code. Once you have a Python BIO implementation, this will clearly be necessary, and you'll have to solve this. It's certainly possible, but quite fiddly. Jean-Paul From guido at python.org Sat Jun 5 16:55:05 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 5 Jun 2010 07:55:05 -0700 Subject: [Python-Dev] Windows registry path not ignored with Py_IgnoreEnvironmentFlag set In-Reply-To: <4C09908E.3080706@gmail.com> References: <4C09908E.3080706@gmail.com> Message-ID: On Fri, Jun 4, 2010 at 4:47 PM, Mark Hammond wrote: > On 2/06/2010 11:32 AM, Farshid Lashkari wrote: >> >> Hello, >> >> I noticed that if Py_IgnoreEnvironmentFlag is enabled, the Windows >> registry is still used to initialize sys.path during startup. Is this an >> oversight or intentional? > > I guess it falls somewhere in the middle - the flag refers to the > 'environment' so I believe it hasn't really been considered as applying to > the registry - IOW, the reference to 'environment' probably refers to the > specific 'environment variables' rather than the more general 'execution > environment'. > >> I assumed one of the intentions of this flag is to prevent embedded >> Python interpreters from being affected by other Python installations. >> Ignoring the Window registry as well as environment variables seems to >> make sense in this situation. > > I agree. > >> If this is an oversight, would it be too late to have this fixed in >> Python 2.7? > > Others will have opinions which carry more weight than mine, but I see no > reason it should not be fixed for *some* Python version. ?Assuming no > objections from anyone else, I suggest the best way to get this to happen in > the short to medium term would be to open a bug with a patch. ?A bug without > a patch would also be worthwhile but would almost certainly cause it to be > pushed back to a future 3.x version... I don't object (this had never occurred to me), but is Python on Windows fully functioning when the registry is entirely ignored? -- --Guido van Rossum (python.org/~guido) From flashk at gmail.com Sat Jun 5 20:03:25 2010 From: flashk at gmail.com (Farshid Lashkari) Date: Sat, 5 Jun 2010 11:03:25 -0700 Subject: [Python-Dev] Windows registry path not ignored with Py_IgnoreEnvironmentFlag set In-Reply-To: References: <4C09908E.3080706@gmail.com> Message-ID: On Sat, Jun 5, 2010 at 7:55 AM, Guido van Rossum wrote: > > I don't object (this had never occurred to me), but is Python on > Windows fully functioning when the registry is entirely ignored? I believe so. The path of executable and Python DLL are used to initialize sys.path, which should be enough to find the necessary files. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Sat Jun 5 20:05:07 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sat, 5 Jun 2010 18:05:07 +0000 Subject: [Python-Dev] Windows registry path not ignored with Py_IgnoreEnvironmentFlag set In-Reply-To: References: <4C09908E.3080706@gmail.com> Message-ID: <930F189C8A437347B80DF2C156F7EC7F0A8D533E5D@exchis.ccp.ad.local> Tangengially relevant is the following: When embedding python, it is currently impossible (well, in 2.x anyway) to completely override pythons magic path-guessing algorithm. This is annoying. Last pycon, the talk on embedding python, showed how applications that do that often get started through bootstrapping batch scripts that set up the environment for python, to guide the path-setting algorithm along. At CCP, we have patched python so that we can specify an initial sys.path, and completely disable the path guessing algorithm. This is necessary because python is _embedded_ and it is the embedding application that knows where it is allowed to look for libraries. This is in addition to telling it to ignore the environment. In fact, it is my opinion that the path init stuff, as well as command line parsing and so on, really belongs in python.exe and not in python25.lib, although one can argue for the convenience of keeping it in the .lib. But IMHO, it should not be part of Py_Initialize. Perhaps I'll submit this particular patch to the tracker one day. K > -----Original Message----- > From: python-dev-bounces+kristjan=ccpgames.com at python.org > [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf > Of Guido van Rossum > Sent: 5. j?n? 2010 14:55 > To: Mark Hammond > Cc: Python-Dev > Subject: Re: [Python-Dev] Windows registry path not ignored with > Py_IgnoreEnvironmentFlag set > > I don't object (this had never occurred to me), but is Python on > Windows fully functioning when the registry is entirely ignored? > > -- > --Guido van Rossum (python.org/~guido) From fuzzyman at voidspace.org.uk Sat Jun 5 20:32:39 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 05 Jun 2010 19:32:39 +0100 Subject: [Python-Dev] Windows registry path not ignored with Py_IgnoreEnvironmentFlag set In-Reply-To: References: <4C09908E.3080706@gmail.com> Message-ID: <4C0A9847.6000503@voidspace.org.uk> On 05/06/2010 19:03, Farshid Lashkari wrote: > > On Sat, Jun 5, 2010 at 7:55 AM, Guido van Rossum > wrote: > > I don't object (this had never occurred to me), but is Python on > Windows fully functioning when the registry is entirely ignored? > > Yes, it works fine. This is one of the things py2exe does to create 'standalone' Python programs for Windows. Michael > I believe so. The path of executable and Python DLL are used to > initialize sys.path, which should be enough to find the necessary files. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Jun 5 20:51:38 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 05 Jun 2010 14:51:38 -0400 Subject: [Python-Dev] Windows registry path not ignored with Py_IgnoreEnvironmentFlag set In-Reply-To: References: <4C09908E.3080706@gmail.com> Message-ID: On 6/5/2010 10:55 AM, Guido van Rossum wrote: > I don't object (this had never occurred to me), but is Python on > Windows fully functioning when the registry is entirely ignored? There have been a couple of portable CPython-on-a-CD or memory stick that supposedly run on any machine without 'installation' (writing to the registry), so they must run without reading anything Python specific. From martin at v.loewis.de Sun Jun 6 01:51:57 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 06 Jun 2010 01:51:57 +0200 Subject: [Python-Dev] ssl In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0A8D533E28@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0A8D533E28@exchis.ccp.ad.local> Message-ID: <4C0AE31D.7030504@v.loewis.de> > In general, gettin all those ?external? projects seem to be complex to > build. Is there a fast way? Run Tools\buildbot\external.bat. Regards, Martin From benjamin at python.org Sun Jun 6 04:08:32 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sat, 5 Jun 2010 21:08:32 -0500 Subject: [Python-Dev] [RELEASE] Python 2.7 release candidate 1 released Message-ID: On behalf of the Python development team, I'm effusive to announce the first release candidate of Python 2.7. Python 2.7 is scheduled (by Guido and Python-dev) to be the last major version in the 2.x series. However, 2.7 will have an extended period of bugfix maintenance. 2.7 includes many features that were first released in Python 3.1. The faster io module, the new nested with statement syntax, improved float repr, set literals, dictionary views, and the memoryview object have been backported from 3.1. Other features include an ordered dictionary implementation, unittests improvements, a new sysconfig module, and support for ttk Tile in Tkinter. For a more extensive list of changes in 2.7, see http://doc.python.org/dev/whatsnew/2.7.html or Misc/NEWS in the Python distribution. To download Python 2.7 visit: http://www.python.org/download/releases/2.7/ While this is a preview release and is thus not suitable for production use, we strongly encourage Python application and library developers to test the release with their code and report any bugs they encounter to: http://bugs.python.org/ This helps ensure that those upgrading to Python 2.7 will encounter as few bumps as possible. 2.7 documentation can be found at: http://docs.python.org/2.7/ Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 2.7's contributors) From kristjan at ccpgames.com Mon Jun 7 12:44:40 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Mon, 7 Jun 2010 10:44:40 +0000 Subject: [Python-Dev] ssl In-Reply-To: <4C0AE31D.7030504@v.loewis.de> References: <930F189C8A437347B80DF2C156F7EC7F0A8D533E28@exchis.ccp.ad.local> <4C0AE31D.7030504@v.loewis.de> Message-ID: <930F189C8A437347B80DF2C156F7EC7F0A8D533F48@exchis.ccp.ad.local> Thanks martin. I did as you suggested, and by installing nasm (creating nasmw.exe as a copy of nasm.exe) and without installing perl, was able to build the 32 bit debug version. The 64 bit version didn't want to build, probably because of some strangeness in the .vcprops files. amd64.vcprops defines PythonExe to $(HOST_PYTHON) which isn't defined. Removing this macro definition makes everything build, right up to the final link: 2>Linking... 2> Creating library D:\pydev\python\trunk\PCbuild\\amd64\\_ssl_d.lib and object D:\pydev\python\trunk\PCbuild\\amd64\\_ssl_d.exp 2>Creating manifest... 2>.\x64-temp-Debug\_ssl\_ssl.exe.intermediate.manifest : general error c1010070: Failed to load and parse the manifest. El sistema no puede encontrar el archivo especificado. 2>Build log was saved at "file://D:\pydev\python\trunk\PCbuild\x64-temp-Debug\_ssl\BuildLog.htm" 2>_ssl - 1 error(s), 246 warning(s) The above is using the "trunk", but I got the same result with brances/py3k. Please don't tell me that I need to install Perl :) K > -----Original Message----- > From: "Martin v. L?wis" [mailto:martin at v.loewis.de] > Sent: 5. j?n? 2010 23:52 > To: Kristj?n Valur J?nsson > Cc: python-dev at python.org > Subject: Re: [Python-Dev] ssl > > > In general, gettin all those "external" projects seem to be complex > to > > build. Is there a fast way? > > Run Tools\buildbot\external.bat. > > Regards, > Martin From martin at v.loewis.de Mon Jun 7 22:33:36 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 07 Jun 2010 22:33:36 +0200 Subject: [Python-Dev] ssl In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0A8D533F48@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0A8D533E28@exchis.ccp.ad.local> <4C0AE31D.7030504@v.loewis.de> <930F189C8A437347B80DF2C156F7EC7F0A8D533F48@exchis.ccp.ad.local> Message-ID: <4C0D57A0.3060309@v.loewis.de> Am 07.06.2010 12:44, schrieb Kristj?n Valur J?nsson: > Thanks martin. > I did as you suggested, and by installing nasm (creating nasmw.exe as a copy of nasm.exe) and without installing perl, was able to build the 32 bit debug version. > The 64 bit version didn't want to build, probably because of some strangeness in the .vcprops files. > amd64.vcprops defines PythonExe to $(HOST_PYTHON) which isn't defined. See PCbuild/readme.txt. > Please don't tell me that I need to install Perl :) You don't need to install Perl; see PCbuild/readme.txt. Regards, Martin From kristjan at ccpgames.com Tue Jun 8 21:58:53 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Tue, 8 Jun 2010 19:58:53 +0000 Subject: [Python-Dev] issue 8832: Add a context manager to dom.minidom Message-ID: <930F189C8A437347B80DF2C156F7EC7F0A8D73B85D@exchis.ccp.ad.local> I haven't had any comment on this patch, are there any objections? http://bugs.python.org/issue8832 K -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 8 22:49:01 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 09 Jun 2010 06:49:01 +1000 Subject: [Python-Dev] issue 8832: Add a context manager to dom.minidom In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0A8D73B85D@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0A8D73B85D@exchis.ccp.ad.local> Message-ID: <4C0EACBD.4020202@gmail.com> On 09/06/10 05:58, Kristj?n Valur J?nsson wrote: > I haven?t had any comment on this patch, are there any objections? > > http://bugs.python.org/issue8832 Sounds good to me. One of the nice things about the context management protocol is that it doesn't interfere with any code that isn't explicitly written to take advantage of it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From victor.stinner at haypocalc.com Wed Jun 9 01:53:14 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 9 Jun 2010 01:53:14 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs Message-ID: <201006090153.14190.victor.stinner@haypocalc.com> There are two opposite issues in the bug tracker: #7475: codecs missing: base64 bz2 hex zlib ... -> reintroduce the codecs removed from Python3 #8838: Remove codecs.readbuffer_encode() -> remove the last part of the removed codecs If I understood correctly, the question is: should codecs module only contain encoding codecs, or contain also other kind of codecs. Encoding codec API is now strict (encode: str->bytes, decode: bytes->str), it's not possible to reuse str.encode() or bytes.decode() for the other codecs. Marc-Andre Lemburg proposed to add .tranform() and .untranform() methods to str, bytes and bytearray types. If I understood correctly, it would look like: >>> b'abc'.transform("hex") '616263' >>> '616263'.untranform("hex") b'abc' I suppose that each codec will have a different list of accepted input and output types. Example: bz2: encode:bytes->bytes, decode:bytes->bytes rot13: encode:str->str, decode:str->str hex: encode:bytes->str, decode: str->bytes And so "abc".encode("bz2") would raise a TypeError. -- In my opinion, we should not mix codecs of different kinds (compression, cipher, etc.) because the input and output types are different. It would have more sense to create a standard API for each kind of codec. Existing examples of standard APIs in Python: hashlib, shutil.make_archive(), database API, etc. -- Victor Stinner http://www.haypocalc.com/ From alexandre at peadrop.com Wed Jun 9 05:58:10 2010 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 8 Jun 2010 20:58:10 -0700 Subject: [Python-Dev] Future of 2.x. Message-ID: Is there is any plan for a 2.8 release? If not, I will go through the tracker and close outstanding backport requests of 3.x features to 2.x. -- Alexandre From benjamin at python.org Wed Jun 9 06:13:33 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 8 Jun 2010 23:13:33 -0500 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: Message-ID: 2010/6/8 Alexandre Vassalotti : > Is there is any plan for a 2.8 release? If not, I will go through the > tracker and close outstanding backport requests of 3.x features to > 2.x. Not from the core development team. -- Regards, Benjamin From orsenthil at gmail.com Wed Jun 9 06:30:00 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Wed, 9 Jun 2010 10:00:00 +0530 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: Message-ID: On Wed, Jun 9, 2010 at 9:28 AM, Alexandre Vassalotti wrote: > Is there is any plan for a 2.8 release? If not, I will go through the > tracker and close outstanding backport requests of 3.x features to You mean, simply mark them as Wont-Fix and close. I doubt, if this is desirable action to take. Even thought they are new features, it would still be a good idea to introduce some of them in minor releases in 2.7. I know, this deviating from the process, but it could be an option considering that 2.7 is the last of 2.x release. This is just my opinion. -- Senthil From fdrake at acm.org Wed Jun 9 07:15:09 2010 From: fdrake at acm.org (Fred Drake) Date: Wed, 9 Jun 2010 01:15:09 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: Message-ID: On Wed, Jun 9, 2010 at 12:30 AM, Senthil Kumaran wrote: > it would still be a good idea to > introduce some of them in minor releases in 2.7. I know, this > deviating from the process, but it could be an option considering that > 2.7 is the last of 2.x release. I disagree. If there are going to be features going into *any* post 2.7.0 version, there's no reason not to increment the revision number to 2.8, Since there's also a well-advertised decision that 2.7 will be the last 2.x, such a 2.8 isn't planned. But there's no reason to violate the no-features-in-bugfix-releases policy. We've seen violations cause trouble and confusion, but we've not seen it be successful. The policy wasn't arbitrary; let's stick to it. -Fred -- Fred L. Drake, Jr. "Chaos is the score upon which reality is written." --Henry Miller From chrism at plope.com Wed Jun 9 08:26:28 2010 From: chrism at plope.com (Chris McDonough) Date: Wed, 09 Jun 2010 02:26:28 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: Message-ID: <1276064788.2227.122.camel@thinko> On Wed, 2010-06-09 at 01:15 -0400, Fred Drake wrote: > On Wed, Jun 9, 2010 at 12:30 AM, Senthil Kumaran wrote: > > it would still be a good idea to > > introduce some of them in minor releases in 2.7. I know, this > > deviating from the process, but it could be an option considering that > > 2.7 is the last of 2.x release. > > I disagree. > > If there are going to be features going into *any* post 2.7.0 version, > there's no reason not to increment the revision number to 2.8, > > Since there's also a well-advertised decision that 2.7 will be the > last 2.x, such a 2.8 isn't planned. But there's no reason to violate > the no-features-in-bugfix-releases policy. We've seen violations > cause trouble and confusion, but we've not seen it be successful. > > The policy wasn't arbitrary; let's stick to it. It might be useful to copy the identifiers and URLs of all the backport request tickets into some other repository, or to create some unique state in roundup for these. Rationale: it's almost certain that if the existing Python core maintainers won't evolve Python 2.X past 2.7, some other group will, and losing existing context for that would kinda suck. - C From stephen at xemacs.org Wed Jun 9 10:07:17 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 09 Jun 2010 17:07:17 +0900 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <1276064788.2227.122.camel@thinko> References: <1276064788.2227.122.camel@thinko> Message-ID: <871vcghbnu.fsf@uwakimon.sk.tsukuba.ac.jp> Chris McDonough writes: > It might be useful to copy the identifiers and URLs of all the backport > request tickets into some other repository, or to create some unique > state in roundup for these. A keyword would do. Please don't add a status or something like that, though. From mal at egenix.com Wed Jun 9 10:41:29 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 09 Jun 2010 10:41:29 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <201006090153.14190.victor.stinner@haypocalc.com> References: <201006090153.14190.victor.stinner@haypocalc.com> Message-ID: <4C0F53B9.2020302@egenix.com> Victor Stinner wrote: > There are two opposite issues in the bug tracker: > > #7475: codecs missing: base64 bz2 hex zlib ... > -> reintroduce the codecs removed from Python3 > > #8838: Remove codecs.readbuffer_encode() > -> remove the last part of the removed codecs > > If I understood correctly, the question is: should codecs module only contain > encoding codecs, or contain also other kind of codecs. Sorry, but I can only repeat what I've already mentioned a few times on the tracker items: this is a misunderstanding. The codec system does not mandate a specific type combination (and that's per design). Only the helper methods .encode() and .decode() on bytes and str objects in Python3 do in order to provide type safety. > Encoding codec API is now strict (encode: str->bytes, decode: bytes->str), > it's not possible to reuse str.encode() or bytes.decode() for the other > codecs. Marc-Andre Lemburg proposed to add .tranform() and .untranform() > methods to str, bytes and bytearray types. If I understood correctly, it would > look like: > > >>> b'abc'.transform("hex") > '616263' > >>> '616263'.untranform("hex") > b'abc' No, .transform() and .untransform() will be interface to same-type codecs, i.e. ones that convert bytes to bytes or str to str. As with .encode()/.decode() these helper methods also implement type safety of the return type. The above example will read: >>> b'abc'.transform("hex") b'616263' >>> b'616263'.untranform("hex") b'abc' > I suppose that each codec will have a different list of accepted input and > output types. Example: > > bz2: encode:bytes->bytes, decode:bytes->bytes > rot13: encode:str->str, decode:str->str > hex: encode:bytes->str, decode: str->bytes hex will do bytes->bytes in both directions, just like it does in Python2. The methods to be used will be .transform() for the encode direction and .untransform() for the decode direction. > And so "abc".encode("bz2") would raise a TypeError. Yes. > -- > > In my opinion, we should not mix codecs of different kinds (compression, > cipher, etc.) because the input and output types are different. It would have > more sense to create a standard API for each kind of codec. Existing examples > of standard APIs in Python: hashlib, shutil.make_archive(), database API, etc. If you want, you can have those as well, but then you'd have to introduce new APIs or modules, whereas the codec interface have existed for quite a while in Python2 and are in regular use. For most applications the very simple to use codec interface to these codecs is all that is needed, so I don't see a strong case for adding new interfaces, e.g. hex_data = data.transform('hex') looks clean and neat. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 09 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 39 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Wed Jun 9 13:14:33 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 09 Jun 2010 21:14:33 +1000 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0F53B9.2020302@egenix.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> Message-ID: <4C0F7799.10700@gmail.com> On 09/06/10 18:41, M.-A. Lemburg wrote: > The methods to be used will be .transform() for the encode direction > and .untransform() for the decode direction. +1, although adding this for 3.2 would need an exception to the moratorium approved (since it is adding new methods for builtin types). Adding the same-type codecs back even without the helper methods should be fine though (less useful without the helper methods, obviously, but still valid). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Wed Jun 9 13:35:49 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 9 Jun 2010 13:35:49 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> Message-ID: <20100609133549.578157ed@pitrou.net> On Wed, 09 Jun 2010 10:41:29 +0200 "M.-A. Lemburg" wrote: > > The above example will read: > > >>> b'abc'.transform("hex") > b'616263' > >>> b'616263'.untranform("hex") > b'abc' This doesn't look right to me. Hex-encoded "data" is really text (it's a textual representation of binary, and isn't often used as an opaque binary transport encoding). Of course, this is not necessarily so for all codecs. For base64-encoded data, for example, it is debatable whether you want it as ASCII bytes or unicode text. From fuzzyman at voidspace.org.uk Wed Jun 9 13:38:45 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 09 Jun 2010 12:38:45 +0100 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <20100609133549.578157ed@pitrou.net> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> Message-ID: <4C0F7D45.4060706@voidspace.org.uk> On 09/06/2010 12:35, Antoine Pitrou wrote: > On Wed, 09 Jun 2010 10:41:29 +0200 > "M.-A. Lemburg" wrote: > >> The above example will read: >> >> >>> b'abc'.transform("hex") >> b'616263' >> >>> b'616263'.untranform("hex") >> b'abc' >> > This doesn't look right to me. Hex-encoded "data" is really text (it's > a textual representation of binary, and isn't often used as an opaque > binary transport encoding). > Of course, this is not necessarily so for all codecs. For > base64-encoded data, for example, it is debatable whether you want it > as ASCII bytes or unicode text. > But in both cases you probably want bytes -> bytes and str -> str. If you want text out then put text in, if you want bytes out then put bytes in. Michael > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From solipsis at pitrou.net Wed Jun 9 13:40:50 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 09 Jun 2010 13:40:50 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0F7D45.4060706@voidspace.org.uk> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> Message-ID: <1276083650.3143.1.camel@localhost.localdomain> Le mercredi 09 juin 2010 ? 12:38 +0100, Michael Foord a ?crit : > On 09/06/2010 12:35, Antoine Pitrou wrote: > > On Wed, 09 Jun 2010 10:41:29 +0200 > > "M.-A. Lemburg" wrote: > > > >> The above example will read: > >> > >> >>> b'abc'.transform("hex") > >> b'616263' > >> >>> b'616263'.untranform("hex") > >> b'abc' > >> > > This doesn't look right to me. Hex-encoded "data" is really text (it's > > a textual representation of binary, and isn't often used as an opaque > > binary transport encoding). > > Of course, this is not necessarily so for all codecs. For > > base64-encoded data, for example, it is debatable whether you want it > > as ASCII bytes or unicode text. > > > > But in both cases you probably want bytes -> bytes and str -> str. If > you want text out then put text in, if you want bytes out then put bytes in. No, I don't think so. If I'm using hex "encoding", it's because I want to see a text representation of some arbitrary bytestring (in order to display it inside another piece of text, for example). In other words, the purpose of hex is precisely to give a textual display of non-textual data. From rdmurray at bitdance.com Wed Jun 9 13:42:27 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 09 Jun 2010 07:42:27 -0400 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0F7799.10700@gmail.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <4C0F7799.10700@gmail.com> Message-ID: <20100609114228.5059821701A@kimball.webabinitio.net> On Wed, 09 Jun 2010 21:14:33 +1000, Nick Coghlan wrote: > On 09/06/10 18:41, M.-A. Lemburg wrote: > > The methods to be used will be .transform() for the encode direction > > and .untransform() for the decode direction. > > +1, although adding this for 3.2 would need an exception to the > moratorium approved (since it is adding new methods for builtin types). > > Adding the same-type codecs back even without the helper methods should > be fine though (less useful without the helper methods, obviously, but > still valid). Agreed. And I think making an exception to the moratorium for translate/untranslate is justified, given that this is restoring a feature that Python2 had, in a Python3 compatible manner. -- R. David Murray www.bitdance.com From mal at egenix.com Wed Jun 9 13:45:28 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 09 Jun 2010 13:45:28 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0F7799.10700@gmail.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <4C0F7799.10700@gmail.com> Message-ID: <4C0F7ED8.9000000@egenix.com> Nick Coghlan wrote: > On 09/06/10 18:41, M.-A. Lemburg wrote: >> The methods to be used will be .transform() for the encode direction >> and .untransform() for the decode direction. > > +1, although adding this for 3.2 would need an exception to the > moratorium approved (since it is adding new methods for builtin types). Good point. We already discussed these methods in 2008 and Guido approved them back then, so perhaps that's a good argument for an exception. > Adding the same-type codecs back even without the helper methods should > be fine though (less useful without the helper methods, obviously, but > still valid). Agreed. The new methods would make it easier to port to Python3, though, since e.g. data.encode('hex') is easier to convert to data.transform('hex'). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 09 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 39 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Wed Jun 9 13:53:08 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 09 Jun 2010 13:53:08 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <20100609133549.578157ed@pitrou.net> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> Message-ID: <4C0F80A4.8070002@egenix.com> Antoine Pitrou wrote: > On Wed, 09 Jun 2010 10:41:29 +0200 > "M.-A. Lemburg" wrote: >> >> The above example will read: >> >> >>> b'abc'.transform("hex") >> b'616263' >> >>> b'616263'.untranform("hex") >> b'abc' > > This doesn't look right to me. Hex-encoded "data" is really text (it's > a textual representation of binary, and isn't often used as an opaque > binary transport encoding). Then we'd need new .encode() and .decode() methods, so that we could write: >>> b'abc'.encode("hex") '616263' >>> '616263'.decode("hex") b'abc' The reason is that we don't have helper methods for the directions encoding: bytes->str and decoding: str->bytes. We do in Python2, so perhaps adding those back as well would be a possibility, but I don't want to strain all this too much. It's always possible to use: codecs.encode(b'abc') and codecs.decode('616263') instead. > Of course, this is not necessarily so for all codecs. For > base64-encoded data, for example, it is debatable whether you want it > as ASCII bytes or unicode text. Since there are multiple ways of choosing types, I would like to use the ones that Python2 already chose, if possible. The only one I'm not sure about is 'rot13': this is an encoding that is only defined for text and works by creating mangled text, so str->str appears to be more correct than str->bytes (which we have in Python2). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 09 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 39 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dirkjan at ochtman.nl Wed Jun 9 13:57:05 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Wed, 9 Jun 2010 13:57:05 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <1276083650.3143.1.camel@localhost.localdomain> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> <1276083650.3143.1.camel@localhost.localdomain> Message-ID: On Wed, Jun 9, 2010 at 13:40, Antoine Pitrou wrote: > No, I don't think so. If I'm using hex "encoding", it's because I want > to see a text representation of some arbitrary bytestring (in order to > display it inside another piece of text, for example). > In other words, the purpose of hex is precisely to give a textual > display of non-textual data. Or I want to encode binary data in a non-binary-safe protocol, in which case I probably want bytes. Cheers, Dirkjan From p.f.moore at gmail.com Wed Jun 9 13:58:17 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 9 Jun 2010 12:58:17 +0100 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <1276064788.2227.122.camel@thinko> References: <1276064788.2227.122.camel@thinko> Message-ID: On 9 June 2010 07:26, Chris McDonough wrote: > On Wed, 2010-06-09 at 01:15 -0400, Fred Drake wrote: >> On Wed, Jun 9, 2010 at 12:30 AM, Senthil Kumaran wrote: >> > it would still be a good idea to >> > introduce some of them in minor releases in 2.7. I know, this >> > deviating from the process, but it could be an option considering that >> > 2.7 is the last of 2.x release. >> >> I disagree. >> >> If there are going to be features going into *any* post 2.7.0 version, >> there's no reason not to increment the revision number to 2.8, >> >> Since there's also a well-advertised decision that 2.7 will be the >> last 2.x, such a 2.8 isn't planned. ?But there's no reason to violate >> the no-features-in-bugfix-releases policy. ?We've seen violations >> cause trouble and confusion, but we've not seen it be successful. >> >> The policy wasn't arbitrary; let's stick to it. > > It might be useful to copy the identifiers and URLs of all the backport > request tickets into some other repository, or to create some unique > state in roundup for these. ?Rationale: it's almost certain that if the > existing Python core maintainers won't evolve Python 2.X past 2.7, some > other group will, and losing existing context for that would kinda suck. Personally, as a user of Python, I'm already getting tired of the "we won't let Python 2.x die" arguments. Unless and until some other group comes along and says they definitely plan to pick up Python 2.x development (and set up or agree shared usage of all the relevant infrastructure, bug tracker, developers list, VCS, etc) I see the core developers' decision as made. 2.7 is the last Python 2.x release, and all further development will be on 3.x. On that basis I'm +1 on Alexandre's proposal. A 3rd party planning on working on a 2.8 release (not that I think such a party currently exists) can step up and extract the relevant tickets for their later reference if they feel the need. Let's not stop moving forward for the convenience of a hypothetical 2.8 development team. Paul. From solipsis at pitrou.net Wed Jun 9 14:17:48 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 9 Jun 2010 14:17:48 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> <1276083650.3143.1.camel@localhost.localdomain> Message-ID: <20100609141748.733d3e94@pitrou.net> On Wed, 9 Jun 2010 13:57:05 +0200 Dirkjan Ochtman wrote: > On Wed, Jun 9, 2010 at 13:40, Antoine Pitrou wrote: > > No, I don't think so. If I'm using hex "encoding", it's because I want > > to see a text representation of some arbitrary bytestring (in order to > > display it inside another piece of text, for example). > > In other words, the purpose of hex is precisely to give a textual > > display of non-textual data. > > Or I want to encode binary data in a non-binary-safe protocol, in > which case I probably want bytes. In this case you would probably choose a more space-efficient representation, such as base64 or base85. Which is why I think the purpose of hex is mostly for textual representation. Regards Antoine. From victor.stinner at haypocalc.com Wed Jun 9 14:18:44 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 9 Jun 2010 14:18:44 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0F53B9.2020302@egenix.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> Message-ID: <201006091418.44680.victor.stinner@haypocalc.com> Le mercredi 09 juin 2010 10:41:29, M.-A. Lemburg a ?crit : > No, .transform() and .untransform() will be interface to same-type > codecs, i.e. ones that convert bytes to bytes or str to str. As with > .encode()/.decode() these helper methods also implement type safety > of the return type. What about buffer compatible objects like array.array(), memoryview(), etc.? Should we use codecs.encode() / codecs.decode() for these types? -- Victor Stinner http://www.haypocalc.com/ From mal at egenix.com Wed Jun 9 14:34:13 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 09 Jun 2010 14:34:13 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <201006091418.44680.victor.stinner@haypocalc.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <201006091418.44680.victor.stinner@haypocalc.com> Message-ID: <4C0F8A45.5050500@egenix.com> Victor Stinner wrote: > Le mercredi 09 juin 2010 10:41:29, M.-A. Lemburg a ?crit : >> No, .transform() and .untransform() will be interface to same-type >> codecs, i.e. ones that convert bytes to bytes or str to str. As with >> .encode()/.decode() these helper methods also implement type safety >> of the return type. > > What about buffer compatible objects like array.array(), memoryview(), etc.? > Should we use codecs.encode() / codecs.decode() for these types? Yes, or call the encoders/decoders directly by first fetching them via codecs.lookup(). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 09 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 39 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Wed Jun 9 14:47:22 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 09 Jun 2010 22:47:22 +1000 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <201006091418.44680.victor.stinner@haypocalc.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <201006091418.44680.victor.stinner@haypocalc.com> Message-ID: <4C0F8D5A.8010706@gmail.com> On 09/06/10 22:18, Victor Stinner wrote: > Le mercredi 09 juin 2010 10:41:29, M.-A. Lemburg a ?crit : >> No, .transform() and .untransform() will be interface to same-type >> codecs, i.e. ones that convert bytes to bytes or str to str. As with >> .encode()/.decode() these helper methods also implement type safety >> of the return type. > > What about buffer compatible objects like array.array(), memoryview(), etc.? > Should we use codecs.encode() / codecs.decode() for these types? There are probably enough subtleties that this is all worth specifying in a PEP: - which codecs from 2.x are to be restored - the domain each codec operates in (binary data or text)* - review behaviour of codecs.encode and codecs.decode - behaviour of the new str, bytes and bytearray (un)transform methods - whether to add helper methods for reverse codecs (like base64) The PEP would also serve as a reference back to both this discussion and the previous one (which was long enough ago that I've forgotten most of it). *Some are obvious, such as rot13 being text only, and bz2 being binary data only, but others are less clear. hex could be either str->str or bytes->bytes, since ''.join(map(chr, seq)) and b''.join(map(ord, seq)) allow each of them to be implemented trivially in terms of the other. As Antoine pointed out, base64 is really a reverse codec (encode from bytes->str, decode from str->bytes), so it still wouldn't be covered by the new transformation helper methods. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From facundobatista at gmail.com Wed Jun 9 14:55:43 2010 From: facundobatista at gmail.com (Facundo Batista) Date: Wed, 9 Jun 2010 09:55:43 -0300 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <1276064788.2227.122.camel@thinko> Message-ID: On Wed, Jun 9, 2010 at 8:58 AM, Paul Moore wrote: > On that basis I'm +1 on Alexandre's proposal. A 3rd party planning on > working on a 2.8 release (not that I think such a party currently > exists) can step up and extract the relevant tickets for their later > reference if they feel the need. Let's not stop moving forward for the > convenience of a hypothetical 2.8 development team. Yes, closing the tickets as "won't fix" and tagging them as "will-never-happen-in-2.x" or something, is the best combination of both worlds: it will clean the tracker and ease further developments, and will allow anybody to pick up those tickets later. (I'm +1 too to Alexandre's proposal, btw) -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From steve at holdenweb.com Wed Jun 9 14:56:30 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 09 Jun 2010 20:56:30 +0800 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <1276064788.2227.122.camel@thinko> Message-ID: Paul Moore wrote: > On 9 June 2010 07:26, Chris McDonough wrote: >> On Wed, 2010-06-09 at 01:15 -0400, Fred Drake wrote: >>> On Wed, Jun 9, 2010 at 12:30 AM, Senthil Kumaran wrote: >>>> it would still be a good idea to >>>> introduce some of them in minor releases in 2.7. I know, this >>>> deviating from the process, but it could be an option considering that >>>> 2.7 is the last of 2.x release. >>> I disagree. >>> >>> If there are going to be features going into *any* post 2.7.0 version, >>> there's no reason not to increment the revision number to 2.8, >>> >>> Since there's also a well-advertised decision that 2.7 will be the >>> last 2.x, such a 2.8 isn't planned. But there's no reason to violate >>> the no-features-in-bugfix-releases policy. We've seen violations >>> cause trouble and confusion, but we've not seen it be successful. >>> >>> The policy wasn't arbitrary; let's stick to it. >> It might be useful to copy the identifiers and URLs of all the backport >> request tickets into some other repository, or to create some unique >> state in roundup for these. Rationale: it's almost certain that if the >> existing Python core maintainers won't evolve Python 2.X past 2.7, some >> other group will, and losing existing context for that would kinda suck. > > Personally, as a user of Python, I'm already getting tired of the "we > won't let Python 2.x die" arguments. Unless and until some other group > comes along and says they definitely plan to pick up Python 2.x > development (and set up or agree shared usage of all the relevant > infrastructure, bug tracker, developers list, VCS, etc) I see the core > developers' decision as made. 2.7 is the last Python 2.x release, and > all further development will be on 3.x. > > On that basis I'm +1 on Alexandre's proposal. A 3rd party planning on > working on a 2.8 release (not that I think such a party currently > exists) can step up and extract the relevant tickets for their later > reference if they feel the need. Let's not stop moving forward for the > convenience of a hypothetical 2.8 development team. > How does throwing away information represent "moving forward"? I have to say I am surprised by the current lack of momentum behind 3.x, but I do know users who consider that their current investment in the 2.x series is unlikely to migrate to 3.x in the next five years, and it would be strange if they didn't continue to develop 2.x (including backporting some 3.x features). I don't see why we have to make such work harder than it need be. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From fuzzyman at voidspace.org.uk Wed Jun 9 15:05:51 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 09 Jun 2010 14:05:51 +0100 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <1276064788.2227.122.camel@thinko> Message-ID: <4C0F91AF.1000401@voidspace.org.uk> On 09/06/2010 13:56, Steve Holden wrote: > Paul Moore wrote: > >> On 9 June 2010 07:26, Chris McDonough wrote: >> >>> On Wed, 2010-06-09 at 01:15 -0400, Fred Drake wrote: >>> >>>> On Wed, Jun 9, 2010 at 12:30 AM, Senthil Kumaran wrote: >>>> >>>>> it would still be a good idea to >>>>> introduce some of them in minor releases in 2.7. I know, this >>>>> deviating from the process, but it could be an option considering that >>>>> 2.7 is the last of 2.x release. >>>>> >>>> I disagree. >>>> >>>> If there are going to be features going into *any* post 2.7.0 version, >>>> there's no reason not to increment the revision number to 2.8, >>>> >>>> Since there's also a well-advertised decision that 2.7 will be the >>>> last 2.x, such a 2.8 isn't planned. But there's no reason to violate >>>> the no-features-in-bugfix-releases policy. We've seen violations >>>> cause trouble and confusion, but we've not seen it be successful. >>>> >>>> The policy wasn't arbitrary; let's stick to it. >>>> >>> It might be useful to copy the identifiers and URLs of all the backport >>> request tickets into some other repository, or to create some unique >>> state in roundup for these. Rationale: it's almost certain that if the >>> existing Python core maintainers won't evolve Python 2.X past 2.7, some >>> other group will, and losing existing context for that would kinda suck. >>> >> Personally, as a user of Python, I'm already getting tired of the "we >> won't let Python 2.x die" arguments. Unless and until some other group >> comes along and says they definitely plan to pick up Python 2.x >> development (and set up or agree shared usage of all the relevant >> infrastructure, bug tracker, developers list, VCS, etc) I see the core >> developers' decision as made. 2.7 is the last Python 2.x release, and >> all further development will be on 3.x. >> >> On that basis I'm +1 on Alexandre's proposal. A 3rd party planning on >> working on a 2.8 release (not that I think such a party currently >> exists) can step up and extract the relevant tickets for their later >> reference if they feel the need. Let's not stop moving forward for the >> convenience of a hypothetical 2.8 development team. >> >> > How does throwing away information represent "moving forward"? > > I'm inclined to agree. There is no *need* to close these tickets now. > I have to say I am surprised by the current lack of momentum behind 3.x, > but I do know users who consider that their current investment in the > 2.x series is unlikely to migrate to 3.x in the next five years, and it > would be strange if they didn't continue to develop 2.x (including > backporting some 3.x features). > Who is the 'they' in your last sentence here? It seems to imply the 'users'... Certainly no-one specific (neither individual nor group) have stepped up and said they will continue to develop Python 2.x. Even if they did it is not clear that they would use the python.org infrastructure to do it. The Python core developers (basically) *have* moved on and are unlikely to further develop 2.x. We'll see though, it's all speculation at the moment. All the best, Michael > I don't see why we have to make such work harder than it need be. > > regards > Steve > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From barry at python.org Wed Jun 9 16:12:24 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 9 Jun 2010 10:12:24 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: Message-ID: <20100609101224.4425723d@heresy> On Jun 09, 2010, at 01:15 AM, Fred Drake wrote: >On Wed, Jun 9, 2010 at 12:30 AM, Senthil Kumaran wrote: >> it would still be a good idea to >> introduce some of them in minor releases in 2.7. I know, this >> deviating from the process, but it could be an option considering that >> 2.7 is the last of 2.x release. > >I disagree. > >If there are going to be features going into *any* post 2.7.0 version, >there's no reason not to increment the revision number to 2.8, > >Since there's also a well-advertised decision that 2.7 will be the >last 2.x, such a 2.8 isn't planned. But there's no reason to violate >the no-features-in-bugfix-releases policy. We've seen violations >cause trouble and confusion, but we've not seen it be successful. > >The policy wasn't arbitrary; let's stick to it. I completely agree with Fred. New features in point releases will cause many more headaches than opening up a 2.8, which I still hope we don't do. I'd rather see all that pent up energy focussed on doing whatever we can to help people transition to Python 3. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From victor.stinner at haypocalc.com Wed Jun 9 16:35:38 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 9 Jun 2010 16:35:38 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0F8D5A.8010706@gmail.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <201006091418.44680.victor.stinner@haypocalc.com> <4C0F8D5A.8010706@gmail.com> Message-ID: <201006091635.38538.victor.stinner@haypocalc.com> Le mercredi 09 juin 2010 14:47:22, Nick Coghlan a ?crit : > *Some are obvious, such as rot13 being text only, Should rot13 shift any unicode character, or just a-z and A-Z? Python2 only changes characters a-z and A-Z, and use ISO-8859-1 to encode unicode to byte string. >>> u"abc ?".encode("rot13") 'nop \xe9' >>> u"abc \u2c01".encode("rot13") Traceback (most recent call last): ... UnicodeEncodeError: 'charmap' codec can't encode character u'\u2c01' in position 4: character maps to -- Victor Stinner http://www.haypocalc.com/ From mal at egenix.com Wed Jun 9 16:42:28 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 09 Jun 2010 16:42:28 +0200 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <4C0F91AF.1000401@voidspace.org.uk> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> Message-ID: <4C0FA854.1080400@egenix.com> Michael Foord wrote: >> How does throwing away information represent "moving forward"? > > I'm inclined to agree. There is no *need* to close these tickets now. > >> I have to say I am surprised by the current lack of momentum behind 3.x, >> but I do know users who consider that their current investment in the >> 2.x series is unlikely to migrate to 3.x in the next five years, and it >> would be strange if they didn't continue to develop 2.x (including >> backporting some 3.x features). >> > > Who is the 'they' in your last sentence here? It seems to imply the > 'users'... Certainly no-one specific (neither individual nor group) have > stepped up and said they will continue to develop Python 2.x. Even if > they did it is not clear that they would use the python.org > infrastructure to do it. The Python core developers (basically) *have* > moved on and are unlikely to further develop 2.x. We'll see though, it's > all speculation at the moment. I think it also depends on which core developers you ask :-) Many of them are not keen on having to maintain Python2 for much longer, but some of them may have assets codified in Python2 or interests based Python2 that they'll want to keep for more than just another 5 years. E.g. we still have customers that are on Python 2.3 and have just recently considered moving to Python 2.5. Depending on where you look, motivations are rather diverse. It's certainly not fair to require all core developers to continue working on Python2, but it would also be unfair to cancel out that possibility for a subset of interested devs. Even more so, since it doesn't really create any extra work for those that have no interest. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 09 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 39 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From barry at python.org Wed Jun 9 17:12:38 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 9 Jun 2010 11:12:38 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <4C0FA854.1080400@egenix.com> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> Message-ID: <20100609111238.7c017907@heresy> On Jun 09, 2010, at 04:42 PM, M.-A. Lemburg wrote: >Many of them are not keen on having to maintain Python2 for much >longer, but some of them may have assets codified in Python2 >or interests based Python2 that they'll want to keep for >more than just another 5 years. > >E.g. we still have customers that are on Python 2.3 and have >just recently considered moving to Python 2.5. Depending on where >you look, motivations are rather diverse. > >It's certainly not fair to require all core developers to >continue working on Python2, but it would also be unfair to >cancel out that possibility for a subset of interested devs. >Even more so, since it doesn't really create any extra work >for those that have no interest. Note that Python 2.7 will be *maintained* for a very long time, which should satisfy those folks who still require Python 2. Anybody on older (and currently unmaintained) versions of Python 2 will not care about new features so a Python 2.8 wouldn't help them anyway. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From janssen at parc.com Wed Jun 9 18:07:04 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 9 Jun 2010 09:07:04 PDT Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <1276083650.3143.1.camel@localhost.localdomain> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> <1276083650.3143.1.camel@localhost.localdomain> Message-ID: <19213.1276099624@parc.com> Antoine Pitrou wrote: > Le mercredi 09 juin 2010 ? 12:38 +0100, Michael Foord a ?crit : > > On 09/06/2010 12:35, Antoine Pitrou wrote: > > > On Wed, 09 Jun 2010 10:41:29 +0200 > > > "M.-A. Lemburg" wrote: > > > > > >> The above example will read: > > >> > > >> >>> b'abc'.transform("hex") > > >> b'616263' > > >> >>> b'616263'.untranform("hex") > > >> b'abc' > > >> > > > This doesn't look right to me. Hex-encoded "data" is really text (it's > > > a textual representation of binary, and isn't often used as an opaque > > > binary transport encoding). > > > Of course, this is not necessarily so for all codecs. For > > > base64-encoded data, for example, it is debatable whether you want it > > > as ASCII bytes or unicode text. > > > > > > > But in both cases you probably want bytes -> bytes and str -> str. If > > you want text out then put text in, if you want bytes out then put bytes in. > > No, I don't think so. If I'm using hex "encoding", it's because I want > to see a text representation of some arbitrary bytestring (in order to > display it inside another piece of text, for example). > In other words, the purpose of hex is precisely to give a textual > display of non-textual data. Yes. And base64, and quoted-printable, etc. Bill From janssen at parc.com Wed Jun 9 18:13:20 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 9 Jun 2010 09:13:20 PDT Subject: [Python-Dev] Future of 2.x. In-Reply-To: <20100609111238.7c017907@heresy> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> <20100609111238.7c017907@heresy> Message-ID: <19370.1276100000@parc.com> Barry Warsaw wrote: > On Jun 09, 2010, at 04:42 PM, M.-A. Lemburg wrote: > > >Many of them are not keen on having to maintain Python2 for much > >longer, but some of them may have assets codified in Python2 > >or interests based Python2 that they'll want to keep for > >more than just another 5 years. > > > >E.g. we still have customers that are on Python 2.3 and have > >just recently considered moving to Python 2.5. Depending on where > >you look, motivations are rather diverse. > > > >It's certainly not fair to require all core developers to > >continue working on Python2, but it would also be unfair to > >cancel out that possibility for a subset of interested devs. > >Even more so, since it doesn't really create any extra work > >for those that have no interest. > > Note that Python 2.7 will be *maintained* for a very long time, which > should satisfy those folks who still require Python 2. Anybody on > older (and currently unmaintained) versions of Python 2 will not care > about new features so a Python 2.8 wouldn't help them anyway. There are two kinds of new features, though. Those added to improve (or at any rate modify :-) the product, and those added to keep the product relevant to a changing external world (new operating systems, new communication protocols, etc.) I think it would take a pretty strong crystal ball to be able to rule out the latter kind of feature add from the 2.x line. Bill From barry at python.org Wed Jun 9 18:32:23 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 9 Jun 2010 12:32:23 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <19370.1276100000@parc.com> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> <20100609111238.7c017907@heresy> <19370.1276100000@parc.com> Message-ID: <20100609123223.27838ab4@heresy> On Jun 09, 2010, at 09:13 AM, Bill Janssen wrote: >Barry Warsaw wrote: > >> Note that Python 2.7 will be *maintained* for a very long time, which >> should satisfy those folks who still require Python 2. Anybody on >> older (and currently unmaintained) versions of Python 2 will not care >> about new features so a Python 2.8 wouldn't help them anyway. > >There are two kinds of new features, though. Those added to improve (or >at any rate modify :-) the product, and those added to keep the product >relevant to a changing external world (new operating systems, new >communication protocols, etc.) I think it would take a pretty strong >crystal ball to be able to rule out the latter kind of feature add from >the 2.x line. The latter should mostly be supported by third party packages available in the Cheeseshop. To the extent that such support can't be effected by add-ons (e.g. new OS support), I think a better approach would be to encourage and allow unofficial ports by utilizing dvcs branches (we *are* moving to Mercurial after Python 2.7 final is released, right?). I think we should plan on 2.7 being the last Python 2, and spend lots of effort to get people onto Python 3, partially by offering big carrots like Unladen Swallow, a better/no GIL, etc. I think it should be part of the PSF's mission to help that happen through directed sponsorship, sprints, and other tools. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From jnoller at gmail.com Wed Jun 9 19:16:30 2010 From: jnoller at gmail.com (Jesse Noller) Date: Wed, 9 Jun 2010 13:16:30 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <20100609123223.27838ab4@heresy> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> <20100609111238.7c017907@heresy> <19370.1276100000@parc.com> <20100609123223.27838ab4@heresy> Message-ID: On Wed, Jun 9, 2010 at 12:32 PM, Barry Warsaw wrote: > On Jun 09, 2010, at 09:13 AM, Bill Janssen wrote: > >>Barry Warsaw wrote: >> >>> Note that Python 2.7 will be *maintained* for a very long time, which >>> should satisfy those folks who still require Python 2. ?Anybody on >>> older (and currently unmaintained) versions of Python 2 will not care >>> about new features so a Python 2.8 wouldn't help them anyway. >> >>There are two kinds of new features, though. ?Those added to improve (or >>at any rate modify :-) the product, and those added to keep the product >>relevant to a changing external world (new operating systems, new >>communication protocols, etc.) ?I think it would take a pretty strong >>crystal ball to be able to rule out the latter kind of feature add from >>the 2.x line. > > The latter should mostly be supported by third party packages available in the > Cheeseshop. ?To the extent that such support can't be effected by add-ons > (e.g. new OS support), I think a better approach would be to encourage and > allow unofficial ports by utilizing dvcs branches (we *are* moving to > Mercurial after Python 2.7 final is released, right?). > > I think we should plan on 2.7 being the last Python 2, and spend lots of effort > to get people onto Python 3, partially by offering big carrots like Unladen > Swallow, a better/no GIL, etc. ?I think it should be part of the PSF's mission > to help that happen through directed sponsorship, sprints, and other tools. > > -Barry +1 fearless FLUFL From brett at python.org Wed Jun 9 19:41:47 2010 From: brett at python.org (Brett Cannon) Date: Wed, 9 Jun 2010 10:41:47 -0700 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <20100609111238.7c017907@heresy> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> <20100609111238.7c017907@heresy> Message-ID: On Wed, Jun 9, 2010 at 08:12, Barry Warsaw wrote: > On Jun 09, 2010, at 04:42 PM, M.-A. Lemburg wrote: > >>Many of them are not keen on having to maintain Python2 for much >>longer, but some of them may have assets codified in Python2 >>or interests based Python2 that they'll want to keep for >>more than just another 5 years. >> >>E.g. we still have customers that are on Python 2.3 and have >>just recently considered moving to Python 2.5. Depending on where >>you look, motivations are rather diverse. >> >>It's certainly not fair to require all core developers to >>continue working on Python2, but it would also be unfair to >>cancel out that possibility for a subset of interested devs. >>Even more so, since it doesn't really create any extra work >>for those that have no interest. > > Note that Python 2.7 will be *maintained* for a very long time, which should > satisfy those folks who still require Python 2. ?Anybody on older (and > currently unmaintained) versions of Python 2 will not care about new features > so a Python 2.8 wouldn't help them anyway. The other point about Alexandre's desire to close the issues is that nothing is really getting deleted; closed issues can still be searched for. Alexandre simply wants to not waste anyone's time who happens to be looking at the tracker with issues that the core team will simply never work on. If some mythical 2.8 fork of Python comes along they can perform a search and find the issues that were closed because they were backports that never happened. So +1 on closing them out. From raymond.hettinger at gmail.com Wed Jun 9 19:45:13 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Wed, 9 Jun 2010 10:45:13 -0700 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: Message-ID: On Jun 8, 2010, at 9:13 PM, Benjamin Peterson wrote: > 2010/6/8 Alexandre Vassalotti : >> Is there is any plan for a 2.8 release? If not, I will go through the >> tracker and close outstanding backport requests of 3.x features to >> 2.x. > > Not from the core development team. The current plan is to make 2.7 the last 2.x release. The theory is that this will encourage people to switch to 3.x. In practice, the users will get a say in this and time will tell. When I do polls at conferences, it seems that most participants have briefly tried 3.x but are continuing to develop in 2.x. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmurray at bitdance.com Wed Jun 9 19:59:20 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Wed, 09 Jun 2010 13:59:20 -0400 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <201006091635.38538.victor.stinner@haypocalc.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <201006091418.44680.victor.stinner@haypocalc.com> <4C0F8D5A.8010706@gmail.com> <201006091635.38538.victor.stinner@haypocalc.com> Message-ID: <20100609175920.1B46A21849A@kimball.webabinitio.net> On Wed, 09 Jun 2010 16:35:38 +0200, Victor Stinner wrote: > Le mercredi 09 juin 2010 14:47:22, Nick Coghlan a =E9crit : > > *Some are obvious, such as rot13 being text only, > > Should rot13 shift any unicode character, or just a-z and A-Z? The latter, unless you want to do a lot of work: http://unicode.org/mail-arch/unicode-ml/y2007-m12/0047.html -- R. David Murray www.bitdance.com From tjreedy at udel.edu Wed Jun 9 21:28:25 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 09 Jun 2010 15:28:25 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <871vcghbnu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1276064788.2227.122.camel@thinko> <871vcghbnu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6/9/2010 4:07 AM, Stephen J. Turnbull wrote: > Chris McDonough writes: > > > It might be useful to copy the identifiers and URLs of all the backport > > request tickets into some other repository, or to create some unique > > state in roundup for these. Closed issues are not lost. They can still be searched and the result downloaded. > A keyword would do. Please don't add a status or something like that, > though. I believe Type: feature request; Version: 2.7; Resolution wont fix should do fine now. I believe Alexander will use the first two to find things to close. Anything else anyone finds could be made to match. Terry Jan Reedy From tjreedy at udel.edu Wed Jun 9 21:28:30 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 09 Jun 2010 15:28:30 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <4C0FA854.1080400@egenix.com> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> Message-ID: On 6/9/2010 10:42 AM, M.-A. Lemburg wrote: >> Steve Holden wrote >>> How does throwing away information represent "moving forward"? 'Closing' a tracker issue does not 'throw away' information', it *adds* information as to current intention. > It's certainly not fair to require all core developers to > continue working on Python2, but it would also be unfair to > cancel out that possibility for a subset of interested devs. Closing a set of issues does not cancel out that possibility. If such a subset of devs develops, they can easily reopen (or move) particular issues they are interested in working on. From tjreedy at udel.edu Wed Jun 9 21:39:04 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 09 Jun 2010 15:39:04 -0400 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0F7ED8.9000000@egenix.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <4C0F7799.10700@gmail.com> <4C0F7ED8.9000000@egenix.com> Message-ID: On 6/9/2010 7:45 AM, M.-A. Lemburg wrote: > Nick Coghlan wrote: >> On 09/06/10 18:41, M.-A. Lemburg wrote: >>> The methods to be used will be .transform() for the encode direction >>> and .untransform() for the decode direction. >> >> +1, although adding this for 3.2 would need an exception to the >> moratorium approved (since it is adding new methods for builtin types). +1 also. This is neither new syntax, nor, really a new feature. > > Good point. > > We already discussed these methods in 2008 and Guido > approved them back then, so perhaps that's a good argument > for an exception. > >> Adding the same-type codecs back even without the helper methods should >> be fine though (less useful without the helper methods, obviously, but >> still valid). > > Agreed. > > The new methods would make it easier to port to Python3, though, > since e.g. data.encode('hex') is easier to convert to > data.transform('hex'). That would definitely be a point in favor of getting this in 3.2, with appropriate additions to 2to3. From eric at trueblade.com Wed Jun 9 21:40:08 2010 From: eric at trueblade.com (Eric Smith) Date: Wed, 9 Jun 2010 15:40:08 -0400 (EDT) Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <1276064788.2227.122.camel@thinko> <871vcghbnu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: > On 6/9/2010 4:07 AM, Stephen J. Turnbull wrote: > Closed issues are not lost. They can still be searched and the result > downloaded. > >> A keyword would do. Please don't add a status or something like that, >> though. > > I believe Type: feature request; Version: 2.7; Resolution wont fix > should do fine now. I believe Alexander will use the first two to find > things to close. Anything else anyone finds could be made to match. Are there any currently existing issues that match that criteria (feature request, 2.7, won't fix)? I don't have good connectivity here so I can't check. Eric. From tjreedy at udel.edu Wed Jun 9 21:45:55 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 09 Jun 2010 15:45:55 -0400 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <20100609141748.733d3e94@pitrou.net> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> <1276083650.3143.1.camel@localhost.localdomain> <20100609141748.733d3e94@pitrou.net> Message-ID: On 6/9/2010 8:17 AM, Antoine Pitrou wrote: > On Wed, 9 Jun 2010 13:57:05 +0200 > Dirkjan Ochtman wrote: >> On Wed, Jun 9, 2010 at 13:40, Antoine Pitrou wrote: >>> No, I don't think so. If I'm using hex "encoding", it's because I want >>> to see a text representation of some arbitrary bytestring (in order to >>> display it inside another piece of text, for example). >>> In other words, the purpose of hex is precisely to give a textual >>> display of non-textual data. >> >> Or I want to encode binary data in a non-binary-safe protocol, in >> which case I probably want bytes. > > In this case you would probably choose a more space-efficient > representation, such as base64 or base85. Unless the receiver expects hex. Please, hextext = str(somebytes.tranform('hex')) is quite easy and explicit and will work for any bytes to ascii-subset transform, not just 'hex'. Keep .transform and .untransform simple by *always* going to/from same type. Terry Jan Reedy From brett at python.org Wed Jun 9 21:55:06 2010 From: brett at python.org (Brett Cannon) Date: Wed, 9 Jun 2010 12:55:06 -0700 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <1276064788.2227.122.camel@thinko> <871vcghbnu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jun 9, 2010 at 12:40, Eric Smith wrote: >> On 6/9/2010 4:07 AM, Stephen J. Turnbull wrote: >> Closed issues are not lost. They can still be searched and the result >> downloaded. >> >>> A keyword would do. ?Please don't add a status or something like that, >>> though. >> >> I believe Type: feature request; Version: 2.7; Resolution wont fix >> should do fine now. I believe Alexander will use the first two to find >> things to close. Anything else anyone finds could be made to match. > > Are there any currently existing issues that match that criteria (feature > request, 2.7, won't fix)? 2.7, closed, wont fix has 27 issues at the moment, which is obviously small and easy to peruse. -Brett > > I don't have good connectivity here so I can't check. > > Eric. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From solipsis at pitrou.net Wed Jun 9 21:56:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 9 Jun 2010 21:56:59 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> <1276083650.3143.1.camel@localhost.localdomain> <20100609141748.733d3e94@pitrou.net> Message-ID: <20100609215659.0ea27cde@pitrou.net> On Wed, 09 Jun 2010 15:45:55 -0400 Terry Reedy wrote: > On 6/9/2010 8:17 AM, Antoine Pitrou wrote: > > On Wed, 9 Jun 2010 13:57:05 +0200 > > Dirkjan Ochtman wrote: > >> On Wed, Jun 9, 2010 at 13:40, Antoine Pitrou wrote: > >>> No, I don't think so. If I'm using hex "encoding", it's because I want > >>> to see a text representation of some arbitrary bytestring (in order to > >>> display it inside another piece of text, for example). > >>> In other words, the purpose of hex is precisely to give a textual > >>> display of non-textual data. > >> > >> Or I want to encode binary data in a non-binary-safe protocol, in > >> which case I probably want bytes. > > > > In this case you would probably choose a more space-efficient > > representation, such as base64 or base85. > > Unless the receiver expects hex. In which cases is this true? Hex is rarely used for ASCII-encoding of binary data, precisely because its efficiency is poor. > Please, hextext = str(somebytes.tranform('hex')) is quite easy and > explicit and will work for any bytes to ascii-subset transform, not just > 'hex'. It will give you the str representation of a bytes object, which is not what you want. Of course, hextext = somebytes.tranform('hex').decode('ascii') is not very hard either. But I disagree with the overall idea that bytes is the good output type for hex encoding. Regards Antoine. From martin at v.loewis.de Wed Jun 9 22:13:28 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 09 Jun 2010 22:13:28 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <1276083650.3143.1.camel@localhost.localdomain> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> <1276083650.3143.1.camel@localhost.localdomain> Message-ID: <4C0FF5E8.60305@v.loewis.de> >> But in both cases you probably want bytes -> bytes and str -> str. If >> you want text out then put text in, if you want bytes out then put bytes in. > > No, I don't think so. If I'm using hex "encoding", it's because I want > to see a text representation of some arbitrary bytestring (in order to > display it inside another piece of text, for example). > In other words, the purpose of hex is precisely to give a textual > display of non-textual data. I think this is the way it is for consistency reasons (which I would not lightly wish away). I think you agree that base64 is a bytes->bytes transformation (because you typically use it as a payload on some wire protocol). So: py> binascii.b2a_base64(b'foo') b'Zm9v\n' py> binascii.b2a_hex(b'foo') b'666f6f' Now, I'd admit that "b2a" may be a misnomer (binary -> ASCII), but then it may not because ASCII actually *also* implies "bytes" (it's an encoding). So what would you propose to change: b2a_hex should return a Unicode string? or this future transform method should return a Unicode string, whereas the module returns bytes? Something else? Regards, Martin From martin at v.loewis.de Wed Jun 9 22:18:34 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Jun 2010 22:18:34 +0200 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <1276064788.2227.122.camel@thinko> References: <1276064788.2227.122.camel@thinko> Message-ID: <4C0FF71A.1030702@v.loewis.de> > > It might be useful to copy the identifiers and URLs of all the backport > request tickets into some other repository, or to create some unique > state in roundup for these. Rationale: it's almost certain that if the > existing Python core maintainers won't evolve Python 2.X past 2.7, some > other group will, and losing existing context for that would kinda suck. Roundup keeps track of all status changes, see the bottom of an arbitrary issue for an example. So I don't think any additional recording is necessary. Regards, Martin From martin at v.loewis.de Wed Jun 9 22:23:58 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 09 Jun 2010 22:23:58 +0200 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: Message-ID: <4C0FF85E.9080203@v.loewis.de> Am 09.06.2010 05:58, schrieb Alexandre Vassalotti: > Is there is any plan for a 2.8 release? If not, I will go through the > tracker and close outstanding backport requests of 3.x features to > 2.x. Closing the backport requests is fine. For the feature requests, I'd only close them *after* the 2.7 release (after determining that they won't apply to 3.x, of course). There aren't that many backport requests, anyway, are there? Regards, Martin From solipsis at pitrou.net Wed Jun 9 22:26:25 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 9 Jun 2010 22:26:25 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0FF5E8.60305@v.loewis.de> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> <1276083650.3143.1.camel@localhost.localdomain> <4C0FF5E8.60305@v.loewis.de> Message-ID: <20100609222625.43a216f2@pitrou.net> On Wed, 09 Jun 2010 22:13:28 +0200 "Martin v. L?wis" wrote: > py> binascii.b2a_base64(b'foo') > b'Zm9v\n' > py> binascii.b2a_hex(b'foo') > b'666f6f' > > Now, I'd admit that "b2a" may be a misnomer (binary -> ASCII), but then > it may not because ASCII actually *also* implies "bytes" (it's an encoding). > > So what would you propose to change: b2a_hex should return a Unicode > string? or this future transform method should return a Unicode string, > whereas the module returns bytes? Something else? Well, I would propose transform return str whereas b2a_hex returns bytes. But I agree the consistency argument with b2a_hex looks quite strong. (speaking of which, the builtin hex() functions returns str, although it's purpose is slightly different) Regards Antoine. From steve at holdenweb.com Thu Jun 10 03:01:00 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 10 Jun 2010 09:01:00 +0800 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <20100609123223.27838ab4@heresy> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> <20100609111238.7c017907@heresy> <19370.1276100000@parc.com> <20100609123223.27838ab4@heresy> Message-ID: <4C10394C.10707@holdenweb.com> Barry Warsaw wrote: > On Jun 09, 2010, at 09:13 AM, Bill Janssen wrote: > >> Barry Warsaw wrote: >> >>> Note that Python 2.7 will be *maintained* for a very long time, which >>> should satisfy those folks who still require Python 2. Anybody on >>> older (and currently unmaintained) versions of Python 2 will not care >>> about new features so a Python 2.8 wouldn't help them anyway. >> There are two kinds of new features, though. Those added to improve (or >> at any rate modify :-) the product, and those added to keep the product >> relevant to a changing external world (new operating systems, new >> communication protocols, etc.) I think it would take a pretty strong >> crystal ball to be able to rule out the latter kind of feature add from >> the 2.x line. > > The latter should mostly be supported by third party packages available in the > Cheeseshop. To the extent that such support can't be effected by add-ons > (e.g. new OS support), I think a better approach would be to encourage and > allow unofficial ports by utilizing dvcs branches (we *are* moving to > Mercurial after Python 2.7 final is released, right?). > > I think we should plan on 2.7 being the last Python 2, and spend lots of effort > to get people onto Python 3, partially by offering big carrots like Unladen > Swallow, a better/no GIL, etc. I think it should be part of the PSF's mission > to help that happen through directed sponsorship, sprints, and other tools. > The current stumbling block isn't the language itself, it's the lack of support from third-party libraries. GSoC is addressing some of these issues, but so far we (the PSF, the dev community, anybody else except R. David Murray) haven't really come to grips with intractable problems like the broken state of the email package, and we are not doing well at attracting funds to support it. So I think we need to address a larger issue than just the language. As a development community we decided to change the language. Now we have to do what we can to ensure that the changed language has appropriate support. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Thu Jun 10 03:01:46 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 10 Jun 2010 09:01:46 +0800 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> Message-ID: <4C10397A.9000504@holdenweb.com> Terry Reedy wrote: > On 6/9/2010 10:42 AM, M.-A. Lemburg wrote: > >>> Steve Holden wrote >>>> How does throwing away information represent "moving forward"? > > 'Closing' a tracker issue does not 'throw away' information', it *adds* > information as to current intention. > >> It's certainly not fair to require all core developers to >> continue working on Python2, but it would also be unfair to >> cancel out that possibility for a subset of interested devs. > > Closing a set of issues does not cancel out that possibility. If such a > subset of devs develops, they can easily reopen (or move) particular > issues they are interested in working on. > > As long as that's the case I am fine with the change. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Thu Jun 10 03:01:46 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 10 Jun 2010 09:01:46 +0800 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> Message-ID: <4C10397A.9000504@holdenweb.com> Terry Reedy wrote: > On 6/9/2010 10:42 AM, M.-A. Lemburg wrote: > >>> Steve Holden wrote >>>> How does throwing away information represent "moving forward"? > > 'Closing' a tracker issue does not 'throw away' information', it *adds* > information as to current intention. > >> It's certainly not fair to require all core developers to >> continue working on Python2, but it would also be unfair to >> cancel out that possibility for a subset of interested devs. > > Closing a set of issues does not cancel out that possibility. If such a > subset of devs develops, they can easily reopen (or move) particular > issues they are interested in working on. > > As long as that's the case I am fine with the change. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Thu Jun 10 03:02:56 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 10 Jun 2010 09:02:56 +0800 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <20100609101224.4425723d@heresy> References: <20100609101224.4425723d@heresy> Message-ID: Barry Warsaw wrote: > On Jun 09, 2010, at 01:15 AM, Fred Drake wrote: > >> On Wed, Jun 9, 2010 at 12:30 AM, Senthil Kumaran wrote: >>> it would still be a good idea to >>> introduce some of them in minor releases in 2.7. I know, this >>> deviating from the process, but it could be an option considering that >>> 2.7 is the last of 2.x release. >> I disagree. >> >> If there are going to be features going into *any* post 2.7.0 version, >> there's no reason not to increment the revision number to 2.8, >> >> Since there's also a well-advertised decision that 2.7 will be the >> last 2.x, such a 2.8 isn't planned. But there's no reason to violate >> the no-features-in-bugfix-releases policy. We've seen violations >> cause trouble and confusion, but we've not seen it be successful. >> >> The policy wasn't arbitrary; let's stick to it. > > I completely agree with Fred. New features in point releases will cause many > more headaches than opening up a 2.8, which I still hope we don't do. I'd > rather see all that pent up energy focussed on doing whatever we can to help > people transition to Python 3. > Though one might ironically suggest that sticking to the policy actually represents a change in policy :) regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From alexandre at peadrop.com Thu Jun 10 03:10:31 2010 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 9 Jun 2010 18:10:31 -0700 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <4C0FF85E.9080203@v.loewis.de> References: <4C0FF85E.9080203@v.loewis.de> Message-ID: On Wed, Jun 9, 2010 at 1:23 PM, "Martin v. L?wis" wrote: > Closing the backport requests is fine. For the feature requests, I'd only > close them *after* the 2.7 release (after determining that they won't apply > to 3.x, of course). > > There aren't that many backport requests, anyway, are there? > There is only a few requests (about five). -- Alexandre From alexandre at peadrop.com Thu Jun 10 03:13:32 2010 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 9 Jun 2010 18:13:32 -0700 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <1276064788.2227.122.camel@thinko> Message-ID: On Wed, Jun 9, 2010 at 5:55 AM, Facundo Batista wrote: > Yes, closing the tickets as "won't fix" and tagging them as > "will-never-happen-in-2.x" or something, is the best combination of > both worlds: it will clean the tracker and ease further developments, > and will allow anybody to pick up those tickets later. > The issue I care about are already tagged as 26backport. So, I don't think another keyword is needed. -- Alexandre From orsenthil at gmail.com Thu Jun 10 08:48:05 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Thu, 10 Jun 2010 12:18:05 +0530 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <4C0FF85E.9080203@v.loewis.de> Message-ID: On Thu, Jun 10, 2010 at 6:40 AM, Alexandre Vassalotti wrote: > On Wed, Jun 9, 2010 at 1:23 PM, "Martin v. L?wis" wrote: >> Closing the backport requests is fine. For the feature requests, I'd only >> close them *after* the 2.7 release (after determining that they won't apply >> to 3.x, of course). >> >> There aren't that many backport requests, anyway, are there? >> > > There is only a few requests (about five) I get your point. It is the 'back-ports' that you have tagged. These were designed for 3.x and implemented in 3.x in the first place. I was concerned that there will be policy drawn or a practice that will close any/every existing Feature Request in Python 2.7. There are some cases (in stdlib) which can debated on the lines of feature request vs bug-fix and those will get hurt in the process. Thanks, Senthil From stephen at xemacs.org Thu Jun 10 08:59:48 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 10 Jun 2010 15:59:48 +0900 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <20100609215659.0ea27cde@pitrou.net> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <20100609133549.578157ed@pitrou.net> <4C0F7D45.4060706@voidspace.org.uk> <1276083650.3143.1.camel@localhost.localdomain> <20100609141748.733d3e94@pitrou.net> <20100609215659.0ea27cde@pitrou.net> Message-ID: <19472.36196.673114.398905@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > In which cases is this true? Hex is rarely used for ASCII-encoding of > binary data, precisely because its efficiency is poor. MIME quoted-printable, URL-quoting, and XBM come to mind. From baptiste13z at free.fr Thu Jun 10 12:27:33 2010 From: baptiste13z at free.fr (Baptiste Carvello) Date: Thu, 10 Jun 2010 12:27:33 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <201006090153.14190.victor.stinner@haypocalc.com> References: <201006090153.14190.victor.stinner@haypocalc.com> Message-ID: Victor Stinner a ?crit : > > I suppose that each codec will have a different list of accepted input and > output types. Example: > > bz2: encode:bytes->bytes, decode:bytes->bytes > rot13: encode:str->str, decode:str->str > hex: encode:bytes->str, decode: str->bytes A user point of view: please NO. This might be more consistent with the semantics, but it forces users to scratch their head each time to find out which types are involved. I'd rather all methods take and return the same types, independant of codec, that is: .encode : str->bytes .decode : bytes->str .(un)transform : same type, str->str or bytes->bytes All other uses can be trivially done with .encode('ascii')/.decode('ascii'). Changing the type of *ascii* text is easy, understanding bytes vs str semantics is not! Cheers, B. From walter at livinglogic.de Thu Jun 10 12:30:01 2010 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Thu, 10 Jun 2010 12:30:01 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C0F8D5A.8010706@gmail.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <201006091418.44680.victor.stinner@haypocalc.com> <4C0F8D5A.8010706@gmail.com> Message-ID: <4C10BEA9.4090704@livinglogic.de> On 09.06.10 14:47, Nick Coghlan wrote: > On 09/06/10 22:18, Victor Stinner wrote: >> Le mercredi 09 juin 2010 10:41:29, M.-A. Lemburg a ?crit : >>> No, .transform() and .untransform() will be interface to same-type >>> codecs, i.e. ones that convert bytes to bytes or str to str. As with >>> .encode()/.decode() these helper methods also implement type safety >>> of the return type. >> >> What about buffer compatible objects like array.array(), memoryview(), etc.? >> Should we use codecs.encode() / codecs.decode() for these types? > > There are probably enough subtleties that this is all worth specifying > in a PEP: > > - which codecs from 2.x are to be restored > - the domain each codec operates in (binary data or text)* > - review behaviour of codecs.encode and codecs.decode > - behaviour of the new str, bytes and bytearray (un)transform methods > - whether to add helper methods for reverse codecs (like base64) > > The PEP would also serve as a reference back to both this discussion and > the previous one (which was long enough ago that I've forgotten most of it). I too think that a PEP is required here. Codecs support several types of error handling that don't make sense for transform()/untransform(). What should 'abc'.decode('hex', 'replace') do? (In 2.6 it raises an assertion error, because errors *must* be strict). I think we should takt this opportunity to implement transform/untransform without being burdened with features we inherited from codecs which don't make sense for transform/untransform. > [...] Servus, Walter From mal at egenix.com Thu Jun 10 13:08:26 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 10 Jun 2010 13:08:26 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C10BEA9.4090704@livinglogic.de> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <201006091418.44680.victor.stinner@haypocalc.com> <4C0F8D5A.8010706@gmail.com> <4C10BEA9.4090704@livinglogic.de> Message-ID: <4C10C7AA.9030300@egenix.com> Walter D?rwald wrote: > On 09.06.10 14:47, Nick Coghlan wrote: > >> On 09/06/10 22:18, Victor Stinner wrote: >>> Le mercredi 09 juin 2010 10:41:29, M.-A. Lemburg a ?crit : >>>> No, .transform() and .untransform() will be interface to same-type >>>> codecs, i.e. ones that convert bytes to bytes or str to str. As with >>>> .encode()/.decode() these helper methods also implement type safety >>>> of the return type. >>> >>> What about buffer compatible objects like array.array(), memoryview(), etc.? >>> Should we use codecs.encode() / codecs.decode() for these types? >> >> There are probably enough subtleties that this is all worth specifying >> in a PEP: >> >> - which codecs from 2.x are to be restored >> - the domain each codec operates in (binary data or text)* >> - review behaviour of codecs.encode and codecs.decode >> - behaviour of the new str, bytes and bytearray (un)transform methods >> - whether to add helper methods for reverse codecs (like base64) >> >> The PEP would also serve as a reference back to both this discussion and >> the previous one (which was long enough ago that I've forgotten most of it). > > I too think that a PEP is required here. Fair enough. I'll write a PEP. > Codecs support several types of error handling that don't make sense for > transform()/untransform(). What should 'abc'.decode('hex', 'replace') > do? (In 2.6 it raises an assertion error, because errors *must* be strict). That's not really an issue since codecs don't have to implement all error handling schemes. For starters, they will all only implement 'strict' mode. > I think we should takt this opportunity to implement > transform/untransform without being burdened with features we inherited > from codecs which don't make sense for transform/untransform. Not sure what you mean here. Those methods are just helper methods which interface to the codec system and provide return type safety. Nothing more or less. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 10 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 38 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From victor.stinner at haypocalc.com Thu Jun 10 14:16:46 2010 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 10 Jun 2010 14:16:46 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C10BEA9.4090704@livinglogic.de> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F8D5A.8010706@gmail.com> <4C10BEA9.4090704@livinglogic.de> Message-ID: <201006101416.46500.victor.stinner@haypocalc.com> Le jeudi 10 juin 2010 12:30:01, Walter D?rwald a ?crit : > Codecs support several types of error handling that don't make sense for > transform()/untransform(). What should 'abc'.decode('hex', 'replace') > do? You mean 'abc'.transform('hex', 'replace'), right? Error handler is useful for encoding codecs (the input type is different than the output type), but I don't see how it can used with hex, rot13, bz2, ... (we decided that .transform() and .untransform() will use the same input and output types). Even if bz2+xmlcharref can be something funny :-) .transform() and .untransform() should have only one argument. (If you would really like to play with the error handler, you can still use codecs.encode(name, errors) and codecs.decode(name, errors).) .transform() and .untransform() have to be simple. If you want to control the codec, why not using directly the real API? Examples: - base64.b64encode() has an optional altchars argument - bz2.compress() has an optional compresslevel argument - etc. I don't see how altchars or compresslevel can be added to .transform() / .untransform(). (**kw would be something really ugly.) > (In 2.6 it raises an assertion error, because errors *must* be strict) hex, bz2, rot13, ... codecs should also raise an error if errors is not "strict" (or None which means "strict") in Python3. -- Victor Stinner http://www.haypocalc.com/ From rdmurray at bitdance.com Thu Jun 10 14:18:08 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Thu, 10 Jun 2010 08:18:08 -0400 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: References: <201006090153.14190.victor.stinner@haypocalc.com> Message-ID: <20100610121808.7D8901FCB52@kimball.webabinitio.net> On Thu, 10 Jun 2010 12:27:33 +0200, Baptiste Carvello wrote: > Victor Stinner wrote: > > > I suppose that each codec will have a different list of accepted input and > > output types. Example: > > > bz2: encode:bytes->bytes, decode:bytes->bytes > > rot13: encode:str->str, decode:str->str > > hex: encode:bytes->str, decode: str->bytes > > A user point of view: please NO. > > This might be more consistent with the semantics, but it forces users to sc= > ratch = > > their head each time to find out which types are involved. I'd rather all = > > methods take and return the same types, independant of codec, that is: > > .encode : str->bytes > .decode : bytes->str > .(un)transform : same type, str->str or bytes->bytes > > All other uses can be trivially done with .encode('ascii')/.decode('ascii'). > > Changing the type of *ascii* text is easy, understanding bytes vs str semantics is not! +1 Consistency in interface is more important in *this* context than the sensibleness of any particular transform. -- R. David Murray www.bitdance.com From barry at python.org Thu Jun 10 19:51:49 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 10 Jun 2010 13:51:49 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: <4C10394C.10707@holdenweb.com> References: <1276064788.2227.122.camel@thinko> <4C0F91AF.1000401@voidspace.org.uk> <4C0FA854.1080400@egenix.com> <20100609111238.7c017907@heresy> <19370.1276100000@parc.com> <20100609123223.27838ab4@heresy> <4C10394C.10707@holdenweb.com> Message-ID: <20100610135149.50b3d15a@heresy> On Jun 10, 2010, at 09:01 AM, Steve Holden wrote: >The current stumbling block isn't the language itself, it's the lack of >support from third-party libraries. GSoC is addressing some of these >issues, but so far we (the PSF, the dev community, anybody else except >R. David Murray) haven't really come to grips with intractable problems >like the broken state of the email package, and we are not doing well at >attracting funds to support it. > >So I think we need to address a larger issue than just the language. As >a development community we decided to change the language. Now we have >to do what we can to ensure that the changed language has appropriate >support. This is exactly my point - I totally agree. Let's take all that pent up energy and apply it to porting important libraries to Python 3. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From tjreedy at udel.edu Thu Jun 10 21:25:33 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 Jun 2010 15:25:33 -0400 Subject: [Python-Dev] Future of 2.x. In-Reply-To: References: <4C0FF85E.9080203@v.loewis.de> Message-ID: On 6/10/2010 2:48 AM, Senthil Kumaran wrote: > On Thu, Jun 10, 2010 at 6:40 AM, Alexandre Vassalotti > wrote: >> On Wed, Jun 9, 2010 at 1:23 PM, "Martin v. L?wis" wrote: >>> Closing the backport requests is fine. For the feature requests, I'd only >>> close them *after* the 2.7 release (after determining that they won't apply >>> to 3.x, of course). >>> >>> There aren't that many backport requests, anyway, are there? >>> >> >> There is only a few requests (about five) > > I get your point. It is the 'back-ports' that you have tagged. Right, things already in 3.x. > These > were designed for 3.x and implemented in 3.x in the first place. > I was concerned that there will be policy drawn or a practice that > will close any/every existing Feature Request in Python 2.7. > There are some cases (in stdlib) which can debated on the lines of > feature request vs bug-fix and those will get hurt in the process. I have started going through old open issues tagged with 2.5. Many are unclassified. Those that are feature requests that are *plausible* for 3.2 I am marking as such and retagging for 3.2, *not* closing. (I am also marking bug reports as such and asking the OP to test in 2.6/7 and maybe 3.1 if I cannot easily do so.) Ideally, all core/stdlib feature requests should be classified as such and tagged for 3.2 or even 3.3) only. Terry Jan Reedy From tjreedy at udel.edu Thu Jun 10 21:31:58 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 Jun 2010 15:31:58 -0400 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: <4C10C7AA.9030300@egenix.com> References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <201006091418.44680.victor.stinner@haypocalc.com> <4C0F8D5A.8010706@gmail.com> <4C10BEA9.4090704@livinglogic.de> <4C10C7AA.9030300@egenix.com> Message-ID: On 6/10/2010 7:08 AM, M.-A. Lemburg wrote: > Walter D?rwald wrote: >>> The PEP would also serve as a reference back to both this discussion and >>> the previous one (which was long enough ago that I've forgotten most of it). >> >> I too think that a PEP is required here. > > Fair enough. I'll write a PEP. Thank you from me. > >> Codecs support several types of error handling that don't make sense for >> transform()/untransform(). What should 'abc'.decode('hex', 'replace') >> do? (In 2.6 it raises an assertion error, because errors *must* be strict). I would expext either ValueError: errors arg must be 'strict' for trransform or else TypeError: tranform takes 1 arg, 2 given. > That's not really an issue since codecs don't have to implement > all error handling schemes. > > For starters, they will all only implement 'strict' mode. Terry Jan Reedy From walter at livinglogic.de Fri Jun 11 13:34:37 2010 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Fri, 11 Jun 2010 13:34:37 +0200 Subject: [Python-Dev] Reintroduce or drop completly hex, bz2, rot13, ... codecs In-Reply-To: References: <201006090153.14190.victor.stinner@haypocalc.com> <4C0F53B9.2020302@egenix.com> <201006091418.44680.victor.stinner@haypocalc.com> <4C0F8D5A.8010706@gmail.com> <4C10BEA9.4090704@livinglogic.de> <4C10C7AA.9030300@egenix.com> Message-ID: <4C121F4D.5020206@livinglogic.de> On 10.06.10 21:31, Terry Reedy wrote: > On 6/10/2010 7:08 AM, M.-A. Lemburg wrote: >> Walter D?rwald wrote: > >>>> The PEP would also serve as a reference back to both this discussion and >>>> the previous one (which was long enough ago that I've forgotten most of it). >>> >>> I too think that a PEP is required here. >> >> Fair enough. I'll write a PEP. > > Thank you from me. >> >>> Codecs support several types of error handling that don't make sense for >>> transform()/untransform(). What should 'abc'.decode('hex', 'replace') >>> do? (In 2.6 it raises an assertion error, because errors *must* be strict). > > I would expext either ValueError: errors arg must be 'strict' for > trransform What use is an argument that must always have the same value? 'abc'.transform('hex', errors='strict', obey_the_flufl=True) > or else TypeError: tranform takes 1 arg, 2 given. IMHO that's the better option. >> That's not really an issue since codecs don't have to implement >> all error handling schemes. >> >> For starters, they will all only implement 'strict' mode. I would prefer it if transformers were separate from codecs and had their own registry. Servus, Walter From status at bugs.python.org Fri Jun 11 18:07:44 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 11 Jun 2010 18:07:44 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20100611160744.AF7E57816D@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-06-04 - 2010-06-11) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2764 open (+54) / 18028 closed (+23) / 20792 total (+77) Open issues with patches: 1115 Average duration of open issues: 714 days. Median duration of open issues: 502 days. Open Issues Breakdown open 2737 (+54) languishing 12 ( +0) pending 14 ( +0) Issues Created Or Reopened (79) _______________________________ datetime.datetime operator methods are not subclass-friendly 2010-06-09 http://bugs.python.org/issue2267 reopened belopolsky patch DeprecationWarning message applies to wrong context with exec( 2010-06-10 http://bugs.python.org/issue3423 reopened ghazel sunau bytes / str TypeError in Py3k 2010-06-04 CLOSED http://bugs.python.org/issue8897 created tjollans patch The email package should defer to the codecs module for all al 2010-06-04 http://bugs.python.org/issue8898 created r.david.murray easy Add docstrings to time.struct_time 2010-06-04 CLOSED http://bugs.python.org/issue8899 created belopolsky patch, easy IDLE crashes if Preference set to At Startup -> Open Edit Wind 2010-06-04 http://bugs.python.org/issue8900 created mhuster Windows registry path not ignored with -E option 2010-06-05 http://bugs.python.org/issue8901 created flashk patch, needs review add datetime.time.now() for consistency 2010-06-05 http://bugs.python.org/issue8902 created techtonik Add module level now() and today() functions to datetime modul 2010-06-05 CLOSED http://bugs.python.org/issue8903 created techtonik quick example how to fix docs 2010-06-05 http://bugs.python.org/issue8904 created techtonik difflib should accept arbitrary line iterators 2010-06-05 http://bugs.python.org/issue8905 created techtonik Document TestCase attributes in class docstring 2010-06-05 http://bugs.python.org/issue8906 created flub time module documentation differs in trunk and py3k 2010-06-05 CLOSED http://bugs.python.org/issue8907 created belopolsky patch friendly errors for UAC misbehavior in windows installers 2010-06-05 http://bugs.python.org/issue8908 created techtonik patch mention bitmap size for bdist_wininst 2010-06-05 CLOSED http://bugs.python.org/issue8909 created techtonik patch Write a text file explaining why Lib/test/data exists 2010-06-06 http://bugs.python.org/issue8910 created brett.cannon patch, easy, needs review regrtest.main should have a test skipping argument 2010-06-06 http://bugs.python.org/issue8911 created brett.cannon easy `make patchcheck` should check the whitespace of .c/.h files 2010-06-06 http://bugs.python.org/issue8912 created brett.cannon Document that datetime.__format__ is datetime.strftime 2010-06-06 http://bugs.python.org/issue8913 created brett.cannon easy Run clang's static analyzer 2010-06-06 http://bugs.python.org/issue8914 created brett.cannon Use locale.nl_langinfo in _strptime 2010-06-06 http://bugs.python.org/issue8915 created brett.cannon Move PEP 362 (function signature objects) into inspect 2010-06-06 http://bugs.python.org/issue8916 created brett.cannon Segmentation error happens in Embedding Python. 2010-06-06 http://bugs.python.org/issue8917 created tanaga distutils test failure on solaris: IOError: [Errno 2] No such 2010-06-06 http://bugs.python.org/issue8918 created srid python should read ~/.pythonrc.py by default 2010-06-06 http://bugs.python.org/issue8919 created lesmana PYTHONSTARTUP should expand "~" 2010-06-06 http://bugs.python.org/issue8920 created lesmana 2.7rc1: test_ttk failures on OSX 10.4 2010-06-06 http://bugs.python.org/issue8921 created srid Improve encoding shortcuts in PyUnicode_AsEncodedString() 2010-06-06 CLOSED http://bugs.python.org/issue8922 created haypo patch Remove unused "errors" argument from _PyUnicode_AsDefaultEncod 2010-06-06 http://bugs.python.org/issue8923 created haypo patch Error in error message in logging 2010-06-06 http://bugs.python.org/issue8924 created PeterL Improve c-api/arg.rst: use "bytes" or "str" types instead of " 2010-06-06 CLOSED http://bugs.python.org/issue8925 created haypo patch getargs.c: release the buffer on error 2010-06-06 http://bugs.python.org/issue8926 created haypo patch Cannot handle complex requirement resolution 2010-06-06 http://bugs.python.org/issue8927 created dabrahams wininst: could not create key 2010-06-06 CLOSED http://bugs.python.org/issue8928 created techtonik wininst: msvcr90 dependency in x64 build 2010-06-06 CLOSED http://bugs.python.org/issue8929 created techtonik messed up formatting after reindenting 2010-06-06 http://bugs.python.org/issue8930 created benjamin.peterson '#' has no affect with 'c' type 2010-06-06 http://bugs.python.org/issue8931 created benjamin.peterson test_capi fails --without-threads 2010-06-07 CLOSED http://bugs.python.org/issue8932 created skrah patch, buildbot Invalid detection of metadata version 2010-06-07 http://bugs.python.org/issue8933 created benliles aifc should use str instead of bytes (wave, sunau compatibilit 2010-06-07 http://bugs.python.org/issue8934 created tjollans patch Syntax error in os.py 2010-06-08 CLOSED http://bugs.python.org/issue8935 created sklein webbrowser regression on windows 2010-06-08 http://bugs.python.org/issue8936 created techtonik SimpleHTTPServer should contain usage example 2010-06-08 http://bugs.python.org/issue8937 created techtonik Mac OS dialogs(Save As..., Load) translation 2010-06-08 http://bugs.python.org/issue8938 created Pavel.Denisow Use C type names (PyUnicode etc;) in the C API docs 2010-06-08 http://bugs.python.org/issue8939 created pitrou patch *HTTPServer need a summary page with API inheritance table 2010-06-08 http://bugs.python.org/issue8940 created techtonik utf-32be codec failing on UCS-2 python build for 32-bit value 2010-06-08 http://bugs.python.org/issue8941 created opstad patch __path__ attribute of modules loaded by zipimporter is unteste 2010-06-08 http://bugs.python.org/issue8942 created exarkun Bug in InteractiveConsole 2010-06-08 http://bugs.python.org/issue8943 created fabioz test_winreg.test_reflection_functions fails on Windows Server 2010-06-08 http://bugs.python.org/issue8944 created brian.curtin Bug in **kwds expansion on call? 2010-06-08 CLOSED http://bugs.python.org/issue8945 created tjreedy PyBuffer_Release signature in 3.1 documentation is incorrect 2010-06-08 CLOSED http://bugs.python.org/issue8946 created opstad Provide as_integer_ratio() method to Decimal 2010-06-08 http://bugs.python.org/issue8947 created belopolsky patch cleanup functions are not executed with unittest.TestCase.debu 2010-06-08 CLOSED http://bugs.python.org/issue8948 created michael.foord PyArg_Parse*(): "z" should not accept bytes 2010-06-08 http://bugs.python.org/issue8949 created haypo patch In getargs.c, make 'L' code raise TypeError for float argument 2010-06-08 CLOSED http://bugs.python.org/issue8950 created mark.dickinson patch PyArg_Parse*(): factorize code of 's' and 'z' formats, and 'u' 2010-06-08 http://bugs.python.org/issue8951 created haypo patch Doc/c-api/arg.rst: fix documentation of number formats 2010-06-09 http://bugs.python.org/issue8952 created haypo Syntax error in http://docs.python.org/library/decimal.html#re 2010-06-09 CLOSED http://bugs.python.org/issue8953 created Jean.Jordaan wininst regression: errors when building on linux 2010-06-09 http://bugs.python.org/issue8954 created techtonik import doesn't notice changes to working directory 2010-06-09 CLOSED http://bugs.python.org/issue8955 created purpleidea Incorrect ValueError message for subprocess.Popen.send_signal( 2010-06-09 http://bugs.python.org/issue8956 created giampaolo.rodola strptime('%c', ..) fails to parse output of strftime('%c', ..) 2010-06-09 http://bugs.python.org/issue8957 created belopolsky 2.7rc1 tarfile.py: `bltn_open(targetpath, "wb")` -> IOError: I 2010-06-09 CLOSED http://bugs.python.org/issue8958 created srid WINFUNCTYPE wrapped ctypes callbacks not functioning correctly 2010-06-10 http://bugs.python.org/issue8959 created mdcurran 2.6 README 2010-06-10 http://bugs.python.org/issue8960 created vojta.rylko compile Python-2.7rc1 on AIX 5.3 with xlc_r 2010-06-10 CLOSED http://bugs.python.org/issue8961 created tgulacsi IOError: [Errno 13] permission denied 2010-06-10 CLOSED http://bugs.python.org/issue8962 created Caitlin.Kavanaugh test_urllibnet failure 2010-06-10 http://bugs.python.org/issue8963 created pitrou patch Method _sys_version() module Lib\platform.py does parse correc 2010-06-10 http://bugs.python.org/issue8964 created fredericaltorres test_imp fails on OSX when LANG is set 2010-06-10 CLOSED http://bugs.python.org/issue8965 created belopolsky patch ctypes: remove implicit conversion between unicode and bytes 2010-06-10 http://bugs.python.org/issue8966 created haypo patch Create PyErr_GetWindowsMessage() function 2010-06-11 http://bugs.python.org/issue8967 created haypo patch token type constants are not documented 2010-06-11 http://bugs.python.org/issue8968 created isandler Windows: use (mbcs in) strict mode to encode/decode filenames, 2010-06-11 http://bugs.python.org/issue8969 created haypo patch Tkinter Litmus Test 2010-06-11 CLOSED http://bugs.python.org/issue8970 created rantingrick Tkinter Litmus Test 2010-06-11 CLOSED http://bugs.python.org/issue8971 created rantingrick subprocess.list2cmdline doesn't quote the & character 2010-06-11 http://bugs.python.org/issue8972 created shypike Inconsistent docstrings in struct module 2010-06-11 http://bugs.python.org/issue8973 created belopolsky Issues Now Closed (46) ______________________ Confusing error message when dividing timedelta using / 1008 days http://bugs.python.org/issue1083 belopolsky patch "[Errno 11] Resource temporarily unavailable" while using trac 676 days http://bugs.python.org/issue3494 tjreedy email.generator.Generator object bytes/str crash - b64encode() 522 days http://bugs.python.org/issue4768 r.david.murray patch msgfmt.py does not work with plural form 452 days http://bugs.python.org/issue5464 loewis tools\msi\merge.py is sensitive to lack of config.py 451 days http://bugs.python.org/issue5467 loewis patch CVE-2008-5983 python: untrusted python modules search path 423 days http://bugs.python.org/issue5753 akuchling patch httplib fails with HEAD requests to pages with "transfer-encod 9 days http://bugs.python.org/issue6312 orsenthil patch Tkinter import fails when running Python.exe from a network sh 327 days http://bugs.python.org/issue6470 loewis patch, needs review IDLE (python 3.1.1) syntax coloring for b'bytestring' and u'un 230 days http://bugs.python.org/issue7166 taleinat easy raw_input should encode unicode prompt with std.stdout.encodin 136 days http://bugs.python.org/issue7768 naoki [patch] convenience links for subprocess.call() 82 days http://bugs.python.org/issue8151 georg.brandl patch Unified hash for numeric types. 83 days http://bugs.python.org/issue8188 mark.dickinson patch SkipTest exception in setUpClass or setUpModule is marked as a 63 days http://bugs.python.org/issue8302 michael.foord Suppress large diffs in unitttest.TestCase.assertSequenceEqual 58 days http://bugs.python.org/issue8351 michael.foord patch automate minidom.unlink() with a context manager 13 days http://bugs.python.org/issue8832 merwok patch, patch, easy, needs review PyArg_ParseTuple(): remove "t# format 13 days http://bugs.python.org/issue8839 lemburg patch Deprecate or remove "U" and "U#" formats of Py_BuildValue() 10 days http://bugs.python.org/issue8848 haypo patch multiprocessing: undefined struct/union member: msg_control 4 days http://bugs.python.org/issue8864 loewis --user-access-control=force produces invalid installer on Vist 9 days http://bugs.python.org/issue8870 r.david.murray XML-RPC improvement is described twice. 5 days http://bugs.python.org/issue8875 akuchling newline vs. newlines in io module 0 days http://bugs.python.org/issue8895 r.david.murray sunau bytes / str TypeError in Py3k 3 days http://bugs.python.org/issue8897 haypo patch Add docstrings to time.struct_time 1 days http://bugs.python.org/issue8899 belopolsky patch, easy Add module level now() and today() functions to datetime modul 6 days http://bugs.python.org/issue8903 rhettinger time module documentation differs in trunk and py3k 3 days http://bugs.python.org/issue8907 belopolsky patch mention bitmap size for bdist_wininst 1 days http://bugs.python.org/issue8909 tarek patch Improve encoding shortcuts in PyUnicode_AsEncodedString() 4 days http://bugs.python.org/issue8922 haypo patch Improve c-api/arg.rst: use "bytes" or "str" types instead of " 1 days http://bugs.python.org/issue8925 haypo patch wininst: could not create key 0 days http://bugs.python.org/issue8928 tarek wininst: msvcr90 dependency in x64 build 0 days http://bugs.python.org/issue8929 loewis test_capi fails --without-threads 2 days http://bugs.python.org/issue8932 skrah patch, buildbot Syntax error in os.py 0 days http://bugs.python.org/issue8935 ezio.melotti Bug in **kwds expansion on call? 0 days http://bugs.python.org/issue8945 rhettinger PyBuffer_Release signature in 3.1 documentation is incorrect 0 days http://bugs.python.org/issue8946 brian.curtin cleanup functions are not executed with unittest.TestCase.debu 2 days http://bugs.python.org/issue8948 michael.foord In getargs.c, make 'L' code raise TypeError for float argument 2 days http://bugs.python.org/issue8950 mark.dickinson patch Syntax error in http://docs.python.org/library/decimal.html#re 0 days http://bugs.python.org/issue8953 brian.curtin import doesn't notice changes to working directory 0 days http://bugs.python.org/issue8955 r.david.murray 2.7rc1 tarfile.py: `bltn_open(targetpath, "wb")` -> IOError: I 2 days http://bugs.python.org/issue8958 lars.gustaebel compile Python-2.7rc1 on AIX 5.3 with xlc_r 1 days http://bugs.python.org/issue8961 tgulacsi IOError: [Errno 13] permission denied 0 days http://bugs.python.org/issue8962 mark.dickinson test_imp fails on OSX when LANG is set 0 days http://bugs.python.org/issue8965 belopolsky patch Tkinter Litmus Test 0 days http://bugs.python.org/issue8970 merwok Tkinter Litmus Test 0 days http://bugs.python.org/issue8971 r.david.murray Installing w/o admin generates key error 2840 days http://bugs.python.org/issue600952 tarek xmlrpclib can no longer marshal Fault objects 1086 days http://bugs.python.org/issue1739842 tjreedy Top Issues Most Discussed (10) ______________________________ 18 test_urllibnet failure 1 days open http://bugs.python.org/issue8963 18 Add pure Python implementation of datetime module to CPython 109 days open http://bugs.python.org/issue7989 18 datetime lacks concrete tzinfo impl. for UTC 499 days open http://bugs.python.org/issue5094 11 crash appending list and namedtuple 14 days open http://bugs.python.org/issue8847 11 tarfile/Windows: Don't use mbcs as the default encoding 21 days open http://bugs.python.org/issue8784 9 test_imp fails on OSX when LANG is set 0 days closed http://bugs.python.org/issue8965 8 Improve c-api/arg.rst: use "bytes" or "str" types instead of "s 1 days closed http://bugs.python.org/issue8925 8 Popen should raise ValueError if pass a string when shell=False 129 days open http://bugs.python.org/issue7839 7 Use C type names (PyUnicode etc;) in the C API docs 3 days open http://bugs.python.org/issue8939 7 Improve encoding shortcuts in PyUnicode_AsEncodedString() 4 days closed http://bugs.python.org/issue8922 From brett at python.org Sat Jun 12 02:35:22 2010 From: brett at python.org (Brett Cannon) Date: Fri, 11 Jun 2010 17:35:22 -0700 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? Message-ID: The logging module taught me something today about the difference of a function defined in C and a function defined in Python:: import importlib class Base: def imp(self, name): return self.import_(name) class CVersion(Base): import_ = __import__ class PyVersion(Base): import_ = importlib.__import__ CFunction().imp('tokenize') PyFunction().imp('tokenize') # Fails! Turns out the use of __import__ works while the importlib version fails. Why does importlib fail? Because the first argument to the importlib.__import__ function is an instance of PyVersion, not a string. And yet the __import__ version works as if the self argument is never passed to it! This "magical" ignoring of self seems to extend to any PyCFunction. Is this dichotomy intentional or just a "fluke"? Maybe this is a hold-over from before we had descriptors and staticmethod, but now that we have these things perhaps this difference should go away. From benjamin at python.org Sat Jun 12 02:41:52 2010 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 11 Jun 2010 19:41:52 -0500 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: Message-ID: 2010/6/11 Brett Cannon : > This "magical" ignoring of self seems to extend to any PyCFunction. Is > this dichotomy intentional or just a "fluke"? Maybe this is a > hold-over from before we had descriptors and staticmethod, but now > that we have these things perhaps this difference should go away. There are several open feature requests about this. It is merely because PyCFunction does not implement __get__. -- Regards, Benjamin From guido at python.org Sat Jun 12 03:30:36 2010 From: guido at python.org (Guido van Rossum) Date: Fri, 11 Jun 2010 18:30:36 -0700 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: Message-ID: On Fri, Jun 11, 2010 at 5:41 PM, Benjamin Peterson wrote: > 2010/6/11 Brett Cannon : >> This "magical" ignoring of self seems to extend to any PyCFunction. Is >> this dichotomy intentional or just a "fluke"? Maybe this is a >> hold-over from before we had descriptors and staticmethod, but now >> that we have these things perhaps this difference should go away. > > There are several open feature requests about this. It is merely > because PyCFunction does not implement __get__. Yeah, but this of course is because before descriptors only Python functions were special-cased as methods, and there was known code that depended on this. I'm sure there's even more code that depends on this today (because there is just more code, period :-). Maybe we could offer a decorator that adds a __get__ to a PyCFunction though. -- --Guido van Rossum (python.org/~guido) From brett at python.org Sat Jun 12 03:57:05 2010 From: brett at python.org (Brett Cannon) Date: Fri, 11 Jun 2010 18:57:05 -0700 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: Message-ID: On Fri, Jun 11, 2010 at 18:30, Guido van Rossum wrote: > On Fri, Jun 11, 2010 at 5:41 PM, Benjamin Peterson wrote: >> 2010/6/11 Brett Cannon : >>> This "magical" ignoring of self seems to extend to any PyCFunction. Is >>> this dichotomy intentional or just a "fluke"? Maybe this is a >>> hold-over from before we had descriptors and staticmethod, but now >>> that we have these things perhaps this difference should go away. >> >> There are several open feature requests about this. It is merely >> because PyCFunction does not implement __get__. > > Yeah, but this of course is because before descriptors only Python > functions were special-cased as methods, and there was known code that > depended on this. I'm sure there's even more code that depends on this > today (because there is just more code, period :-). > > Maybe we could offer a decorator that adds a __get__ to a PyCFunction though. Well, staticmethod seems to work just as well. I'm going to make this my first request for what to change in Py4K. =) -Brett From pjenvey at underboss.org Sat Jun 12 19:19:07 2010 From: pjenvey at underboss.org (Philip Jenvey) Date: Sat, 12 Jun 2010 10:19:07 -0700 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: Message-ID: On Jun 11, 2010, at 6:57 PM, Brett Cannon wrote: > On Fri, Jun 11, 2010 at 18:30, Guido van Rossum wrote: >> On Fri, Jun 11, 2010 at 5:41 PM, Benjamin Peterson wrote: >>> 2010/6/11 Brett Cannon : >>>> This "magical" ignoring of self seems to extend to any PyCFunction. Is >>>> this dichotomy intentional or just a "fluke"? Maybe this is a >>>> hold-over from before we had descriptors and staticmethod, but now >>>> that we have these things perhaps this difference should go away. >>> >>> There are several open feature requests about this. It is merely >>> because PyCFunction does not implement __get__. >> >> Yeah, but this of course is because before descriptors only Python >> functions were special-cased as methods, and there was known code that >> depended on this. I'm sure there's even more code that depends on this >> today (because there is just more code, period :-). >> >> Maybe we could offer a decorator that adds a __get__ to a PyCFunction though. > > Well, staticmethod seems to work just as well. > > I'm going to make this my first request for what to change in Py4K. =) +1 on changing this, it's annoying for alternate implementations. They oftentimes implement functions in pure Python whereas user code might be expecting the PYCFunction behavior. Jython's had a couple cases of this incompatibility reported. It's a rare occurrence but it's very mysterious to the user when it happens. -- Philip Jenvey From guido at python.org Sat Jun 12 21:59:54 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Jun 2010 12:59:54 -0700 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: Message-ID: On Sat, Jun 12, 2010 at 10:19 AM, Philip Jenvey wrote: > +1 on changing this, it's annoying for alternate implementations. They oftentimes implement functions in pure Python whereas user code might be expecting the PYCFunction behavior. > > Jython's had a couple cases of this incompatibility reported. It's a rare occurrence but it's very mysterious to the user when it happens. Well, yeah, but you're presenting an argument *against* changing this -- existing code will break if it is changed. I can think of only way out without just breaking such code: Start issuing warnings when a bare PyCFunction exists at the class level, and introduce/recommend decorators that can be used to disambiguate the two possible intended meanings. As Brett says, f = staticmethod(func) will work to insist on the existing PyCFunction semantics. We should also introduce a new one decorator that treats any callable the same way as pure-Python functions work today: bind the instance to the first argument when it is called on an instance. I can't think of a good name for that one right now, but we'll think of one. I wish the warning could happen at class definition time, but I expect that there are use cases where the warning is unnecessary (because the code happens to be structured so as to never call it through the instance) or even wrong (who knows what introspection might be thwarted by wrapping something in staticmethod). Perhaps the warning can be done by adding a __get__ method to PyCFunction that issues the warning and then returns the original value. I'm not sure how we'll ever get rid of the warning except in Py4k. -- --Guido van Rossum (python.org/~guido) From fuzzyman at voidspace.org.uk Sat Jun 12 23:17:33 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 12 Jun 2010 22:17:33 +0100 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: Message-ID: <9DD65C5E-7E01-4A1E-B480-588F58782262@voidspace.org.uk> On 12 Jun 2010, at 20:59, Guido van Rossum wrote: > On Sat, Jun 12, 2010 at 10:19 AM, Philip Jenvey > wrote: >> +1 on changing this, it's annoying for alternate implementations. >> They oftentimes implement functions in pure Python whereas user >> code might be expecting the PYCFunction behavior. >> >> Jython's had a couple cases of this incompatibility reported. It's >> a rare occurrence but it's very mysterious to the user when it >> happens. > > Well, yeah, but you're presenting an argument *against* changing this > -- existing code will break if it is changed. > > I can think of only way out without just breaking such code: Start > issuing warnings when a bare PyCFunction exists at the class level, > and introduce/recommend decorators that can be used to disambiguate > the two possible intended meanings. > > As Brett says, f = staticmethod(func) will work to insist on the > existing PyCFunction semantics. We should also introduce a new one > decorator that treats any callable the same way as pure-Python > functions work today: bind the instance to the first argument when it > is called on an instance. I can't think of a good name for that one > right now, but we'll think of one. > method or instancemethod perhaps? Michael > I wish the warning could happen at class definition time, but I expect > that there are use cases where the warning is unnecessary (because the > code happens to be structured so as to never call it through the > instance) or even wrong (who knows what introspection might be > thwarted by wrapping something in staticmethod). Perhaps the warning > can be done by adding a __get__ method to PyCFunction that issues the > warning and then returns the original value. > > I'm not sure how we'll ever get rid of the warning except in Py4k. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk From lists at cheimes.de Sun Jun 13 01:03:44 2010 From: lists at cheimes.de (Christian Heimes) Date: Sun, 13 Jun 2010 01:03:44 +0200 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: <9DD65C5E-7E01-4A1E-B480-588F58782262@voidspace.org.uk> References: <9DD65C5E-7E01-4A1E-B480-588F58782262@voidspace.org.uk> Message-ID: > method or instancemethod perhaps? The necessary code is already in Python 3.0's code base. I've added in in r56469 as requested in my issue http://bugs.python.org/issue1587. It seems we had this very discussion over two and a half year ago. Index: Python/bltinmodule.c =================================================================== --- Python/bltinmodule.c (Revision 81963) +++ Python/bltinmodule.c (Arbeitskopie) @@ -2351,6 +2351,7 @@ SETBUILTIN("frozenset", &PyFrozenSet_Type); SETBUILTIN("property", &PyProperty_Type); SETBUILTIN("int", &PyLong_Type); + SETBUILTIN("instancemethod", &PyInstanceMethod_Type); SETBUILTIN("list", &PyList_Type); SETBUILTIN("map", &PyMap_Type); SETBUILTIN("object", &PyBaseObject_Type); >>> class Example: ... iid = instancemethod(id) ... id = id ... >>> Example().id() Traceback (most recent call last): File "", line 1, in TypeError: id() takes exactly one argument (0 given) >>> Example().iid() 139941157882144 Christian From guido at python.org Sun Jun 13 01:15:08 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Jun 2010 16:15:08 -0700 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: <9DD65C5E-7E01-4A1E-B480-588F58782262@voidspace.org.uk> Message-ID: Hey! No borrowing the time machine! :-) On Sat, Jun 12, 2010 at 4:03 PM, Christian Heimes wrote: >> method or instancemethod perhaps? > > The necessary code is already in Python 3.0's code base. I've added in > in r56469 as requested in my issue http://bugs.python.org/issue1587. It > seems we had this very discussion over two and a half year ago. > > Index: Python/bltinmodule.c > =================================================================== > --- Python/bltinmodule.c ? ? ? ?(Revision 81963) > +++ Python/bltinmodule.c ? ? ? ?(Arbeitskopie) > @@ -2351,6 +2351,7 @@ > ? ? SETBUILTIN("frozenset", ? ? ? ? ? ? &PyFrozenSet_Type); > ? ? SETBUILTIN("property", ? ? ? ? ? ? ?&PyProperty_Type); > ? ? SETBUILTIN("int", ? ? ? ? ? ? ? ? ? &PyLong_Type); > + ? ?SETBUILTIN("instancemethod", ? ? ? ?&PyInstanceMethod_Type); > ? ? SETBUILTIN("list", ? ? ? ? ? ? ? ? ?&PyList_Type); > ? ? SETBUILTIN("map", ? ? ? ? ? ? ? ? ? &PyMap_Type); > ? ? SETBUILTIN("object", ? ? ? ? ? ? ? ?&PyBaseObject_Type); > > >>>> class Example: > ... ? ? iid = instancemethod(id) > ... ? ? id = id > ... >>>> Example().id() > Traceback (most recent call last): > ?File "", line 1, in > TypeError: id() takes exactly one argument (0 given) >>>> Example().iid() > 139941157882144 > > Christian > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Jun 13 01:16:17 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Jun 2010 16:16:17 -0700 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: <9DD65C5E-7E01-4A1E-B480-588F58782262@voidspace.org.uk> Message-ID: (Of course, I'd still like to see the warning, since it's now a portability issue.) On Sat, Jun 12, 2010 at 4:15 PM, Guido van Rossum wrote: > Hey! No borrowing the time machine! :-) > > On Sat, Jun 12, 2010 at 4:03 PM, Christian Heimes wrote: >>> method or instancemethod perhaps? >> >> The necessary code is already in Python 3.0's code base. I've added in >> in r56469 as requested in my issue http://bugs.python.org/issue1587. It >> seems we had this very discussion over two and a half year ago. >> >> Index: Python/bltinmodule.c >> =================================================================== >> --- Python/bltinmodule.c ? ? ? ?(Revision 81963) >> +++ Python/bltinmodule.c ? ? ? ?(Arbeitskopie) >> @@ -2351,6 +2351,7 @@ >> ? ? SETBUILTIN("frozenset", ? ? ? ? ? ? &PyFrozenSet_Type); >> ? ? SETBUILTIN("property", ? ? ? ? ? ? ?&PyProperty_Type); >> ? ? SETBUILTIN("int", ? ? ? ? ? ? ? ? ? &PyLong_Type); >> + ? ?SETBUILTIN("instancemethod", ? ? ? ?&PyInstanceMethod_Type); >> ? ? SETBUILTIN("list", ? ? ? ? ? ? ? ? ?&PyList_Type); >> ? ? SETBUILTIN("map", ? ? ? ? ? ? ? ? ? &PyMap_Type); >> ? ? SETBUILTIN("object", ? ? ? ? ? ? ? ?&PyBaseObject_Type); >> >> >>>>> class Example: >> ... ? ? iid = instancemethod(id) >> ... ? ? id = id >> ... >>>>> Example().id() >> Traceback (most recent call last): >> ?File "", line 1, in >> TypeError: id() takes exactly one argument (0 given) >>>>> Example().iid() >> 139941157882144 >> >> Christian >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >> > > > > -- > --Guido van Rossum (python.org/~guido) > -- --Guido van Rossum (python.org/~guido) From lists at cheimes.de Sun Jun 13 01:22:11 2010 From: lists at cheimes.de (Christian Heimes) Date: Sun, 13 Jun 2010 01:22:11 +0200 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: <9DD65C5E-7E01-4A1E-B480-588F58782262@voidspace.org.uk> Message-ID: <4C1416A3.60004@cheimes.de> Am 13.06.2010 01:15, schrieb Guido van Rossum: > Hey! No borrowing the time machine! :-) Too late, Guido. The keys to the time machine are back at their usual place. You should hide them better next time. ;) Christian From greg.ewing at canterbury.ac.nz Sun Jun 13 02:30:16 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 13 Jun 2010 12:30:16 +1200 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: References: Message-ID: <4C142698.6090209@canterbury.ac.nz> Guido van Rossum wrote: > bind the instance to the first argument when it > is called on an instance. I can't think of a good name for that one > right now, but we'll think of one. dynamicmethod? -- Greg From python-dev at code2develop.com Mon Jun 14 12:29:08 2010 From: python-dev at code2develop.com (F van der Meeren) Date: Mon, 14 Jun 2010 12:29:08 +0200 Subject: [Python-Dev] Static linking with libpython.a Message-ID: Hello, I am trying to figure out, what files to copy with my app so I am able to initialize the python runtime. Where can I find information about this? I am currently targeting Mac OS X 10.5 and above. Thank you, Filip From kristjan at ccpgames.com Mon Jun 14 14:08:02 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Mon, 14 Jun 2010 12:08:02 +0000 Subject: [Python-Dev] debug and release python Message-ID: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> Hello there. I'm sure this has come up before, but here it is again: Python exports a different api in debug mode, depending on whether PYMALLOC_DEBUG and WITH_PYMALLOC are exported. This means that _d.pyd files that are used must have been compiled with a version of python using the same settings for these macros. It is unfortunate that the _PyObject_DebugMalloc() api is exposed to external applications using macros in objimpl.h I would suggest two things: 1) provide dummy or thunking versions of those in builds that don't have PYMALLOC_DEBUG impolemented, that thunk to PyObject_Malloc et al. (This is what we have done at CCP) 2) Remove the _PyObject_DebugMalloc() from the api. It really should be an implementation of in the exposed PyObject_Malloc() functions whether they use debug functionality at all. the _PyObject_DebugCheckAddress and _PyObject_DebugDumpAddress() can be left in place. But exposing this functionality in macros that external moduled compile in, is not good at all. The reason why this is annoying: Some external software comes with proprietary .pyd bindings. When developing my own application, with modified preprocessor definitions (e.g. to turn off PYMALLOC_DEBUG) we find that those externally provided libraries don't work. It takes a fair amount of detective work to find out why exactly linkage fails. The external API really shouldn't change depending on preprocessor definitions. Cheers, K -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Jun 15 00:12:30 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 00:12:30 +0200 Subject: [Python-Dev] debug and release python In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> Message-ID: <4C16A94E.9020101@v.loewis.de> > Some external software comes with proprietary .pyd bindings. Can you please explain what a "proprietary .pyd binding" is? Do you mean they come with extension modules? If so, there is no chance of using them in debug mode, anyway, right? So what specifically is the problem? Regards, Martin From alexander.belopolsky at gmail.com Tue Jun 15 00:45:49 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Jun 2010 18:45:49 -0400 Subject: [Python-Dev] Sharing functions between C extension modules in stdlib Message-ID: I have learned a long time ago that it is not enough to simply declare a function in some header file if you want to define it in one module and use in another. You have to use what now is known as PyCapsule - an array of pointers to C functions wrapped in a Python object. However, while navigating through the time/datetime maze recently I have come across timefuncs.h which seems to share _PyTime_DoubleToTimet between time and datetime modules. I did not expect this to work, but apparently the build machinery somehow knows how to place _PyTime_DoubleToTimet code in both time.so and datetime.so: $ nm build/lib.macosx-10.4-x86_64-3.2-pydebug/datetime.so | grep _PyTime_DoubleToTimet 000000000000f4e2 T __PyTime_DoubleToTimet $ nm build/lib.macosx-10.4-x86_64-3.2-pydebug/time.so | grep _PyTime_DoubleToTimet 0000000000000996 T __PyTime_DoubleToTimet I have two questions: 1) how does this happen; and 2) is this intentional? Thanks. From alexander.belopolsky at gmail.com Tue Jun 15 01:00:19 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Jun 2010 19:00:19 -0400 Subject: [Python-Dev] Sharing functions between C extension modules in stdlib In-Reply-To: References: Message-ID: On Mon, Jun 14, 2010 at 6:45 PM, Alexander Belopolsky wrote: .. > I did not expect this to work, but apparently the build machinery > somehow knows how to place _PyTime_DoubleToTimet code in both time.so > and datetime.so: .. > I have two questions: 1) how does this happen; and 2) is this intentional? > OK, the answer to the first question is simple: in setup.py, we have exts.append( Extension('datetime', ['datetimemodule.c', 'timemodule.c'], libraries=math_libs) ) but if timemodule.c is compiled-in with datetime module, why is does it also need to be imported to share some other code? From martin at v.loewis.de Tue Jun 15 01:09:44 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 01:09:44 +0200 Subject: [Python-Dev] Sharing functions between C extension modules in stdlib In-Reply-To: References: Message-ID: <4C16B6B8.3030304@v.loewis.de> > $ nm build/lib.macosx-10.4-x86_64-3.2-pydebug/datetime.so | grep > _PyTime_DoubleToTimet > 000000000000f4e2 T __PyTime_DoubleToTimet > $ nm build/lib.macosx-10.4-x86_64-3.2-pydebug/time.so | grep > _PyTime_DoubleToTimet > 0000000000000996 T __PyTime_DoubleToTimet > > I have two questions: 1) how does this happen; 'T' means "defined in text segment", so it looks like the code is included twice. And indeed, it is: exts.append( Extension('time', ['timemodule.c'], libraries=math_libs) ) exts.append( Extension('datetime', ['datetimemodule.c', 'timemodule.c'], libraries=math_libs) ) > and 2) is this intentional? This was added with ------------------------------------------------------------------------ r36221 | bcannon | 2004-06-24 03:38:47 +0200 (Do, 24. Jun 2004) | 3 Zeilen Add compilation of timemodule.c with datetimemodule.c to get __PyTime_DoubleToTimet(). ------------------------------------------------------------------------ So it's clearly intentional. I doubt its desirable, though. If only __PyTime_DoubleToTimet needs to be duplicated, I'd rather put that function into a separate C file that gets included twice, instead of including the full timemodule.c into datetimemodule.c. Regards, Martin From alexander.belopolsky at gmail.com Tue Jun 15 01:17:41 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 14 Jun 2010 19:17:41 -0400 Subject: [Python-Dev] Static linking with libpython.a In-Reply-To: References: Message-ID: On Mon, Jun 14, 2010 at 6:29 AM, F van der Meeren wrote: .. > I am trying to figure out, what files to copy with my app so I am able to initialize the python runtime. > Where can I find information about this? On comp.lang.python forum. This forum is for developing python itself, not applications using python. However, in general, you need code in Python, Parser, and Objects directories. See LIBRARY_OBJS definition in the Makefile. These days you also need some bootstrap code from Lib, AFAIK. From kristjan at ccpgames.com Tue Jun 15 14:48:39 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Tue, 15 Jun 2010 12:48:39 +0000 Subject: [Python-Dev] debug and release python In-Reply-To: <4C16A94E.9020101@v.loewis.de> References: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> <4C16A94E.9020101@v.loewis.de> Message-ID: <930F189C8A437347B80DF2C156F7EC7F0A8D8FBEE0@exchis.ccp.ad.local> What I mean is that a third party software vendor supplies a foobarapp.pyd and a foobarapp_d.pyd dlls that link to python2x.dll and python2x_d.dll respectively. But the latter will have been compiled to match a certain settings of the objimpl.h header, which may not match whatever is being used to build the local python2x_d.dll. And thus, you get strange and hard to debug linker errors when trying to load external libraries. When developing superapp.exe, which uses a custom build of python2x, perhaps even embedded, python2x_d.dll is used extensively both during the development process and the testing process. This is why foobarapp_d.pyd is necessary and why it is supplied by any sensible vendor providing opaque python extensions. But the current objimpl.h api makes it a matter of developer choice whether that foobarapp_d.pyd will successfully link with your python2x_d.dll or not. IMHO, it is not good practice to expose an API that changes depending on preprocessor settings like this. K > -----Original Message----- > From: "Martin v. L?wis" [mailto:martin at v.loewis.de] > Sent: 14. j?n? 2010 22:13 > To: Kristj?n Valur J?nsson > Cc: python-dev at python.org > Subject: Re: [Python-Dev] debug and release python > > > Some external software comes with proprietary .pyd bindings. > > Can you please explain what a "proprietary .pyd binding" is? > > Do you mean they come with extension modules? If so, there is no chance > of using them in debug mode, anyway, right? So what specifically is the > problem? > > Regards, > Martin From catherine.devlin at gmail.com Tue Jun 15 21:51:11 2010 From: catherine.devlin at gmail.com (Catherine Devlin) Date: Tue, 15 Jun 2010 15:51:11 -0400 Subject: [Python-Dev] Become a Python contributor at PyOhio Message-ID: Thanks to David Murray, we're going ahead with plans to make a full-fledged introduction to core development at PyOhio. We've just started circulating this announcement to drum up interest, so if there are people or groups who you'd like to recruit to the effort, please forward it to them. By the way, I haven't made a peep on this list yet - or even read it - because I'm intentionally preserving my ignorance so that I can be the leader-learner for the Teach Me session. (It's the first time wilful ignorance has actually been a virtue). Anyway, the announcement: Become a Python contributor at PyOhio ===================================== Working in Python is awesome. Are you ready to work on Python? The quality of Python and the Standard Library depend on volunteers who fix bugs and make improvements to the codebase. If you're interested in joining these volunteers, good for you! Information on core development is right on Python's homepage. However, if you'd like an in-person boost to get you started, come to PyOhio this July 31 - August 3. One of our many events is "Teach Me Python Bugfixing", an introduction to working on Python that's guaranteed newbie-friendly (because a newbie is running it). Next come two evenings and two full days of Python core sprinting, so you can put your new skills to use with plenty of helpers around. It's classroom learning and real-life practice at one free event! See you there! Core development: http://www.python.org/dev/ PyOhio: http://www.pyohio.org/ Teach Me Python Bugfixing: http://www.pyohio.org/2010/Talks#A.234_Teach_Me_Python_Bugfixing PyOhio sprints: http://www.pyohio.org/Sprints2010 -- - Catherine http://catherinedevlin.blogspot.com/ *** PyOhio 2010 * July 31 - Aug 1 * Columbus, OH * pyohio.org *** -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Jun 15 22:19:57 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 15 Jun 2010 22:19:57 +0200 Subject: [Python-Dev] debug and release python In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0A8D8FBEE0@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> <4C16A94E.9020101@v.loewis.de> <930F189C8A437347B80DF2C156F7EC7F0A8D8FBEE0@exchis.ccp.ad.local> Message-ID: <4C17E06D.6030601@v.loewis.de> Am 15.06.2010 14:48, schrieb Kristj?n Valur J?nsson: > What I mean is that a third party software vendor supplies a > foobarapp.pyd and a foobarapp_d.pyd dlls that link to python2x.dll > and python2x_d.dll respectively. But the latter will have been > compiled to match a certain settings of the objimpl.h header, which > may not match whatever is being used to build the local > python2x_d.dll. And thus, you get strange and hard to debug linker > errors when trying to load external libraries. Ok. But your proposed change doesn't fix that, right? I.e. even with the change, it would *still* depend on objimpl.h (and other) settings what ABI this debug DLL exactly has. So I think this problem can't really be fixed. Instead, you have to trust that the vendor did the most sensible thing when building foobarapp.pyd, namely activating *just* the debug build. Then, if you do the same, it will interoperate just fine. > IMHO, it is not good practice to expose an API that changes depending > on preprocessor settings like this. But there are tons of ABI changes that may happen in a debug build. If you want to cope with all of them, you really need to recompile the source of all extensions. Regards, Martin From amauryfa at gmail.com Tue Jun 15 22:24:05 2010 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 15 Jun 2010 22:24:05 +0200 Subject: [Python-Dev] debug and release python In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> Message-ID: 2010/6/14 Kristj?n Valur J?nsson : > Hello there. > > I?m sure this has come up before, but here it is again: > > > > Python exports a different api in debug mode, depending on whether > PYMALLOC_DEBUG and WITH_PYMALLOC are exported.? This means that _d.pyd files > that are used must have been compiled with a version of python using the > same settings for these macros.?? It is unfortunate that the > _PyObject_DebugMalloc() api is exposed to external applications using macros > in objimpl.h > > > > I would suggest two things: > > 1)????? provide dummy or thunking versions of those in builds that don?t > have PYMALLOC_DEBUG impolemented, that thunk to PyObject_Malloc et al. (This > is what we have done at CCP) > > 2)????? Remove the _PyObject_DebugMalloc() from the api.? It really should > be an implementation of in the exposed PyObject_Malloc() functions whether > they use debug functionality at all.? ?the _PyObject_DebugCheckAddress and > _PyObject_DebugDumpAddress() can be left in place.? But exposing this > functionality in macros that external moduled compile in, is not good at > all. > > The reason why this is annoying: > > Some external software comes with proprietary .pyd bindings.? When > developing my own application, with modified preprocessor definitions (e.g. > to turn off PYMALLOC_DEBUG) we find that those externally provided libraries > don?t work.? It takes a fair amount of detective work to find out why > exactly linkage fails.? The external API really shouldn?t change depending > on preprocessor definitions. I remember having the same issue years ago: http://mail.python.org/pipermail/python-list/2004-July/855844.html At the time, I solved the issue by compiling extension modules with pymalloc options turned on (which it fortunately the default, so this applies to the supplied proprietary .pyd), and I added a (plain) definition for functions like _PyObject_DebugMalloc, even when PYMALLOC_DEBUG is undefined. Since the python_d.dll is a custom build anyway, adding the code is not too much pain. -- Amaury Forgeot d'Arc From catherine.devlin at gmail.com Tue Jun 15 23:07:22 2010 From: catherine.devlin at gmail.com (Catherine Devlin) Date: Tue, 15 Jun 2010 17:07:22 -0400 Subject: [Python-Dev] Become a Python contributor at PyOhio In-Reply-To: <20100615203439.GT8876@ag.com> References: <20100615203439.GT8876@ag.com> Message-ID: On Tue, Jun 15, 2010 at 4:34 PM, Dan Buch wrote: > Does this mean I should repurpose my talk slot, currently entitled > "Intro to Core Involvement"? :) > > Ach! I forgot! Hopefully that's the dumbest mistake I'll make in this year's PyOhio preparations. Fortunately the PyCon blog can be edited... wish emails could be. No, as I wrote to the talk committee, "There is some overlap between this talk and Dan Buch's submission, though his seems to have a heavier focus on doc and triage work. If they're both selected, I'll work with Dan to see that the talks dovetail well together. I would really *love* to see Dan's talk, this talk, and sprints (weekend sprints AND sprints on the following weekdays) fuse into a great big contribu-palooza that will put PyOhio on the map! Well, we're already on the map." I actually think it'll be ideal if we can get - Your talk midday on Saturday, for a clasically planned introduction on multiple aspects of core involvement - Shortly thereafter, my "teach me" talk, which will be specifically about bugfixing and will focus on points of newbie confusion by means of my own all-natural fumbling. Hopefully some people from your talk's audience will take their brand-new knowledge to participate in the "teach me" session as both teachers and learners... nothing solidifies learning like teaching does. (I think I need to *not* watch your talk until afterward on video, incidentally, to keep my ignorance pure. I might end up as the most ignorant person in the room, which would be perfect.) - That evening, the sprinty goodness begins. -- - Catherine http://catherinedevlin.blogspot.com/ *** PyOhio 2010 * July 31 - Aug 1 * Columbus, OH * pyohio.org *** -------------- next part -------------- An HTML attachment was scrubbed... URL: From catherine.devlin at gmail.com Tue Jun 15 23:08:56 2010 From: catherine.devlin at gmail.com (Catherine Devlin) Date: Tue, 15 Jun 2010 17:08:56 -0400 Subject: [Python-Dev] Become a Python contributor at PyOhio Message-ID: So let's try this again: Become a Python contributor at PyOhio ===================================== Working in Python is awesome. Are you ready to work on Python? The quality of Python and the Standard Library depend on volunteers who fix bugs and make improvements to the codebase. If you're interested in joining these volunteers, good for you! Information on core development is right on Python's homepage. However, if you'd like an in-person boost to get you started, come to PyOhio this July 31 - August 3. Two talks will get you up to speed on Python contribution: "Intro to Core Involvement" and "Teach Me Python Bugfixing". Next come two evenings and two full days of Python core sprinting, so you can put your new skills to use with plenty of helpers around. It's classroom learning and real-life practice at one free event! See you there! Core development: http://www.python.org/dev/ PyOhio: http://www.pyohio.org/ Intro to Core Development: http://www.pyohio.org/2010/Talks#A.2320_Intro_to_Core_Involvement Teach Me Python Bugfixing: http://www.pyohio.org/2010/Talks#A.234_Teach_Me_Python_Bugfixing PyOhio sprints: http://www.pyohio.org/Sprints2010 -- - Catherine http://catherinedevlin.blogspot.com/ *** PyOhio 2010 * July 31 - Aug 1 * Columbus, OH * pyohio.org *** -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed Jun 16 09:47:32 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 16 Jun 2010 09:47:32 +0200 Subject: [Python-Dev] Are PyCFunctions supposed to invisibly consume self when used as a method? In-Reply-To: <4C1416A3.60004@cheimes.de> References: <9DD65C5E-7E01-4A1E-B480-588F58782262@voidspace.org.uk> <4C1416A3.60004@cheimes.de> Message-ID: Christian Heimes, 13.06.2010 01:22: > Am 13.06.2010 01:15, schrieb Guido van Rossum: >> Hey! No borrowing the time machine! :-) > > Too late Get the irony? Stefan From kristjan at ccpgames.com Wed Jun 16 10:35:58 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 16 Jun 2010 08:35:58 +0000 Subject: [Python-Dev] debug and release python In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> Message-ID: <930F189C8A437347B80DF2C156F7EC7F0A8D8FC053@exchis.ccp.ad.local> > -----Original Message----- > From: Amaury Forgeot d'Arc [mailto:amauryfa at gmail.com] > Sent: 15. j?n? 2010 21:24 > To: Kristj?n Valur J?nsson > Cc: python-dev at python.org > Subject: Re: [Python-Dev] debug and release python > > I remember having the same issue years ago: > http://mail.python.org/pipermail/python-list/2004-July/855844.html > > At the time, I solved the issue by compiling extension modules with > pymalloc options turned on > (which it fortunately the default, so this applies to the supplied > proprietary .pyd), > and I added a (plain) definition for functions like > _PyObject_DebugMalloc, > even when PYMALLOC_DEBUG is undefined. > > Since the python_d.dll is a custom build anyway, adding the code is > not too much pain. > It is not too much pain, once you realize the problem, no. But I just got bitten by this and spent the best part of a weekend trying to solve the problem. On Windows, you get an import failure on the .pyd file with the message: "Procedure entry point not found". I had come across this previously, some three years ago perhaps, and forgotten all about it, so I was sufficiently annoyed to post to python-dev. We use python27_d.dll a lot and typically have WITH_PYMALLOC disabled in debug build to for the benefit of using the debug malloc libraries present on windows. I've solved the issue now by making sure that obmalloc.c always exports _PyObject_DebugMalloc(), much as it always exports PyObject_Malloc() whether WITH_PYMALLOC is defined or not. My suggestion for python core would be the same: expose these always for existing python versions, and remove them from the API in new python versions. K From kristjan at ccpgames.com Wed Jun 16 10:42:07 2010 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 16 Jun 2010 08:42:07 +0000 Subject: [Python-Dev] debug and release python In-Reply-To: <4C17E06D.6030601@v.loewis.de> References: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> <4C16A94E.9020101@v.loewis.de> <930F189C8A437347B80DF2C156F7EC7F0A8D8FBEE0@exchis.ccp.ad.local> <4C17E06D.6030601@v.loewis.de> Message-ID: <930F189C8A437347B80DF2C156F7EC7F0A8D8FC056@exchis.ccp.ad.local> > -----Original Message----- > From: "Martin v. L?wis" [mailto:martin at v.loewis.de] > Sent: 15. j?n? 2010 21:20 > To: Kristj?n Valur J?nsson > Cc: python-dev at python.org > Subject: Re: [Python-Dev] debug and release python > > Am 15.06.2010 14:48, schrieb Kristj?n Valur J?nsson: > > What I mean is that a third party software vendor supplies a > > foobarapp.pyd and a foobarapp_d.pyd dlls that link to python2x.dll > > and python2x_d.dll respectively. But the latter will have been > > compiled to match a certain settings of the objimpl.h header, which > > may not match whatever is being used to build the local > > python2x_d.dll. And thus, you get strange and hard to debug linker > > errors when trying to load external libraries. > > Ok. But your proposed change doesn't fix that, right? > > I.e. even with the change, it would *still* depend on objimpl.h (and > other) settings what ABI this debug DLL exactly has. > I think it does. My proposal was perhaps not clear: For existing python versions, always export _PyObject_DebugMalloc et al. irrespective of the WITH_PYMALLOC and PYMALLOC_DEBUG settings. (PyObject_Malloc()) is always exported, even for builds without WITH_PYMALLOC) On new python versions, remove the _PyObject_DebugMalloc from the ABI. Make the switch internal to obmalloc.c, so that you can turn on the debug library by recompiling pythonxx_d.dll only (currently, you have to recompile the .pyd files too!) > But there are tons of ABI changes that may happen in a debug build. > If you want to cope with all of them, you really need to recompile the > source of all extensions. Are there? Can you give me an example? I thought we were careful to keep the interface shown to pyd files constant regardless of configuration settings. K From msenecal.sc at gmail.com Wed Jun 16 07:45:54 2010 From: msenecal.sc at gmail.com (Mart) Date: Wed, 16 Jun 2010 01:45:54 -0400 Subject: [Python-Dev] Release manager/developer (10 years + experience) would like to help and volunteer time if needed Message-ID: <601BF624-5D0B-4E2A-A019-2589491AB6D3@gmail.com> Hi, I have worked 10 years at Adobe Systems as a Release Developer for the LiveCycle ES team and am now employed as a Release Manager (for a team of one, me ;) ) at Nuance Communications since last March. I have put lots of effort to keep Python alive and well at Adobe by providing complete build/release solutions & processes, automation and tooling in my favourite language, Python. I have been promoting, planning and implementing a completely new build/release infrastructure at Nuance, where my expectation is have a 100% python shop to manage builds and releases. I would very like to offer any help you may require, provided I am a good fit. I can provide references, resume, etc. if requested. In hopes of pursuing further discussions, please accept my best regards, Martin Senecal Gatineau (Quebec) Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: From orsenthil at gmail.com Wed Jun 16 13:20:51 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Wed, 16 Jun 2010 16:50:51 +0530 Subject: [Python-Dev] Release manager/developer (10 years + experience) would like to help and volunteer time if needed Message-ID: Welcome! You might just want to hook on to the process mentioned at http://www.python.org/dev That's it. -- Senthil On 16 Jun 2010 16:44, "Mart" wrote: Hi, I have worked 10 years at Adobe Systems as a Release Developer for the LiveCycle ES team and am now employed as a Release Manager (for a team of one, me ;) ) at Nuance Communications since last March. I have put lots of effort to keep Python alive and well at Adobe by providing complete build/release solutions & processes, automation and tooling in my favourite language, Python. I have been promoting, planning and implementing a completely new build/release infrastructure at Nuance, where my expectation is have a 100% python shop to manage builds and releases. I would very like to offer any help you may require, provided I am a good fit. I can provide references, resume, etc. if requested. In hopes of pursuing further discussions, please accept my best regards, Martin Senecal Gatineau (Quebec) Canada _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/orsenthil%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jun 16 15:19:20 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Jun 2010 23:19:20 +1000 Subject: [Python-Dev] Release manager/developer (10 years + experience) would like to help and volunteer time if needed In-Reply-To: <601BF624-5D0B-4E2A-A019-2589491AB6D3@gmail.com> References: <601BF624-5D0B-4E2A-A019-2589491AB6D3@gmail.com> Message-ID: On Wed, Jun 16, 2010 at 3:45 PM, Mart wrote: > > I have put lots of effort to keep Python alive and well at Adobe by providing complete build/release solutions & processes, automation and tooling ?in my favourite language, Python. > I have been promoting, planning and implementing a completely new build/release infrastructure at Nuance, where my expectation is have a 100% python shop to manage builds and releases. Hi Martin, With that kind of background there are likely a number of ways you could contribute. From a general Python programming point of view, I'd start with Brett's intro to CPython development at http://www.python.org/dev/intro/ and the other links in the dev section of the web site. There are plenty of bug fixes and feature requests relating to pure Python components of the standard library that always need work (even comments just saying "I tried this patch and it worked for me" can be very helpful). Specifically in the area of automated build and test management, Martin von Loewis may have some suggestions for improvements that could be made to our Buildbot infrastructure that he doesn't have the time to do himself. It may also be worth checking with Dirkjan Ochtman to see if there is anything in this space that still needs to be handled for the transition from svn to hg that will hopefully be happening later this year. With any luck, those two will actually chime in here (as they're both python-dev subscribers). We don't go in for automated binary releases for a variety of reasons - I definitely advise trawling through the python-dev archives for a while before getting too enthusiastic on that particular front. Cheers, Nick. -- Nick Coghlan ? | ? ncoghlan at gmail.com ? | ? Brisbane, Australia From msenecal.sc at gmail.com Wed Jun 16 16:19:39 2010 From: msenecal.sc at gmail.com (Mart) Date: Wed, 16 Jun 2010 10:19:39 -0400 Subject: [Python-Dev] Release manager/developer (10 years + experience) would like to help and volunteer time if needed In-Reply-To: References: <601BF624-5D0B-4E2A-A019-2589491AB6D3@gmail.com> Message-ID: <7A14D4CD-0708-4521-BC1F-785D88BDFAFA@gmail.com> Hi Nick, That sounds great! I assume since python-dev has been cc'ed that both Martin von Loewis ans Dirkjan Ochtman are listening on this thread. If so, then let me know if there is anything specific that either of you would need a hand with. I would be more than happy to take on some of your "still TODO but no time" items. Meanwhile I will take a closer look @ http://www.python.org/dev/intro and see where/if I can roll up my sleeves and lend a hand. Thanks for the reply & info and I look forward to contributing! Mart :) On 2010-06-16, at 9:19 AM, Nick Coghlan wrote: > On Wed, Jun 16, 2010 at 3:45 PM, Mart wrote: >> >> I have put lots of effort to keep Python alive and well at Adobe by providing complete build/release solutions & processes, automation and tooling in my favourite language, Python. >> I have been promoting, planning and implementing a completely new build/release infrastructure at Nuance, where my expectation is have a 100% python shop to manage builds and releases. > > Hi Martin, > > With that kind of background there are likely a number of ways you > could contribute. From a general Python programming point of view, I'd > start with Brett's intro to CPython development at > http://www.python.org/dev/intro/ and the other links in the dev > section of the web site. There are plenty of bug fixes and feature > requests relating to pure Python components of the standard library > that always need work (even comments just saying "I tried this patch > and it worked for me" can be very helpful). > > Specifically in the area of automated build and test management, > Martin von Loewis may have some suggestions for improvements that > could be made to our Buildbot infrastructure that he doesn't have the > time to do himself. It may also be worth checking with Dirkjan Ochtman > to see if there is anything in this space that still needs to be > handled for the transition from svn to hg that will hopefully be > happening later this year. With any luck, those two will actually > chime in here (as they're both python-dev subscribers). > > We don't go in for automated binary releases for a variety of reasons > - I definitely advise trawling through the python-dev archives for a > while before getting too enthusiastic on that particular front. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From alexander.belopolsky at gmail.com Wed Jun 16 17:54:06 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 16 Jun 2010 11:54:06 -0400 Subject: [Python-Dev] Sharing functions between C extension modules in stdlib In-Reply-To: <4C16B6B8.3030304@v.loewis.de> References: <4C16B6B8.3030304@v.loewis.de> Message-ID: On Mon, Jun 14, 2010 at 7:09 PM, "Martin v. L?wis" wrote: .. > So it's clearly intentional. I doubt its desirable, though. If only > __PyTime_DoubleToTimet needs to be duplicated, I'd rather put that > function into a separate C file that gets included twice, instead of > including the full timemodule.c into datetimemodule.c. Thanks for your research, Martin. I've opened an issue for this at http://bugs.python.org/issue9012 . From lutz at rmi.net Wed Jun 16 22:48:49 2010 From: lutz at rmi.net (lutz at rmi.net) Date: Wed, 16 Jun 2010 20:48:49 -0000 Subject: [Python-Dev] email package status in 3.X Message-ID: <6wwifklfk7n7tup216062010044853@SMTP> [copied to pydev from email-sig because of the broader scope] Well, it looks like I've stumbled onto the "other shoe" on this issue--that the email package's problems are also apparently behind the fact that CGI binary file uploads don't work in 3.1 (http://bugs.python.org/issue4953). Yikes. I trust that people realize this is a show-stopper for broader Python 3.X adoption. Why 3.0 was rolled out anyhow is beyond me; it seems that it would have been better if Python developers had gotten their own code to work with 3.X, before expecting the world at large to do so. FWIW, after rewriting Programming Python for 3.1, 3.x still feels a lot like a beta to me, almost 2 years after its release. How did this happen? Maybe nobody is using 3.X enough to care, but I have a feeling that issues like this are part of the reason why. No offense to people who obviously put in an incredible amount of work on 3.X. As someone who remembers 0.X, though, it's hard not to find the current situation a bit disappointing. --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > -----Original Message----- > From: lutz at rmi.net > To: "R. David Murray" > Subject: Re: email package status in 3.X > Date: Sun, 13 Jun 2010 15:30:06 -0000 > > Come to think of it, here was another oddness I just recalled: this > may have been reported already, but header decoding returns mixed types > depending upon the structure of the header. Converting to a str for > display isn't too difficult to handle, but this seems a bit inconsistent > and contrary to Python's type neutrality: > > >>> from email.header import decode_header > >>> S1 = 'Man where did you get that assistant?' > >>> S2 = '=?utf-8?q?Man_where_did_you_get_that_assistant=3F?=' > >>> S3 = 'Man where did you get that =?UTF-8?Q?assistant=3F?=' > > # str: don't decode() > >>> decode_header(S1) > [('Man where did you get that assistant?', None)] > > # bytes: do decode() > >>> decode_header(S2) > [(b'Man where did you get that assistant?', 'utf-8')] > > # bytes: do decode(), using raw-unicode-escape applied in package > >>> decode_header(S3) > [(b'Man where did you get that', None), (b'assistant?', 'utf-8')] > > I can work around this with the following code, but it > feels a bit too tightly coupled to the package's internal details > (further evidence that email.* can be made to work as is today, > even if it may be seen as less than ideal aesthetically): > > parts = email.header.decode_header(rawheader) > decoded = [] > for (part, enc) in parts: # for all substrings > if enc == None: # part unencoded? > if not isinstance(part, bytes): # str: full hdr unencoded > decoded += [part] # else do unicode decode > else: > decoded += [part.decode('raw-unicode-escape')] > else: > decoded += [part.decode(enc)] > return ' '.join(decoded) > > Thanks, > --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > > > > -----Original Message----- > > From: lutz at rmi.net > > To: "R. David Murray" > > Subject: Re: email package status in 3.X > > Date: Sat, 12 Jun 2010 16:52:32 -0000 > > > > Hi David, > > > > All sounds good, and thanks again for all your work on this. > > > > I appreciate the difficulties of moving this package to 3.X > > in a backward-compatible way. My suggestions stem from the fact > > that it does work as is today, albeit in a less than ideal way. > > > > That, and I'm seeing that Python 3.X in general is still having > > a great deal of trouble gaining traction in the "real world" > > almost 2 years after its release, and I'd hate to see further > > disincentives for people to migrate. This is a bigger issue > > than both the email package and this thread, of course. > > > > > > 3) Type-dependent text part encoding > > > > > > > ... > > > So, in the next releases of Python all MIMEText input should be string, > > > and it will fail if you pass bytes. I consider this as email previously > > > not living up to its published API, but do you think I should hack > > > in a way for it to accept bytes too, for backward compatibility in the > > > 3 line? > > > > Decoding can probably be safely delegated to package clients. > > Typical email clients will probably have str for display of the > > main text. They may wish to read attachments in binary mode, but > > can always read in text mode instead or decode manualy, because > > they need a known encoding to send the part correctly (my client > > has to ask or use configurations in some cases). > > > > B/W compatibility probably isn't a concern; I suspect that my > > temporary workaround will still work with your patch anyhow, > > and this code didn't work at all for some encodings before. > > > > > > There are some additional cases that now require decoding per mail > > > > headers today due to the str/bytes split, but these are just a > > > > normal artifact of supporting Unicode character sets in general, > > > > ans seem like issues for package client to resolve (e.g., the bytes > > > > returned for decoded payloads in 3.X didn't play well with existing > > > > str-based text processing code written for 2.X). > > > > > > I'm not following you here. Can you give me some more specific > > > examples? Even if these "normal artifacts" must remain with > > > the current API, I'd like to make things as easy as practical when > > > using the new API. > > > > This was just a general statement about things in my own code that > > didn't jive with the 3.X string model. For instance, line wrapping > > logic assumed str; tkinter text widgets do much better rendering str > > than the bytes fetched for decoded payloads; and my Pyedit text editor > > component had to be overhauled to handle display/edit/save of payloads > > of arbitrary encodings. If I remember any more specific issues with > > the email package itself, I'll forward your way. > > > > I'll watch for an opportunity to get the book's new PyMailGUI > > client code to you as a candidate test case, but please ping > > me about it later if I haven't acted on this. It works well, > > but largely because of all the work that went into the email > > package underlying it. > > > > Thanks, > > --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > > > > > > > -----Original Message----- > > > From: "R. David Murray" > > > To: lutz at rmi.net > > > Subject: Re: email package status in 3.X > > > Date: Thu, 10 Jun 2010 10:18:48 -0400 > > > > > > On Thu, 10 Jun 2010 09:21:52 -0400, lutz at rmi.net wrote: > > > > In other words, some of my concern may have been a bit premature. > > > > I hope that in the future we'll either strive for compatibility > > > > or keep the current version around; it's a lot of very useful code. > > > > > > The plan is to have a compatibility layer that will accept calls based > > > on the old API and forward appropriately to the new API. So far I'm > > > thinking I can succeed in doing this in a fairly straightforward manner, > > > but I won't know for sure until I get some more pieces in place. > > > > > > > In fact, I recommend that any new email package be named distinctly, > > > > > > I'm going to avoid that if I can (though the PyPI package will be > > > named email6 when we publish it for public testing). If, however, > > > it turns out that I can't correctly support both the old and the > > > new API, then I'll have to do that. > > > > > > > and that the current package be retained for a number of releases to > > > > come. After all the breakages that 3.X introduced in general, doing > > > > the same to any email-based code seems a bit too much, especially > > > > given that the current package is largely functional as is. To me, > > > > after having just used it extensively, fixing its few issues seems > > > > a better approach than starting from scratch. > > > > > > Well, the thing is, as you found, existing 2.x code needs to be fixed to > > > correctly handle the distinction between strings and bytes no matter what. > > > The goal is to make it easier to write correct programs, while providing > > > the compatibility layer to make porting smoother. But I doubt that any > > > non-trivial 2.x email program will port without significant changes, > > > even if the compatibility layer is close to 100% compatible with the > > > current Python3 email package, simply because the previous conflation > > > of text and bytes must be untangled in order to work correctly in > > > Python3, and email involves lots of transitions between text and bytes. > > > > > > As for "starting from scratch", it is true that the current plan involves > > > considerable changes in the recommended API (in the direction of greater > > > flexibility and power), but I'm hoping that significant portions of the > > > code will carry forward with minor changes, and that this will make it > > > easier to support the old API. > > > > > > > As far as other issues, the things I found are described below my > > > > signature. I don't know what the utf-8 issue is that you refer > > > > too; I'm able to parse and send with this encoding as is without > > > > problems (both payloads and headers), but I'm probably not using the > > > > interfaces you fixed, and this may be the same as one of item listed. > > > > > > It is, see below. > > > > > > > Another thought: it might be useful to use the book's email client > > > > as a sort of test case for the package; it's much more rigorous in > > > > the new edition because it now has to be given 3.X'Unicode model > > > > (it's abut 4,900 lines of code, though not all is email-related). > > > > I'd be happy to donate the code as soon as I find out what the > > > > copyright will be this time around; it will be at O'Reilly's site > > > > this Fall in any event. > > > > > > That would be great. I am planning to write my own sample ap to > > > demonstrate the new API, but if I can use yours to test the compatibility > > > layer that will help a lot, since I otherwise have no Python3 email > > > application to test against unless I port something from Python2. > > > > > > > Major issues I found... > > > > ------------------------------------------------------------------ > > > > 1) Str required for parsing, but bytes returned from poplib > > > > > > > > The initial decode from bytes to str of full mail text; in > > > > retrospect, probably not a major issue, since original email > > > > standards called for ASCII. A 8-bit encoding like Latin-1 is > > > > probably sufficient for most conforming mails. For the book, > > > > I try a set of different encodings, beginning with an optional > > > > configuration module setting, then ascii, latin-1, and utf-8; > > > > this is probably overkill, but a GUI has to be defensive. > > > > > > This works (mostly) for conforming email, but some important Python email > > > applications need to deal with non-conforming email. That's where the > > > inability to parse bytes directly really causes problems. > > > > > > > 2) Binary attachments encoding > > > > > > > > The binary attachments byte-to-str issue that you've just > > > > fixed. As I mentioned, I worked around this by passing in a > > > > custom encoder that calls the original and runs an extra decode > > > > step. Here's what my fix looked like in the book; your patch > > > > may do better, and I will minimally add a note about the 3.1.3 > > > > and 3.2 fix for this: > > > > > > Yeah, our patch was a lot simpler since we could fix the encoding inside > > > the loop producing the encoded lines :) > > > > > > > 3) Type-dependent text part encoding > > > > > > > > There's a str/bytes confusion issue related to Unicode encodings > > > > in text payload generation: some encodings require the payload to > > > > be str, but others expect bytes. Unfortunately, this means that > > > > clients need to know how the package will react to the encoding > > > > that is used, and special-case based upon that. > > > > > > This was the UTF-8 bug I fixed. I shouldn't have called it "the UTF-8 > > > bug", because it applies equally to the other charsets that use base64, > > > as you note. I called it that because UTF-8 was where the problem was > > > noticed and is mentioned in the title of the bug report. > > > > > > I had a suspicion that the quoted-printable encoding wasn't being done > > > correctly either, so to hear that it is working for you is good news. > > > There may still be bugs to find there, though. > > > > > > So, in the next releases of Python all MIMEText input should be string, > > > and it will fail if you pass bytes. I consider this as email previously > > > not living up to its published API, but do you think I should hack > > > in a way for it to accept bytes too, for backward compatibility in the > > > 3 line? > > > > > > > There are some additional cases that now require decoding per mail > > > > headers today due to the str/bytes split, but these are just a > > > > normal artifact of supporting Unicode character sets in general, > > > > ans seem like issues for package client to resolve (e.g., the bytes > > > > returned for decoded payloads in 3.X didn't play well with existing > > > > str-based text processing code written for 2.X). > > > > > > I'm not following you here. Can you give me some more specific > > > examples? Even if these "normal artifacts" must remain with > > > the current API, I'd like to make things as easy as practical when > > > using the new API. > > > > > > Thanks for all your feedback! > > > > > > --David > > > > > > > > > > > > From ncoghlan at gmail.com Wed Jun 16 23:47:27 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 17 Jun 2010 07:47:27 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <6wwifklfk7n7tup216062010044853@SMTP> References: <6wwifklfk7n7tup216062010044853@SMTP> Message-ID: On Thu, Jun 17, 2010 at 6:48 AM, wrote: > I trust that people realize this is a show-stopper for broader > Python 3.X adoption. ?Why 3.0 was rolled out anyhow is beyond > me; it seems that it would have been better if Python developers > had gotten their own code to work with 3.X, before expecting the > world at large to do so. > > FWIW, after rewriting Programming Python for 3.1, 3.x still feels > a lot like a beta to me, almost 2 years after its release. ?How > did this happen? ?Maybe nobody is using 3.X enough to care, but > I have a feeling that issues like this are part of the reason why. > > No offense to people who obviously put in an incredible amount of > work on 3.X. ?As someone who remembers 0.X, though, it's hard not > to find the current situation a bit disappointing. Agreed, but the binary/text distinction in 2.x (or rather, the lack thereof) makes the unicode handling situation so hopelessly confused that there is a lot of 2.x code (including in the standard library) that silently mixes the two, often without really testing the consequences (as clearly happened here). 3.x was rolled out anyway because the vast majority of it works. Obviously people affected by the problems specific to the email package and any other binary vs text parsing problems that are still lingering are out of luck at the moment, but leaving 3.x sitting on a shelf indefinitely would hardly have inspired anyone to clean it up. My personal perspective is that a lot of that code was likely already broken in hard to detect ways when dealing with mixed encodings - releasing 3.x just made the associated errors significantly easier to detect. If we end up being able to add your email client code to the standard library's unit test suite, that should help the situation immensely. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From martin at v.loewis.de Thu Jun 17 09:29:01 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 17 Jun 2010 09:29:01 +0200 Subject: [Python-Dev] debug and release python In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F0A8D8FC056@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F0A8D73C303@exchis.ccp.ad.local> <4C16A94E.9020101@v.loewis.de> <930F189C8A437347B80DF2C156F7EC7F0A8D8FBEE0@exchis.ccp.ad.local> <4C17E06D.6030601@v.loewis.de> <930F189C8A437347B80DF2C156F7EC7F0A8D8FC056@exchis.ccp.ad.local> Message-ID: <4C19CEBD.9080304@v.loewis.de> >> I.e. even with the change, it would *still* depend on objimpl.h >> (and other) settings what ABI this debug DLL exactly has. >> > I think it does. My proposal was perhaps not clear: For existing > python versions, always export _PyObject_DebugMalloc et al. Hmm. That's still not clear. What are "existing Python versions"? You can't change them anymore; any change can only affect future, as-of-yet-non-existing Python versions. Also, what do you mean by "always"? Even in release builds? Would this really help? You shouldn't be mixing PyObject_DebugMalloc and PyObject_Malloc in a single process. > On new python versions, remove the > _PyObject_DebugMalloc from the ABI. Make the switch internal to > obmalloc.c, so that you can turn on the debug library by recompiling > pythonxx_d.dll only (currently, you have to recompile the .pyd files > too!) That sounds fine. >> But there are tons of ABI changes that may happen in a debug >> build. If you want to cope with all of them, you really need to >> recompile the source of all extensions. > Are there? Can you give me an example? If you define Py_TRACE_REFS, every object has two additional pointers, which aren't there if you don't. So extensions compiled with it are incompatible with extensions compiled without it. If you define COUNT_ALLOCS, every type object will have additional slots; again, you can't mix extensions that have a different setting here than the interpreter. Regards, Martin From barry at python.org Thu Jun 17 17:43:29 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 17 Jun 2010 11:43:29 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <6wwifklfk7n7tup216062010044853@SMTP> References: <6wwifklfk7n7tup216062010044853@SMTP> Message-ID: <20100617114329.254db9ac@heresy> On Jun 16, 2010, at 08:48 PM, lutz at rmi.net wrote: >Well, it looks like I've stumbled onto the "other shoe" on this >issue--that the email package's problems are also apparently >behind the fact that CGI binary file uploads don't work in 3.1 >(http://bugs.python.org/issue4953). Yikes. > >I trust that people realize this is a show-stopper for broader >Python 3.X adoption. We know it, we have extensively discussed how to fix it, we have IMO a good design, and we even have someone willing and able to tackle the problem. We need to find a sufficient source of funding to enable him to do the work it will take, and so far that's been the biggest stumbling block. It will take a focused and determined effort to see this through, and it's obvious that volunteers cannot make it happen. I include myself in the latter category, as I've tried and failed at least twice to do it in my spare time. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From janssen at parc.com Thu Jun 17 20:11:22 2010 From: janssen at parc.com (Bill Janssen) Date: Thu, 17 Jun 2010 11:11:22 PDT Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <6wwifklfk7n7tup216062010044853@SMTP> Message-ID: <58318.1276798282@parc.com> Nick Coghlan wrote: > My personal perspective is that a lot of that code was likely already > broken in hard to detect ways when dealing with mixed encodings - > releasing 3.x just made the associated errors significantly easier to > detect. I have to agree with this, and not just about encodings. I think much of the stdlib code dealing with all aspects of HTTP (urllib and the http package which now includes cgi) is kind of shaky. And it affects (infects) other parts of the stdlib, too; sockets are hacked to support the read-after-close paradigm that httplib uses, for instance. Which means that SSL and other socket-using code also has to support it, etc. Some of this was cleaned up in the move to 3.x, but more work needs to be done. Cudos to the folks working on httplib2 (http://code.google.com/p/httplib2/) and WSGI. There's a related meta-issue having to do with antique protocols. FTP, for instance, was designed when the Internet had only 19 nodes connected together with custom-built refrigerator-sized routers. A very early experiment in application protocols. It does a few odd things that we've since learned to be inefficient/unwise/unnecessary. Does it make sense that Python support every part of it? On the other hand, it was fairly static when the Python support was added (unlike HTTP, which was under very active development!) so that module is pretty robust. Bill From brett at python.org Thu Jun 17 21:24:54 2010 From: brett at python.org (Brett Cannon) Date: Thu, 17 Jun 2010 12:24:54 -0700 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100617114329.254db9ac@heresy> References: <6wwifklfk7n7tup216062010044853@SMTP> <20100617114329.254db9ac@heresy> Message-ID: On Thu, Jun 17, 2010 at 08:43, Barry Warsaw wrote: > On Jun 16, 2010, at 08:48 PM, lutz at rmi.net wrote: > >>Well, it looks like I've stumbled onto the "other shoe" on this >>issue--that the email package's problems are also apparently >>behind the fact that CGI binary file uploads don't work in 3.1 >>(http://bugs.python.org/issue4953). ?Yikes. >> >>I trust that people realize this is a show-stopper for broader >>Python 3.X adoption. > > We know it, we have extensively discussed how to fix it, we have IMO a good > design, and we even have someone willing and able to tackle the problem. ?We > need to find a sufficient source of funding to enable him to do the work it > will take, and so far that's been the biggest stumbling block. ?It will take a > focused and determined effort to see this through, and it's obvious that > volunteers cannot make it happen. ?I include myself in the latter category, as > I've tried and failed at least twice to do it in my spare time. And in general I think this is the reason some modules have not transitioned as well as others: there are only so many of us. The stdlib passes its test suite, but obviously some unit tests do not cover enough of the code in the ways people need it covered. As for using Python 3 for my code, I do and have since Python 3 became more-or-less usable. I just happen to not work with internet-related stuff in my day-to-day work. Plus we have needed to maintain FOUR branches for a while. That is a nasty time sink when you are having to port bug fixes and such. It also means that python-dev has been focused on making sure Python 2.7 is a solid release instead of getting to focus on the stdlib in Python 3. This a nasty chicken-and-egg issue; we could ignore Python 2 and focus on Python 3, but then the community would complain about us not supporting the transition from 2 to 3 better, but obviously focusing on 2 has led to 3 not getting enough TLC. Once Python 2.7 is done and out the door the entire situation for Python 3 should start to improve as python-dev as whole will have a chance to begin to focus solely on Python 3. From g.rodola at gmail.com Fri Jun 18 00:40:16 2010 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Fri, 18 Jun 2010 00:40:16 +0200 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <58318.1276798282@parc.com> References: <6wwifklfk7n7tup216062010044853@SMTP> <58318.1276798282@parc.com> Message-ID: 2010/6/17 Bill Janssen : > There's a related meta-issue having to do with antique protocols. Can I know what meta-issue are you talking about exactly? > FTP, for instance, was designed when the Internet had only 19 nodes connected > together with custom-built refrigerator-sized routers. ?A very early > experiment in application protocols. ?It does a few odd things that > we've since learned to be inefficient/unwise/unnecessary. ?Does it make > sense that Python support every part of it? Being FTP protocol still quite widespread I'd say it makes a lot of sense. That aside, what parts of urllib/http* are penalized because of FTP support? --- Giampaolo http://code.google.com/p/pyftpdlib http://code.google.com/p/psutil From steve at holdenweb.com Fri Jun 18 04:32:51 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 18 Jun 2010 11:32:51 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100617114329.254db9ac@heresy> References: <6wwifklfk7n7tup216062010044853@SMTP> <20100617114329.254db9ac@heresy> Message-ID: <4C1ADAD3.9070808@holdenweb.com> Barry Warsaw wrote: > On Jun 16, 2010, at 08:48 PM, lutz at rmi.net wrote: > >> Well, it looks like I've stumbled onto the "other shoe" on this >> issue--that the email package's problems are also apparently >> behind the fact that CGI binary file uploads don't work in 3.1 >> (http://bugs.python.org/issue4953). Yikes. >> >> I trust that people realize this is a show-stopper for broader >> Python 3.X adoption. > > We know it, we have extensively discussed how to fix it, we have IMO a good > design, and we even have someone willing and able to tackle the problem. We > need to find a sufficient source of funding to enable him to do the work it > will take, and so far that's been the biggest stumbling block. It will take a > focused and determined effort to see this through, and it's obvious that > volunteers cannot make it happen. I include myself in the latter category, as > I've tried and failed at least twice to do it in my spare time. > > -Barry > Lest the readership think that the PSF is unaware of this issue, allow me to point out that we have already partially funded this effort, and are still offering R. David Murray some further matching funds if he can raise sponsorship to complete the effort (on which he has made a very promising start). We are also attempting to enable tax-deductible fund raising to increase the likelihood of David's finding support. Perhaps we need to think about a broader campaign to increase the quality of the python 3 libraries. I find it very annoying that the #python IRC group still has "Don't use Python 3" in it's topic. They adamantly refuse to remove it until there is better library support, and they are the guys who see the issues day in day out so it is hard to argue with them (and I don't think an autocratic decision-making process would be appropriate). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Fri Jun 18 04:32:51 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 18 Jun 2010 11:32:51 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100617114329.254db9ac@heresy> References: <6wwifklfk7n7tup216062010044853@SMTP> <20100617114329.254db9ac@heresy> Message-ID: <4C1ADAD3.9070808@holdenweb.com> Barry Warsaw wrote: > On Jun 16, 2010, at 08:48 PM, lutz at rmi.net wrote: > >> Well, it looks like I've stumbled onto the "other shoe" on this >> issue--that the email package's problems are also apparently >> behind the fact that CGI binary file uploads don't work in 3.1 >> (http://bugs.python.org/issue4953). Yikes. >> >> I trust that people realize this is a show-stopper for broader >> Python 3.X adoption. > > We know it, we have extensively discussed how to fix it, we have IMO a good > design, and we even have someone willing and able to tackle the problem. We > need to find a sufficient source of funding to enable him to do the work it > will take, and so far that's been the biggest stumbling block. It will take a > focused and determined effort to see this through, and it's obvious that > volunteers cannot make it happen. I include myself in the latter category, as > I've tried and failed at least twice to do it in my spare time. > > -Barry > Lest the readership think that the PSF is unaware of this issue, allow me to point out that we have already partially funded this effort, and are still offering R. David Murray some further matching funds if he can raise sponsorship to complete the effort (on which he has made a very promising start). We are also attempting to enable tax-deductible fund raising to increase the likelihood of David's finding support. Perhaps we need to think about a broader campaign to increase the quality of the python 3 libraries. I find it very annoying that the #python IRC group still has "Don't use Python 3" in it's topic. They adamantly refuse to remove it until there is better library support, and they are the guys who see the issues day in day out so it is hard to argue with them (and I don't think an autocratic decision-making process would be appropriate). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From arcriley at gmail.com Fri Jun 18 05:16:47 2010 From: arcriley at gmail.com (Arc Riley) Date: Thu, 17 Jun 2010 23:16:47 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <4C1ADAD3.9070808@holdenweb.com> References: <6wwifklfk7n7tup216062010044853@SMTP> <20100617114329.254db9ac@heresy> <4C1ADAD3.9070808@holdenweb.com> Message-ID: David and his Google Summer of Code student, Shashwat Anand. You can read Shashwat's weekly progress updates at http://l0nwlf.in/ or subscribe to http://twitter.com/l0nwlf for more micro updates. We have more than 30 paid students working on Python 3 tasks this year, most of them participating under the PSF umbrella but also a few with 3rd party projects such as Mercurial porting those various packages to Py3. Given all this "on the horizon" work, I think the Py3 package situation will look a lot brighter by Python 3.2's release. On Thu, Jun 17, 2010 at 10:32 PM, Steve Holden wrote: > > Lest the readership think that the PSF is unaware of this issue, allow > me to point out that we have already partially funded this effort, and > are still offering R. David Murray some further matching funds if he can > raise sponsorship to complete the effort (on which he has made a very > promising start). > > We are also attempting to enable tax-deductible fund raising to increase > the likelihood of David's finding support. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Fri Jun 18 07:52:17 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 18 Jun 2010 14:52:17 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <6wwifklfk7n7tup216062010044853@SMTP> References: <6wwifklfk7n7tup216062010044853@SMTP> Message-ID: <87d3volwfi.fsf@uwakimon.sk.tsukuba.ac.jp> lutz at rmi.net writes: > FWIW, after rewriting Programming Python for 3.1, 3.x still feels > a lot like a beta to me, almost 2 years after its release. Email, of course, is a big wart. But guess what? Python 2's email module doesn't actually work! Sure, the program runs most of the time, but every program that depends on email must acquire inches of armorplate against all the things that can go wrong. You simply can't rely on it to DTRT except in a pre-MIME, pre-HTML, ASCII-only world. Although they're often addressing general problems, these hacks are *not* integrated back into the email module in most cases, but remain app-specific voodoo. If you live in Kansas, sure, you can concentrate on dodging tornados and completely forget about Unicode and MIME and text/bogus content. For the rest of the world, though, the problem is not Python 3. It's STD 11 (which still points at RFC 822, dated 1982!) It's really inappropriate to point at the email module, whose developers are trying *not* to punt on conformance and robustness, when even the IETF can only "run in circles, scream and shout"! Maybe there are other problems with Python 3 that deserve to be pointed at, but given the general scarcity of resources I think the email module developers are working on the right things. Unlike many other modules, email really needs to be rewritten from the ground (Python 3) up, because of the centrality of bytes/unicode confusion to all email problems. Python 3 completely changes the assumptions there; a Python 2-style email module really can't work properly. Then on top of that, today we know a lot more about handling issues like text/html content and MIME in general than when the Python 2 email module was designed. New problems have arisen over the period of Python 3 development, like "domain keys", which email doesn't handle out of the box AFAIK, but email for Python 3 should IMHO. Should Python 3 have been held back until email was fixed? Dunno, but I personally am very glad it was not; where I have a choice, I always use Python 3 now, and have yet to run into a problem. I expect that to change if I can find the time to get involved in email and Mailman 3 development, of course. From stephen at thorne.id.au Fri Jun 18 07:07:12 2010 From: stephen at thorne.id.au (Stephen Thorne) Date: Fri, 18 Jun 2010 15:07:12 +1000 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) Message-ID: <20100618050712.GC20639@thorne.id.au> Steve Holden Wrote: > We are also attempting to enable tax-deductible fund raising to increase > the likelihood of David's finding support. Perhaps we need to think > about a broader campaign to increase the quality of the python 3 > libraries. I find it very annoying that the #python IRC group still has > "Don't use Python 3" in it's topic. They adamantly refuse to remove it > until there is better library support, and they are the guys who see the > issues day in day out so it is hard to argue with them (and I don't > think an autocratic decision-making process would be appropriate). Yes, #python keeps the text "It's too early to use Python 3.x" in its topic. Library support is the only reason. -- Regards, Stephen Thorne Development Engineer From techtonik at gmail.com Fri Jun 18 14:44:15 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 18 Jun 2010 15:44:15 +0300 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <20100618050712.GC20639@thorne.id.au> References: <20100618050712.GC20639@thorne.id.au> Message-ID: On Fri, Jun 18, 2010 at 8:07 AM, Stephen Thorne wrote: >> We are also attempting to enable tax-deductible fund raising to increase >> the likelihood of David's finding support. Perhaps we need to think >> about a broader campaign to increase the quality of the python 3 >> libraries. I find it very annoying that the #python IRC group still has >> "Don't use Python 3" in it's topic. ?They adamantly refuse to remove it >> until there is better library support, and they are the guys who see the >> issues day in day out so it is hard to argue with them (and I don't >> think an autocratic decision-making process would be appropriate). > > Yes, #python keeps the text "It's too early to use Python 3.x" in its topic. > Library support is the only reason. I do not know what are you intending to do, but my opinion that fund raising for patching library is a waste of money. PSF should concentrate on enhancing tools to make lives of library supporters easier. I do not want to become a maintainer, and I believe there was a lot of spam about this topic from me. The latest thread was in http://bugs.python.org/issue9008 in short: `pydotorg` tools - theres is no: 1. separate commit notifications for the module with ability to reply to dedicated group for review 2. separate bug tracker category for my module with giving users ability to change every property of it 3. bug tracker timeline for the module that includes ticket changes, wiki edits, commits and everything else. Filtered. 4. roadmap page with actual status, plans and coverage 5. dashboard page with links to all the above `python development tools`: 1. no way to get all related code for the module 1.1. source code location (repository, branches) 1.2. source code components (source file, tests, documentation) 2. no code coverage (test/user story/rfc/pep) 3. no convenient way to run module-related tests http://bugs.python.org/issue9027 4. no code review management process 5. no way to notify interested parties -- anatoly t. From techtonik at gmail.com Fri Jun 18 15:08:49 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 18 Jun 2010 16:08:49 +0300 Subject: [Python-Dev] cmdline arguments in test_support.run_unittest Message-ID: I thought that some arguments to test_support.run_unittest would be useful. Would like to hear your feedback before making anything. http://bugs.python.org/issue9028 -- anatoly t. From jnoller at gmail.com Fri Jun 18 15:19:37 2010 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 18 Jun 2010 09:19:37 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: On Fri, Jun 18, 2010 at 8:44 AM, anatoly techtonik wrote: > On Fri, Jun 18, 2010 at 8:07 AM, Stephen Thorne wrote: >>> We are also attempting to enable tax-deductible fund raising to increase >>> the likelihood of David's finding support. Perhaps we need to think >>> about a broader campaign to increase the quality of the python 3 >>> libraries. I find it very annoying that the #python IRC group still has >>> "Don't use Python 3" in it's topic. ?They adamantly refuse to remove it >>> until there is better library support, and they are the guys who see the >>> issues day in day out so it is hard to argue with them (and I don't >>> think an autocratic decision-making process would be appropriate). >> >> Yes, #python keeps the text "It's too early to use Python 3.x" in its topic. >> Library support is the only reason. > > I do not know what are you intending to do, but my opinion that fund > raising for patching library is a waste of money. PSF should > concentrate on enhancing tools to make lives of library supporters > easier. I do not want to become a maintainer, and I believe there was > a lot of spam about this topic from me. The latest thread was in > http://bugs.python.org/issue9008 in short: Awesome. I plan on wasting as much money on the useless effort of moving python 3 forward as humanly possible. From barry at python.org Fri Jun 18 15:45:57 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 18 Jun 2010 09:45:57 -0400 Subject: [Python-Dev] [Email-SIG] email package status in 3.X In-Reply-To: <4C1ADAD3.9070808@holdenweb.com> References: <6wwifklfk7n7tup216062010044853@SMTP> <20100617114329.254db9ac@heresy> <4C1ADAD3.9070808@holdenweb.com> Message-ID: <20100618094557.77a07994@heresy> On Jun 18, 2010, at 11:32 AM, Steve Holden wrote: >Lest the readership think that the PSF is unaware of this issue, allow >me to point out that we have already partially funded this effort, and >are still offering R. David Murray some further matching funds if he can >raise sponsorship to complete the effort (on which he has made a very >promising start). Right, sorry, I didn't mean to imply the PSF isn't doing anything. More that we need a coordinated effort among all the companies and organizations that use Python to help fund Python 3 library development (and not just in the stdlib). I think the PSF is best suited to coordinating and managing those efforts, and through its tax-exempt status, collecting and distributing donations specifically targeted to Python 3 work. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From steve at pearwood.info Fri Jun 18 16:09:45 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 19 Jun 2010 00:09:45 +1000 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: <201006190009.46122.steve@pearwood.info> On Fri, 18 Jun 2010 11:19:37 pm Jesse Noller wrote: > Awesome. I plan on wasting as much money on the useless effort of > moving python 3 forward as humanly possible. I'm sorry, but if that's sarcasm, it's far too subtle for me :( -- Steven D'Aprano From jnoller at gmail.com Fri Jun 18 16:24:29 2010 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 18 Jun 2010 10:24:29 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <201006190009.46122.steve@pearwood.info> References: <20100618050712.GC20639@thorne.id.au> <201006190009.46122.steve@pearwood.info> Message-ID: On Fri, Jun 18, 2010 at 10:09 AM, Steven D'Aprano wrote: > On Fri, 18 Jun 2010 11:19:37 pm Jesse Noller wrote: > >> Awesome. I plan on wasting as much money on the useless effort of >> moving python 3 forward as humanly possible. > > I'm sorry, but if that's sarcasm, it's far too subtle for me :( > Yes, it is. See: http://jessenoller.com/2010/05/20/announcing-python-sprint-sponsorship/ This, in my mind is but a start. Along with RDM's sponsorship for the email module, the PSF and the community as a whole should be spending time and money (if they can) to port and help push Python 3 along. Therefore, I was responding directly to Anatoly's: "I do not know what are you intending to do, but my opinion that fund raising for patching library is a waste of money" To which my response stands: I intend on, based on his opinion, on wasting as much money as I can. jesse From brian.curtin at gmail.com Fri Jun 18 17:04:31 2010 From: brian.curtin at gmail.com (Brian Curtin) Date: Fri, 18 Jun 2010 10:04:31 -0500 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: On Fri, Jun 18, 2010 at 07:44, anatoly techtonik wrote: > On Fri, Jun 18, 2010 at 8:07 AM, Stephen Thorne > wrote: > >> We are also attempting to enable tax-deductible fund raising to increase > >> the likelihood of David's finding support. Perhaps we need to think > >> about a broader campaign to increase the quality of the python 3 > >> libraries. I find it very annoying that the #python IRC group still has > >> "Don't use Python 3" in it's topic. They adamantly refuse to remove it > >> until there is better library support, and they are the guys who see the > >> issues day in day out so it is hard to argue with them (and I don't > >> think an autocratic decision-making process would be appropriate). > > > > Yes, #python keeps the text "It's too early to use Python 3.x" in its > topic. > > Library support is the only reason. > > I do not know what are you intending to do, but my opinion that fund > raising for patching library is a waste of money. PSF should > concentrate on enhancing tools to make lives of library supporters > easier. I do not want to become a maintainer, and I believe there was > a lot of spam about this topic from me. The latest thread was in > http://bugs.python.org/issue9008 in short: > > `pydotorg` tools - theres is no: > 1. separate commit notifications for the module with ability to reply > to dedicated group for review If you know how to set this up, feel free to implement it. > 2. separate bug tracker category for my module with giving users > ability to change every property of it > The Python bug tracker isn't the place for "my module". The second part of this sentence has been brought up and I think it's a good point. For example, those who lack developer privileges can't assign issues to themselves. I think Twisted's tracker does well in this area, as the fields are inclusive rather than exclusive. Assignment is open to anyone willing to work on it, and the field is used to prod the next responsible person into acting (I think, correct me if I'm wrong). > 3. bug tracker timeline for the module that includes ticket changes, > wiki edits, commits and everything else. Filtered. That seems like information overload. It might be cool to see all of that, but I'm not sure what the gain is. Some modules get worked on in spurts and sometimes modules don't see action for months. It doesn't actually mean anything, though. > 4. roadmap page with actual status, plans and coverage > 5. dashboard page with links to all the above > If you know how to do this, you are more than welcome to whip up some code and show how it would help. `python development tools`: > 1. no way to get all related code for the module > 1.1. source code location (repository, branches) > 1.2. source code components (source file, tests, documentation) > What exactly do you mean? Since you have submitted several issues, some with patches, I have a hard time believing that you've done all of that work without knowing where any of that information was. > 2. no code coverage (test/user story/rfc/pep) > If you know of a way to incorporate code coverage tools and metrics into the current process, I believe a number of people would be interested. There currently exists some coverage tool that runs on the current repository, but I'm not sure of its location or status. > 4. no code review management process > I agree, this is an area that could use work. It has been suggested that Rietveld be incorporated into Roundup both visually ("upload to Rietveld" button) and as a part of the workflow (possible requirement before commit). As with many of these comments, lack of time and a lack of available volunteers are two of many answers as to why there is no traction on this. > 5. no way to notify interested parties > I'm not sure what this is specifically addressing. anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From lutz at rmi.net Fri Jun 18 17:09:40 2010 From: lutz at rmi.net (lutz at rmi.net) Date: Fri, 18 Jun 2010 15:09:40 -0000 Subject: [Python-Dev] email package status in 3.X Message-ID: Replying en masse to save bandwidth here... Barry Warsaw writes: > We know it, we have extensively discussed how to fix it, we have IMO a good > design, and we even have someone willing and able to tackle the problem. We > need to find a sufficient source of funding to enable him to do the work it > will take, and so far that's been the biggest stumbling block. It will take a > focused and determined effort to see this through, and it's obvious that > volunteers cannot make it happen. I include myself in the latter category, as > I've tried and failed at least twice to do it in my spare time. All understood, and again, not to disparage anyone here. My comments are directed to the development community at large to underscore the grave p/r problems 3.X faces. I realize email parsing is a known issue; I also realize that most people evaluating 3.X today won't care that it is. Most will care only that the new version of a language reportedly used by Google and YouTube still doesn't support CGI uploads a year and a half after its release. As an author, that's a downright horrible story to have to tell the world. "Stephen J. Turnbull" writes: > Email, of course, is a big wart. But guess what? Python 2's email > module doesn't actually work! Yes it does (see next point). > If you live in Kansas, sure, you can concentrate on dodging tornados > and completely forget about Unicode and MIME and text/bogus content. > For the rest of the world, though, the problem is not Python 3 Yes it is, and Kansas is a lot bigger than you seem to think. I want to reiterate that I was able to build a feature rich email client with the email package as it exists in 3.1. This includes support on both the receiving and sending sides for HTML, arbitrary attachments, and decoding and encoding of both text payloads and headers according to email, MIME, and Unicode/I18N standards. It's an amazingly useful package, and does work as is in 3.X. The two main issues I found have been recently fixed. It's unfortunate that this package is also the culprit behind CGI breakage, but it's not clear why it became a critical path for so much utility in the first place. The package might not be aesthetically ideal, but to me it seems that an utterly incompatible overhaul of this in the name of supporting potentially very different data streams is a huge functional overload. And to those people in Kansas who live outside the pydev clique, replacing it with something different at this point will look as if an incompatible Python is already incompatible with releases in its own line. Why in the world would anyone base a new project on that sort of thrashing? For my part, I've had to add far too many notes to the upcoming edition of Programming Python about major pieces of functionality that worked in 2.X but no longer do in 3.X. That's disappointing to me personally, but it will probably seem a lot worse to the book's tens of thousands of readers. Yet this is the reality that 3.X has created for itself. > Should Python 3 have been held back until email was fixed? Dunno, but > I personally am very glad it was not; where I have a choice, I always > use Python 3 now, and have yet to run into a problem. I guess we'll just have to disagree on that. IMHO, Python 3 shot itself in the foot by releasing in half-baked form. And the 3.0 I/O speed issue (remember that?) came very close to blowing its leg clean off. The reality out there in Kansas today is that 3.X is perceived as so bad that it could very well go the way of POP4 if its story does not improve. I don't know what sort of Python world will be left behind in the wake, but I do know it will probably be much smaller. Steve Holden writes: > Lest the readership think that the PSF is unaware of this issue, allow > me to point out that we have already partially funded this effort, and > are still offering R. David Murray some further matching funds if he can > raise sponsorship to complete the effort (on which he has made a very > promising start). > > We are also attempting to enable tax-deductible fund raising to increase > the likelihood of David's finding support. Perhaps we need to think > about a broader campaign to increase the quality of the python 3 > libraries. I find it very annoying that the #python IRC group still has > "Don't use Python 3" in it's topic. They adamantly refuse to remove it > until there is better library support, and they are the guys who see the > issues day in day out so it is hard to argue with them (and I don't > think an autocratic decision-making process would be appropriate). I'm all for people getting paid for work they do, but with all due respect, I think this underscores part of the problem in the Python world today. If funding had been as stringent a prerequisite in the 90s, I doubt there would be a Python today. It was about the fun and the code, not the bucks and the bureaucracy. As far as I can recall, there was no notion of creating a task force to get things done. Of course, this may just be the natural evolutionary pattern of human enterprises. As it is today, though, the Python community has a formal diversity statement, but it still does not have a fully functional 3.X almost two years after the fact. I doubt that I'm the only one who sees the irony in that. Again, I mean no disrespect to people contributing to Python today on so many fronts, and I have no answers to offer here. For better or worse, though, this is a personal issue to me too. After spending much of the last 2 years updating the best selling Python books for all the changes this group has seen fit to make, I believe I can say with some authority that 3.X still faces a very uncertain future. --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) From fuzzyman at voidspace.org.uk Fri Jun 18 17:31:09 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 18 Jun 2010 16:31:09 +0100 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <4C1B913D.60401@voidspace.org.uk> On 18/06/2010 16:09, lutz at rmi.net wrote: > Replying en masse to save bandwidth here... > > Barry Warsaw writes: > >> We know it, we have extensively discussed how to fix it, we have IMO a good >> design, and we even have someone willing and able to tackle the problem. We >> need to find a sufficient source of funding to enable him to do the work it >> will take, and so far that's been the biggest stumbling block. It will take a >> focused and determined effort to see this through, and it's obvious that >> volunteers cannot make it happen. I include myself in the latter category, as >> I've tried and failed at least twice to do it in my spare time. >> > All understood, and again, not to disparage anyone here. My > comments are directed to the development community at large > to underscore the grave p/r problems 3.X faces. > > I realize email parsing is a known issue; I also realize that > most people evaluating 3.X today won't care that it is. Most > will care only that the new version of a language reportedly > used by Google and YouTube still doesn't support CGI uploads > a year and a half after its release. As an author, that's a > downright horrible story to have to tell the world. > > Really? How widely used is the CGI module these days? Maybe there is a reason nobody appeared to notice... > [snip...] >> Should Python 3 have been held back until email was fixed? Dunno, but >> I personally am very glad it was not; where I have a choice, I always >> use Python 3 now, and have yet to run into a problem. >> > I guess we'll just have to disagree on that. IMHO, Python 3 shot > itself in the foot by releasing in half-baked form. And the 3.0 > I/O speed issue (remember that?) came very close to blowing its > leg clean off. > > Whilst I agree that there are plenty of issues to workon, and I don't underestimate the difficulty of some of them, I think "half-baked" is very much overblown. Whilst you have a lot to say about how much of a problem this is I don't understand what you are suggesting be *done*? Python 3.0 was *declared* to be an experimental release, and by most standards 3.1 (in terms of the core language and functionality) was a solid release. Any reasonable expectation about Python 3 adoption predicted that it would take years, and would include going through a phase of difficulty and disappointment... All the best, Michael Foord -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From status at bugs.python.org Fri Jun 18 18:09:47 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 18 Jun 2010 18:09:47 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20100618160947.8D29D7816D@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-06-11 - 2010-06-18) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2777 open (+43) / 18070 closed (+12) / 20847 total (+55) Open issues with patches: 1122 Average duration of open issues: 713 days. Median duration of open issues: 503 days. Open Issues Breakdown open 2747 (+43) languishing 13 ( +0) pending 16 ( +0) Issues Created Or Reopened (64) _______________________________ New SSL module doesn't seem to verify hostname against commonN 2010-06-15 http://bugs.python.org/issue1589 reopened pitrou struct allows repeat spec. without a format specifier 2010-06-12 CLOSED http://bugs.python.org/issue3129 reopened belopolsky patch datetime lacks concrete tzinfo implementation for UTC 2010-06-15 CLOSED http://bugs.python.org/issue5094 reopened belopolsky patch datetime.strptime doesn't support %z format ? 2010-06-18 http://bugs.python.org/issue6641 reopened belopolsky patch libffi update to 3.0.9 2010-06-12 http://bugs.python.org/issue8142 reopened haypo patch, buildbot struct - please make sizes explicit 2010-06-15 CLOSED http://bugs.python.org/issue8469 reopened mark.dickinson patch test_distutils fails if srcdir != builddir 2010-06-15 http://bugs.python.org/issue8577 reopened pitrou patch Expose sqlite3 connection inTransaction as read-only in_transa 2010-06-12 http://bugs.python.org/issue8845 reopened haypo patch, easy Tkinter Litmus Test 2010-06-11 CLOSED http://bugs.python.org/issue8971 reopened merwok svnmerge errors in msgfmt.py 2010-06-11 http://bugs.python.org/issue8974 created merwok patch Bug in cookiejar 2010-06-11 http://bugs.python.org/issue8975 created Popa.Claudiu subprocess module causes segmentation fault 2010-06-11 http://bugs.python.org/issue8976 created Chris.Blazick Globalize lonely augmented assignment 2010-06-11 CLOSED http://bugs.python.org/issue8977 created serprex patch "tarfile.ReadError: file could not be opened successfully" if 2010-06-11 http://bugs.python.org/issue8978 created flox OptParse __getitem__ 2010-06-12 CLOSED http://bugs.python.org/issue8979 created bcward distutils.tests.test_register.RegisterTestCase.test_strict fai 2010-06-12 http://bugs.python.org/issue8980 created Arfrever patch _struct.__version__ should be string, not bytes 2010-06-12 CLOSED http://bugs.python.org/issue8981 created belopolsky easy argparse docs cross reference Namespace as a class but the Nam 2010-06-12 http://bugs.python.org/issue8982 created r.david.murray Docstrings should refer to help(name), not name.__doc__ 2010-06-12 http://bugs.python.org/issue8983 created belopolsky patch Python 3 doesn't register script arguments 2010-06-12 CLOSED http://bugs.python.org/issue8984 created Sworddragon String format() has problems parsing numeric indexes 2010-06-12 CLOSED http://bugs.python.org/issue8985 created gosella math.erfc OverflowError 2010-06-12 CLOSED http://bugs.python.org/issue8986 created debatem1 patch Distutils doesn't quote Windows command lines properly 2010-06-13 http://bugs.python.org/issue8987 created mgiuca patch import + coding = failure (3.1.2/win32) 2010-06-13 http://bugs.python.org/issue8988 created gonegown email.utils.make_msgid: specify domain 2010-06-13 http://bugs.python.org/issue8989 created avbidder at fortytwo.ch patch array constructor and array.fromstring should accept bytearray 2010-06-13 http://bugs.python.org/issue8990 created tjollans patch PyArg_Parse*() functions: reject discontinious buffers 2010-06-13 http://bugs.python.org/issue8991 created haypo patch convertsimple() doesn't need to call converterr() if an except 2010-06-13 http://bugs.python.org/issue8992 created haypo patch Small typo in docs for PySys_SetArgv 2010-06-14 CLOSED http://bugs.python.org/issue8993 created flashk patch pydoc does not support non-ascii docstrings 2010-06-14 http://bugs.python.org/issue8994 created torsten Performance issue with multiprocessing queue (3.1 VS 2.6) 2010-06-14 http://bugs.python.org/issue8995 created bob Add a default role to allow writing bare `len` instead of :fun 2010-06-14 http://bugs.python.org/issue8996 created merwok Write documentation for codecs.readbuffer_encode() 2010-06-14 http://bugs.python.org/issue8997 created lemburg add crypto routines to stdlib 2010-06-14 http://bugs.python.org/issue8998 created debatem1 Add Mercurial support to patchcheck 2010-06-15 http://bugs.python.org/issue8999 created merwok patch Provide parseable repr to datetime.timezone 2010-06-15 http://bugs.python.org/issue9000 created belopolsky easy PyFile_FromFd wrong documentation 2010-06-15 CLOSED http://bugs.python.org/issue9001 created trovao patch Add a pointer on where to find a better description of PyFile_ 2010-06-15 CLOSED http://bugs.python.org/issue9002 created trovao patch urllib about https behavior 2010-06-16 http://bugs.python.org/issue9003 created debatem1 datetime.utctimetuple() should not set tm_isdst flag to 0 2010-06-16 http://bugs.python.org/issue9004 created belopolsky Year range in timetuple 2010-06-16 http://bugs.python.org/issue9005 created belopolsky xml-rpc Server object does not propagate the encoding to Unmar 2010-06-16 http://bugs.python.org/issue9006 created Timoth??e.CEZARD CGIHTTPServer supports only Python CGI scripts 2010-06-16 http://bugs.python.org/issue9007 created techtonik CGIHTTPServer support for arbitrary CGI scripts 2010-06-16 http://bugs.python.org/issue9008 created techtonik Improve quality of Python/dtoa.c 2010-06-16 http://bugs.python.org/issue9009 created mark.dickinson patch Infinite loop in imaplib.IMAP4_SSL when used with Gmail 2010-06-16 CLOSED http://bugs.python.org/issue9010 created Ruben.Bakker ast_for_factor unary minus optimization changes AST 2010-06-16 http://bugs.python.org/issue9011 created alexhsamuel patch Separate compilation of time and datetime modules 2010-06-16 CLOSED http://bugs.python.org/issue9012 reopened haypo patch Implement tzinfo.dst() method in timezone 2010-06-16 http://bugs.python.org/issue9013 created belopolsky Incorrect documentation of the PyObject_HEAD macro 2010-06-16 http://bugs.python.org/issue9014 created trovao array.array.tofile cannot write arrays of sizes > 4GB, even co 2010-06-16 http://bugs.python.org/issue9015 created Bill.Steinmetz IDLE won't launch (Win XP) 2010-06-17 http://bugs.python.org/issue9016 created jonseger doctest option flag to enable/disable some chunk of doctests? 2010-06-17 http://bugs.python.org/issue9017 created harobed os.path.normcase(None) does not raise an error on linux and sh 2010-06-17 http://bugs.python.org/issue9018 created r.david.murray easy wsgiref.headers.Header() does not update headers list it was c 2010-06-17 http://bugs.python.org/issue9019 created Marcel.Hellkamp 2.7: eval hangs on AIX 2010-06-17 http://bugs.python.org/issue9020 created srid no copy.copy problem description 2010-06-17 http://bugs.python.org/issue9021 created techtonik TypeError in wsgiref.handlers when using CGIHandler 2010-06-18 http://bugs.python.org/issue9022 created toxicdav3 distutils relative path errors 2010-06-18 http://bugs.python.org/issue9023 created ghazel PyDateTime_IMPORT macro incorrectly marked up 2010-06-18 http://bugs.python.org/issue9024 created tim.golden patch Non-uniformity in randrange for large arguments. 2010-06-18 http://bugs.python.org/issue9025 created mark.dickinson patch [argparse] Subcommands not printed in the same order they were 2010-06-18 http://bugs.python.org/issue9026 created jcollado patch add test_support.run_unittest command line options and argumen 2010-06-18 http://bugs.python.org/issue9027 created techtonik test_support.run_unittest cmdline options and arguments 2010-06-18 CLOSED http://bugs.python.org/issue9028 created techtonik Issues Now Closed (53) ______________________ New style vs. old style classes __ror__() operator overloadin 854 days http://bugs.python.org/issue2102 tjreedy Backport 3.0 struct module changes to 2.6 818 days http://bugs.python.org/issue2397 mark.dickinson confusing action of struct.pack and struct.unpack with fmt 'p' 748 days http://bugs.python.org/issue2981 mark.dickinson struct allows repeat spec. without a format specifier 3 days http://bugs.python.org/issue3129 belopolsky patch Python doesn't handle SIGINT well if it arrives during interpr 726 days http://bugs.python.org/issue3137 haypo patch [patch] allow mmap take file offset as argument 651 days http://bugs.python.org/issue3765 tjreedy warning: unknown conversion type character `z' in format 574 days http://bugs.python.org/issue4370 mark.dickinson patch Incorrect docstring of os.setpgrp 566 days http://bugs.python.org/issue4452 orsenthil datetime lacks concrete tzinfo implementation for UTC 0 days http://bugs.python.org/issue5094 belopolsky patch os.makedirs' mode argument has bad default value 488 days http://bugs.python.org/issue5220 smyrman msgfmt.py does not work with plural form 459 days http://bugs.python.org/issue5464 merwok email feedparser.py CRLFLF bug: $ vs \Z 443 days http://bugs.python.org/issue5610 r.david.murray patch traceback presented in wrong encoding 331 days http://bugs.python.org/issue6543 haypo patch, needs review Document 2.x -> 3.x round changes in "What's New" documents. 225 days http://bugs.python.org/issue7261 mark.dickinson patch PyDateTime_IMPORT() causes compiler warnings 185 days http://bugs.python.org/issue7463 belopolsky logger.StreamHandler emit encoding fallback is wrong 184 days http://bugs.python.org/issue7470 merwok patch IGNORE_EXCEPTION_DETAIL should ignore the module name 181 days http://bugs.python.org/issue7490 ncoghlan patch IDLE about dialog credits raises UnicodeDecodeError 87 days http://bugs.python.org/issue8203 haypo patch getargs.c in Python3 contains some TODO and the documentation 82 days http://bugs.python.org/issue8215 haypo patch Add Misc/maintainers.rst to 2.x branch 68 days http://bugs.python.org/issue8362 techtonik patch, needs review Broken zipfile with python 3.2 on osx 58 days http://bugs.python.org/issue8442 ronaldoussoren struct - please make sizes explicit 2 days http://bugs.python.org/issue8469 mark.dickinson patch 'y' does not check for embedded NUL bytes 43 days http://bugs.python.org/issue8592 haypo patch undo findsource regression/change 33 days http://bugs.python.org/issue8720 r.david.murray patch tarfile/Windows: Don't use mbcs as the default encoding 21 days http://bugs.python.org/issue8784 haypo patch Remove codecs.readbuffer_encode() and codecs.charbuffer_encode 18 days http://bugs.python.org/issue8838 haypo Add module level now() and today() functions to datetime modul 10 days http://bugs.python.org/issue8903 techtonik quick example how to fix docs 7 days http://bugs.python.org/issue8904 georg.brandl Error in error message in logging 5 days http://bugs.python.org/issue8924 merwok Improve c-api/arg.rst: use "bytes" or "str" types instead of " 7 days http://bugs.python.org/issue8925 merwok patch SimpleHTTPServer should contain usage example 10 days http://bugs.python.org/issue8937 techtonik utf-32be codec failing on UCS-2 python build for 32-bit value 3 days http://bugs.python.org/issue8941 pitrou patch 2.7rc1 tarfile.py: `bltn_open(targetpath, "wb")` -> IOError: I 5 days http://bugs.python.org/issue8958 srid 2.6 README 2 days http://bugs.python.org/issue8960 georg.brandl test_imp fails on OSX when LANG is set 1 days http://bugs.python.org/issue8965 haypo patch Windows: use (mbcs in) strict mode to encode/decode filenames, 3 days http://bugs.python.org/issue8969 haypo patch Tkinter Litmus Test 0 days http://bugs.python.org/issue8971 merwok Inconsistent docstrings in struct module 1 days http://bugs.python.org/issue8973 belopolsky patch Globalize lonely augmented assignment 1 days http://bugs.python.org/issue8977 mark.dickinson patch OptParse __getitem__ 1 days http://bugs.python.org/issue8979 merwok _struct.__version__ should be string, not bytes 0 days http://bugs.python.org/issue8981 mark.dickinson easy Python 3 doesn't register script arguments 0 days http://bugs.python.org/issue8984 mark.dickinson String format() has problems parsing numeric indexes 1 days http://bugs.python.org/issue8985 eric.smith math.erfc OverflowError 0 days http://bugs.python.org/issue8986 mark.dickinson patch Small typo in docs for PySys_SetArgv 0 days http://bugs.python.org/issue8993 georg.brandl patch PyFile_FromFd wrong documentation 0 days http://bugs.python.org/issue9001 pitrou patch Add a pointer on where to find a better description of PyFile_ 0 days http://bugs.python.org/issue9002 pitrou patch Infinite loop in imaplib.IMAP4_SSL when used with Gmail 1 days http://bugs.python.org/issue9010 r.david.murray Separate compilation of time and datetime modules 0 days http://bugs.python.org/issue9012 haypo patch test_support.run_unittest cmdline options and arguments 0 days http://bugs.python.org/issue9028 r.david.murray Speed up function calls/can add more introspection info 1970 days http://bugs.python.org/issue1107887 collinwinter patch prompt_user_passwd() in FancyURLopener masks 401 Unauthorized 1663 days http://bugs.python.org/issue1368368 orsenthil patch readline problem on ia64-unknown-linux-gnu 1316 days http://bugs.python.org/issue1593035 tjreedy Top Issues Most Discussed (10) ______________________________ 30 add crypto routines to stdlib 4 days open http://bugs.python.org/issue8998 24 datetime lacks concrete tzinfo implementation for UTC 0 days closed http://bugs.python.org/issue5094 22 Add pure Python implementation of datetime module to CPython 116 days open http://bugs.python.org/issue7989 17 `make patchcheck` should check the whitespace of .c/.h files 13 days open http://bugs.python.org/issue8912 12 Python 3 doesn't register script arguments 0 days closed http://bugs.python.org/issue8984 12 Inconsistent docstrings in struct module 1 days closed http://bugs.python.org/issue8973 12 sys.argv contains only scriptname 123 days open http://bugs.python.org/issue7936 10 Improve quality of Python/dtoa.c 2 days open http://bugs.python.org/issue9009 10 CGIHTTPServer support for arbitrary CGI scripts 2 days open http://bugs.python.org/issue9008 9 subprocess.list2cmdline doesn't quote the & character 7 days pending http://bugs.python.org/issue8972 From walter at livinglogic.de Fri Jun 18 18:32:00 2010 From: walter at livinglogic.de (=?ISO-8859-1?Q?Walter_D=F6rwald?=) Date: Fri, 18 Jun 2010 18:32:00 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: <4C1B9F80.6080203@livinglogic.de> On 18.06.10 17:04, Brian Curtin wrote: > [...] > 2. no code coverage (test/user story/rfc/pep) > > > If you know of a way to incorporate code coverage tools and metrics into > the current process, I believe a number of people would be interested. > There currently exists some coverage tool that runs on the current > repository, but I'm not sure of its location or status. http://coverage.livinglogic.de/ I haven't touched the code in a year, but the job's still running. > [...] Servus, Walter From lutz at rmi.net Fri Jun 18 19:22:10 2010 From: lutz at rmi.net (lutz at rmi.net) Date: Fri, 18 Jun 2010 17:22:10 -0000 Subject: [Python-Dev] email package status in 3.X Message-ID: > Python 3.0 was *declared* to be an experimental release, and by most > standards 3.1 (in terms of the core language and functionality) was a > solid release. > > Any reasonable expectation about Python 3 adoption predicted that it > would take years, and would include going through a phase of difficulty > and disappointment... Declaring something to be a turd doesn't change the fact that it's a turd. I have a feeling that most people outside this list would have much rather avoided the difficulty and disappointment altogether. Let's be honest here; 3.X was released to the community in part as an extended beta. That's not a problem, unless you drop the word "beta". And if you're still not buying that, imagine the sort of response you'd get if you tried to sell software that billed itself as "experimental", and promised a phase of "disappointment". Why would you expect the Python world to react any differently? > Whilst I agree that there are plenty of issues to workon, and I don't > underestimate the difficulty of some of them, I think "half-baked" is > very much overblown. Whilst you have a lot to say about how much of a > problem this is I don't understand what you are suggesting be *done*? I agree that 3.X isn't all bad, and I very much hope it succeeds. And no, I have no answers; I'm just reporting the perception from downwind. So here it is: The prevailing view is that 3.X developers hoisted things on users that they did not fully work through themselves. Unicode is prime among these: for all the talk here about how 2.X was broken in this regard, the implications of the 3.X string solution remain to be fully resolved in the 3.X standard library to this day. What is a common Python user to make of that? --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) From fuzzyman at voidspace.org.uk Fri Jun 18 19:27:46 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 18 Jun 2010 18:27:46 +0100 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <4C1BAC92.70500@voidspace.org.uk> On 18/06/2010 18:22, lutz at rmi.net wrote: >> Python 3.0 was *declared* to be an experimental release, and by most >> standards 3.1 (in terms of the core language and functionality) was a >> solid release. >> >> Any reasonable expectation about Python 3 adoption predicted that it >> would take years, and would include going through a phase of difficulty >> and disappointment... >> > Declaring something to be a turd doesn't change the fact that > it's a turd. Right - but *you're* the one calling it a turd, which is not a helpful approach or likely to achieve *anything* useful. I still have no idea what you are actually suggesting. > I have a feeling that most people outside this > list would have much rather avoided the difficulty and > disappointment altogether. > > Let's be honest here; 3.X was released to the community in part > as an extended beta. Correction - 3.0 was an experimental release. That is not true of 3.1 and future releases. All the best, Michael > That's not a problem, unless you drop the > word "beta". And if you're still not buying that, imagine the sort > of response you'd get if you tried to sell software that billed > itself as "experimental", and promised a phase of "disappointment". > Why would you expect the Python world to react any differently? > > >> Whilst I agree that there are plenty of issues to workon, and I don't >> underestimate the difficulty of some of them, I think "half-baked" is >> very much overblown. Whilst you have a lot to say about how much of a >> problem this is I don't understand what you are suggesting be *done*? >> > I agree that 3.X isn't all bad, and I very much hope it succeeds. And > no, I have no answers; I'm just reporting the perception from downwind. > > So here it is: The prevailing view is that 3.X developers hoisted things > on users that they did not fully work through themselves. Unicode is > prime among these: for all the talk here about how 2.X was broken in > this regard, the implications of the 3.X string solution remain to be > fully resolved in the 3.X standard library to this day. What is a > common Python user to make of that? > > --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > > > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From janssen at parc.com Fri Jun 18 19:46:22 2010 From: janssen at parc.com (Bill Janssen) Date: Fri, 18 Jun 2010 10:46:22 PDT Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <6wwifklfk7n7tup216062010044853@SMTP> <58318.1276798282@parc.com> Message-ID: <60565.1276883182@parc.com> Giampaolo Rodol? wrote: > 2010/6/17 Bill Janssen : > > > There's a related meta-issue having to do with antique protocols. > > Can I know what meta-issue are you talking about exactly? Giampaolo, I believe that you and I have already discussed this on one of the FTP issues. Bill From g.rodola at gmail.com Fri Jun 18 20:23:17 2010 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Fri, 18 Jun 2010 20:23:17 +0200 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <60565.1276883182@parc.com> References: <6wwifklfk7n7tup216062010044853@SMTP> <58318.1276798282@parc.com> <60565.1276883182@parc.com> Message-ID: 2010/6/18 Bill Janssen : > Giampaolo Rodol? wrote: > >> 2010/6/17 Bill Janssen : >> >> > There's a related meta-issue having to do with antique protocols. >> >> Can I know what meta-issue are you talking about exactly? > > Giampaolo, I believe that you and I have already discussed this on one > of the FTP issues. > > Bill I only remember a discussion in which I was against removing OOB data support from asyncore in order to support certain parts of the FTP protocol using it, but that's all. I don't see how urlib or any other stdlib module is supposed to be penalized by FTP protocol in any way. --- Giampaolo http://code.google.com/p/pyftpdlib http://code.google.com/p/psutil From lutz at rmi.net Fri Jun 18 20:52:45 2010 From: lutz at rmi.net (lutz at rmi.net) Date: Fri, 18 Jun 2010 18:52:45 -0000 Subject: [Python-Dev] email package status in 3.X Message-ID: I wasn't calling Python 3 a turd. I was trying to show the strangeness of the logic behind your rationalization. And failing badly... (maybe I should have used "tar ball"?) What I'm suggesting is that extreme caution be exercised from this point forward with all things 3.X-related. Whether you wish to accept this or not, 3.X has a negative image to many. This suggestion specifically includes not abandoning current 3.X email package users as a case in point. Ripping the rug out from new 3.X users after they took the time to port seems like it may be just enough to tip the scales altogether. --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > -----Original Message----- > From: Michael Foord > To: lutz at rmi.net > Subject: Re: [Python-Dev] email package status in 3.X > Date: Fri, 18 Jun 2010 18:27:46 +0100 > > On 18/06/2010 18:22, lutz at rmi.net wrote: > >> Python 3.0 was *declared* to be an experimental release, and by most > >> standards 3.1 (in terms of the core language and functionality) was a > >> solid release. > >> > >> Any reasonable expectation about Python 3 adoption predicted that it > >> would take years, and would include going through a phase of difficulty > >> and disappointment... > >> > > Declaring something to be a turd doesn't change the fact that > > it's a turd. > > Right - but *you're* the one calling it a turd, which is not a helpful > approach or likely to achieve *anything* useful. I still have no idea > what you are actually suggesting. > > > I have a feeling that most people outside this > > list would have much rather avoided the difficulty and > > disappointment altogether. > > > > Let's be honest here; 3.X was released to the community in part > > as an extended beta. > > Correction - 3.0 was an experimental release. That is not true of 3.1 > and future releases. > > All the best, > > Michael > > That's not a problem, unless you drop the > > word "beta". And if you're still not buying that, imagine the sort > > of response you'd get if you tried to sell software that billed > > itself as "experimental", and promised a phase of "disappointment". > > Why would you expect the Python world to react any differently? > > > > > >> Whilst I agree that there are plenty of issues to workon, and I don't > >> underestimate the difficulty of some of them, I think "half-baked" is > >> very much overblown. Whilst you have a lot to say about how much of a > >> problem this is I don't understand what you are suggesting be *done*? > >> > > I agree that 3.X isn't all bad, and I very much hope it succeeds. And > > no, I have no answers; I'm just reporting the perception from downwind. > > > > So here it is: The prevailing view is that 3.X developers hoisted things > > on users that they did not fully work through themselves. Unicode is > > prime among these: for all the talk here about how 2.X was broken in > > this regard, the implications of the 3.X string solution remain to be > > fully resolved in the 3.X standard library to this day. What is a > > common Python user to make of that? > > > > --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > > > > > > > > > -- > http://www.ironpythoninaction.com/ > http://www.voidspace.org.uk/blog > > READ CAREFULLY. By accepting and reading this email you agree, on behalf of > your employer, to release me from all obligations and waivers arising from > any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, > clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and > acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with > your employer, its partners, licensors, agents and assigns, in perpetuity, > without prejudice to my ongoing rights and privileges. You further represent > that you have the authority to release me from any BOGUS AGREEMENTS on behalf > of your employer. > > > From pje at telecommunity.com Fri Jun 18 22:48:21 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 18 Jun 2010 16:48:21 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> At 05:22 PM 6/18/2010 +0000, lutz at rmi.net wrote: >So here it is: The prevailing view is that 3.X developers hoisted things >on users that they did not fully work through themselves. Unicode is >prime among these: for all the talk here about how 2.X was broken in >this regard, the implications of the 3.X string solution remain to be >fully resolved in the 3.X standard library to this day. What is a >common Python user to make of that? Certainly, this was my impression as well, after all the Web-SIG discussions regarding the state of the stdlib in 3.x with respect to URL parsing, joining, opening, etc. To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that actually addresses these kinds of stdlib usage issues, so that I don't have to think about it or futz around with experimenting, possibly to find that some things can't be done at all. IOW, 3.x has broken TOOOWTDI for me in some areas. There may be obvious ways to do it, but, as per the Zen of Python, "that way may not be obvious at first unless you're Dutch". ;-) Since at the moment Python 3 offers me only cosmetic improvements over 2.x (apart from argument annotations), it's hard to get excited enough about it to want to muck about with porting anything to it, or even trying to learn about all the ramifications of the changes. :-( From tjreedy at udel.edu Fri Jun 18 22:53:42 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 18 Jun 2010 16:53:42 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <4C1B9F80.6080203@livinglogic.de> References: <20100618050712.GC20639@thorne.id.au> <4C1B9F80.6080203@livinglogic.de> Message-ID: On 6/18/2010 12:32 PM, Walter D?rwald wrote: > http://coverage.livinglogic.de/ I am a bit puzzled as to the meaning of the gray/red/green bars since the correlation between coverage % and bars is not very high. From jnoller at gmail.com Fri Jun 18 23:02:09 2010 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 18 Jun 2010 17:02:09 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> Message-ID: On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby wrote: > At 05:22 PM 6/18/2010 +0000, lutz at rmi.net wrote: >> >> So here it is: The prevailing view is that 3.X developers hoisted things >> on users that they did not fully work through themselves. ?Unicode is >> prime among these: for all the talk here about how 2.X was broken in >> this regard, the implications of the 3.X string solution remain to be >> fully resolved in the 3.X standard library to this day. ?What is a >> common Python user to make of that? > > Certainly, this was my impression as well, after all the Web-SIG discussions > regarding the state of the stdlib in 3.x with respect to URL parsing, > joining, opening, etc. Nothing is set in stone; if something is incredibly painful, or worse yet broken, then someone needs to file a bug, bring it to this list, or bring up a patch. This is code we're talking about - nothing is set in stone, and if something is criminally broken it needs to be first identified, and then fixed. > To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that > actually addresses these kinds of stdlib usage issues, so that I don't have > to think about it or futz around with experimenting, possibly to find that > some things can't be done at all. I guess tutorial welcome, rather than patch welcome then ;) > IOW, 3.x has broken TOOOWTDI for me in some areas. ?There may be obvious > ways to do it, but, as per the Zen of Python, "that way may not be obvious > at first unless you're Dutch". ?;-) What areas. We need specifics which can either be: 1> Shot down. 2> Turned into bugs, so they can be fixed 3> Documented in the core documentation. jesse From brett at python.org Fri Jun 18 23:09:11 2010 From: brett at python.org (Brett Cannon) Date: Fri, 18 Jun 2010 14:09:11 -0700 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <4C1B9F80.6080203@livinglogic.de> Message-ID: On Fri, Jun 18, 2010 at 13:53, Terry Reedy wrote: > On 6/18/2010 12:32 PM, Walter D?rwald wrote: > >> ? ?http://coverage.livinglogic.de/ > > I am a bit puzzled as to the meaning of the gray/red/green bars since the > correlation between coverage % and bars is not very high. Gray is lines that are unexecutable (comments, etc.), green are lines that were executed, and red is lines not executed. From fuzzyman at voidspace.org.uk Sat Jun 19 00:08:32 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 18 Jun 2010 23:08:32 +0100 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <4C1BEE60.4040508@voidspace.org.uk> On 18/06/2010 19:52, lutz at rmi.net wrote: > I wasn't calling Python 3 a turd. I was trying to show > the strangeness of the logic behind your rationalization. > And failing badly... (maybe I should have used "tar ball"?) > > I didn't make myself clear. The expected disappointment I was referring to was about the rate of adoption, not about the quality of the product. I'm still baffled as to how a bug in the cgi module (along with the acknowledged email problems) is such a big deal. Was it reported and then languished in the bug tracker? That would be bad ion its own but if it was only recently discovered that indicates that it probably isn't such a big deal - either way it needs fixing, but using Python for writing cgis hasn't been a big use case for a long time. All the best, Michael > What I'm suggesting is that extreme caution be exercised from > this point forward with all things 3.X-related. Whether you > wish to accept this or not, 3.X has a negative image to many. > This suggestion specifically includes not abandoning current > 3.X email package users as a case in point. Ripping the rug > out from new 3.X users after they took the time to port seems > like it may be just enough to tip the scales altogether. > > --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) > > > >> -----Original Message----- >> From: Michael Foord >> To: lutz at rmi.net >> Subject: Re: [Python-Dev] email package status in 3.X >> Date: Fri, 18 Jun 2010 18:27:46 +0100 >> >> On 18/06/2010 18:22, lutz at rmi.net wrote: >> >>>> Python 3.0 was *declared* to be an experimental release, and by most >>>> standards 3.1 (in terms of the core language and functionality) was a >>>> solid release. >>>> >>>> Any reasonable expectation about Python 3 adoption predicted that it >>>> would take years, and would include going through a phase of difficulty >>>> and disappointment... >>>> >>>> >>> Declaring something to be a turd doesn't change the fact that >>> it's a turd. >>> >> Right - but *you're* the one calling it a turd, which is not a helpful >> approach or likely to achieve *anything* useful. I still have no idea >> what you are actually suggesting. >> >> >>> I have a feeling that most people outside this >>> list would have much rather avoided the difficulty and >>> disappointment altogether. >>> >>> Let's be honest here; 3.X was released to the community in part >>> as an extended beta. >>> >> Correction - 3.0 was an experimental release. That is not true of 3.1 >> and future releases. >> >> All the best, >> >> Michael >> >>> That's not a problem, unless you drop the >>> word "beta". And if you're still not buying that, imagine the sort >>> of response you'd get if you tried to sell software that billed >>> itself as "experimental", and promised a phase of "disappointment". >>> Why would you expect the Python world to react any differently? >>> >>> >>> >>>> Whilst I agree that there are plenty of issues to workon, and I don't >>>> underestimate the difficulty of some of them, I think "half-baked" is >>>> very much overblown. Whilst you have a lot to say about how much of a >>>> problem this is I don't understand what you are suggesting be *done*? >>>> >>>> >>> I agree that 3.X isn't all bad, and I very much hope it succeeds. And >>> no, I have no answers; I'm just reporting the perception from downwind. >>> >>> So here it is: The prevailing view is that 3.X developers hoisted things >>> on users that they did not fully work through themselves. Unicode is >>> prime among these: for all the talk here about how 2.X was broken in >>> this regard, the implications of the 3.X string solution remain to be >>> fully resolved in the 3.X standard library to this day. What is a >>> common Python user to make of that? >>> >>> --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) >>> >>> >>> >>> >> >> -- >> http://www.ironpythoninaction.com/ >> http://www.voidspace.org.uk/blog >> >> READ CAREFULLY. By accepting and reading this email you agree, on behalf of >> your employer, to release me from all obligations and waivers arising from >> any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, >> clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and >> acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with >> your employer, its partners, licensors, agents and assigns, in perpetuity, >> without prejudice to my ongoing rights and privileges. You further represent >> that you have the authority to release me from any BOGUS AGREEMENTS on behalf >> of your employer. >> >> >> >> -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From tjreedy at udel.edu Sat Jun 19 00:08:19 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 18 Jun 2010 18:08:19 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <201006190009.46122.steve@pearwood.info> Message-ID: On 6/18/2010 10:24 AM, Jesse Noller wrote: > http://jessenoller.com/2010/05/20/announcing-python-sprint-sponsorship/ This does not specify what expenses you are thinking of covering. Food is the most obvious. Anyway, this got me to think about offering my house at a site for US east coast mid-atlantic sprints (near I95, halfway betweenn NY and WDC, FIOS internet, TV/Playstation/Netflix for breaks ;-). Terry Jan Reedy From nyamatongwe at gmail.com Sat Jun 19 00:31:40 2010 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Sat, 19 Jun 2010 08:31:40 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <4C1B913D.60401@voidspace.org.uk> References: <4C1B913D.60401@voidspace.org.uk> Message-ID: Michael Foord: > Python 3.0 was *declared* to be an experimental release, and by most > standards 3.1 (in terms of the core language and functionality) was a solid > release. That looks to me like an after-the-event rationalization. The release note for Python 3.0 (and the "What's new") gives no indication that it is experimental but does say """ We are confident that Python 3.0 is of the same high quality as our previous releases ... you can safely choose either version (or both) to use in your projects. """ http://mail.python.org/pipermail/python-dev/2008-December/083824.html Neil From jnoller at gmail.com Sat Jun 19 00:37:56 2010 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 18 Jun 2010 18:37:56 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <201006190009.46122.steve@pearwood.info> Message-ID: On Fri, Jun 18, 2010 at 6:08 PM, Terry Reedy wrote: > On 6/18/2010 10:24 AM, Jesse Noller wrote: > >> http://jessenoller.com/2010/05/20/announcing-python-sprint-sponsorship/ > > This does not specify what expenses you are thinking of covering. Food is > the most obvious. > > Anyway, this got me to think about offering my house at a site for US east > coast mid-atlantic sprints (near I95, halfway betweenn NY and WDC, FIOS > internet, TV/Playstation/Netflix for breaks ;-). > > Terry Jan Reedy Yup, I'm putting the site together now - essentially what's covered is "anything up to this amount" - meaning, if you spend 200$ on room space, then this could go to that. Or 200$ in food for 20 people, etc. We'll have basic guidelines. jesse From raymond.hettinger at gmail.com Sat Jun 19 00:51:10 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 18 Jun 2010 15:51:10 -0700 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <4C1BEE60.4040508@voidspace.org.uk> References: <4C1BEE60.4040508@voidspace.org.uk> Message-ID: <606BEE82-71C7-4033-A273-F9E945ACDF90@gmail.com> On Jun 18, 2010, at 3:08 PM, Michael Foord wrote: > I'm still baffled as to how a bug in the cgi module (along with the acknowledged email problems) is such a big deal. Was it reported and then languished in the bug tracker? That would be bad ion its own but if it was only recently discovered that indicates that it probably isn't such a big deal - either way it needs fixing, but using Python for writing cgis hasn't been a big use case for a long time. That's one possible explanation. Another possible explanation is the product isn't being heavily exercised for serious work and that it has yet to be shaken-out thoroughly. There has been a disappointing lack of bug reports across the board for 3.x. That doesn't mean that the bugs aren't there and that they won't be reported when adoption is heavier. In the cases of email, mime handling, cgi and whatnot, the important point is not whether a given technology is popular. The important part is that it hints at the kind of bytes/text issues that people are going to face and that we will need to help them address (i.e. such as blobs containing multiple encodings, a need to use byte oriented tools such as md5 in conjunction with text oriented applications, etc.) One other thought: In addition to not getting many 3.x specific bug reports, we don't seem to be getting many 3.x specific help questions (i.e. asking about dictviews or how to make a priority queue in a environment where many callable don't support ordering operations, etc.). > Mark Lutz wrote > What I'm suggesting is that extreme caution be exercised from > this point forward with all things 3.X-related. Whether you > wish to accept this or not, 3.X has a negative image to many. > This suggestion specifically includes not abandoning current > 3.X email package users as a case in point. Ripping the rug > out from new 3.X users after they took the time to port seems > like it may be just enough to tip the scales altogether. A couple other areas that need work (some of them are minor): * BeautifulSoup was left behind when SGML parsing was removed from the standard lib. * Shelves were crippled for Windows users when bsddb was ripped out. * Lists containing None for missing values are no longer sortable. * The basic heapq approach to making a priority queue not longer works well. Simply decorating with (priority_level, callable_or_object) fails with two tasks at the same priority if the callable or other objects aren't orderable. Raymond P.S. I do think it would be great if we could direct some attention to parts of 3.x that are really nice. Am hoping that this conversation doesn't drown in negativity. Instead, it should focus on what improvements are needed to win broader adoption. From fuzzyman at voidspace.org.uk Sat Jun 19 00:56:40 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 18 Jun 2010 23:56:40 +0100 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <606BEE82-71C7-4033-A273-F9E945ACDF90@gmail.com> References: <4C1BEE60.4040508@voidspace.org.uk> <606BEE82-71C7-4033-A273-F9E945ACDF90@gmail.com> Message-ID: <4C1BF9A8.7030208@voidspace.org.uk> On 18/06/2010 23:51, Raymond Hettinger wrote: > On Jun 18, 2010, at 3:08 PM, Michael Foord wrote: > > >> I'm still baffled as to how a bug in the cgi module (along with the acknowledged email problems) is such a big deal. Was it reported and then languished in the bug tracker? That would be bad ion its own but if it was only recently discovered that indicates that it probably isn't such a big deal - either way it needs fixing, but using Python for writing cgis hasn't been a big use case for a long time. >> > That's one possible explanation. Another possible explanation is the product isn't being heavily exercised for serious work and that it has yet to be shaken-out thoroughly. There has been a disappointing lack of bug reports across the board for 3.x. That doesn't mean that the bugs aren't there and that they won't be reported when adoption is heavier. > > Oh, I quite agree. I don't think it makes py3k a turd either. > In the cases of email, mime handling, cgi and whatnot, the important point is not whether a given technology is popular. The important part is that it hints at the kind of bytes/text issues that people are going to face and that we will need to help them address (i.e. such as blobs containing multiple encodings, a need to use byte oriented tools such as md5 in conjunction with text oriented applications, etc.) > > One other thought: In addition to not getting many 3.x specific bug reports, we don't seem to be getting many 3.x specific help questions (i.e. asking about dictviews or how to make a priority queue in a environment where many callable don't support ordering operations, etc.). > > Most of the questions I've seen about Python 3 are from library authors doing porting rather than application developers. This is to be expected I guess. > >> Mark Lutz wrote >> > >> What I'm suggesting is that extreme caution be exercised from >> this point forward with all things 3.X-related. Whether you >> wish to accept this or not, 3.X has a negative image to many. >> This suggestion specifically includes not abandoning current >> 3.X email package users as a case in point. Ripping the rug >> out from new 3.X users after they took the time to port seems >> like it may be just enough to tip the scales altogether. >> > A couple other areas that need work (some of them are minor): > > * BeautifulSoup was left behind when SGML parsing was removed from the standard lib. > * Shelves were crippled for Windows users when bsddb was ripped out. > * Lists containing None for missing values are no longer sortable. > Yeah, this one can be a bugger. :-) > * The basic heapq approach to making a priority queue not longer works well. > Simply decorating with (priority_level, callable_or_object) fails with two tasks at the > same priority if the callable or other objects aren't orderable. > > > Raymond > > P.S. I do think it would be great if we could direct some attention > to parts of 3.x that are really nice. Am hoping that this conversation > doesn't drown in negativity. Instead, it should focus on what > improvements are needed to win broader adoption. > > > I definitely agree that our focus should be on fixing problems as we find them and working on increasing adoption. No argument from me. All the best, Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From tjreedy at udel.edu Sat Jun 19 04:39:36 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 18 Jun 2010 22:39:36 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <606BEE82-71C7-4033-A273-F9E945ACDF90@gmail.com> References: <4C1BEE60.4040508@voidspace.org.uk> <606BEE82-71C7-4033-A273-F9E945ACDF90@gmail.com> Message-ID: On 6/18/2010 6:51 PM, Raymond Hettinger wrote: > There has been a disappointing > lack of bug reports across the board for 3.x. Here is one from this week involving the interaction of array and bytearray. It needs a comment from someone who can understand the C-API based patch, which is beyond me. http://bugs.python.org/issue8990 Another possible reason for the lack: 500 of the current 2800 open issues have NO comment (ie, message count = 1), some with patches. I just posted '500 tracker orphans; we need more reviewers' on python-list to encourage more participation. Terry Jan Reedy From walter at livinglogic.de Sat Jun 19 11:57:35 2010 From: walter at livinglogic.de (=?utf-8?Q?Walter_D=C3=B6rwald?=) Date: Sat, 19 Jun 2010 11:57:35 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <4C1B9F80.6080203@livinglogic.de> Message-ID: Am 18.06.2010 um 22:53 schrieb Terry Reedy : > On 6/18/2010 12:32 PM, Walter D?rwald wrote: > >> http://coverage.livinglogic.de/ > > I am a bit puzzled as to the meaning of the gray/red/green bars > since the correlation between coverage % and bars is not very high. The gray bar is the uncoverable part of the source (empty lines, comments etc.), the green bar is the covered part (i.e. those lines that really got executed) and the red bar is the uncovered part (i.e. Those lines that could have been executed but weren't). So coverage is green / (green + red) Just click on the coverage header to sort by coverage and you *will* see a correlation. Servus, Walter From arcriley at gmail.com Sat Jun 19 12:59:44 2010 From: arcriley at gmail.com (Arc Riley) Date: Sat, 19 Jun 2010 06:59:44 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <20100618050712.GC20639@thorne.id.au> References: <20100618050712.GC20639@thorne.id.au> Message-ID: You mean Twisted support, because library support is at the point where there are fewer actively maintained packages not yet ported than those which are. Of course if your Python experience is hyper-focused to one framework that isn't ported yet, it will certainly seem like a lot, and you guys who run #Python are clearly hyper-focused on Twisted. Great example of the current state: about an hour ago I needed an inotify Python package for a Py3 project. I googled for "Python inotify", found pyinotify, saw that they have several recent releases but no mention of Py3, typed "sudo emerge -av pyinotify", and it installed pyinotify for Python 2.6, 3.1, and 3.2_pre at the same time. Run python interactively, imports and works great. Portage (Gentoo's package system, emerge being the primary command) is Python based and fully ported to Python 3. Most of my workstations and production servers report "/usr/bin/python --version" as "Python 3.1.2" (Python 2.6 is /usr/bin/python2), my Apache's mod_wsgi is compiled for Python 3 and save for a few Django and Trac sites (fastcgi) all of my Python-based webapps run on it. CherryPy and SQLAlchemy have had Py3 support for some time. I can name in a short list the legacy Python packages I use: - Django - Trac - Mercurial (they have a Summer of Code student working to port it now) - PIL (apparently will have a Python 3 release out soon) - pygtk (Python 3 support planned for Gnome 3 in a few months) - xmpppy The list of Python 3 packages I use regularly is at least 50 names long and I have only contributed to porting a dozen or so of those. This anti-Py3 rhetoric is damaging to the community and needs to stop. We're moving forward toward Python 3.2 and beyond, complaining about it only saps valuable developer time (including your own) from getting these libraries you need ported faster. On Fri, Jun 18, 2010 at 1:07 AM, Stephen Thorne wrote: > > Yes, #python keeps the text "It's too early to use Python 3.x" in its > topic. > Library support is the only reason. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sat Jun 19 13:20:01 2010 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 19 Jun 2010 12:20:01 +0100 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: On 19/06/2010 11:59, Arc Riley wrote: > You mean Twisted support, because library support is at the point where > there are fewer actively maintained packages not yet ported than those which > are. Of course if your Python experience is hyper-focused to one framework > that isn't ported yet, it will certainly seem like a lot, and you guys who > run #Python are clearly hyper-focused on Twisted. > > Great example of the current state: about an hour ago I needed an inotify > Python package for a Py3 project. I googled for "Python inotify", found > pyinotify, saw that they have several recent releases but no mention of Py3, > typed "sudo emerge -av pyinotify", and it installed pyinotify for Python > 2.6, 3.1, and 3.2_pre at the same time. Run python interactively, imports > and works great. > > Portage (Gentoo's package system, emerge being the primary command) is > Python based and fully ported to Python 3. Most of my workstations and > production servers report "/usr/bin/python --version" as "Python 3.1.2" > (Python 2.6 is /usr/bin/python2), my Apache's mod_wsgi is compiled for > Python 3 and save for a few Django and Trac sites (fastcgi) all of my > Python-based webapps run on it. CherryPy and SQLAlchemy have had Py3 support > for some time. > > I can name in a short list the legacy Python packages I use: > > - Django > - Trac > - Mercurial (they have a Summer of Code student working to port it now) > - PIL (apparently will have a Python 3 release out soon) > - pygtk (Python 3 support planned for Gnome 3 in a few months) > - xmpppy > > The list of Python 3 packages I use regularly is at least 50 names long and > I have only contributed to porting a dozen or so of those. > > This anti-Py3 rhetoric is damaging to the community and needs to stop. > We're moving forward toward Python 3.2 and beyond, complaining about it only > saps valuable developer time (including your own) from getting these > libraries you need ported faster. > Fair comment, but how many people are waiting for numpy for Python 3? I'd guess that it's many, many thousands, given that there are people such as myself who use it indirectly, in my case via matplotlib. Note that I am aware that the numpy Python 3 support is very close to release. Kindest regards. Mark Lawrence. From stephen at xemacs.org Sat Jun 19 13:34:41 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 19 Jun 2010 20:34:41 +0900 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> anatoly techtonik writes: > I do not know what are you intending to do, but my opinion that > fund raising for patching library is a waste of money. Of course it's not a waste of money. The need is real, so as long as the PSF and other organizations (GSoC) choose reasonable projects/ people to support, progress will be steady. Merely the sense that real resources are flowing into the stdlib from outside the volunteer core will encourage more volunteers as well. > PSF should concentrate on enhancing tools to make lives of library > supporters easier. I do not want to become a maintainer, Well, the current maintainers, while not yet happy with the state of the infrastructure, have been steadily engaged in improving it by adding features that have consensus support. But getting consensus support is not easy. Eg, I thought that with three plausible candidates, of which Mercurial was obviously satisfactory (although I preferred git, myself, and a at least couple people advocated Bazaar strongly), a switch to a dVCS was a no-brainer. It wasn't. Several people opposed it strongly until it became clear that in theory at least it would require *no* changes to current workflow (although I think most of those developers will find much to like about the changes Mercurial will bring). And even now implementation is hanging up on the requirement that it not affect Windows-based developers adversely ... and it turns out that even being Python-based is nowhere near enough to guarantee that, but rather it requires further effort before that will become reality -- and it's not forthcoming from the Mercurial developers, who unsurprisingly like Mercurial enough to deal with the minor flaws. IMO, if you want to improve the infrastructure, you need to work on getting consensus behind a few of your proposals, rather than making one after another and not following up with code or a PEP. From solipsis at pitrou.net Sat Jun 19 13:51:04 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 19 Jun 2010 13:51:04 +0200 Subject: [Python-Dev] Mercurial References: <20100618050712.GC20639@thorne.id.au> <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100619135104.5b0f22ed@pitrou.net> On Sat, 19 Jun 2010 20:34:41 +0900 "Stephen J. Turnbull" wrote: > > And even now > implementation is hanging up on the requirement that it not affect > Windows-based developers adversely ... and it turns out that even > being Python-based is nowhere near enough to guarantee that, but > rather it requires further effort before that will become reality -- > and it's not forthcoming from the Mercurial developers, who > unsurprisingly like Mercurial enough to deal with the minor flaws. FWIW, the EOL extension is now part of Mercurial: http://mercurial.selenic.com/wiki/EolExtension Antoine. From exarkun at twistedmatrix.com Sat Jun 19 14:12:56 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Sat, 19 Jun 2010 12:12:56 -0000 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> On 10:59 am, arcriley at gmail.com wrote: >You mean Twisted support, because library support is at the point where >there are fewer actively maintained packages not yet ported than those >which >are. Of course if your Python experience is hyper-focused to one >framework >that isn't ported yet, it will certainly seem like a lot, and you guys >who >run #Python are clearly hyper-focused on Twisted. Arc, This isn't about Twisted. Let's not waste everyone's time by trying to make it into a conflict between Twisted users and the rest of the Python community. You listed six other major packages that you yourself use that aren't available on Python 3 yet, so why are you trying to say here that this is all about Twisted? >[snip] > >This anti-Py3 rhetoric is damaging to the community and needs to stop. >We're moving forward toward Python 3.2 and beyond, complaining about it >only >saps valuable developer time (including your own) from getting these >libraries you need ported faster. No, it's not damaging. Critical self-evaluation is a useful tool. Trying to silence differing perspectives is what's damaging to the community. Jean-Paul From orsenthil at gmail.com Sat Jun 19 14:13:02 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Sat, 19 Jun 2010 17:43:02 +0530 Subject: [Python-Dev] Mercurial In-Reply-To: <20100619135104.5b0f22ed@pitrou.net> References: <20100618050712.GC20639@thorne.id.au> <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100619135104.5b0f22ed@pitrou.net> Message-ID: <20100619121302.GB12233@remy> On Sat, Jun 19, 2010 at 01:51:04PM +0200, Antoine Pitrou wrote: > FWIW, the EOL extension is now part of Mercurial: > http://mercurial.selenic.com/wiki/EolExtension Should we all move soon now? Any target date you have in mind, Antoine? -- Senthil From barry at python.org Sat Jun 19 14:33:17 2010 From: barry at python.org (Barry Warsaw) Date: Sat, 19 Jun 2010 08:33:17 -0400 Subject: [Python-Dev] Mercurial In-Reply-To: <20100619121302.GB12233@remy> References: <20100618050712.GC20639@thorne.id.au> <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100619135104.5b0f22ed@pitrou.net> <20100619121302.GB12233@remy> Message-ID: <20100619083317.11355342@heresy> On Jun 19, 2010, at 05:43 PM, Senthil Kumaran wrote: >On Sat, Jun 19, 2010 at 01:51:04PM +0200, Antoine Pitrou wrote: >> FWIW, the EOL extension is now part of Mercurial: >> http://mercurial.selenic.com/wiki/EolExtension > >Should we all move soon now? >Any target date you have in mind, Antoine? I believe the plan was to migrate right after 2.7 final is released. I hope that is still the plan. Since that is only 2 weeks away, are we ready? -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From solipsis at pitrou.net Sat Jun 19 14:42:18 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 19 Jun 2010 14:42:18 +0200 Subject: [Python-Dev] Mercurial In-Reply-To: <20100619121302.GB12233@remy> References: <20100618050712.GC20639@thorne.id.au> <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100619135104.5b0f22ed@pitrou.net> <20100619121302.GB12233@remy> Message-ID: <20100619144218.4209e881@pitrou.net> On Sat, 19 Jun 2010 17:43:02 +0530 Senthil Kumaran wrote: > On Sat, Jun 19, 2010 at 01:51:04PM +0200, Antoine Pitrou wrote: > > FWIW, the EOL extension is now part of Mercurial: > > http://mercurial.selenic.com/wiki/EolExtension > > Should we all move soon now? > Any target date you have in mind, Antoine? I should point out that I am in no way responsible for the migration. I think Dirkjan and Brett said they would tackle this after the 2.7 release. But they'd better answer by themselves :) From prologic at shortcircuit.net.au Sat Jun 19 15:05:37 2010 From: prologic at shortcircuit.net.au (James Mills) Date: Sat, 19 Jun 2010 23:05:37 +1000 Subject: [Python-Dev] Mercurial In-Reply-To: <20100619144218.4209e881@pitrou.net> References: <20100618050712.GC20639@thorne.id.au> <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100619135104.5b0f22ed@pitrou.net> <20100619121302.GB12233@remy> <20100619144218.4209e881@pitrou.net> Message-ID: On Sat, Jun 19, 2010 at 10:42 PM, Antoine Pitrou wrote: > I should point out that I am in no way responsible for the migration. > I think Dirkjan and Brett said they would tackle this after the 2.7 > release. But they'd better answer by themselves :) I'm willing to help out if needed. Can't hurt to have another set of hands :) I'm sure there are others in the Mercurial/Python community that would be willing to help too! cheers james From martin at v.loewis.de Sat Jun 19 15:07:33 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 15:07:33 +0200 Subject: [Python-Dev] Mercurial In-Reply-To: <20100619083317.11355342@heresy> References: <20100618050712.GC20639@thorne.id.au> <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100619135104.5b0f22ed@pitrou.net> <20100619121302.GB12233@remy> <20100619083317.11355342@heresy> Message-ID: <4C1CC115.60709@v.loewis.de> Am 19.06.2010 14:33, schrieb Barry Warsaw: > On Jun 19, 2010, at 05:43 PM, Senthil Kumaran wrote: > >> On Sat, Jun 19, 2010 at 01:51:04PM +0200, Antoine Pitrou wrote: >>> FWIW, the EOL extension is now part of Mercurial: >>> http://mercurial.selenic.com/wiki/EolExtension >> >> Should we all move soon now? >> Any target date you have in mind, Antoine? > > I believe the plan was to migrate right after 2.7 final is released. I don't think so. The last update to the plan that I know of was in http://mail.python.org/pipermail/python-dev/2010-February/097497.html and it said that we would migrate on May 1. This hasn't happened, but there was no update to the plan since (that I know of). > I hope > that is still the plan. Since that is only 2 weeks away, are we ready? Not nearly. AFAICT, the conversion process isn't complete yet, and the hook scripts are missing. Also, I would really like to see a /final/ demo installation *before* the switchover; because these things are all missing, the final demo installation is missing, as well. Regards, Martin From arcriley at gmail.com Sat Jun 19 15:09:55 2010 From: arcriley at gmail.com (Arc Riley) Date: Sat, 19 Jun 2010 09:09:55 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: Just because legacy Python needs to be kept around for a bit longer for a few uses does not mean that "Python 3 is not ready yet". Any decent package system can have two or more versions of Python installed at the same time. It is not "critical self-evaluation" to repeat "Python 3 is not ready" as litany in #Python and your supporting website. I use the word "litany" here because #Python refers users to what appears to be a religious website http://python-commandments.org/python3.html I have further witnessed (and even been the other party to) you and other ops in #Python telling package developers, who have clearly said that they are working to port their legacy package to Py3, that "Python 3 is not ready". One of our Summer of Code students this year actually included in his application that he was told (strongly) in #Python that he shouldn't be working with Py3 - even after he expressed his intent to apply under the PSF to help with the Py3 migration effort as his project. Besides rally against it what have you, as a Twisted developer, done regarding the Python 3 migration process? On Sat, Jun 19, 2010 at 8:12 AM, wrote: > On 10:59 am, arcriley at gmail.com wrote: > >> You mean Twisted support, because library support is at the point where >> there are fewer actively maintained packages not yet ported than those >> which >> are. Of course if your Python experience is hyper-focused to one >> framework >> that isn't ported yet, it will certainly seem like a lot, and you guys who >> run #Python are clearly hyper-focused on Twisted. >> > > Arc, > > This isn't about Twisted. Let's not waste everyone's time by trying to > make it into a conflict between Twisted users and the rest of the Python > community. > > You listed six other major packages that you yourself use that aren't > available on Python 3 yet, so why are you trying to say here that this is > all about Twisted? > >> [snip] >> >> >> This anti-Py3 rhetoric is damaging to the community and needs to stop. >> We're moving forward toward Python 3.2 and beyond, complaining about it >> only >> saps valuable developer time (including your own) from getting these >> libraries you need ported faster. >> > > No, it's not damaging. Critical self-evaluation is a useful tool. Trying > to silence differing perspectives is what's damaging to the community. > > Jean-Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Sat Jun 19 15:11:40 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 15:11:40 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: <4C1CC20C.7060709@v.loewis.de> >> This anti-Py3 rhetoric is damaging to the community and needs to stop. >> We're moving forward toward Python 3.2 and beyond, complaining about >> it only >> saps valuable developer time (including your own) from getting these >> libraries you need ported faster. > > No, it's not damaging. Critical self-evaluation is a useful tool. It's useful only if constructive. Stating a problem is, in itself, just frustrating. One needs to accompany it with proposals of actions. In the specific case, I'm optimistic, though. 2.7 will be the last release of 2.x, so it will then be easier to focus on fixing the 3.x bugs. Regards, Martin From martin at v.loewis.de Sat Jun 19 15:23:25 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 19 Jun 2010 15:23:25 +0200 Subject: [Python-Dev] Mercurial In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100619135104.5b0f22ed@pitrou.net> <20100619121302.GB12233@remy> <20100619144218.4209e881@pitrou.net> Message-ID: <4C1CC4CD.6080801@v.loewis.de> Am 19.06.2010 15:05, schrieb James Mills: > On Sat, Jun 19, 2010 at 10:42 PM, Antoine Pitrou wrote: >> I should point out that I am in no way responsible for the migration. >> I think Dirkjan and Brett said they would tackle this after the 2.7 >> release. But they'd better answer by themselves :) > > I'm willing to help out if needed. Can't hurt to have > another set of hands :) I'm sure there are others in the > Mercurial/Python community that would be willing to help too! Take a look at http://hg.python.org/pymigr/ What I *think* is missing is all the hook scripts (but you would need to check with Dirkjan whether they are already somewhere). In theory, I would expect that you can run this migration suite yourself, and get a working installation - but I never tried myself. See also PEP 385, which is the master plan. I'm not sure whether the approach to branches has been approved (or who could really approve it); I just notice that the current conversion produces a ridiculously large repository (which fails to download with older versions of hg because of size). On the meta level, what seems to be missing as well is a clear view on what the status is - so if you manage to get it working somehow, don't forget to post what you think the status is. Regards, Martin From g.brandl at gmx.net Sat Jun 19 15:43:55 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 19 Jun 2010 15:43:55 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: Am 19.06.2010 15:09, schrieb Arc Riley: > Just because legacy Python needs to be kept around for a bit longer for > a few uses does not mean that "Python 3 is not ready yet". Any decent > package system can have two or more versions of Python installed at the > same time. > > It is not "critical self-evaluation" to repeat "Python 3 is not ready" > as litany in #Python and your supporting website. I use the word > "litany" here because #Python refers users to what appears to be a > religious website http://python-commandments.org/python3.html > > I have further witnessed (and even been the other party to) you and > other ops in #Python telling package developers, who have clearly said > that they are working to port their legacy package to Py3, that "Python > 3 is not ready". One of our Summer of Code students this year actually > included in his application that he was told (strongly) in #Python that > he shouldn't be working with Py3 - even after he expressed his intent to > apply under the PSF to help with the Py3 migration effort as his project. Ouch. Looks like it's time for the PSU to release the 10-ton wei From tseaver at palladion.com Sat Jun 19 15:57:47 2010 From: tseaver at palladion.com (Tres Seaver) Date: Sat, 19 Jun 2010 09:57:47 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <4C1BEE60.4040508@voidspace.org.uk> References: <4C1BEE60.4040508@voidspace.org.uk> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michael Foord wrote: > I didn't make myself clear. The expected disappointment I was referring > to was about the rate of adoption, not about the quality of the product. > > I'm still baffled as to how a bug in the cgi module (along with the > acknowledged email problems) is such a big deal. Was it reported and > then languished in the bug tracker? That would be bad ion its own but if > it was only recently discovered that indicates that it probably isn't > such a big deal - either way it needs fixing, but using Python for > writing cgis hasn't been a big use case for a long time. FWIW: some APIs in the cgi module is actually used by a number of Python2 web frameworks and libraries: Paste, for instance, uses it, and is in turn used by BFG, Pylons, TurboGears. Zope has used it that way since for ever. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwczNsACgkQ+gerLs4ltQ7IjACfVcUshd10OQfZJqLMmU5p1nZ6 5OcAmwSsn7+q1GO67I1HuOH1waEDI8v/ =1geT -----END PGP SIGNATURE----- From stephen at xemacs.org Sat Jun 19 15:55:29 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 19 Jun 2010 22:55:29 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> lutz at rmi.net writes: > I agree that 3.X isn't all bad, and I very much hope it succeeds. And > no, I have no answers; I'm just reporting the perception from downwind. The fact is, though, that many of your "downwind" readers are not the audience for Python 3, not yet. If you want to do Python 3 a favor, make sure that they understand that Python 3 is *not* an "upgrade" of Python 2. It's a hard task for you, but IMO one strategy is to write in the style that we wrote the DVCS PEP (#374) in: here's how you do the same task in these similar languages. And just as git and Bazaar turned out to have fatal defects in terms of adoption *in that time frame*, Python 3 is not yet adoptable for many, many users. Python 3 is a Python-2-like language, but even though it's built on the same design principles, and uses nearly identical syntax, there are fundamental differences. And it is *very* young. So it's a new language and should be approached in the same way as any new language. Try it on non-mission critical projects, on projects where its library support has a good reputation, etc. Many of your readers have no time (or perhaps no approval "from upstairs") for that kind of thing. Too bad, but that's what happens to every great new language. > So here it is: The prevailing view is that 3.X developers hoisted things > on users that they did not fully work through themselves. Unicode is > prime among these: for all the talk here about how 2.X was broken in > this regard, the implications of the 3.X string solution remain to be > fully resolved in the 3.X standard library to this day. What is a > common Python user to make of that? Why should she make anything of that? Python 3 is a *new* language, possibly as different from Python 2 as C++ was from C (and *more* different in terms of fundamental incompatibilities). And as long as C++ was almost entirely dependent on C libraries, there were problems. (Not to mention that even today there are plenty of programmers who are proud to be C programmers, not C++ programmers.) Today, Python 3 is entirely dependent on Python 2 libraries. It's human to hope there will be no problems, but not realistic. BTW, I think what you're missing is that you're wrong about the money. Python 3 is still about the fun and the code. "Fun and code" are why the core developers spent about five years developing it, because doing that was fun, because the new code has high value as code, and because it promised *them* a more fun and more productive future. Library support, on the other hand, *is* about money. Your readers, down in the trenches of WWW, intraweb, and sysadmin implementation and support, depend on robust libraries to get their day jobs done. They really don't care that writing Python 3 was fun, and that programming in Python 3 is more fun than ever. That doesn't compensate for even one lingering str/bytes bogosity to most of them, and since they don't get paid for fixing Python library bugs, they don't, and they're in no mood to *forgive* any, either. So tell users who feel that way to use Python 2, for now, and check on Python 3 progress every 6 months or so. And users who are just a bit more adventurous to stick to applications where the libraries already have a good reputation *in Python 3*. It's as simple as that, I think. Regards, From tseaver at palladion.com Sat Jun 19 16:13:34 2010 From: tseaver at palladion.com (Tres Seaver) Date: Sat, 19 Jun 2010 10:13:34 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jesse Noller wrote: > On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby wrote: >> At 05:22 PM 6/18/2010 +0000, lutz at rmi.net wrote: >>> So here it is: The prevailing view is that 3.X developers hoisted things >>> on users that they did not fully work through themselves. Unicode is >>> prime among these: for all the talk here about how 2.X was broken in >>> this regard, the implications of the 3.X string solution remain to be >>> fully resolved in the 3.X standard library to this day. What is a >>> common Python user to make of that? >> Certainly, this was my impression as well, after all the Web-SIG discussions >> regarding the state of the stdlib in 3.x with respect to URL parsing, >> joining, opening, etc. > > Nothing is set in stone; if something is incredibly painful, or worse > yet broken, then someone needs to file a bug, bring it to this list, > or bring up a patch. Or walk away. > This is code we're talking about - nothing is set > in stone, and if something is criminally broken it needs to be first > identified, and then fixed. > >> To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x that >> actually addresses these kinds of stdlib usage issues, so that I don't have >> to think about it or futz around with experimenting, possibly to find that >> some things can't be done at all. > > I guess tutorial welcome, rather than patch welcome then ;) The only folks who can write the tutorial are the ones who have already drunk the koolaid. Note that I've been making my living with Python for about twelve years now, and would *like* to use Python3, but can't, yet, and therefore haven't taken the first sip. >> IOW, 3.x has broken TOOOWTDI for me in some areas. There may be obvious >> ways to do it, but, as per the Zen of Python, "that way may not be obvious >> at first unless you're Dutch". ;-) > > What areas. We need specifics which can either be: > > 1> Shot down. > 2> Turned into bugs, so they can be fixed > 3> Documented in the core documentation. That's bloody ironic in a thread which had pointed at reasons why people are not even considering Py3 for their projects: those folks won't even find the issues due to the lack of confidence in the suitability of the platform. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwc0I0ACgkQ+gerLs4ltQ6aDgCguYv+BXou0a42Yi7ERGCHOfIv 6REAnjejq4LDbE9c/gCqB+xs1yGfQ4KR =/9fw -----END PGP SIGNATURE----- From exarkun at twistedmatrix.com Sat Jun 19 16:28:00 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Sat, 19 Jun 2010 14:28:00 -0000 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: <20100619142800.2412.647073227.divmod.xquotient.206@localhost.localdomain> On 01:09 pm, arcriley at gmail.com wrote: >[snip] >It is not "critical self-evaluation" to repeat "Python 3 is not ready" >as >litany in #Python and your supporting website. I use the word "litany" >here >because #Python refers users to what appears to be a religious website >http://python-commandments.org/python3.html It's not my website. I don't own the domain, I don't control the hosting, I didn't generate the content, I have no access to change anything on it. I've barely even frequent #python in the last three years. Perhaps you were directing those comments at Stephen Thorne though (although I don't know if he's any more involved in it than I am so don't take this as anything but idle speculation). >I have further witnessed (and even been the other party to) you and >other >ops in #Python telling package developers, who have clearly said that >they >are working to port their legacy package to Py3, that "Python 3 is not >ready". I'm not going to condone or condemn events which I didn't observe. However you've never witnessed me discouraging developers who were actively porting software to Python 3 because I've never done it. I'm sure this was an honest mistake and you simply confused me with someone else. >Besides rally against it what have you, as a Twisted developer, done >regarding the Python 3 migration process? This, however, I find extremely insulting. I don't answer to you. The only reason I'm replying at all is to correct the two pieces of misinformation in your message. I don't see how this discussion can go anywhere productive, so I'll do my best to make this my last post on the subject. Obviously I made a mistake posting to the thread at all. Jean-Paul From jnoller at gmail.com Sat Jun 19 16:59:18 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sat, 19 Jun 2010 10:59:18 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> Message-ID: <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> On Jun 19, 2010, at 10:13 AM, Tres Seaver wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Jesse Noller wrote: >> On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby >> wrote: >>> At 05:22 PM 6/18/2010 +0000, lutz at rmi.net wrote: >>>> So here it is: The prevailing view is that 3.X developers hoisted >>>> things >>>> on users that they did not fully work through themselves. >>>> Unicode is >>>> prime among these: for all the talk here about how 2.X was broken >>>> in >>>> this regard, the implications of the 3.X string solution remain >>>> to be >>>> fully resolved in the 3.X standard library to this day. What is a >>>> common Python user to make of that? >>> Certainly, this was my impression as well, after all the Web-SIG >>> discussions >>> regarding the state of the stdlib in 3.x with respect to URL >>> parsing, >>> joining, opening, etc. >> >> Nothing is set in stone; if something is incredibly painful, or worse >> yet broken, then someone needs to file a bug, bring it to this list, >> or bring up a patch. > > Or walk away. > Ok. If you want. >> This is code we're talking about - nothing is set >> in stone, and if something is criminally broken it needs to be first >> identified, and then fixed. >> >>> To be honest, I'm waiting to see some sort of tutorial(s) for >>> using 3.x that >>> actually addresses these kinds of stdlib usage issues, so that I >>> don't have >>> to think about it or futz around with experimenting, possibly to >>> find that >>> some things can't be done at all. >> >> I guess tutorial welcome, rather than patch welcome then ;) > > The only folks who can write the tutorial are the ones who have > already > drunk the koolaid. Note that I've been making my living with Python > for > about twelve years now, and would *like* to use Python3, but can't, > yet, > and therefore haven't taken the first sip. Why can't you? Is it a bug? Let's file it and fix it. Is it that you need a dependency ported? Cool - let's bring it up to the maintainers, or this list, or ask the PSF to push resources into helping port. Anything but nothing. If what you're saying is that python 3 is a completely unsuitable platform, well, then yeah - we can all "fix" it or walk away. > >>> IOW, 3.x has broken TOOOWTDI for me in some areas. There may be >>> obvious >>> ways to do it, but, as per the Zen of Python, "that way may not be >>> obvious >>> at first unless you're Dutch". ;-) >> >> What areas. We need specifics which can either be: >> >> 1> Shot down. >> 2> Turned into bugs, so they can be fixed >> 3> Documented in the core documentation. > > That's bloody ironic in a thread which had pointed at reasons why > people > are not even considering Py3 for their projects: those folks won't > even > find the issues due to the lack of confidence in the suitability of > the > platform. What I saw was a thread about some issues in email, and cgi. We have some work being done to address the issue. This will help resolve some of the issues. I'd there are other issues, then we should step up and either help, or get out ofthe way. Arguing about the viability of a platform we knew would take a bit for adoption is silly and breeds ill will. It's not a turd, and it's not hopeless, in fact rumor has it NumPy will be ported soon which is a major stepping stone. The only way to counteract this meme that python 3 is horribly broken is to prove that it's not, fix bugs, and move on. There's no point debating relative turdiness here. Jesse From jnoller at gmail.com Sat Jun 19 17:07:06 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sat, 19 Jun 2010 11:07:06 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: On Sat, Jun 19, 2010 at 10:59 AM, Jesse Noller wrote: > > > On Jun 19, 2010, at 10:13 AM, Tres Seaver wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Jesse Noller wrote: >>> >>> On Fri, Jun 18, 2010 at 4:48 PM, P.J. Eby wrote: >>>> >>>> At 05:22 PM 6/18/2010 +0000, lutz at rmi.net wrote: >>>>> >>>>> So here it is: The prevailing view is that 3.X developers hoisted >>>>> things >>>>> on users that they did not fully work through themselves. ?Unicode is >>>>> prime among these: for all the talk here about how 2.X was broken in >>>>> this regard, the implications of the 3.X string solution remain to be >>>>> fully resolved in the 3.X standard library to this day. ?What is a >>>>> common Python user to make of that? >>>> >>>> Certainly, this was my impression as well, after all the Web-SIG >>>> discussions >>>> regarding the state of the stdlib in 3.x with respect to URL parsing, >>>> joining, opening, etc. >>> >>> Nothing is set in stone; if something is incredibly painful, or worse >>> yet broken, then someone needs to file a bug, bring it to this list, >>> or bring up a patch. >> >> Or walk away. >> > > Ok. If you want. > >>> This is code we're talking about - nothing is set >>> in stone, and if something is criminally broken it needs to be first >>> identified, and then fixed. >>> >>>> To be honest, I'm waiting to see some sort of tutorial(s) for using 3.x >>>> that >>>> actually addresses these kinds of stdlib usage issues, so that I don't >>>> have >>>> to think about it or futz around with experimenting, possibly to find >>>> that >>>> some things can't be done at all. >>> >>> I guess tutorial welcome, rather than patch welcome then ;) >> >> The only folks who can write the tutorial are the ones who have already >> drunk the koolaid. ?Note that I've been making my living with Python for >> about twelve years now, and would *like* to use Python3, but can't, yet, >> and therefore haven't taken the first sip. > > Why can't you? Is it a bug? Let's file it and fix it. Is it that you need a > dependency ported? Cool - let's bring it up to the maintainers, or this > list, or ask the PSF to push resources into helping port. Anything but > nothing. > > If what you're saying is that python 3 is a completely unsuitable platform, > well, then yeah - we can all "fix" it or walk away. > >> >>>> IOW, 3.x has broken TOOOWTDI for me in some areas. ?There may be obvious >>>> ways to do it, but, as per the Zen of Python, "that way may not be >>>> obvious >>>> at first unless you're Dutch". ?;-) >>> >>> What areas. We need specifics which can either be: >>> >>> 1> Shot down. >>> 2> Turned into bugs, so they can be fixed >>> 3> Documented in the core documentation. >> >> That's bloody ironic in a thread which had pointed at reasons why people >> are not even considering Py3 for their projects: ?those folks won't even >> find the issues due to the lack of confidence in the suitability of the >> platform. > > What I saw was a thread about some issues in email, and cgi. We have some > work being done to address the issue. This will help resolve some of the > issues. > > I'd there are other issues, then we should step up and either help, or get > out ofthe way. Arguing about the viability of a platform we knew would take > a bit for adoption is silly and breeds ill will. > s/I'd/If - stupid phone. From arcriley at gmail.com Sat Jun 19 17:14:51 2010 From: arcriley at gmail.com (Arc Riley) Date: Sat, 19 Jun 2010 11:14:51 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <20100619142800.2412.647073227.divmod.xquotient.206@localhost.localdomain> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <20100619142800.2412.647073227.divmod.xquotient.206@localhost.localdomain> Message-ID: python-commandments.org is owned and hosted by the same person (Allen Short aka dash aka washort) as pound-python.org which is the "official" website for #Python and which links to it. #Python is co-managed by Stephen Thorne (aka Jerub) and Allen Short (aka dash aka washort). According to Freenode services, the channel operators include more than half the active Twisted Matrix developers, including yourself. Each of you has had the ability to change the topic at any time. I may have cast an overly broad net in including you, I don't have IRC logs to review. I do remember that you have contributed a great deal of time to helping people in #Python and that you were fairly active as a channel operator in #Python when the anti-Py3 rhetoric got started. Perhaps you can shine some light on who is actually responsible for promoting this? I'm sorry if we're in uncomfortable finger-pointing mode, but in the spirit of critical self-evaluation I think its time we take a long look at who is actually representing the Python community in operating our primary community help channel and whether that situation should continue. On Sat, Jun 19, 2010 at 10:28 AM, wrote: > On 01:09 pm, arcriley at gmail.com wrote: > >> [snip] >> >> It is not "critical self-evaluation" to repeat "Python 3 is not ready" as >> litany in #Python and your supporting website. I use the word "litany" >> here >> because #Python refers users to what appears to be a religious website >> http://python-commandments.org/python3.html >> > > It's not my website. I don't own the domain, I don't control the hosting, > I didn't generate the content, I have no access to change anything on it. > I've barely even frequent #python in the last three years. > > Perhaps you were directing those comments at Stephen Thorne though > (although I don't know if he's any more involved in it than I am so don't > take this as anything but idle speculation). > > I have further witnessed (and even been the other party to) you and other >> ops in #Python telling package developers, who have clearly said that they >> are working to port their legacy package to Py3, that "Python 3 is not >> ready". >> > > I'm not going to condone or condemn events which I didn't observe. > > However you've never witnessed me discouraging developers who were actively > porting software to Python 3 because I've never done it. I'm sure this was > an honest mistake and you simply confused me with someone else. > > Besides rally against it what have you, as a Twisted developer, done >> regarding the Python 3 migration process? >> > > This, however, I find extremely insulting. I don't answer to you. The > only reason I'm replying at all is to correct the two pieces of > misinformation in your message. > > I don't see how this discussion can go anywhere productive, so I'll do my > best to make this my last post on the subject. Obviously I made a mistake > posting to the thread at all. > > Jean-Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Jun 19 17:43:26 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 19 Jun 2010 17:43:26 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <20100619142800.2412.647073227.divmod.xquotient.206@localhost.localdomain> Message-ID: <20100619174326.22cac1a3@pitrou.net> On Sat, 19 Jun 2010 11:14:51 -0400 Arc Riley wrote: > python-commandments.org is owned and hosted by the same person (Allen Short > aka dash aka washort) as pound-python.org which is the "official" website > for #Python and which links to it. > > #Python is co-managed by Stephen Thorne (aka Jerub) and Allen Short (aka > dash aka washort). According to Freenode services, the channel operators > include more than half the active Twisted Matrix developers, including > yourself. Each of you has had the ability to change the topic at any time. I don't think it's constructive to treat the Twisted developers as an uniform society. I would expect #python (which I don't think I have ever participated in) to function like any community, where you don't make unilateral changes if others disagree with you. Jean-Paul said ?I've barely even frequent #python in the last three years?. Knowing this, I don't know how he could impose a topic change on his own. > I'm sorry if we're in uncomfortable finger-pointing mode, but in the spirit > of critical self-evaluation I think its time we take a long look at who is > actually representing the Python community in operating our primary > community help channel and whether that situation should continue. Well, perhaps, but whether Python 3 is misrepresented shouldn't be the only metric, then. Regards Antoine. From pje at telecommunity.com Sat Jun 19 18:07:43 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 19 Jun 2010 12:07:43 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100619160755.4E50C3A4060@sparrow.telecommunity.com> At 10:55 PM 6/19/2010 +0900, Stephen J. Turnbull wrote: >They really don't care that writing Python 3 was fun, and that >programming in Python 3 is more fun than ever. That doesn't >compensate for even one lingering str/bytes bogosity to most of >them, and since they don't get paid for fixing Python library bugs, >they don't, and they're in no mood to *forgive* any, either. This is pretty much where I'm at, except that the only potential fun increase Py3 appears to offer me are argument annotations and keyword-only args -- but these are partly balanced by the loss of argument tuple unpacking. The metaclass keyword argument is nice, but the loss of dynamically-settable __metaclass__ is just plain annoying. Really, just about everything that Py3 offers in the way of added fun, seems offset by a matching loss somewhere else. So it's hard to get excited about it - it seems like, "ho hum, a new language that's kind of like Python, but just different enough to be annoying." OTOH, I don't know what to do about that, besides adding some sort of "killer app" feature that makes Python 3 the One Obvious Way to do some specific application domain. From breamoreboy at yahoo.co.uk Sat Jun 19 18:28:17 2010 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 19 Jun 2010 17:28:17 +0100 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: On 19/06/2010 14:43, Georg Brandl wrote: > Am 19.06.2010 15:09, schrieb Arc Riley: >> Just because legacy Python needs to be kept around for a bit longer for >> a few uses does not mean that "Python 3 is not ready yet". Any decent >> package system can have two or more versions of Python installed at the >> same time. >> >> It is not "critical self-evaluation" to repeat "Python 3 is not ready" >> as litany in #Python and your supporting website. I use the word >> "litany" here because #Python refers users to what appears to be a >> religious website http://python-commandments.org/python3.html >> >> I have further witnessed (and even been the other party to) you and >> other ops in #Python telling package developers, who have clearly said >> that they are working to port their legacy package to Py3, that "Python >> 3 is not ready". One of our Summer of Code students this year actually >> included in his application that he was told (strongly) in #Python that >> he shouldn't be working with Py3 - even after he expressed his intent to >> apply under the PSF to help with the Py3 migration effort as his project. > > Ouch. Looks like it's time for the PSU to release the 10-ton wei > Please raise a new issue, the weight should be 16 ton to conform to Python standards. Cheers. Mark Lawrence. From debatem1 at gmail.com Sat Jun 19 21:02:53 2010 From: debatem1 at gmail.com (geremy condra) Date: Sat, 19 Jun 2010 12:02:53 -0700 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <20100619142800.2412.647073227.divmod.xquotient.206@localhost.localdomain> Message-ID: On Sat, Jun 19, 2010 at 8:14 AM, Arc Riley wrote: > python-commandments.org is owned and hosted by the same person (Allen Short > aka dash aka washort) as pound-python.org which is the "official" website > for #Python and which links to it. > > #Python is co-managed by Stephen Thorne (aka Jerub) and Allen Short (aka > dash aka washort).? According to Freenode services, the channel operators > include more than half the active Twisted Matrix developers, including > yourself.? Each of you has had the ability to change the topic at any time. > > I may have cast an overly broad net in including you, I don't have IRC logs > to review.? I do remember that you have contributed a great deal of time to > helping people in #Python and that you were fairly active as a channel > operator in #Python when the anti-Py3 rhetoric got started.? Perhaps you can > shine some light on who is actually responsible for promoting this? > > I'm sorry if we're in uncomfortable finger-pointing mode, but in the spirit > of critical self-evaluation I think its time we take a long look at who is > actually representing the Python community in operating our primary > community help channel and whether that situation should continue. Amen. I've heard about people being told not to use python3 on the irc *way* too many times for it to be all make believe. Geremy Condra From simon at ikanobori.jp Sat Jun 19 21:55:34 2010 From: simon at ikanobori.jp (Simon de Vlieger) Date: Sat, 19 Jun 2010 21:55:34 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) Message-ID: <1A151F99-EAA2-4897-88B5-B3E765AF66BD@ikanobori.jp> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear all, Sorry for the maybe somewhat late response but I am not a subscriber on the python-dev mailinglists. Someone else pointed me towards this thread and I want to shortly clarify a few things regarding the following two statements: > It is not "critical self-evaluation" to repeat "Python 3 is not > ready" as > litany in #Python and your supporting website. I use the word > "litany" here > because #Python refers users to what appears to be a religious website > http://python-commandments.org/python3.html > python-commandments.org is owned and hosted by the same person > (Allen Short > aka dash aka washort) as pound-python.org which is the "official" > website > for #Python and which links to it. Both python-commandments.org and pound-python.org are my websites. I own both the domains and I do all administrative tasks regarding these domains. pound-python.org is the official #python website and as such is maintained on Launchpad by a team of volunteers, see: https://launchpad.net/ ~pound-python which is indeed owned by Allen Short. However, Allen Short has nothing to do with the Python Commandments page. That is an endeavor for which I am the sole responsible person. I have asked some people to contribute texts but that doesn't change that I should be spoken to regarding the content on that website. If there are any issues with the content on either website please do not hesitate to contact me at this email address or on IRC where I go by the nickname of ikanobori. As for the potentially harmful text on Python 3 which is included on the python-commandments website I do get the hint that it might not be clear enough that the text does not apply to people who are porting libraries. This is a complaint I have heard before and to which I will take affirmative action by explicitly adding text to clarify that. Hope all is well, Regards, Simon de Vlieger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQIcBAEBAgAGBQJMHSC2AAoJEBBSHP7i+JXf5pMQANPBCUzDwx2xjTP8shA1E4mx 7/OQk27nxt+wOZNT0Ybe/iNXLetF6qa8At7kTau/yU3l/xJWVODjfJUICkDv/0ad ebMKiFeKO8jqdvEe+RL3ck7jTXEM73C2PLNtge9FLTY6HhYrXnOJakNbpWPJR/PG TQQ+mY/8ZvSP+n98RrY9kcVaVJMSmXUJWHvWVh+LkcIDwF/h30EH/e5PUGzylINI NiV5955pNRXTnwdgjsouljUI/rrod3zphnUEyL22QvSUx0b7YXMfC24eRGTpwrLg 9cyQAMjjbuVqkhSJhYFnm+DKwsZEAHxxOvu50Xwuy3i1C7c8L6/QDT1txoSTVuaP 4xw8GSFEblbHviz7hY7KCe5nMpBNHNfcGFHFSWd+WYogRXjpDitlMDNW8HT56pRW lwzs1WENnoOSCAn4Xds+xPJj9JyAGnS8rWz70RVMyrkHDFaJhDlIDNpEFdlAlywT R0uCQrlxs/uWzAXK2IA0wXPtm/m8fYLR3q8mD4++QotZKQcT4ciN7Xv913/ZT2b2 NtR1WEoTZAV+gWrFyFsgmMFAmZhvUdI8Ludxs3l2smHHaCFUkj2Ur9BrkMiEv5Z8 wLN+/LRaHgGnmVT2SF0LOCeOLz97dP728OKBO0DwxqT89Cla8445z7ktdHnJ3amA gjbsfG7W+yx9L2v0IDFC =YDiR -----END PGP SIGNATURE----- From turnbull at sk.tsukuba.ac.jp Sat Jun 19 22:23:09 2010 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sun, 20 Jun 2010 05:23:09 +0900 Subject: [Python-Dev] Python Library Support in 3.x In-Reply-To: <1A151F99-EAA2-4897-88B5-B3E765AF66BD@ikanobori.jp> References: <1A151F99-EAA2-4897-88B5-B3E765AF66BD@ikanobori.jp> Message-ID: <87hbkydb6a.fsf@uwakimon.sk.tsukuba.ac.jp> Simon de Vlieger writes: > As for the potentially harmful text on Python 3 which is included on > the python-commandments website I do get the hint that it might not be > clear enough that the text does not apply to people who are porting > libraries. It also doesn't apply to people who don't need unported libraries, eg, where the task is plain old text filtering or command line scripting. Don't ask me for the list of "unported libraries", I know of none from personal experience. You might also want to withdraw the claim that Python 2.x is actively developed. With the release of 2.7, that's not true any more, not in the sense that most people think of "actively developed." From alexander.belopolsky at gmail.com Sat Jun 19 22:43:18 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 19 Jun 2010 16:43:18 -0400 Subject: [Python-Dev] Year 0 and year 10,000 in timetuple Message-ID: While datetime range is limited to years from 1 through 9999, it is possible to produce time tuple with year 0 or year 10,000: >>> t1 = datetime.min.replace(tzinfo=timezone.max) >>> t2 = datetime.max.replace(tzinfo=timezone.min) >>> t1.utctimetuple().tm_year 0 >>> t2.utctimetuple().tm_year 10000 Most if not all functions consuming timetuples are not designed to handle years beyond 9999 and such timetuples cannot be converted back to datetime. I would like to make utctimetuple() method to raise OverflowError on values like t1 or t2 above. These values are most certainly a mistake in application ad it is better to detect them earlier before they make their way into system functions that cannot handle them. See issues 9005 and 6608 on the tracker. http://bugs.python.org/issue9005 http://bugs.python.org/issue6608 From guido at python.org Sun Jun 20 00:12:29 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Jun 2010 15:12:29 -0700 Subject: [Python-Dev] Year 0 and year 10,000 in timetuple In-Reply-To: References: Message-ID: But what if they are used intentionally as "impossible" or sentinel values? --Guido (on Android) On Jun 19, 2010 2:37 PM, "Alexander Belopolsky" < alexander.belopolsky at gmail.com> wrote: > While datetime range is limited to years from 1 through 9999, it is > possible to produce time tuple with year 0 or year 10,000: > >>>> t1 = datetime.min.replace(tzinfo=timezone.max) >>>> t2 = datetime.max.replace(tzinfo=timezone.min) >>>> t1.utctimetuple().tm_year > 0 >>>> t2.utctimetuple().tm_year > 10000 > > Most if not all functions consuming timetuples are not designed to > handle years beyond 9999 and such timetuples cannot be converted back > to datetime. > > I would like to make utctimetuple() method to raise OverflowError on > values like t1 or t2 above. These values are most certainly a mistake > in application ad it is better to detect them earlier before they make > their way into system functions that cannot handle them. > > See issues 9005 and 6608 on the tracker. > > http://bugs.python.org/issue9005 > http://bugs.python.org/issue6608 > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sun Jun 20 00:27:11 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 19 Jun 2010 15:27:11 -0700 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <4C1BEE60.4040508@voidspace.org.uk> <606BEE82-71C7-4033-A273-F9E945ACDF90@gmail.com> Message-ID: On Jun 18, 2010, at 7:39 PM, Terry Reedy wrote: > On 6/18/2010 6:51 PM, Raymond Hettinger wrote: >> There has been a disappointing >> lack of bug reports across the board for 3.x. > > Here is one from this week involving the interaction of array and bytearray. It needs a comment from someone who can understand the C-API based patch, which is beyond me. > http://bugs.python.org/issue8990 I'll take a look at this one. Raymond P.S. For those who are interested, here is the story on BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/3.1-problems.html From alexander.belopolsky at gmail.com Sun Jun 20 00:31:52 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sat, 19 Jun 2010 18:31:52 -0400 Subject: [Python-Dev] Year 0 and year 10,000 in timetuple In-Reply-To: References: Message-ID: On Jun 19, 2010, at 6:12 PM, Guido van Rossum wrote: > But what if they are used intentionally as "impossible" or sentinel > values? > That would be another reason not to produce them accidently. Note that I am proposing disallowing production of out of range years from valid datetime objects, not consumption of them if that is allowed anywhere. From tjreedy at udel.edu Sun Jun 20 02:02:03 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 19 Jun 2010 20:02:03 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: After reading the discussion in the previous thread, signed in to #python and verified that the intro message starts with a lie about python3. I also verified that the official #python site links to "Python Commandment Don't use Python 3? yet". The excuse that the negative commandment site is not part of the official site is does not wash. The #python site maintainer choose that as the authoritative word on the topic "On using Python 2.x or Python 3.x". Since a fair, half-intelligent person would know that the usability of Python3 depends on the user, this all strikes as conscious sabotage. To me, this, along with other reports, is really ugly. I do not wish to fight such people; but I would rather ask python3 questions in a pro- rather than anti-python3 atmosphere. #python is certainly not a place that I would refer new people to. Given that the 'owners' of #python have been asked and refuse to remove their negative-opinion-stated-as-leading-headline-fact, it seems to me that we need a separate #python3 channel. The topic could be "Welcome to discussion of Python3, the latest, greated version of Python." The first link might be to the current stable Python3 docs. Hence the '!' in the subject line. HoweverI have very little experience with IRC and consequently have little idea what getting a permanent, owned, channel like #python entails. Hence the '?' that follows. What do others think? From glyph at twistedmatrix.com Sun Jun 20 02:24:07 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Sat, 19 Jun 2010 17:24:07 -0700 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> On Jun 19, 2010, at 5:02 PM, Terry Reedy wrote: > HoweverI have very little experience with IRC and consequently have little idea what getting a permanent, owned, channel like #python entails. Hence the '?' that follows. > > What do others think? Sure, this is a good idea. Technically speaking, this is extremely easy. Somebody needs to "/msg chanserv register #python3" and that's about it. (In this case, that "someone" may need to be Brett Cannon, since he is the official group contact for Freenode regarding Python-related channels.) Practically speaking, you will need a group of at least a dozen contributors, each in a different timezone, who sit there all day answering questions :). Otherwise the ownership of the channel is just a signpost pointing at an empty room. -------------- next part -------------- An HTML attachment was scrubbed... URL: From debatem1 at gmail.com Sun Jun 20 02:39:44 2010 From: debatem1 at gmail.com (geremy condra) Date: Sat, 19 Jun 2010 17:39:44 -0700 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: On Sat, Jun 19, 2010 at 5:02 PM, Terry Reedy wrote: > After reading the discussion in the previous thread, signed in to #python > and verified that the intro message starts with a lie about python3. I also > verified that the official #python site links to "Python Commandment Don't > use Python 3? yet". The excuse that the negative commandment site is not > part of the official site is does not wash. The #python site maintainer > choose that as the authoritative word on the topic "On using Python 2.x or > Python 3.x". > > Since a fair, half-intelligent person would know that the usability of > Python3 depends on the user, this all strikes as conscious sabotage. > > To me, this, along with other reports, is really ugly. I do not wish to > fight such people; but I would rather ask python3 questions in a pro- rather > than anti-python3 atmosphere. #python is certainly not a place that I would > refer new people to. > > Given that the 'owners' of #python have been asked and refuse to remove > their negative-opinion-stated-as-leading-headline-fact, it seems to me that > we need a separate #python3 channel. The topic could be "Welcome to > discussion of Python3, the latest, greated version of Python." The first > link might be to the current stable Python3 docs. Hence the '!' in the > subject line. > > HoweverI have very little experience with IRC and consequently have little > idea what getting a permanent, owned, channel like #python entails. Hence > the '?' that follows. > > What do others think? Seems like it turns a disagreement into a power struggle that python-dev is unlikely to win. If people here were interested in the irc, the irc culture would never have become as disconnected from the core group as it has, and even the most impassioned call isn't going to build an active community overnight. Furthermore, if #python has 200 people in it and #python3 is a ghost town, they can just tell anybody asking a python3 question to go to #python3 and snicker, reinforcing the widely held belief that python3 itself is a failure. It also runs the risk of hardening their existing position, and in any event begins the process of fracturing the community at a point where 3.x is probably not going to come out on top. Bottom line, what I'd really like to do is kick them all off of #python, but practically I see very little that can be done to rectify the situation at this point. Geremy Condra From glyph at twistedmatrix.com Sun Jun 20 02:56:38 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Sat, 19 Jun 2010 17:56:38 -0700 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: On Jun 19, 2010, at 5:39 PM, geremy condra wrote: > Bottom line, what I'd really like to do is kick them all off of #python, but > practically I see very little that can be done to rectify the situation at this > point. Here's something you can do: port libraries to python 3 and make the ecosystem viable. It's as simple as that. Nobody on #python has an ideological axe to grind, they just want to tell users to use tools which actually solve their problems. (Well, unless you think that "helping users" is ideological axe-grinding, in which case I think you may want to re-examine your own premises.) If Python 3 had all the features and libraries as Python 2, and ran in all the same places (for example, as Stephen Thorne reminded me when I asked him about this, the oldest supported version of Red Hat Enterprise Linux...) then it would be an equally viable answer on IRC. It's going to take a lot of work to get it to that point. Even if you write code, of course, it's too much work for one person to fill the whole gap. Have some patience. The PSF is funding these efforts, and more library authors are porting all the time. Eventually, resistance in forums like Freenode's #python will disappear. But you can't make it go away by wishing it away, you have to get rid of the cause. -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Sun Jun 20 03:12:35 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Sat, 19 Jun 2010 18:12:35 -0700 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: <7F841A1E-213A-40A0-8BAC-3CDEF9838E94@gmail.com> On Jun 19, 2010, at 5:39 PM, geremy condra wrote: > Bottom line, what I'd really like to do is kick them all off of #python, This is so profoundly wrong on so many levels it is hard to know how to respond. Raymond From jacob at jacobian.org Sun Jun 20 03:19:28 2010 From: jacob at jacobian.org (Jacob Kaplan-Moss) Date: Sat, 19 Jun 2010 18:19:28 -0700 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <7F841A1E-213A-40A0-8BAC-3CDEF9838E94@gmail.com> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <7F841A1E-213A-40A0-8BAC-3CDEF9838E94@gmail.com> Message-ID: On Sat, Jun 19, 2010 at 6:12 PM, Raymond Hettinger wrote: > This is so profoundly wrong on so many levels it is hard to know how to respond. C'mon, Raymond, that's not any more helpful. Geremy wasn't trying to argue for that course of action; he was expression his frustration with the culture that's developed in #python. There's nothing wrong with frustration, and there's nothing wrong with expressing those -- or any -- feelings. Indeed, I'm happy that folks are blowing off a bit of steam here instead of doing something silly in public. Let's all try to simmer down here a little bit and cut each other some slack: this is a frustration situation, and we're not going to help it by heaping more fuel on the fire. Jacob From steve at pearwood.info Sun Jun 20 04:04:30 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 20 Jun 2010 12:04:30 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <201006201204.30795.steve@pearwood.info> On Sat, 19 Jun 2010 11:55:29 pm Stephen J. Turnbull wrote: > If you want to do Python 3 a favor, > make sure that they understand that Python 3 is *not* an "upgrade" of > Python 2. [...] > Python 3 is a Python-2-like language, but even though it's built on > the same design principles, and uses nearly identical syntax, there > are fundamental differences. And it is *very* young. So it's a new > language and should be approached in the same way as any new > language. I haven't written any large projects in Python3, so take this with a grain of salt, but I just don't see that Python3 is a "new language" as most people understand the term. It might be splitting hairs, but I see it as a new dialect *at worst*, and probably not even that, in the sense that any half decent human coder who can read Python 2.x code should be able to make sense of Python 3.x code, and vice versa. As I see it, the changes to the language and syntax between 2.x and 3.x are much smaller than those between 1.x to 2.x: Python 2.x introduced a brand new object model (new style classes). Python 3.x does not. Python 2.x introduced radically new syntax, namely list comprehensions, while 3.x merely extends the same idea to set and dict comprehensions. Python 2.x introduced lexical scoping AND closures. Python 3.x does nothing as radical. Python 2.x introduced a new (to Python) programming model, namely iterators, complete with TWO extensions to syntax (generator functions including yield, generator expressions), *and* then went and made yield a function so as to introduce coroutines as well. Python 3.x merely uses iterators in more places. Python 2.x introduced Unicode strings. Python 3.x merely makes them the default. The only major difference is that Python 3 takes away as well as adding, but even there, Python 2 did the same, e.g. there is no provision to get the old scoping behaviour except to go back and use 2.1 or older. Frankly, I believe that pushing the meme that "Python 3 is different" is a strategic mistake. People hate and fear change. I should know this. I resisted Python 2.x and stuck with 1.5 until Python 2.3 was released, and then was amazed at how *easy* the transition was. Of course, I wasn't using third party libraries that hadn't been ported to 2.3, if I had my experience would have been different. It's bad enough to have to tell people "Python 3 is currently lacking some critical libraries, particularly third-party libraries" without also telling them (wrongly IMO) "oh, and it's a new language too". -- Steven D'Aprano From steve at pearwood.info Sun Jun 20 04:05:46 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 20 Jun 2010 12:05:46 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <201006201205.46507.steve@pearwood.info> On Sun, 20 Jun 2010 12:13:34 am Tres Seaver wrote: > > I guess tutorial welcome, rather than patch welcome then ;) > > The only folks who can write the tutorial are the ones who have > already drunk the koolaid. Note that I've been making my living with > Python for about twelve years now, and would *like* to use Python3, > but can't, yet, and therefore haven't taken the first sip. You emphatically say you would "like" to use Python3, but describe those who already have as having drunk the Koolaid. Comparing those who can and have successfully moved to Python3 with the Jonestown cult mass-suicide doesn't really strike me as a sign that you want to join them. -- Steven D'Aprano From guido at python.org Sun Jun 20 04:21:35 2010 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Jun 2010 19:21:35 -0700 Subject: [Python-Dev] Year 0 and year 10,000 in timetuple In-Reply-To: References: Message-ID: On Sat, Jun 19, 2010 at 3:31 PM, Alexander Belopolsky wrote: > On Jun 19, 2010, at 6:12 PM, Guido van Rossum wrote: >> But what if they are used intentionally as "impossible" or sentinel >> values? > That would be another reason not to produce them accidently. ?Note that I am > proposing disallowing production of out of range years from valid datetime > objects, not consumption of them if that is allowed anywhere. OK. -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Sun Jun 20 04:44:49 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 19 Jun 2010 22:44:49 -0400 Subject: [Python-Dev] #Python3 ! ? In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: On 6/19/2010 8:56 PM, Glyph Lefkowitz wrote: > On Jun 19, 2010, at 5:39 PM, geremy condra wrote: > >> Bottom line, what I'd really like to do is kick them all off of >> #python, but >> practically I see very little that can be done to rectify the >> situation at this >> point. Given the experiences you reported, I can understand that sentiment, but I explicitly disclaimed any intent to fight or power struggle. > Here's something you can do: port libraries to python 3 and make the > ecosystem viable. > > It's as simple as that. Nobody on #python has an ideological axe to > grind, Then why are they grinding an anti-Python3 axe? As I explained in my original post, I did not take anyone's word for it, but verified for myself that they are indeed doing so and why I thought so. There are people who are opposed to Python3 and have the fantasy that if it fails, the devs would continue to pile new features, sometimes duplicative features into 2.x and never remove anything. They do not care that this would make the language harder and harder for new learners. However, I will consider taking your claim at face value and, ignoring the insulting login message and site, try a Python3 question and see what response I get. > they just want to tell users to use tools which actually solve > their problems. But that is not what they are doing. Python3 solved many of *my* problems with Python2, and there they are, commanding me and potential readers of my book-in-progress not to use it. If they wanted to help people make an intelligent choice between Python2 and Python3, they would point people to a discussion of the pros and cons of each. There have been several posted on python-list. Anyone who posted either "Do not use Python3" or "Do not use Python2" as a sweeping answer to a generic enquiry about 2 versus 3 might rightfully be blasted as a troll. > If Python 3 had all the features and libraries as Python 2, Python3 has several features that Python2 does not. To me, nearly all the deletions and changes make the language better, much better, for *my* purposes. However, I am glad that the PSF exists to make all versions of Python available indefinitely for anyone who has need of them. I would not dream of saying "Python2: do not use it" to anyone except in response to a question about a specific problem solved in Python3 and not in Python2. Terry Jan Reedy From ncoghlan at gmail.com Sun Jun 20 05:27:52 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jun 2010 13:27:52 +1000 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <1A151F99-EAA2-4897-88B5-B3E765AF66BD@ikanobori.jp> References: <1A151F99-EAA2-4897-88B5-B3E765AF66BD@ikanobori.jp> Message-ID: On Sun, Jun 20, 2010 at 5:55 AM, Simon de Vlieger wrote: > As for the potentially harmful text on Python 3 which is included on the > python-commandments website I do get the hint that it might not be clear > enough that the text does not apply to people who are porting libraries. > This is a complaint I have heard before and to which I will take affirmative > action by explicitly adding text to clarify that. I just read that page, and I believe it could do with a little refinement even from an application developer point of view. Specifically, rather than "Why shouldn't I use it, yet?", a more positive phrasing would be "Should I use it, yet?" or "Is Python 3 ready for me, yet?". And then suggest to app developers that they check the status of Py3k support for libraries they need or think they will need, as these days many of them will provide a 3.x compatible version. Staying on 2.x for now is certainly a viable choice - there's a reason that backports to 2.7 have been a prominent python-dev activity for the last year or two. With that nearly out the door, the focus will switch more to Py3k. Cheers, Nick. P.S. wind the clock back 12 months or so, and I think the page as it currently stands would have been perfectly good advice to app developers. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jun 20 06:05:07 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jun 2010 14:05:07 +1000 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <7F841A1E-213A-40A0-8BAC-3CDEF9838E94@gmail.com> Message-ID: On Sun, Jun 20, 2010 at 11:19 AM, Jacob Kaplan-Moss wrote: > Let's all try to simmer down here a little bit and cut each other some > slack: this is a frustration situation, and we're not going to help it > by heaping more fuel on the fire. The other thing to keep in mind is that there was a time when what the #python folks are still saying *wasn't wrong*. Yes, their advice is too negative for the situation as it stands now. But go back 12 or 18 months and their description would have been far more apt. It sounds like they're happy to update the relevant pages to provide a more balanced perspective now, and that's the important point. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From debatem1 at gmail.com Sun Jun 20 07:39:35 2010 From: debatem1 at gmail.com (geremy condra) Date: Sun, 20 Jun 2010 01:39:35 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <7F841A1E-213A-40A0-8BAC-3CDEF9838E94@gmail.com> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <7F841A1E-213A-40A0-8BAC-3CDEF9838E94@gmail.com> Message-ID: On Sat, Jun 19, 2010 at 9:12 PM, Raymond Hettinger wrote: > > On Jun 19, 2010, at 5:39 PM, geremy condra wrote: >> Bottom line, what I'd really like to do is kick them all off of #python, > > This is so profoundly wrong on so many levels it is hard to know how to respond. Alright, so, yeah- I said it in the heat of the moment and shouldn't have. I apologize. I just hate having to explain to folks that don't know any better that #python doesn't represent the opinions of the people who actually develop python, and I'm going to STFU before I get sucked into this again. Geremy Condra From stephen at xemacs.org Sun Jun 20 11:14:02 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 20 Jun 2010 18:14:02 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <201006201204.30795.steve@pearwood.info> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: <87fx0ikqw5.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > Frankly, I believe that pushing the meme that "Python 3 is different" is > a strategic mistake. I agree that it's strategically undesirable. Unfortunately, the genuine backward incompatibility, as well as the huge mind-share already garnered by what I consider wrong-headed advice from certain quarters have made pushing the meme that "Python 3 is very nearly the same" untenable. It's hard to beat something like "it's not yet time to use Python 3" with a nuanced explanation. > had my experience would have been different. It's bad enough to have to > tell people "Python 3 is currently lacking some critical libraries, > particularly third-party libraries" without also telling them (wrongly > IMO) "oh, and it's a new language too". That's why I propose the C to C++ analogy. True, C++ does introduce a lot of new features, but most programmers migrating from C to C++ don't learn to use them properly for years, if ever, I'm told. Note also that I don't propose this as PSF advertising. I proposed it as a response to Mark's question, "what should I tell my readers?" From solipsis at pitrou.net Sun Jun 20 11:29:36 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jun 2010 11:29:36 +0200 Subject: [Python-Dev] [OT] Re: email package status in 3.X References: <201006201205.46507.steve@pearwood.info> Message-ID: <20100620112936.7ae73935@pitrou.net> On Sun, 20 Jun 2010 12:05:46 +1000 Steven D'Aprano wrote: > On Sun, 20 Jun 2010 12:13:34 am Tres Seaver wrote: > > > > I guess tutorial welcome, rather than patch welcome then ;) > > > > The only folks who can write the tutorial are the ones who have > > already drunk the koolaid. Note that I've been making my living with > > Python for about twelve years now, and would *like* to use Python3, > > but can't, yet, and therefore haven't taken the first sip. > > You emphatically say you would "like" to use Python3, but describe those > who already have as having drunk the Koolaid. Comparing those who can > and have successfully moved to Python3 with the Jonestown cult > mass-suicide doesn't really strike me as a sign that you want to join > them. I have read the expression "drinking the Koolaid" more than once but I didn't know it related to a mass-suicide at all. It changes my comprehension of it quite a bit... Regards Antoine. From solipsis at pitrou.net Sun Jun 20 11:32:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jun 2010 11:32:56 +0200 Subject: [Python-Dev] email package status in 3.X References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <87fx0ikqw5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100620113256.7ba8d86a@pitrou.net> On Sun, 20 Jun 2010 18:14:02 +0900 "Stephen J. Turnbull" wrote: > > > had my experience would have been different. It's bad enough to have to > > tell people "Python 3 is currently lacking some critical libraries, > > particularly third-party libraries" without also telling them (wrongly > > IMO) "oh, and it's a new language too". > > That's why I propose the C to C++ analogy. I think it's an unfortunate analogy. C++ needs new libraries (with brand new APIs) to take advantage of its abstraction capabilities. Python 3 has almost the same abstraction capabilities as Python 2, you don't need to write new libraries: just port the existing ones. > True, C++ does introduce a > lot of new features, but most programmers migrating from C to C++ > don't learn to use them properly for years, if ever, I'm told. I don't see how Python 3 has that problem. You can be productive here and now in Python 3, re-using your knowledge of Python 2 with a bit of added information. Regards Antoine. From ben+python at benfinney.id.au Sun Jun 20 12:31:30 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 20 Jun 2010 20:31:30 +1000 Subject: [Python-Dev] [OT] the Kool-Aid Acid Test (was: email package status in 3.X) References: <201006201205.46507.steve@pearwood.info> Message-ID: <87d3vmov0d.fsf_-_@benfinney.id.au> Steven D'Aprano writes: > Comparing those who can and have successfully moved to Python3 with > the Jonestown cult mass-suicide doesn't really strike me as a sign > that you want to join them. In my experience, many who refer to ?drinking the Kool-Aid? are not referring to the Jonestown suicide cult, but rather to the earlier Electric Kool-Aid Acid Test events of the psychedelic era . -- \ ?Whenever you read a good book, it's like the author is right | `\ there, in the room talking to you, which is why I don't like to | _o__) read good books.? ?Jack Handey | Ben Finney From lvh at laurensvh.be Sun Jun 20 12:35:24 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 12:35:24 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: Hello, I'm one of the active people in #python that some people dislike for behavior with respect to Python 3. First of all I'd like to defuse the situation, much like Jacob. Seriously. It's been a bunch of posts so far and most of them have been pretty angry. Let's take a deep breath and try to fix the situation that's getting people frustrated like grownups :-) (FWIW: I find being called worse than half-intelligent pretty offensive. Let's stop doing that?) The idea being expressed in the IRC topic is _way_ bigger than the room an IRC topic gives you. Yes, it's an imperfect medium, yes, it's probably partially based on the use case: it's just that experience leads us to believe that the vast majority of use cases ends up being more in 2.x turf then 3.x turf, at the very least in the past. I'm sorry if you had the impression people wanted to nail you at the stake for using Python 3. If that's how you felt, it isn't true. I basically agree with Glyph. I don't think we've recently (I'm not omnipresent) told anyone who had any good reasons to to stop using Python 3. If someone's doing work that actually needs Python 3 (most recent example a GSOC student porting Sphinx), we try our best to help, and AFAICT we've mostly been successful. (Please correct me if you think this is erroneous.). We don't get too many people that actually want or need that, but I'm guessing that's mostly because people porting libraries to py3k usually already know what they're doing so they don't need the first-line-of-defense thing for Python questions that #python tries to be. Maybe you disagree on what good reasons are. #python is a bunch of volunteers giving help, free of charge, which is usually of a pretty high standard because they're professional Python developers and have been for a long time. Maybe that biases some of us against Py3k? Fact remains that there's a bunch of active people on IRC who pour a lot of time and effort into #python and make a lot of newbies really happy, and I think the picture you're painting based on a single issue that clearly not everyone agrees on is a bit disrespectful and somewhat unfair. Also, I'm pretty sure nobody has ever said that Python 3.x was a "failure", or anything like it. #python has claims that Python 3.x, as a platform for building production apps, is a work in progress because of third party library support, and the language itself is pretty much done and okay -- a cleaner version of 2.x. People ask why it's too early to use Py3k, and that's _always_ the answer they get: at least the first half, and usually the second half too. In the mean while, we encourage people to write code that will be easy to port and behave well in 3.x: new-style classes, don't use eager versions when the Py3k default is lazy and you don't actually need the eager thing, use as many third party libraries as possible (the idea being that this would minimize effort needed to make the switch on the grand scale of things), use absolute imports always (and only explicit relative, but it's discouraged), always have a full unit test suite. This is advice that generally makes a lot of sense, and it's the recommended thing in PEP 3000 for porting to 3.x as well. We're still telling people to use Python 2.x by default because of a few major things: 1. going out on a limb here: well over 90% of those people are completely new to Python and out of those most of them completely new to programming too, 2. the nicest libraries for doing a lot of stuff aren't ported yet, or are in the process of being ported but not yet recommended for actual use by their authors, (this seems to be a point of contention?) 3. we know how to help people better with it Which are all basically different incarnations of the same issue. People are working on libraries everywhere and I really don't want to pretend those people haven't gotten any work done, but AFAICT a lot of these for existing mature projects that you'd want people to use in order to be happy productive Python users don't really exist yet or are at best experimental. At the very least I think most people can agree that 2.x is still the default release for existing, mature software projects and most new ones too. I can only speak for my own area of intrest: Python is way too big for anyone to have used every piece of software for it ever. I, personally, don't use 3.x because I develop for PyS60 devices, PythonCE devices (2.5 only), and Twisted servers (2.6), and none of those work on 3.x yet. The other thing we build is websites, and AFAIK the web situation, for now, is still "use python2.x", too? (for any non-trivial website, of course). We use AMQP, and the best thing we've found for it is 2.x only (maybe Carrot and Pika do 3.x now, but I can't find any evidence of it). Nobody here (here = place of business) hates Python 3. We just can't use it. I'm very sorry if you've been offended. Like Glyph said: we're not grinding ideological axes. We're just recommending what we honestly believe is the right tool for the job. We're just humans, we're not perfect. We make mistakes. If you feel we've made them, please just tell us and don't start a war. If you tried and failed, please feel free to tell me how (doesn't have to be in public) and why it failed, and maybe I (or someone else!) can try to fix it: that's *not* how stuff is supposed to happen. Maybe someone was being a troll, I haven't checked but I trust the people I run #python with enough to say that it probably wasn't a regular. That's IRC for you: the problem is that if you let everyone speak once in a while trolls open their mouths. Perhaps something someone said was just taken too seriously. I don't know the situation you're referring to, I just know #python. Again, just because someone asked and nobody removed that line ('It's too early...') doesn't mean we're evil pricks that want people to use Python2.x because of some hidden agenda. It just means that person disagrees with the idea that it's a good time to start doing it. IRC can be a harsh place, not because the people are jerks but because the medium just lends itself to it. People are generally a not nicer than they appear. Like Nick said: not too long ago this was perfectly sound advice. I'm convinced it still is; maybe I've (and a lot of people active on #python) been out of touch with recent evolutions and it's no longer true. I don't know. I'm just a bit sad that it had to come angry ventings (no grudges, I realise most of it is probably just frustration). I like to think I'm not wrong when I think that if people just ask "Hey, guys, this Python 3.x rule, don't you think it's about time we reviewed that? It's been up for a long while." people would get banned or anything. Maybe people disagree and think it should still be up there: but at least we could have a productive discussion hopefully resulting in something that makes everyone happy or at the very least less frustrated. I just asked a two regulars and despite the fact that we're about as widespread as we possibly could timezone wise (SE Asia; Western Europe; WA, USA) nobody remembers that happening. Also, on tiwsted Twisted: yes, #python is very Twisted-minded, we have a bunch of people that like it, develop it, have built cool software with it and we think it could help other people too. It's not ideological axe grinding: a lot of the regulars just genuinely like Twisted. I'm sorry if you felt that not liking Twisted was going to get you smacked across the face, but that's not true either: Ronny Pfannschmidt is a regular, and he really doesn't like Twisted. We just think that for a lot of questions people come in with, Twisted is a great solution. That doesn't mean you're not allowed to have contrary opinions or that all dissent is crushed with an iron fist: it just means that the people who actually bother to help others day in day out know Twisted, like Twisted, and think Twisted is a great tool for a lot of problems. If you don't like Twisted, feel free to use something else: just don't complain when nobody can help you because the people offering help are all Twisted users that don't understand your software and don't have time or incentive to. It's a purely pragmatic thing. There's no hidden agenda. I've put bits of this up for review to #python regulars, so when I say 'we' it usually does mean 'we, #python regulars'. Most of it resonates. Maybe we're just in the distortion field? thanks for listening, Laurens From ncoghlan at gmail.com Sun Jun 20 12:57:56 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jun 2010 20:57:56 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100620113256.7ba8d86a@pitrou.net> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <87fx0ikqw5.fsf@uwakimon.sk.tsukuba.ac.jp> <20100620113256.7ba8d86a@pitrou.net> Message-ID: On Sun, Jun 20, 2010 at 7:32 PM, Antoine Pitrou wrote: >> True, C++ does introduce a >> lot of new features, but most programmers migrating from C to C++ >> don't learn to use them properly for years, if ever, I'm told. > > I don't see how Python 3 has that problem. You can be productive here > and now in Python 3, re-using your knowledge of Python 2 with a bit of > added information. Yeah, the significant issues with Python 3 over Python 2 *only* apply to people with legacy Python 2 code to worry about. The one thing that makes Python 3 potentially less desirable than Python 2 for some new applications is that the third party library support isn't quite as good yet. As more of the "big" libraries and frameworks provide Python 3 compatible versions, that factor will go away. As far as I can tell, with 3 years still to go on my own original prediction of 5+ years for Python 3 to start to be competitive with Python 2 for programming mindshare, adoption actually seems to be progressing fairly well. A lot of key functionality is either already supported in Python 3 or will be soon, and a lot of the rest is at least talking about plans for Python 3 compatibility. It's just that 5 years can seem like an eternity in the internet age, so sometimes people see the relative lack of adoption of Python 3 at this stage and start to panic about it being a failure. Now, if we're still having this conversation in 2013, then I'll admit we have a problem with the Python 3 uptake rate ;) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From fuzzyman at voidspace.org.uk Sun Jun 20 13:24:59 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 20 Jun 2010 12:24:59 +0100 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: <73C61D25-CDCC-498E-B1EA-BDA608E2EDEC@voidspace.org.uk> On 20 Jun 2010, at 11:35, Laurens Van Houtven wrote: > Hello, > > > > I'm one of the active people in #python that some people dislike for > behavior with respect to Python 3. > > First of all I'd like to defuse the situation, much like Jacob. > Seriously. It's been a bunch of posts so far and most of them have > been pretty angry. Let's take a deep breath and try to fix the > situation that's getting people frustrated like grownups :-) (FWIW: I > find being called worse than half-intelligent pretty offensive. Let's > stop doing that?) > > The idea being expressed in the IRC topic is _way_ bigger than the > room an IRC topic gives you. Hey Laurens - I don't have an issue with with anything you've said, but given the topic is far more nuanced than an IRC topic can express maybe that just isn't the right place for it. Michael > Yes, it's an imperfect medium, yes, it's > probably partially based on the use case: it's just that experience > leads us to believe that the vast majority of use cases ends up being > more in 2.x turf then 3.x turf, at the very least in the past. > > I'm sorry if you had the impression people wanted to nail you at the > stake for using Python 3. If that's how you felt, it isn't true. I > basically agree with Glyph. I don't think we've recently (I'm not > omnipresent) told anyone who had any good reasons to to stop using > Python 3. If someone's doing work that actually needs Python 3 (most > recent example a GSOC student porting Sphinx), we try our best to > help, and AFAICT we've mostly been successful. (Please correct me if > you think this is erroneous.). We don't get too many people that > actually want or need that, but I'm guessing that's mostly because > people porting libraries to py3k usually already know what they're > doing so they don't need the first-line-of-defense thing for Python > questions that #python tries to be. > > Maybe you disagree on what good reasons are. #python is a bunch of > volunteers giving help, free of charge, which is usually of a pretty > high standard because they're professional Python developers and have > been for a long time. Maybe that biases some of us against Py3k? Fact > remains that there's a bunch of active people on IRC who pour a lot of > time and effort into #python and make a lot of newbies really happy, > and I think the picture you're painting based on a single issue that > clearly not everyone agrees on is a bit disrespectful and somewhat > unfair. > > Also, I'm pretty sure nobody has ever said that Python 3.x was a > "failure", or anything like it. #python has claims that Python 3.x, as > a platform for building production apps, is a work in progress because > of third party library support, and the language itself is pretty much > done and okay -- a cleaner version of 2.x. People ask why it's too > early to use Py3k, and that's _always_ the answer they get: at least > the first half, and usually the second half too. > > In the mean while, we encourage people to write code that will be easy > to port and behave well in 3.x: new-style classes, don't use eager > versions when the Py3k default is lazy and you don't actually need the > eager thing, use as many third party libraries as possible (the idea > being that this would minimize effort needed to make the switch on the > grand scale of things), use absolute imports always (and only explicit > relative, but it's discouraged), always have a full unit test suite. > This is advice that generally makes a lot of sense, and it's the > recommended thing in PEP 3000 for porting to 3.x as well. > > We're still telling people to use Python 2.x by default because of a > few major things: > > 1. going out on a limb here: well over 90% of those people are > completely new to Python and out of those most of them completely new > to programming too, > 2. the nicest libraries for doing a lot of stuff aren't ported yet, or > are in the process of being ported but not yet recommended for actual > use by their authors, (this seems to be a point of contention?) > 3. we know how to help people better with it > > Which are all basically different incarnations of the same issue. > People are working on libraries everywhere and I really don't want to > pretend those people haven't gotten any work done, but AFAICT a lot of > these for existing mature projects that you'd want people to use in > order to be happy productive Python users don't really exist yet or > are at best experimental. At the very least I think most people can > agree that 2.x is still the default release for existing, mature > software projects and most new ones too. > > I can only speak for my own area of intrest: Python is way too big for > anyone to have used every piece of software for it ever. I, > personally, don't use 3.x because I develop for PyS60 devices, > PythonCE devices (2.5 only), and Twisted servers (2.6), and none of > those work on 3.x yet. The other thing we build is websites, and AFAIK > the web situation, for now, is still "use python2.x", too? (for any > non-trivial website, of course). We use AMQP, and the best thing we've > found for it is 2.x only (maybe Carrot and Pika do 3.x now, but I > can't find any evidence of it). Nobody here (here = place of business) > hates Python 3. We just can't use it. > > I'm very sorry if you've been offended. Like Glyph said: we're not > grinding ideological axes. We're just recommending what we honestly > believe is the right tool for the job. We're just humans, we're not > perfect. We make mistakes. If you feel we've made them, please just > tell us and don't start a war. If you tried and failed, please feel > free to tell me how (doesn't have to be in public) and why it failed, > and maybe I (or someone else!) can try to fix it: that's *not* how > stuff is supposed to happen. Maybe someone was being a troll, I > haven't checked but I trust the people I run #python with enough to > say that it probably wasn't a regular. That's IRC for you: the problem > is that if you let everyone speak once in a while trolls open their > mouths. Perhaps something someone said was just taken too seriously. I > don't know the situation you're referring to, I just know #python. > > Again, just because someone asked and nobody removed that line ('It's > too early...') doesn't mean we're evil pricks that want people to use > Python2.x because of some hidden agenda. It just means that person > disagrees with the idea that it's a good time to start doing it. IRC > can be a harsh place, not because the people are jerks but because the > medium just lends itself to it. People are generally a not nicer than > they appear. > > Like Nick said: not too long ago this was perfectly sound advice. I'm > convinced it still is; maybe I've (and a lot of people active on > #python) been out of touch with recent evolutions and it's no longer > true. I don't know. I'm just a bit sad that it had to come angry > ventings (no grudges, I realise most of it is probably just > frustration). I like to think I'm not wrong when I think that if > people just ask "Hey, guys, this Python 3.x rule, don't you think it's > about time we reviewed that? It's been up for a long while." people > would get banned or anything. Maybe people disagree and think it > should still be up there: but at least we could have a productive > discussion hopefully resulting in something that makes everyone happy > or at the very least less frustrated. I just asked a two regulars and > despite the fact that we're about as widespread as we possibly could > timezone wise (SE Asia; Western Europe; WA, USA) nobody remembers that > happening. > > Also, on tiwsted Twisted: yes, #python is very Twisted-minded, we have > a bunch of people that like it, develop it, have built cool software > with it and we think it could help other people too. It's not > ideological axe grinding: a lot of the regulars just genuinely like > Twisted. I'm sorry if you felt that not liking Twisted was going to > get you smacked across the face, but that's not true either: Ronny > Pfannschmidt is a regular, and he really doesn't like Twisted. We just > think that for a lot of questions people come in with, Twisted is a > great solution. That doesn't mean you're not allowed to have contrary > opinions or that all dissent is crushed with an iron fist: it just > means that the people who actually bother to help others day in day > out know Twisted, like Twisted, and think Twisted is a great tool for > a lot of problems. If you don't like Twisted, feel free to use > something else: just don't complain when nobody can help you because > the people offering help are all Twisted users that don't understand > your software and don't have time or incentive to. It's a purely > pragmatic thing. There's no hidden agenda. > > I've put bits of this up for review to #python regulars, so when I say > 'we' it usually does mean 'we, #python regulars'. Most of it > resonates. Maybe we're just in the distortion field? > > > thanks for listening, > Laurens > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk From lvh at laurensvh.be Sun Jun 20 13:33:35 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 13:33:35 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <73C61D25-CDCC-498E-B1EA-BDA608E2EDEC@voidspace.org.uk> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <73C61D25-CDCC-498E-B1EA-BDA608E2EDEC@voidspace.org.uk> Message-ID: Michael, Fair point! It's mostly put in the topic so people can ask about it and we can give them more detailed answers, because, as other people have mentioned, the exact answer depends largely on what *precisely* someone is doing. I'm not sure what sort of an effect it would have if we took it out. Maybe something we could try? I'm not sure it'd have much of a practical effect since most of the regulars expertise isn't going to shift instantly, so getting actual help is probably going to be a bit rough on 3.x users. At the very least I'm going to take this suggestion to #python's regulars and see what they have to say about it :-) (One of the problems people I've talked to in private that were "pretty miffed" about is the dissonance between #python and python-dev, and that there's some problem with people assuming things said on #python as being very authoritative answers (ha ha). I think this is really bad for Python as a whole and I've love to hear ideas on how you guys think it could be fixed.) thanks Laurens From g.rodola at gmail.com Sun Jun 20 14:26:28 2010 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Sun, 20 Jun 2010 14:26:28 +0200 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <201006201204.30795.steve@pearwood.info> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: 2010/6/20 Steven D'Aprano : > Python 2.x introduced Unicode strings. Python 3.x merely makes them the > default. "Merely"? To me this looks as the main reason why a lot of projects haven't been ported to Python 3 yet. I attempted to port pyftpdlib to python 3 several times and the biggest show stopper has always been the bytes / string difference introduced by Python 3 which forces you to *know* and *use* Unicode every time you deal with some text and 2to3 is completely useless here. I can only imagine how difficult can it be to do such a conversion in a project like Twisted or Django where the I/O plays a fundamental role. The choice of forcing the user to use Unicode and "think in Unicode" was a very brave one, and I'm sure it's for the better, but not everyone wants to deal with that because Unicode is hard to swallow. The majority of people prefer to stay with bytes and eventually learn and introduce Unicode only when that is actually needed. --- Giampaolo http://code.google.com/p/pyftpdlib http://code.google.com/p/psutil From ncoghlan at gmail.com Sun Jun 20 14:30:08 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jun 2010 22:30:08 +1000 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <73C61D25-CDCC-498E-B1EA-BDA608E2EDEC@voidspace.org.uk> Message-ID: On Sun, Jun 20, 2010 at 9:33 PM, Laurens Van Houtven wrote: > I'm not sure what sort of an effect it would have if we took it out. > Maybe something we could try? I'm not sure it'd have much of a > practical effect since most of the regulars expertise isn't going to > shift instantly, so getting actual help is probably going to be a bit > rough on 3.x users. Given the number of other links that are already in the status message, it would be really nice if the comment could be updated to something like: "Is Python3 ready for me? http://python-commandments.org/python3.html" i.e. make it clear that this is a question where the answer will vary based on your use case, and provide a clear direction on where to get more information. That page could then be updated to give a more balance view of the pros of Python 3 (e.g. cleaner core language design, future direction of the language, much better Unicode support) and the pros of Python 2 (e.g. wider installed base, better current third party library support, greater existing developer base, larger support ecosystem, greater #python expertise) > (One of the problems people I've talked to in private that were > "pretty miffed" about is the dissonance between #python and > python-dev, and that there's some problem with people assuming things > said on #python as being very authoritative answers (ha ha). I think > this is really bad for Python as a whole and I've love to hear ideas > on how you guys think it could be fixed.) There are always going to be differences in how different communities see the world and even the "Python community" is far too large to have a consistent point of view on almost any topic. So we'll likely have to muddle through with various ideas slowly percolating through to different parts of the community. That said, keeping in touch with the #python crew is certainly something we haven't paid much attention to in the past, but is probably just as important as staying in touch with major library developers and the developers of other implementations. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sun Jun 20 14:51:40 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jun 2010 14:51:40 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <73C61D25-CDCC-498E-B1EA-BDA608E2EDEC@voidspace.org.uk> Message-ID: <20100620145140.68f22791@pitrou.net> On Sun, 20 Jun 2010 13:33:35 +0200 Laurens Van Houtven wrote: > > (One of the problems people I've talked to in private that were > "pretty miffed" about is the dissonance between #python and > python-dev, and that there's some problem with people assuming things > said on #python as being very authoritative answers (ha ha). I think > this is really bad for Python as a whole and I've love to hear ideas > on how you guys think it could be fixed.) Perhaps lower the tone a bit on http://pound-python.org/ ? ?foremost support system for developing quality Python applications? ... ?crack team of Python experts? ... ?Your time won't be wasted by architecture astronauts or trivial repetitions of the docs?. (I understand these are slightly tongue-in-cheek but, if this page is intented mainly for beginners, I think being descriptive is more valuable) Also, mention other support options there - primarily comp.lang.python, of course, and the official documentation pages. Regards Antoine. From N.D.Efford at leeds.ac.uk Sun Jun 20 15:08:15 2010 From: N.D.Efford at leeds.ac.uk (Nick Efford) Date: Sun, 20 Jun 2010 14:08:15 +0100 (BST) Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: Message-ID: > I'm sorry if you had the impression people wanted to nail you at the > stake for using Python 3. If that's how you felt, it isn't true. I > basically agree with Glyph. I don't think we've recently (I'm not > omnipresent) told anyone who had any good reasons to to stop using > Python 3. If someone's doing work that actually needs Python 3 (most > recent example a GSOC student porting Sphinx), we try our best to > help, and AFAICT we've mostly been successful. (Please correct me if > you think this is erroneous.). We don't get too many people that > actually want or need that, but I'm guessing that's mostly because > people porting libraries to py3k usually already know what they're > doing so they don't need the first-line-of-defense thing for Python > questions that #python tries to be. Thanks for explaining your position on this so carefully, Laurens. You've made many reasonable points which I hope will help to cool things down a little. Clearly, there are situations where it makes sense to advocate Python 2.X and other situations where people can be encouraged to consider Python 3. The issues that potential users need to consider are too subtle to be represented fairly by the simple advice to 'avoid Python 3', so can we not all agree to remove it as a #python topic as a gesture of goodwill? Nobody need change their opinions or adovacy as a result, but it would have the benefit of presenting #python in a more neutral and inclusive light. I've not used IRC much in the past, but if it would be useful for someone like myself - a longtime Python user but recent and enthusiastic Python 3 adopter - to offer my opinions and advice on the issue to newcomers then I'm certainly willing to get involved. > We're still telling people to use Python 2.x by default because of a > few major things: > > 1. going out on a limb here: well over 90% of those people are > completely new to Python and out of those most of them completely new > to programming too, Not sure if I agree with you here; I regard people new to programming as the prime candidates for using Python 3. Many of the language changes have the effect of making it significantly easier to learn for newcomers (I wrote about this a while ago - see http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html). Also, people new to Python or programming in general won't have the burden of legacy code that needs to be converted. The only situation in which I'd direct someone new to programming away from Python 3 would be if they had a specific need to use a library that wasn't yet supported. > 2. the nicest libraries for doing a lot of stuff aren't ported yet, or > are in the process of being ported but not yet recommended for actual > use by their authors, (this seems to be a point of contention?) This has certainly been the key issue for me. Only in the past two or three months have we got to the point where I feel can commit to Python 3 fully. Six months ago, I definitely could not have done so. This is progress, and we need to be positive about it. Regards, Nick -- Dr Nick Efford, School of | E: N.D.Efford at leeds.ac.uk Computing, University of | T: +44 113 343 6809 Leeds, Leeds, LS2 9JT, UK | W: http://www.comp.leeds.ac.uk/nde/ --------------------------+----------------------------------------- PGP fingerprint: 6ADF 16C2 4E2D 320B F537 8F3C 402D 1C78 A668 8492 From lvh at laurensvh.be Sun Jun 20 16:46:03 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 16:46:03 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <73C61D25-CDCC-498E-B1EA-BDA608E2EDEC@voidspace.org.uk> Message-ID: On Sun, Jun 20, 2010 at 2:30 PM, Nick Coghlan wrote: > On Sun, Jun 20, 2010 at 9:33 PM, Laurens Van Houtven wrote: > Given the number of other links that are already in the status > message, it would be really nice if the comment could be updated to > something like: > > "Is Python3 ready for me? http://python-commandments.org/python3.html" Sounds like a great idea, I'll run it past the other folks. > i.e. make it clear that this is a question where the answer will vary > based on your use case, and provide a clear direction on where to get > more information. I think the reason #python regulars never saw this as a problem is because people who actually ask do get this answer. At least they do if Aaron, Allen, Brendon, Clovis, Stephen, Devin, me... (list of names way too numerous to be exhaustive) are awake. Maybe the strong language does scare people off from that critical asking-for-more-information step, so yes, reviewing that would be a good idea. > There are always going to be differences in how different communities > see the world and even the "Python community" is far too large to have > a consistent point of view on almost any topic. So we'll likely have > to muddle through with various ideas slowly percolating through to > different parts of the community. That said, keeping in touch with the > #python crew is certainly something we haven't paid much attention to > in the past, but is probably just as important as staying in touch > with major library developers and the developers of other > implementations. My thoughts exactly on both counts. Communication good, embrace heterogeneity :) > Cheers, > Nick. > Thanks for your input, Laurens From lvh at laurensvh.be Sun Jun 20 16:50:28 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 16:50:28 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <20100620145140.68f22791@pitrou.net> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <73C61D25-CDCC-498E-B1EA-BDA608E2EDEC@voidspace.org.uk> <20100620145140.68f22791@pitrou.net> Message-ID: On Sun, Jun 20, 2010 at 2:51 PM, Antoine Pitrou wrote: > On Sun, 20 Jun 2010 13:33:35 +0200 > Laurens Van Houtven wrote: > Perhaps lower the tone a bit on http://pound-python.org/ ? > ?foremost support system for developing quality Python > applications? ... ?crack team of Python experts? ... ?Your time won't > be wasted by architecture astronauts or trivial repetitions of the > docs?. Noted, we'll say the same thing but differently. > (I understand these are slightly tongue-in-cheek but, if this page is > intented mainly for beginners, I think being descriptive is more > valuable) Yes, it is tongue-in-cheek, but perhaps a bit too much so :-) I didn't write it, it just never struck me as a problem at the time. I think the problem is that that page was created to fix a very specific problem (explaining why #python isn't a search engine), and it probably got written more out of something snapping than an attempt to be informative. > Also, mention other support options there - primarily comp.lang.python, > of course, and the official documentation pages. Will do. > Regards > > Antoine. Thanks for your input, Laurens From ncoghlan at gmail.com Sun Jun 20 16:58:59 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Jun 2010 00:58:59 +1000 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: Message-ID: On Sun, Jun 20, 2010 at 11:08 PM, Nick Efford wrote: > Not sure if I agree with you here; I regard people new to > programming as the prime candidates for using Python 3. ?Many of > the language changes have the effect of making it significantly > easier to learn for newcomers (I wrote about this a while ago - > see http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html). That's actually one of the better write-ups I've seen regarding several of the key benefits of the Python 3 transition. They're easy to lose sight of when discussing the topic with the existing developers that are bearing the cost of converting their code due to changes that were made primarily for the benefit of new users. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at holdenweb.com Sun Jun 20 17:37:53 2010 From: steve at holdenweb.com (Steve Holden) Date: Mon, 21 Jun 2010 00:37:53 +0900 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> Message-ID: Glyph Lefkowitz wrote: > On Jun 19, 2010, at 5:02 PM, Terry Reedy wrote: > >> HoweverI have very little experience with IRC and consequently have >> little idea what getting a permanent, owned, channel like #python >> entails. Hence the '?' that follows. >> >> What do others think? > > Sure, this is a good idea. > > Technically speaking, this is extremely easy. Somebody needs to "/msg > chanserv register #python3" and that's about it. (In this case, that > "someone" may need to be Brett Cannon, since he is the official group > contact for Freenode regarding Python-related channels.) > > Practically speaking, you will need a group of at least a dozen > contributors, each in a different timezone, who sit there all day > answering questions :). Otherwise the ownership of the channel is just > a signpost pointing at an empty room. > Which is yet another reason I don't think it would be productive to attempt any kind of pre-emptive action against the #python team. They do serve a very useful purpose, and there is reasoned logic behind their position even if we might wish it were different. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From stephen at thorne.id.au Sun Jun 20 17:38:33 2010 From: stephen at thorne.id.au (Stephen Thorne) Date: Mon, 21 Jun 2010 01:38:33 +1000 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: <20100620153833.GD20639@thorne.id.au> On 2010-06-19, Arc Riley wrote: > You mean Twisted support, No. I don't. Often, on #python, we get the situation where someone approaches us saying, "I have this problem in my python code, why does this not work for me?" and usually very quickly we establish the programmer has followed a tutorial or attempted to use a library that depends on python 2, but the programmer is running python 3. Queried on why they are using python 3, the answer is frequently, "Because I downloaded the latest version." For those people, we believe it is too early to use python 3. When talking to these people with a world view of "why shouldn't i use the latest version" having a concrete preexisting statement in the topic we can point to is invaluable. We don't always ask those who are having python 3 problems to go to python2. Often we simply explain about all strings bring unicode or print now being a function, and the conversation dies. There are also programmers who definately should be using python 3 for their work. They know who they are. They do receive support in #python. -- In writing this email to python-dev, I have reviewed my logs of #python specifically looking for the phrase 'python 3'. Here are some packages that were named in the conversations: - py2exe - cx_Freeze - twisted - PIL - ctypes - email I present this list because they are what programmers are coming to #python to ask about, and that may be relevent to your discussion about python 3 ports. -- Regards, Stephen Thorne From lvh at laurensvh.be Sun Jun 20 17:51:28 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 17:51:28 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: Message-ID: On Sun, Jun 20, 2010 at 3:08 PM, Nick Efford wrote: > Thanks for explaining your position on this so carefully, > Laurens. ?You've made many reasonable points which I hope will > help to cool things down a little. Cool, glad it's appreciated. > Clearly, there are situations where it makes sense to advocate > Python 2.X and other situations where people can be encouraged to > consider Python 3. ?The issues that potential users need to > consider are too subtle to be represented fairly by the simple > advice to 'avoid Python 3', so can we not all agree to remove > it as a #python topic as a gesture of goodwill? I like the idea of changing it to something that points to a more detailed thing as someone suggested above. Ideally short and completely neutral, like "2.x or 3.x? http://shorturl/whatever". >?Nobody need change their opinions or adovacy as a result, I very much doubt that'd happen anyway ;-) > but it would have the benefit of presenting #python in a more > neutral and inclusive light. +1 > I've not used IRC much in the past, but if it would be useful for > someone like myself - a longtime Python user but recent and > enthusiastic Python 3 adopter - to offer my opinions and advice > on the issue to newcomers then I'm certainly willing to get > involved. Everybody's very welcome, the entire reason I'm putting time into this is because apparently some people felt less welcome than I'd like them to feel :-) >> We're still telling people to use Python 2.x by default because of a >> few major things: >> >> 1. going out on a limb here: well over 90% of those people are >> completely new to Python and out of those most of them completely new >> to programming too, > > Not sure if I agree with you here; I regard people new to > programming as the prime candidates for using Python 3. ?Many of > the language changes have the effect of making it significantly > easier to learn for newcomers (I wrote about this a while ago - > see http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html). > Also, people new to Python or programming in general won't have > the burden of legacy code that needs to be converted. Very nice read. Most points are indeed common questions, we just tell people how to work around them in 2.x. ie, whenever someone posts old-style classes, someone will always point out to them that they really probably want new-style even if they don't get the difference yet; for integer division we tell people to convert to float or from __future__ import division, if you use print call it with exactly one string and just build that string, never ever ever use input, just use raw_input, that sort of stuff. Not always very clean, more of a workaround. Also stuff like chevron print is actively discouraged in favor of using a logging module or eg sys.stderr. Of course, in py3k where you don't have to, which is even nicer :-) I'm guessing it's okay to link to this from the newer, more neutral pages? :-) > The only situation in which I'd direct someone new to programming > away from Python 3 would be if they had a specific need to use a > library that wasn't yet supported. Yeah, I think the reason for that rule is that the majority of people asking about new software actually start or end up in this category. No statistics to back that up, but the regulars seem to agree (again, maybe we're biased). See Steve Thorne (Jerub)'s post in a parallel thread. Usually it's because they want to do something that people have already solved, and #python is pretty strict about discouraging implementing software that already exists. Of course, as the porting of Python 3.x packages progresses this point becomes more and more moot. A possible solution is that we suggest that people, instead of rolling their own thing from scratch, help to port an existing good 2.x lib to 3.x, or use 2.x? I don't think it's a good idea to start encouraging NIH in new programmers :-) >> 2. the nicest libraries for doing a lot of stuff aren't ported yet, or >> are in the process of being ported but not yet recommended for actual >> use by their authors, (this seems to be a point of contention?) > > This has certainly been the key issue for me. ?Only in the past > two or three months have we got to the point where I feel can commit > to Python 3 fully. ?Six months ago, I definitely could not have > done so. ?This is progress, and we need to be positive about it. Yeah, that message has been in the /topic for _WAY_ longer than 6 months. > > Regards, > > > Nick Thank you very much for your input, Laurens From steve at holdenweb.com Sun Jun 20 18:00:03 2010 From: steve at holdenweb.com (Steve Holden) Date: Mon, 21 Jun 2010 01:00:03 +0900 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <20100620153833.GD20639@thorne.id.au> References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> Message-ID: <4C1E3B03.5060802@holdenweb.com> Stephen Thorne wrote: > On 2010-06-19, Arc Riley wrote: >> You mean Twisted support, > > No. I don't. > > Often, on #python, we get the situation where someone approaches us saying, "I > have this problem in my python code, why does this not work for me?" and > usually very quickly we establish the programmer has followed a tutorial or > attempted to use a library that depends on python 2, but the programmer is > running python 3. > > Queried on why they are using python 3, the answer is frequently, "Because I > downloaded the latest version." > > For those people, we believe it is too early to use python 3. When talking to > these people with a world view of "why shouldn't i use the latest version" > having a concrete preexisting statement in the topic we can point to is > invaluable. > > We don't always ask those who are having python 3 problems to go to python2. > Often we simply explain about all strings bring unicode or print now being a > function, and the conversation dies. > > There are also programmers who definately should be using python 3 for their > work. They know who they are. They do receive support in #python. > > -- > > In writing this email to python-dev, I have reviewed my logs of #python > specifically looking for the phrase 'python 3'. Here are some packages that > were named in the conversations: > > - py2exe > - cx_Freeze > - twisted > - PIL > - ctypes > - email > > I present this list because they are what programmers are coming to #python to > ask about, and that may be relevent to your discussion about python 3 ports. > Given the amount of interest this thread has generated I can't help wondering why it isn't more prominent in python.org content. Is the developer community completely disjoint with the web content editor community? If there is such a disconnect we should think about remedying it: a large "Python 2 or 3?" button could link to a reasoned discussion of the pros and cons as evinced in this thread. That way people will end up with the right version more often (and be writing Python 2 that will more easily migrate to Python 3, if they cannot yet use 3). There seems to be a perception that the PSF can help fund developments, and indeed Jesse Noller has made a small start with his sprint funding proposal (which now has some funding behind it). I think if it is to do so the Foundation will have to look for substantial new funding. I do not currently understand where this funding would come from, and would like to tap your developer creativity in helping to define how the Foundation can effectively commit more developer time to Python. GSoC and GHOP are great examples, but there is plenty of room for all sorts of initiatives that result in development opportunities. I'd like to help. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sun Jun 20 18:00:03 2010 From: steve at holdenweb.com (Steve Holden) Date: Mon, 21 Jun 2010 01:00:03 +0900 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <20100620153833.GD20639@thorne.id.au> References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> Message-ID: <4C1E3B03.5060802@holdenweb.com> Stephen Thorne wrote: > On 2010-06-19, Arc Riley wrote: >> You mean Twisted support, > > No. I don't. > > Often, on #python, we get the situation where someone approaches us saying, "I > have this problem in my python code, why does this not work for me?" and > usually very quickly we establish the programmer has followed a tutorial or > attempted to use a library that depends on python 2, but the programmer is > running python 3. > > Queried on why they are using python 3, the answer is frequently, "Because I > downloaded the latest version." > > For those people, we believe it is too early to use python 3. When talking to > these people with a world view of "why shouldn't i use the latest version" > having a concrete preexisting statement in the topic we can point to is > invaluable. > > We don't always ask those who are having python 3 problems to go to python2. > Often we simply explain about all strings bring unicode or print now being a > function, and the conversation dies. > > There are also programmers who definately should be using python 3 for their > work. They know who they are. They do receive support in #python. > > -- > > In writing this email to python-dev, I have reviewed my logs of #python > specifically looking for the phrase 'python 3'. Here are some packages that > were named in the conversations: > > - py2exe > - cx_Freeze > - twisted > - PIL > - ctypes > - email > > I present this list because they are what programmers are coming to #python to > ask about, and that may be relevent to your discussion about python 3 ports. > Given the amount of interest this thread has generated I can't help wondering why it isn't more prominent in python.org content. Is the developer community completely disjoint with the web content editor community? If there is such a disconnect we should think about remedying it: a large "Python 2 or 3?" button could link to a reasoned discussion of the pros and cons as evinced in this thread. That way people will end up with the right version more often (and be writing Python 2 that will more easily migrate to Python 3, if they cannot yet use 3). There seems to be a perception that the PSF can help fund developments, and indeed Jesse Noller has made a small start with his sprint funding proposal (which now has some funding behind it). I think if it is to do so the Foundation will have to look for substantial new funding. I do not currently understand where this funding would come from, and would like to tap your developer creativity in helping to define how the Foundation can effectively commit more developer time to Python. GSoC and GHOP are great examples, but there is plenty of room for all sorts of initiatives that result in development opportunities. I'd like to help. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From lvh at laurensvh.be Sun Jun 20 18:15:01 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 18:15:01 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> Message-ID: On Sun, Jun 20, 2010 at 5:37 PM, Steve Holden wrote: > Glyph Lefkowitz wrote: >> On Jun 19, 2010, at 5:02 PM, Terry Reedy wrote: > Which is yet another reason I don't think it would be productive to > attempt any kind of pre-emptive action against the #python team. They do > serve a very useful purpose, and there is reasoned logic behind their > position even if we might wish it were different. > > regards > ?Steve I'm one of them so I'm a bit biased, but I'd say the biggest argument is that it's not set in stone (I'm trying to fix it and the regulars have been nothing but cooperative). Nobody from the #python people realized this was a huge thing for people up until today. It's been up there for a long time, and it's becoming less and less defensible every passing day (and that's a good thing!), so we're basically debating what ought to change and when. It's not really a matter of disliking, it's more of a matter of "um, it's still up there because nobody thought it had to go" :-) FWIW: I think a separate #python3 channel would be a really bad idea. thanks Laurens From lvh at laurensvh.be Sun Jun 20 18:20:22 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 18:20:22 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> Message-ID: Status update: Topic now says: NO LOL | Don't paste in here: use http://paste.pocoo.org/ | http://pound-python.org/ | Include Python version in questions | 2.x or 3.x? http://tinyurl.com/py2or3 | Tutorial: http://docs.python.org/tut/ | FAQ: http://effbot.org/pyfaq/ | New Programmer? Read http://tinyurl.com/thinkcspy2e | #python.web #wsgi #python-fr #python.de #python-es #python.tw #python.pl #python-br #python-nl Right now the shorturl points to the excellent http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html by Nick Efford, until we get the Py2.x vs Py3.x page as suggested above done, which will hopefully be in the next few hours. pound-python.org not touched yet because 1) the appropriate person isn't available right now 2) it's not as pressing a matter as the other thing. Thanks again for everyone's input on all of python-dev, #python, #python-offtopic, Laurens From solipsis at pitrou.net Sun Jun 20 19:01:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jun 2010 19:01:59 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> <4C1E3B03.5060802@holdenweb.com> Message-ID: <20100620190159.76973b55@pitrou.net> On Mon, 21 Jun 2010 01:00:03 +0900 Steve Holden wrote: > > Given the amount of interest this thread has generated I can't help > wondering why it isn't more prominent in python.org content. Is the > developer community completely disjoint with the web content editor > community? Sorry for a naive question, but what is the web content editor community? Regards Antoine. From steve at holdenweb.com Sun Jun 20 19:07:01 2010 From: steve at holdenweb.com (Steve Holden) Date: Mon, 21 Jun 2010 02:07:01 +0900 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> Message-ID: <4C1E4AB5.9070502@holdenweb.com> Laurens Van Houtven wrote: > Status update: > > Topic now says: > > NO LOL | Don't paste in here: use http://paste.pocoo.org/ | > http://pound-python.org/ | Include Python version in questions | 2.x or 3.x? > http://tinyurl.com/py2or3 | Tutorial: http://docs.python.org/tut/ | FAQ: > http://effbot.org/pyfaq/ | New Programmer? Read > http://tinyurl.com/thinkcspy2e | #python.web #wsgi #python-fr #python.de > #python-es #python.tw #python.pl #python-br #python-nl > > Right now the shorturl points to the excellent > http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html by Nick Efford, > until we get the Py2.x vs Py3.x page as suggested above done, which > will hopefully be in the next few hours. > > pound-python.org not touched yet because 1) the appropriate person > isn't available right now 2) it's not as pressing a matter as the > other thing. > > > Thanks again for everyone's input on all of python-dev, #python, > #python-offtopic, > Laurens > And thanks for engaging so directly and responsively. The Python community has impressed me again with its maturity. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From martin at v.loewis.de Sun Jun 20 19:23:55 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 20 Jun 2010 19:23:55 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <20100620190159.76973b55@pitrou.net> References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> <4C1E3B03.5060802@holdenweb.com> <20100620190159.76973b55@pitrou.net> Message-ID: <4C1E4EAB.5000404@v.loewis.de> Am 20.06.2010 19:01, schrieb Antoine Pitrou: > On Mon, 21 Jun 2010 01:00:03 +0900 > Steve Holden wrote: >> >> Given the amount of interest this thread has generated I can't help >> wondering why it isn't more prominent in python.org content. Is the >> developer community completely disjoint with the web content editor >> community? > > Sorry for a naive question, but what is the web content editor > community? I think he's talking about the editors of www.python.org. Regards, Martin From martin at v.loewis.de Sun Jun 20 19:30:40 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 20 Jun 2010 19:30:40 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> Message-ID: <4C1E5040.9040504@v.loewis.de> Am 20.06.2010 18:20, schrieb Laurens Van Houtven: > 2.x or 3.x? > http://tinyurl.com/py2or3 If you are interested, we could host any material that somebody would want to provide on http://python.org/py2or3 (which would be one letter shorter :-). We could also make this a redirect. Regards, Martin From stephen at xemacs.org Sun Jun 20 19:30:17 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 21 Jun 2010 02:30:17 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100620113256.7ba8d86a@pitrou.net> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <87fx0ikqw5.fsf@uwakimon.sk.tsukuba.ac.jp> <20100620113256.7ba8d86a@pitrou.net> Message-ID: <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > I think it's an unfortunate analogy. Propose a better one, then. I'm definitely not wedded to the ones I've proposed! But we have a PR problem *now*. The loyal opposition clearly intend to continue trash-talking Python 3 until the libraries get to 100% (or a government-approved approximation of 100%). The topic on #python seems unlikely to change at this point, with both Glyph and JP pointedly failing to denounce it publicly, while Stephen defends it and says it's not going to change as long as the libraries aren't done. What do you suggest? Or do you think there's no PR problem we should worry about, just accept that this going to be a further drag on adoption and improvement, and keep on keeping on? From martin at v.loewis.de Sun Jun 20 19:41:37 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 20 Jun 2010 19:41:37 +0200 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: <4C1E52D1.4030801@v.loewis.de> > I can only imagine how difficult can it be to do such a conversion in > a project like Twisted or Django where the I/O plays a fundamental > role. For Django, you don't need to imagine, but can look at the actual changes: http://bitbucket.org/loewis/django-3k/ > The choice of forcing the user to use Unicode and "think in Unicode" > was a very brave one, and I'm sure it's for the better, but not > everyone wants to deal with that because Unicode is hard to swallow. > The majority of people prefer to stay with bytes and eventually learn > and introduce Unicode only when that is actually needed. It's not really an issue with "Unicode", but rather with "characters". Surprisingly, most people don't grasp the notion of "abstract character". This is similar to not grasping the notion of "abstract integral number", which most programmers master over time (although my students typically need a year or more to get the difference between "decimal number", "two's complement", and "abstract integer"; the difference between "character string" and "number" is easier (*)). For numbers, programmers are forced to accept the abstraction. For character strings, they apparently resist much more. Regards, Martin (*) An anecdotal dialog may read like this Teacher: "How are numbers represented in Python?" Student: "In decimal." T: "How so?" S: "I can do x = 47 and it is decimal. I can then do print x and get "47". See?" From stephen at xemacs.org Sun Jun 20 19:42:53 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 21 Jun 2010 02:42:53 +0900 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: Message-ID: <87eig1fvmq.fsf@uwakimon.sk.tsukuba.ac.jp> Laurens Van Houtven writes: > > The only situation in which I'd direct someone new to programming > > away from Python 3 would be if they had a specific need to use a > > library that wasn't yet supported. > > Yeah, I think the reason for that rule is that the majority of people > asking about new software actually start or end up in this category. I think that the most experienced people have absurdly high standards for "support" compared to those new to programming. I hope they check their advice against the real requirements of the new programmer. > Usually it's because they want to do something that people have > already solved, If they're new to programming, they're already in adventure mode. Why not point out the Road Less Traveled? That will make all the difference. Of course you should point out that it's going to be bumpier, and of course that is likely to push the majority of practical folks back to Python 2. But some of them are likely to be willing to endure a bit of frustration, especially if they're told that their bug reports will be listened to seriously on python-dev (given help from an experienced hand in formatting them!) > A possible solution is that we suggest that people, instead of > rolling their own thing from scratch, help to port an existing good > 2.x lib to 3.x, or use 2.x? Exactly. Don't give them rose-colored glasses about porting, and warn that some are just plain broken (eg, because of inappropriate assumptions about bytes vs Unicode). But on the other hand, some will mostly work for them, and their bug reports on the corner cases will be helpful. > I don't think it's a good idea to start encouraging NIH in new > programmers :-) Agreed. From solipsis at pitrou.net Sun Jun 20 19:47:47 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jun 2010 19:47:47 +0200 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <87fx0ikqw5.fsf@uwakimon.sk.tsukuba.ac.jp> <20100620113256.7ba8d86a@pitrou.net> <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100620194747.0c3d0a82@pitrou.net> On Mon, 21 Jun 2010 02:30:17 +0900 "Stephen J. Turnbull" wrote: > Antoine Pitrou writes: > > > I think it's an unfortunate analogy. > > Propose a better one, then. I'm definitely not wedded to the ones > I've proposed! I'm not sure why you want an analogy. Python 3 improves the language and drops legacy cruft. Bringing C++ makes the description unnecessarily contentious and loaded (because C++ has a rather bad reputation amongst many people; recently Linus Torvalds explained again why he thought C was much more appropriate a programming language). And it's not even warranted, because the situation is vastly different. > What do you suggest? Or do you think there's no PR problem we should > worry about, just accept that this going to be a further drag on > adoption and improvement, and keep on keeping on? I suppose the PR problem could be solved by having an official page on python.org explain what the new features and advantages of Python 3 over Python 2 are. There's no such thing right now; actually, I'm not sure there's a Web page explaining clearly what the difference is about, why it was done in such a compatibility-breaking way, and what we advise (both actual and potential) users to do. I suppose that's a task for the "Web content editor community". Regards Antoine. From turnbull at sk.tsukuba.ac.jp Sun Jun 20 19:48:01 2010 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Mon, 21 Jun 2010 02:48:01 +0900 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: <87d3vlfve6.fsf@uwakimon.sk.tsukuba.ac.jp> Laurens Van Houtven writes: > Also, I'm pretty sure nobody has ever said that Python 3.x was a > "failure", or anything like it. #python has claims that Python 3.x, as > a platform for building production apps, is a work in progress How about "Python 3 is a work in progress" for the topic? That seems to me to strike exactly the right balance, and encourage the interested to ask the right kind of question. From solipsis at pitrou.net Sun Jun 20 19:55:47 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jun 2010 19:55:47 +0200 Subject: [Python-Dev] email package status in 3.X References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: <20100620195547.3c7882ca@pitrou.net> On Sun, 20 Jun 2010 14:26:28 +0200 Giampaolo Rodol? wrote: > I attempted to port pyftpdlib to python 3 several times and the > biggest show stopper has always been the bytes / string difference > introduced by Python 3 which forces you to *know* and *use* Unicode > every time you deal with some text and 2to3 is completely useless > here. I don't really understand what the difficulties are. A character is a character; to convert from bytes to characters needs to know the encoding, which your protocol should specify somewhere (of course, I suppose FTP is old and crummy enough that it may not specify anything). An "encoding" is nothing more than a transformation. When you get gzipped data, you must decompress it before doing anything useful out of it. Similarly, when you get (say) UTF-8 data, you must decode it before doing anything useful out of it. > I can only imagine how difficult can it be to do such a conversion in > a project like Twisted or Django where the I/O plays a fundamental > role. Twisted actually seems to enforce the bytes / unicode separation quite well already, so I don't think they should have many problems on that front. Modern Web frameworks seem to be in the same boat (they already give the Web developer unicode strings to play with, and handle the encoding/decoding at the IO boundary transparently). > The choice of forcing the user to use Unicode and "think in Unicode" > was a very brave one, and I'm sure it's for the better, but not > everyone wants to deal with that because Unicode is hard to swallow. Could Google fund a project named "Unicode Swallow"? Regards Antoine. From guido at python.org Sun Jun 20 19:57:05 2010 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Jun 2010 10:57:05 -0700 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: On Sun, Jun 20, 2010 at 5:26 AM, Giampaolo Rodol? wrote: > 2010/6/20 Steven D'Aprano : >> Python 2.x introduced Unicode strings. Python 3.x merely makes them the >> default. > > "Merely"? To me this looks as the main reason why a lot of projects > haven't been ported to Python 3 yet. > I attempted to port pyftpdlib to python 3 several times and the > biggest show stopper has always been the bytes / string difference > introduced by Python 3 which forces you to *know* and *use* Unicode > every time you deal with some text Ah, but this is the crux of the difference between Python 2 and 3. The distinction between text and bytes is crucial, and Python 2 tried to paper over the differences in a way that led to endless pain. Many clumsy and shaky hacks have been invented to alleviate the pain but it never goes away. Python 3 takes a much clearer stance on the difference -- your code *must* be aware of the distinction and it *must* deal with it. The problem comes exactly where you find it: when *porting* existing code that uses aforementioned ways to alleviate the pain, you find that the hacks no longer work and a properly layered design is needed that clearly distinguishes between which variables contain bytes and which text. > and 2to3 is completely useless here. Alas, this is true, because it is not a matter of changing some simple things. The old ways are no longer supported. > I can only imagine how difficult can it be to do such a conversion in > a project like Twisted or Django where the I/O plays a fundamental > role. Django actually took one of the most principled stances towards this issue and has already been ported (although the port is not maintained by the core Django developers yet). I can't speak for Twisted but I know they have some funding towards a port. The problem is often worse for smaller libraries (like I presume pyftplib is) which don't have a clear stance about bytes vs. text. Another problem is some internet protocols (of which FTP I believe is one) which use antiquated models for dealing with binary vs. text data, often focusing entirely on encodings (usually and mistakenly called "character sets") rather than on proper Unicode support. > The choice of forcing the user to use Unicode and "think in Unicode" > was a very brave one, and I'm sure it's for the better, but not > everyone wants to deal with that because Unicode is hard to swallow. Education is needed. When you search Google (or Bing, for that matter :-) for "python unicode" the first hit is http://www.amk.ca/python/howto/unicode, which is highly detailed but probably too much information for the typical person faced with a UnicodeError exception traceback (that page is also focused on Python 2). What we need is a cookbook on how to deal with various common situations. > The majority of people prefer to stay with bytes and eventually learn > and introduce Unicode only when that is actually needed. This is exactly what we tried to do in Python 2 and it was a flagrant disaster. It's just that the work-arounds people have created to deal with it don't port clearly -- which is by design. This is why I've always said that I assumed that the Python 3 transition would take 5 years. On the #python issue, I expect that IRC is much less influential that some here fear (and than some fervent IRC users believe). I don't see reason for panic or heavy-handed interference. OTOH engaging the channel operators more in python-dev sounds like a useful approach. -- --Guido van Rossum (python.org/~guido) From stephen at xemacs.org Sun Jun 20 19:58:14 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 21 Jun 2010 02:58:14 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <87fx0ikqw5.fsf@uwakimon.sk.tsukuba.ac.jp> <20100620113256.7ba8d86a@pitrou.net> <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87bpb5fux5.fsf@uwakimon.sk.tsukuba.ac.jp> Pass the ketchup, I need to eat my words. I wrote: > The loyal opposition clearly intend to continue trash-talking > Python 3 until the libraries get to 100% (or a government-approved > approximation of 100%). The topic on #python seems unlikely to > change at this point, with both Glyph and JP pointedly failing to > denounce it publicly, while Stephen defends it and says it's not > going to change as long as the libraries aren't done. It would seem from posts I received after replying (local mail glitch, should have know there was more coming :-( ) that the facts are that the topic is quite likely to change soonish, and that "trash-talking" is being done, if at all, by trolls. (Having spent a few hours on #python today, I see that's a lot more possible than I would have believed in this community. Nobody's immune.) Glyph, JP, and Stephen have my personal apologies. From lvh at laurensvh.be Sun Jun 20 20:02:57 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 20:02:57 +0200 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <87fx0ikqw5.fsf@uwakimon.sk.tsukuba.ac.jp> <20100620113256.7ba8d86a@pitrou.net> <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Jun 20, 2010 at 7:30 PM, Stephen J. Turnbull wrote: > Antoine Pitrou writes: > But we have a PR problem *now*. ?The loyal opposition clearly intend > to continue trash-talking Python 3 until the libraries get to 100% (or > a government-approved approximation of 100%). ?The topic on #python > seems unlikely to change at this point, with both Glyph and JP > pointedly failing to denounce it publicly, while Stephen defends it > and says it's not going to change as long as the libraries aren't > done. Huh? We just changed the topic on #python because people complained about it. We didn't do it earlier because we didn't know it was a problem. Defending it doesn't mean it's set in stone :-) I don't wanna come across like a jerk but could we please not use loaded terms like "loyal opposition" and "trash-talking"? I don't really think that's what people do or are (or at least want to be/intend to do). I've really honestly tried my best to fix this situation (see the other thread) and the people whom I've gotten input from (both here and in the IRC channels) have been nothing but helpful. > What do you suggest? ?Or do you think there's no PR problem we should > worry about, just accept that this going to be a further drag on > adoption and improvement, and keep on keeping on? I very much like Martin and Antoine's ideas of putting the thing up on python.org, that might also solve people's problems with the apparent dissonance between #python and python-dev/the PSF that neither side really wants. To the contrary, I think everyone wants this situation to improve, including Guido, apparently. Myself included, I think everyone stands to gain here. thanks for listening Laurens From lvh at laurensvh.be Sun Jun 20 20:08:19 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 20:08:19 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <87d3vlfve6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <87d3vlfve6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Jun 20, 2010 at 7:48 PM, Stephen J. Turnbull wrote: > Laurens Van Houtven writes: > > ?> Also, I'm pretty sure nobody has ever said that Python 3.x was a > ?> "failure", or anything like it. #python has claims that Python 3.x, as > ?> a platform for building production apps, is a work in progress > > How about "Python 3 is a work in progress" for the topic? ?That seems > to me to strike exactly the right balance, and encourage the > interested to ask the right kind of question. I think even that's a bit too loaded, as a sign of goodwill I think we're going to go with something completely neutral like "2.x vs 3.x". But I'm not going to argue that ad nauseam because it's really just bikeshedding. thanks for your input Laurens From pje at telecommunity.com Sun Jun 20 20:09:43 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 20 Jun 2010 14:09:43 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <4C1E3B03.5060802@holdenweb.com> References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> <4C1E3B03.5060802@holdenweb.com> Message-ID: <20100620180957.A71943A4099@sparrow.telecommunity.com> At 01:00 AM 6/21/2010 +0900, Steve Holden wrote: >If there is such a disconnect we should think about remedying it: a >large "Python 2 or 3?" button could link to a reasoned discussion of the >pros and cons as evinced in this thread. That way people will end up >with the right version more often (and be writing Python 2 that will >more easily migrate to Python 3, if they cannot yet use 3). +1 From lvh at laurensvh.be Sun Jun 20 20:15:33 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 20:15:33 +0200 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <4C1E3B03.5060802@holdenweb.com> References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> <4C1E3B03.5060802@holdenweb.com> Message-ID: > If there is such a disconnect we should think about remedying it: a > large "Python 2 or 3?" button could link to a reasoned discussion of the > pros and cons as evinced in this thread. That way people will end up > with the right version more often (and be writing Python 2 that will > more easily migrate to Python 3, if they cannot yet use 3). Me and ikanobori (Simon De Vlieger) are working on this. Laurens From stephen at xemacs.org Sun Jun 20 20:35:30 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 21 Jun 2010 03:35:30 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: <87aaqpft71.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > On the #python issue, I expect that IRC is much less influential that > some here fear (and than some fervent IRC users believe). I don't see > reason for panic or heavy-handed interference. OTOH engaging the > channel operators more in python-dev sounds like a useful approach. More vice-versa, I now think. Ie, (somewhat) greater python-dev presence on #python is more important. I sort of assumed that people actually participated in #python, as a number do in c.l.p, but that doesn't seem to be so. At least while I was there, I didn't see anybody else who seemed to be python-dev, whether core or the regular denizens of the peanut gallery. >From a few hours monitoring and participating in #python, Laurens gives pretty accurate summary of the kind of people in the channel. I didn't see anything about Python 3, but I can definitely imagine there being Python-3-baiting trolls. There certainly were a few trollish posters. Anyway, what I personally plan to do is put in a couple of hours a week on #python, and I probably mostly won't mention Python 3 unless asked, and maybe in discussing Unicode issues. While I don't claim to be particularly *representative* of python-dev, an additional dimension of diversity should go a long way. From pje at telecommunity.com Sun Jun 20 20:40:56 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 20 Jun 2010 14:40:56 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote: >The problem comes exactly where you find it: when *porting* existing >code that uses aforementioned ways to alleviate the pain, you find >that the hacks no longer work and a properly layered design is needed >that clearly distinguishes between which variables contain bytes and >which text. Actually, I would say that it's more that (in the network protocol case) we *have* bytes, some of which we would like to *treat* as text, yet do not wish to constantly convert back and forth to full-blown unicode -- especially since the protocols themselves designate ASCII or latin-1 at the transport layer (sometimes with odder encodings above, but these already have to be explicitly dealt with by existing code). While reading over this thread, I'm wondering whether at least my (WSGI-related) problems in this area would be solved by the availability of a type (say "bstr") that was simply a wrapper providing string-like behavior over an underlying bytes, byte array, or memoryview, that would produce objects of compatible type when combined with strings (by encoding them to match). Then, I could wrap bytes with it to pass them to string operations, and then feed them back into everything else. The bstr type ideally would be directly compatible with bytes I/O, or at least have a .bytes attribute that would be. It seems like that would reduce WSGI porting issues quite a bit, since it would mostly consist of throwing extra bstr() calls in where things are breaking, and maybe grabbing the .bytes attribute for I/O. This approach would still be explicit as to what types you're working with, but would not require O(n) *conversions* at every interaction boundary. It would be limited, of course, to single-byte encodings with all characters (0-255) valid. OTOH, maybe there should just be a bytestrings module with bytestrings.ascii and bytestrings.latin1, and between the two that should cover the network protocol needs quite well. Actually, if the Python 3 str() constructor could do O(1) conversion for the latin-1 case (i.e., just wrapped the underlying bytes), I would just put, "bstr = lambda x: str(x,'latin-1')" at the top of my programs and have roughly the same effect. This idea is still a bit half-baked, but a more baked version might be just the ticket for porting stuff that used str to work with bytes in 2.x, if only because writing, e.g.: newurl = bstr(urljoin(bstr(base), 'subdir')) seems so much saner than writing *this* everywhere: newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1') It is perhaps a bit late to propose this idea, since ideally we would also want to use it in 2.x to aid porting. But I'm curious if any other people here experiencing byte/unicode woes in relation to network protocols would find this a solution to their chief frustration. (i.e., that the stdlib often insists now on strings, where effectively bytes were usable before, and thus one must do conversions both coming and going.) From benjamin at python.org Sun Jun 20 20:49:02 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 20 Jun 2010 13:49:02 -0500 Subject: [Python-Dev] issue 8959 Message-ID: We currently have one release blocker for 2.7: http://bugs.python.org/issue8959 It is a Windows and a ctypes regression. As far as I can tell, the offending revision could just be reverted but it does not merge cleanly. Can anyone offer more expertise? -- Regards, Benjamin From martin at v.loewis.de Sun Jun 20 21:10:54 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 20 Jun 2010 21:10:54 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <87d3vlfve6.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <87d3vlfve6.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <4C1E67BE.7050804@v.loewis.de> Am 20.06.2010 19:48, schrieb Stephen J. Turnbull: > Laurens Van Houtven writes: > > > Also, I'm pretty sure nobody has ever said that Python 3.x was a > > "failure", or anything like it. #python has claims that Python 3.x, as > > a platform for building production apps, is a work in progress > > How about "Python 3 is a work in progress" for the topic? I wouldn't say that, either - not more than Python 2 was a work in progress over the last 10 years. Regards, Martin From lvh at laurensvh.be Sun Jun 20 21:25:41 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 21:25:41 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <87eig1fvmq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87eig1fvmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Jun 20, 2010 at 7:42 PM, Stephen J. Turnbull wrote: > Laurens Van Houtven writes: > ?> Yeah, I think the reason for that rule is that the majority of people > ?> asking about new software actually start or end up in this category. > > I think that the most experienced people have absurdly high standards > for "support" compared to those new to programming. ?I hope they check > their advice against the real requirements of the new programmer. Maybe. I'm not very sure about this: for example quite a few parts in Twisted are pretty hazy voodoo magic to me ;-) I actually recommend the high standards stuff to newbies specifically because it's high standards. If I meet some bug, I can probably work around it, but I imagine that it'd be much more frustrating for a newbie to come into contact with a bunch of stuff that really isn't very well polished or supported? I could be wrong. > ?> Usually it's because they want to do something that people have > ?> already solved, > > If they're new to programming, they're already in adventure mode. ?Why > not point out the Road Less Traveled? ?That will make all the > difference. ?Of course you should point out that it's going to be > bumpier, and of course that is likely to push the majority of > practical folks back to Python 2. Three big reasons I can think of: because it doesn't always exist, because even if it does exist we don't always know about it, and because people actually helping people in #python would be far less adept at helping people with it :-) We have a bunch of people that end up doing their own thing anyway now, that just means we can't be as helpful later when they have more questions. > But some of them are likely to be > willing to endure a bit of frustration, especially if they're told > that their bug reports will be listened to seriously on python-dev > (given help from an experienced hand in formatting them!) Maybe that would help, yeah. We have a bunch of people now that start and then give up. They don't port, because they can't be bothered. They just start from scratch. > ?> A possible solution is that we suggest that people, instead of > ?> rolling their own thing from scratch, help to port an existing good > ?> 2.x lib to 3.x, or use 2.x? > > Exactly. ?Don't give them rose-colored glasses about porting, and warn > that some are just plain broken (eg, because of inappropriate > assumptions about bytes vs Unicode). ?But on the other hand, some will > mostly work for them, and their bug reports on the corner cases will > be helpful. I think that's usually more effort than new programmers are willing to put in, people tend to underestimate the cost of developing something from scratch in my experience. But sure, we all agree it's a good idea, so let's put it in the official thing about 2.x vs 3.x :) > ?> I don't think it's a good idea to start encouraging NIH in new > ?> programmers :-) > > Agreed. I think we're kind of getting into the territory of personal preferences here. Thanks for your input, Laurens From lvh at laurensvh.be Sun Jun 20 21:34:43 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 21:34:43 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <4C1E67BE.7050804@v.loewis.de> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <87d3vlfve6.fsf@uwakimon.sk.tsukuba.ac.jp> <4C1E67BE.7050804@v.loewis.de> Message-ID: On Sun, Jun 20, 2010 at 9:10 PM, "Martin v. L?wis" wrote: > Am 20.06.2010 19:48, schrieb Stephen J. Turnbull: >> How about "Python 3 is a work in progress" for the topic? > > I wouldn't say that, either - not more than Python 2 was a work in progress > over the last 10 years. > > Regards, > Martin Yeah, this is why I really like a completely neutral topic. thanks, Laurens From fuzzyman at voidspace.org.uk Sun Jun 20 21:55:17 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sun, 20 Jun 2010 20:55:17 +0100 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <4C1E3B03.5060802@holdenweb.com> References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> <4C1E3B03.5060802@holdenweb.com> Message-ID: <4C1E7225.70305@voidspace.org.uk> On 20/06/2010 17:00, Steve Holden wrote: > [snip...] >> -- >> >> In writing this email to python-dev, I have reviewed my logs of #python >> specifically looking for the phrase 'python 3'. Here are some packages that >> were named in the conversations: >> >> - py2exe >> - cx_Freeze >> - twisted >> - PIL >> - ctypes >> What is the problem with ctypes in Python 3? Are there particular problems with it - it is part of the standard library and available right? >> - email >> >> I present this list because they are what programmers are coming to #python to >> ask about, and that may be relevent to your discussion about python 3 ports. >> >> > Given the amount of interest this thread has generated I can't help > wondering why it isn't more prominent in python.org content. Is the > developer community completely disjoint with the web content editor > community? > The "web content editor community" (the python.org webmasters) is really just a handful of people. I did suggest a few weeks ago (in response to an enquiry about why there was no guide to choosing between Python 2 and 3 easily visible on the website) that we add or prominently link to a page with information like this. There was no response but I do think it would be a good idea. > If there is such a disconnect we should think about remedying it: a > large "Python 2 or 3?" button could link to a reasoned discussion of the > pros and cons as evinced in this thread. That way people will end up > with the right version more often (and be writing Python 2 that will > more easily migrate to Python 3, if they cannot yet use 3). > Yep. All the best, Michael Foord > There seems to be a perception that the PSF can help fund developments, > and indeed Jesse Noller has made a small start with his sprint funding > proposal (which now has some funding behind it). I think if it is to do > so the Foundation will have to look for substantial new funding. I do > not currently understand where this funding would come from, and would > like to tap your developer creativity in helping to define how the > Foundation can effectively commit more developer time to Python. > > GSoC and GHOP are great examples, but there is plenty of room for all > sorts of initiatives that result in development opportunities. I'd like > to help. > > regards > Steve > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From jnoller at gmail.com Sun Jun 20 21:59:49 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sun, 20 Jun 2010 15:59:49 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: <4C1E3B03.5060802@holdenweb.com> References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> <4C1E3B03.5060802@holdenweb.com> Message-ID: On Sun, Jun 20, 2010 at 12:00 PM, Steve Holden wrote: ...snip >> > Given the amount of interest this thread has generated I can't help > wondering why it isn't more prominent in python.org content. Is the > developer community completely disjoint with the web content editor > community? Yes. > If there is such a disconnect we should think about remedying it: a > large "Python 2 or 3?" button could link to a reasoned discussion of the > pros and cons as evinced in this thread. That way people will end up > with the right version more often (and be writing Python 2 that will > more easily migrate to Python 3, if they cannot yet use 3). Yes; the website needs to change. > There seems to be a perception that the PSF can help fund developments, > and indeed Jesse Noller has made a small start with his sprint funding > proposal (which now has some funding behind it). I think if it is to do > so the Foundation will have to look for substantial new funding. I do > not currently understand where this funding would come from, and would > like to tap your developer creativity in helping to define how the > Foundation can effectively commit more developer time to Python. The good news is that I've already had a few potential companies approach me to inquire as to the possibility of sponsoring porting sprints for specific itches they have. I am going to continue to encourage this on my end, as well as redirecting them to direct PSF donations as they arise. I suspect; if we were to keep pushing the concept of sponsored sprints / bounties on Python 3 library porting, we could see things pick up donation wise. I've long suspected that there are companies out there who do have funds, but lack a target, and don't see a general PSF donation as directly beneficial to their goals (although we will continue to work to convince them otherwise). > GSoC and GHOP are great examples, but there is plenty of room for all > sorts of initiatives that result in development opportunities. I'd like > to help. Quick, off the seat of my pants idea - let's start by encouraging and advertising sponsored sprints in the vein I've outlined in my existing approved proposal. Once we know how to allow companies to donate directly into a fund for direct improvement / porting, we provide them a target which allows them to see measurable outcomes. Just a thought Jesse From jnoller at gmail.com Sun Jun 20 22:10:00 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sun, 20 Jun 2010 16:10:00 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> Message-ID: On Sun, Jun 20, 2010 at 2:40 PM, P.J. Eby wrote: > At 10:57 AM 6/20/2010 -0700, Guido van Rossum wrote: >> >> The problem comes exactly where you find it: when *porting* existing >> code that uses aforementioned ways to alleviate the pain, you find >> that the hacks no longer work and a properly layered design is needed >> that clearly distinguishes between which variables contain bytes and >> which text. > > Actually, I would say that it's more that (in the network protocol case) we > *have* bytes, some of which we would like to *treat* as text, yet do not > wish to constantly convert back and forth to full-blown unicode -- > especially since the protocols themselves designate ASCII or latin-1 at the > transport layer (sometimes with odder encodings above, but these already > have to be explicitly dealt with by existing code). > > While reading over this thread, I'm wondering whether at least my > (WSGI-related) problems in this area would be solved by the availability of > a type (say "bstr") that was simply a wrapper providing string-like behavior > over an underlying bytes, byte array, or memoryview, that would produce > objects of compatible type when combined with strings (by encoding them to > match). > > Then, I could wrap bytes with it to pass them to string operations, and then > feed them back into everything else. ?The bstr type ideally would be > directly compatible with bytes I/O, or at least have a .bytes attribute that > would be. > > It seems like that would reduce WSGI porting issues quite a bit, since it > would mostly consist of throwing extra bstr() calls in where things are > breaking, and maybe grabbing the .bytes attribute for I/O. > > This approach would still be explicit as to what types you're working with, > but would not require O(n) *conversions* at every interaction boundary. ?It > would be limited, of course, to single-byte encodings with all characters > (0-255) valid. > > OTOH, maybe there should just be a bytestrings module with bytestrings.ascii > and bytestrings.latin1, and between the two that should cover the network > protocol needs quite well. > > Actually, if the Python 3 str() constructor could do O(1) conversion for the > latin-1 case (i.e., just wrapped the underlying bytes), I would just put, > "bstr = lambda x: str(x,'latin-1')" at the top of my programs and have > roughly the same effect. > > This idea is still a bit half-baked, but a more baked version might be just > the ticket for porting stuff that used str to work with bytes in 2.x, if > only because writing, e.g.: > > ? ? newurl = bstr(urljoin(bstr(base), 'subdir')) > > seems so much saner than writing *this* everywhere: > > ? ? newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1') > > It is perhaps a bit late to propose this idea, since ideally we would also > want to use it in 2.x to aid porting. ?But I'm curious if any other people > here experiencing byte/unicode woes in relation to network protocols would > find this a solution to their chief frustration. ?(i.e., that the stdlib > often insists now on strings, where effectively bytes were usable before, > and thus one must do conversions both coming and going.) > I hate to reply with a simple +1 - but I've heard this pain and proposal from a frightening number of people, something which allowed you to use bytes with some of the sting methods would go a really long way to solving a lot of peoples python 3 pain. I don't relish the idea that once people start moving over, there might be a billion implementations of "things like this". jesse From tjreedy at udel.edu Sun Jun 20 22:34:46 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Jun 2010 16:34:46 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: On 6/20/2010 6:35 AM, Laurens Van Houtven wrote: > I'm one of the active people in #python that some people dislike for > behavior with respect to Python 3. As I wrote, I disliked the observable, written behavior, now changed. You are obviously a fine person. We both love Python and have both contributed time for years to helping others with Python. The premise for this branch thread was: IF #python is really #python2 and somewhat anti-Python3, THEN (and only then), maybe we need a #python3. I am delighted that you have already refuted the premise with a new, much improved, splash topic. I now feel free to ask Python3 questions on the existing channel -- things like "Is issue #### applicable to Python3?" -- as I work on reviewing tracker issues. In that respect, this thread is finished for me. But I hope it is just the start of better cooperation and communication. Just a few notes in addition to other responses. > First of all I'd like to defuse the situation. Excellently done. > Also, I'm pretty sure nobody has ever said that Python 3.x was a > "failure", or anything like it. I have no idea what has been said by you or anyone on #python, but people *have* posted on both python-list and here on py-dev things like "Python3 is not ready for use. It is a failure. Do not use it." (any of that sound familiar? ;-) and even "Python3 should be scrapped!". I am relieve that you have disassociated yourself and #python from such sentiments. --- On newbies and version choice: I agree with Nick Efford that people using Python to learn about programming may be better off with Python3. I am using a subset of Python3 in a book on algorithms for the reasons he gave and others. Not even mentioned so far in this thread is the availability of unicode identifiers for people with non-Latin alphabets. Of course, Asian schoolkids are unlikely to request help on #python. And the point about suggesting Python2 because that is what you all are good at helping with, is well taken. I do think people learning Python2 now should have a Python3-aware guide to doing so. This > In the mean while, we encourage people to write code that will be easy > to port and behave well in 3.x: new-style classes, don't use eager > versions when the Py3k default is lazy and you don't actually need the > eager thing, use as many third party libraries as possible (the idea > being that this would minimize effort needed to make the switch on the > grand scale of things), use absolute imports always (and only explicit > relative, but it's discouraged), always have a full unit test suite. is a good start. I think something like that would be good for the #python web page, or added to python.org somewhere. Terry Jan Reedy From tjreedy at udel.edu Sun Jun 20 22:57:20 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Jun 2010 16:57:20 -0400 Subject: [Python-Dev] Python Library Support in 3.x (Was: email package status in 3.X) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100620153833.GD20639@thorne.id.au> <4C1E3B03.5060802@holdenweb.com> Message-ID: On 6/20/2010 3:59 PM, Jesse Noller wrote: > I suspect; if we were to keep pushing the concept of sponsored sprints > / bounties on Python 3 library porting, we could see things pick up > donation wise. I've long suspected that there are companies out there > who do have funds, but lack a target, and don't see a general PSF > donation as directly beneficial to their goals (although we will > continue to work to convince them otherwise). Universities **love** unrestricted donations to their general fund. But they bow to human nature and accept and even seek all kinds of targeted donations: buildings, rooms, departments, centers, institutes, programs, professorships, scholarships, research projects, curriculum developments projects, and so on. (Of course, the desire to target on the part of donors is also in part a recognition of human nature, that unrestricted funds might be used frivolously or even in a way that the donor considers obnoxious.) I think it would be great So I think it good that you/PSF try doing the same. And do not ignore individuals. Terry Jan Reedy From simon at ikanobori.jp Sun Jun 20 23:12:17 2010 From: simon at ikanobori.jp (Simon de Vlieger) Date: Sun, 20 Jun 2010 23:12:17 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 In reply to the recent post by Laurens and the vow I made to change the text which is presented on the python-commandments domain I have asked Laurens to write a new text on the subject. This message is a heads up to let all of you know that this new article is now available on the following URL: http://python-commandments.org/python3.html This article will probably be the featured article on #python's /topic regarding Python 2 or 3. I also read some remarks about possibly having an official article up on the Python website and in case that happens that will take the place of this article. Regards, Simon de Vlieger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQIcBAEBAgAGBQJMHoQxAAoJEBBSHP7i+JXfGdUP/3NsUuMAJ2DONJZE4AbQIx5G n7UE/SD0teZpyrYYIzV/PI1m40xz5XBe+zJyNfGN7m+MNoW7lGIxHgBoTB5CU6eE 10LeNy2qR9eqRQ/NZ+t8GJul4zuGIocPglDqCX/M6KtFCmtDsgSgbLaMFEgI4lRs vZr9I9hUX9E1r+9T50uxo/YHQm+QW/HIYVks15nOoeUalkhxlQF67vvzH8/lds/F sl5DxXe/zo287GeOIjpDNI/+0KJtUTLop4S/cpVxxA5eNX9lgGztq1wmKCMQmKcB FS/WfQomyEhZhTk4CtIMQ7HM51bGUHwDeoO8qIOrayTM8ucoruO0QyzmZM0yxoDY G+GVYabTKKp9ICDaUvOMxYpRnuz/Xb10nb9HphutQ03cjR28bJLR8nuLUBmIzcJK ICXVIcV11hD01hzGWBJ7llQeoHl9ykaZu54PqpnZ/gdUrBVJ7VRItb5b4wP/PTwJ frtNvnVwBnuR9wfQmCV9Do1UVTAVUqjFRpoBujIgSaZCa1wyF5U+8eHVD26u8lDj +Hva28S/MggzIbc9x3/yv070204JaZVD1Q6fR5cSWdCMHgEDnwCmRjqlqLRW7zqS al4/JaxDiqa7RrB8+liFDijtqopy7K6a3vDK4BBHuyqWmJ9lGqVJzC0ynRE6DV7N 4+lJCEF9qLW++QgjHXR2 =qRiB -----END PGP SIGNATURE----- From lvh at laurensvh.be Sun Jun 20 23:16:52 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 23:16:52 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: Glad to hear the efforts are so appreciated. Unfortunately not everyone agrees, but I'm beginning to think that's the tragedy of internet politics :) On Sun, Jun 20, 2010 at 10:34 PM, Terry Reedy wrote: > On 6/20/2010 6:35 AM, Laurens Van Houtven wrote: > I have no idea what has been said by you or anyone on #python, but people > *have* posted on both python-list and here on py-dev things like "Python3 is > not ready for use. It is a failure. Do not use it." (any of that sound > familiar? ;-) and even "Python3 should be scrapped!". I am relieve that you > have disassociated yourself and #python from such sentiments. I can understand how people coming to #python might have thought that, in retrospect. I just wanted to make that part clear :) As for the "Python 3.x is a failure" people, I just tune those out, and if they're trolling about it on IRC, ban them. > On newbies and version choice: I agree with Nick Efford that people using > Python to learn about programming may be better off with Python3. I am using > a subset of Python3 in a book on algorithms for the reasons he gave and > others. Not even mentioned so far in this thread is the availability of > unicode identifiers for people with non-Latin alphabets. I think the difference here is probably the focus. I think you're more interested in teaching people Python in a more academic context: basically teaching CS through Python. #python, on the other hand, is trying to help people build practical tools where the CS is often an afterthought (though not as much as it is in other programming language channels which I won't name). >> In the mean while, we encourage people to write code that will be easy >> to port and behave well in 3.x: new-style classes, don't use eager >> versions when the Py3k default is lazy and you don't actually need the >> eager thing, use as many third party libraries as possible (the idea >> being that this would minimize effort needed to make the switch on the >> grand scale of things), use absolute imports always (and only explicit >> relative, but it's discouraged), always have a full unit test suite. > > is a good start. I think something like that would be good for the #python > web page, or added to python.org somewhere. Yeah, it's actually extremely prevalent, it's just not voiced anywhere, we could probably put it up somewhere. It's sort of up in the pound-python page but it's well-hidden in tongue-in-cheek, as Antoine pointed out :) > Terry Jan Reedy > Laurens From lvh at laurensvh.be Sun Jun 20 23:18:34 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Sun, 20 Jun 2010 23:18:34 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: That's not actually up just yet, I'd like people to review it, personally I think it's still a tad bit biased towards Py3k. Until then I'm keeping the Py3.x document by Nick Efford up there. Thanks for your continued participation and seemingly endless patience, Laurens From amk at amk.ca Sun Jun 20 23:22:09 2010 From: amk at amk.ca (A.M. Kuchling) Date: Sun, 20 Jun 2010 17:22:09 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: <20100620212209.GA5319@andrew-kuchlings-macbook.local> On Sun, Jun 20, 2010 at 10:57:05AM -0700, Guido van Rossum wrote: > Education is needed. When you search Google (or Bing, for that matter > :-) for "python unicode" the first hit is > http://www.amk.ca/python/howto/unicode, which is highly detailed but > probably too much information for the typical person faced with a > UnicodeError exception traceback (that page is also focused on Python > 2). What we need is a cookbook on how to deal with various common Eep! That should be directed to http://docs.python.org/howto/unicode.html, the copy that's actually incorporated in the Python docs. I'll fix that immediately. Regarding a smaller document for people who hit a UnicodeError exception: could we write a little Unicode FAQ for python.org? --amk From tjreedy at udel.edu Sun Jun 20 23:30:50 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Jun 2010 17:30:50 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: On 6/20/2010 8:26 AM, Giampaolo Rodol? wrote: > I attempted to port pyftpdlib to python 3 several times and the > biggest show stopper has always been the bytes / string difference > introduced by Python 3 which forces you to *know* and *use* Unicode > every time you deal with some text and 2to3 is completely useless > here. I believe the advice in the wiki porting page is to use unicode() and bytes() but never str(), in a version that runs in 2.6. Then 2to3 should do fine. For 2.5-, add 'bytes = str' somewhere. 2to3 still gets patches, I believe, when someone exhibits code that could and ought to be converted but is not. I suspect that if you posted 'Problems porting pyftpdlib to Python3', you would get some help. If it involved inadequacies in the current tools and guides, it would to be be on-topic here. Or try python-list. > The choice of forcing the user to use Unicode and "think in Unicode" > was a very brave one, and I'm sure it's for the better, but not > everyone wants to deal with that because Unicode is hard to swallow. I felt that way until my daughter decided to switch from Spanish to Japanese for here foreign language. Once I quit fighting it, it because much easier to swallow and learn. As it turns out, thinking in Unicode is a pretty straightforward generalization of thinking in ascii. There are some annoying glitches due to the need to accomodate legacy systems. The plethora of legacy encodings for various subsets, besides ascii, is also a nuisance. > The majority of people who use latin-char alphabets > prefer to stay with bytes and eventually learn > and introduce Unicode only when that is actually needed. The example at http://code.google.com/p/pyftpdlib/ uses names and filenames. Without unicode, these are restricted to ascii, unless you use multiple encodings, which to me would be worse. Terry Jan Reedy From solipsis at pitrou.net Sun Jun 20 23:47:23 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jun 2010 23:47:23 +0200 Subject: [Python-Dev] bytes / unicode References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> Message-ID: <20100620234723.600ad4a8@pitrou.net> On Sun, 20 Jun 2010 14:40:56 -0400 "P.J. Eby" wrote: > > Actually, I would say that it's more that (in the network protocol > case) we *have* bytes, some of which we would like to *treat* as > text, yet do not wish to constantly convert back and forth to > full-blown unicode Well, then why don't you just stick with a bytes object? > While reading over this thread, I'm wondering whether at least my > (WSGI-related) problems in this area would be solved by the > availability of a type (say "bstr") that was simply a wrapper > providing string-like behavior over an underlying bytes, byte array, > or memoryview, that would produce objects of compatible type when > combined with strings (by encoding them to match). This really sounds horrible. Python 3 was designed precisely to discourage ad hoc mixing of bytes and unicode. > Actually, if the Python 3 str() constructor could do O(1) conversion > for the latin-1 case (i.e., just wrapped the underlying bytes), I > would just put, "bstr = lambda x: str(x,'latin-1')" at the top of my > programs and have roughly the same effect. Did you do any measurements that show that latin-1 decoding (hardly a complicated task) introduces a performance regression in Web frameworks in 3.x? > seems so much saner than writing *this* everywhere: > > newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1') urljoin already returns an str object. Why do you want to decode it again? From ncoghlan at gmail.com Sun Jun 20 23:53:38 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Jun 2010 07:53:38 +1000 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On Mon, Jun 21, 2010 at 7:12 AM, Simon de Vlieger wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > In reply to the recent post by Laurens and the vow I made to change the text > which is presented on the python-commandments domain I have asked Laurens to > write a new text on the subject. > > This message is a heads up to let all of you know that this new article is > now available on the following URL: > http://python-commandments.org/python3.html That's a fairly decent write-up in my opinion. As Laurens pointed, it trends towards the "use Python 3 if you can, Python 2 if you need to" point of view, which I personally think is the right spin to be putting on this issue, but obviously opinions will vary on that front. About the only specific wording tweak I would suggest is that "little regard for backwards compatibility" should be phrased as "less regard for backwards compatibility". There were still quite a few ideas we rejected as gratuitously incompatible, even for Py3k (the eventual decision to retain old-style string formatting comes to mind). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From benjamin at python.org Sun Jun 20 23:55:03 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 20 Jun 2010 16:55:03 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100620234723.600ad4a8@pitrou.net> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> Message-ID: 2010/6/20 Antoine Pitrou : > On Sun, 20 Jun 2010 14:40:56 -0400 > "P.J. Eby" wrote: >> >> Actually, I would say that it's more that (in the network protocol >> case) we *have* bytes, some of which we would like to *treat* as >> text, yet do not wish to constantly convert back and forth to >> full-blown unicode > > Well, then why don't you just stick with a bytes object? There are not many tools for treating bytes as text. -- Regards, Benjamin From tjreedy at udel.edu Mon Jun 21 00:00:27 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Jun 2010 18:00:27 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <87fx0ikqw5.fsf@uwakimon.sk.tsukuba.ac.jp> <20100620113256.7ba8d86a@pitrou.net> <87fx0hfw7q.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6/20/2010 1:30 PM, Stephen J. Turnbull wrote: > The topic on #python seems unlikely to change at this point I just verified that, thanks to Laurens and whoever, it has been. It is now rather good. Terry Jan Reedy From lvh at laurensvh.be Mon Jun 21 00:01:08 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Mon, 21 Jun 2010 00:01:08 +0200 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> Message-ID: On Sun, Jun 20, 2010 at 11:30 PM, Terry Reedy wrote: > On 6/20/2010 8:26 AM, Giampaolo Rodol? wrote: > >> I attempted to port pyftpdlib to python 3 several times and the >> biggest show stopper has always been the bytes / string difference >> introduced by Python 3 which forces you to *know* and *use* Unicode >> every time you deal with some text and 2to3 is completely useless >> here. > > I believe the advice in the wiki porting page is to use unicode() and > bytes() but never str(), in a version that runs in 2.6. Then 2to3 should do > fine. For 2.5-, add 'bytes = str' somewhere. Really? I thought you were supposed to call encode/decode methods on the appropriate thing, depending if they're coming from a byte source or a character source. The problems arise when you're doing things like paths, which I believe are bytes on *nix and proper Unicode on Windows (which basically just means they enforce an encoding, UTF-16 if I'm not mistaken). I don't actually use Windows so I might be completely wrong here. > 2to3 still gets patches, I believe, when someone exhibits code that could > and ought to be converted but is not. > > I suspect that if you posted 'Problems porting pyftpdlib to Python3', you > would get some help. If it involved inadequacies in the current tools and > guides, it would to be be on-topic here. Or try python-list. > >> The choice of forcing the user to use Unicode and "think in Unicode" >> was a very brave one, and I'm sure it's for the better, but not >> everyone wants to deal with that because Unicode is hard to swallow. > > I felt that way until my daughter decided to switch from Spanish to Japanese > for here foreign language. Once I quit fighting it, it because much easier > to swallow and learn. As it turns out, thinking in Unicode is a pretty > straightforward generalization of thinking in ascii. There are some annoying > glitches due to the need to accomodate legacy systems. The plethora of > legacy encodings for various subsets, besides ascii, is also a nuisance. I think doing unicode/str properly in 2.x is very important, #python stresses it quite often, I think Py3k's strictness is a good idea because people very often write something that appears to work for a long time, and then someone tries it using funny bytes, and everything blows apart. Convincing people their software is wrong when "everything worked five minutes ago" is really hard :-) You'd be surprised how long it can take before some of these problems are found, a couple of weeks ago in #python we had exactly this problem when we were helping Blender folks. There was a bug report from a German Blender user, turns out Blender ignores unicode in some critical spot making importing between people who disagree on charsets impossible. And Blender isn't exactly a project that's two weeks old and filled with idiots :) The downside is that *fixing* them then becomes a nontrivial task. The central problem is probably that a lot of people don't understand Unicode. Recently I learned that even Tanenbaum got it wrong in his latest revision of the computer networks book! (Although that might just be my dutch translation of it being bad). > Terry Jan Reedy Laurens From ncoghlan at gmail.com Mon Jun 21 00:08:47 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Jun 2010 08:08:47 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> Message-ID: > I hate to reply with a simple +1 - but I've heard this pain and > proposal from a frightening number of people, something which allowed > you to use bytes with some of the sting methods would go a really long > way to solving a lot of peoples python 3 pain. I don't relish the idea > that once people start moving over, there might be a billion > implementations of "things like this". My concern with it would be creating the temptation to use these new objects that can't tolerate multibyte or variable character length encodings when the general string type was more relevant (thus to some degree perpetuating Python 2.x issues with incomplete Unicode handling). Perhaps if people could identify which specific string methods are causing problems? In 3.2, there really aren't that many differences between the available methods for strings and bytes: >>> set(dir(str)) - set(dir(bytes)) {'isprintable', 'format', '__mod__', 'encode', 'isidentifier', '_formatter_field_name_split', 'isnumeric', '__rmod__', 'isdecimal', '_formatter_parser'} >>> set(dir(bytes)) - set(dir(str)) {'decode', 'fromhex'} Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tjreedy at udel.edu Mon Jun 21 00:21:20 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Jun 2010 18:21:20 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> Message-ID: On 6/20/2010 4:10 PM, Jesse Noller wrote: > On Sun, Jun 20, 2010 at 2:40 PM, P.J. Eby wrote: >> While reading over this thread, I'm wondering whether at least my >> (WSGI-related) problems in this area would be solved by the availability of >> a type (say "bstr") that was simply a wrapper providing string-like behavior >> over an underlying bytes, byte array, or memoryview, that would produce >> objects of compatible type when combined with strings (by encoding them to >> match). > I hate to reply with a simple +1 - but I've heard this pain and > proposal from a frightening number of people, something which allowed > you to use bytes with some of the sting methods would go a really long > way to solving a lot of peoples python 3 pain. I don't relish the idea > that once people start moving over, there might be a billion > implementations of "things like this". Given that the 3.x bytes and bytearray classes do retain text methods like .capitalize(), which are meaningless for arbitrary binary data, it is not clear to me what you are asking for or what problem a new class would solve. I am curious though. Terry Jan Reedy From jnoller at gmail.com Mon Jun 21 00:28:35 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sun, 20 Jun 2010 18:28:35 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> Message-ID: On Jun 20, 2010, at 6:21 PM, Terry Reedy wrote: > On 6/20/2010 4:10 PM, Jesse Noller wrote: >> On Sun, Jun 20, 2010 at 2:40 PM, P.J. Eby >> wrote: > >>> While reading over this thread, I'm wondering whether at least my >>> (WSGI-related) problems in this area would be solved by the >>> availability of >>> a type (say "bstr") that was simply a wrapper providing string- >>> like behavior >>> over an underlying bytes, byte array, or memoryview, that would >>> produce >>> objects of compatible type when combined with strings (by encoding >>> them to >>> match). > >> I hate to reply with a simple +1 - but I've heard this pain and >> proposal from a frightening number of people, something which allowed >> you to use bytes with some of the sting methods would go a really >> long >> way to solving a lot of peoples python 3 pain. I don't relish the >> idea >> that once people start moving over, there might be a billion >> implementations of "things like this". > > Given that the 3.x bytes and bytearray classes do retain text > methods like .capitalize(), which are meaningless for arbitrary > binary data, it is not clear to me what you are asking for or what > problem a new class would solve. I am curious though. > Ask the web-sig and wsgi folks for starters. I know they've experienced non-zero pain. From robertc at robertcollins.net Mon Jun 21 00:41:37 2010 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 21 Jun 2010 10:41:37 +1200 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> Message-ID: Also, url's are bytestrings - by definition; if the standard library has made them unicode objects in 3, I expect a lot of pain in the webserver space. -Rob From tjreedy at udel.edu Mon Jun 21 00:57:33 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Jun 2010 18:57:33 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On 6/20/2010 5:53 PM, Nick Coghlan wrote: > On Mon, Jun 21, 2010 at 7:12 AM, Simon de Vlieger wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> In reply to the recent post by Laurens and the vow I made to change the text >> which is presented on the python-commandments domain I have asked Laurens to >> write a new text on the subject. > That's a fairly decent write-up in my opinion. As Laurens pointed, it > trends towards the "use Python 3 if you can, Python 2 if you need to" > point of view, which I personally think is the right spin to be > putting on this issue, but obviously opinions will vary on that front. > > About the only specific wording tweak I would suggest is that "little > regard for backwards compatibility" should be phrased as "less regard > for backwards compatibility". There were still quite a few ideas we > rejected as gratuitously incompatible, even for Py3k (the eventual > decision to retain old-style string formatting comes to mind). I have much the same opinion, and the ame suggestion, as Nick. People do not usually see the proposals that were rejected and the changes not made in 3.0. For those who *do* wish, there are about 25 items listed at http://www.python.org/dev/peps/pep-3099/ Things that will Not Change in Python 3000 Nick listed one thing not on the list. Eliminating the duplicate method names in the unittest module is another. (In isolation, most everyone was in favor. Guido's reason for leaving the duplication: porting 2 to 3 is much easier with a good (and stable) test suite. Therefore, cleaning up unittest and possibly breaking test suites, even with a 2to3 conversion, would not be a good idea.) Terry Jan Reedy From brett at python.org Mon Jun 21 00:58:42 2010 From: brett at python.org (Brett Cannon) Date: Sun, 20 Jun 2010 15:58:42 -0700 Subject: [Python-Dev] Mercurial In-Reply-To: <20100619144218.4209e881@pitrou.net> References: <20100618050712.GC20639@thorne.id.au> <87tyozcl2m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100619135104.5b0f22ed@pitrou.net> <20100619121302.GB12233@remy> <20100619144218.4209e881@pitrou.net> Message-ID: On Sat, Jun 19, 2010 at 05:42, Antoine Pitrou wrote: > On Sat, 19 Jun 2010 17:43:02 +0530 > Senthil Kumaran wrote: >> On Sat, Jun 19, 2010 at 01:51:04PM +0200, Antoine Pitrou wrote: >> > FWIW, the EOL extension is now part of Mercurial: >> > http://mercurial.selenic.com/wiki/EolExtension >> >> Should we all move soon now? >> Any target date you have in mind, Antoine? > > I should point out that I am in no way responsible for the migration. > I think Dirkjan and Brett said they would tackle this after the 2.7 > release. But they'd better answer by themselves :) WIth the eol extension dealt with, it means all hold-ups are on python-dev's end, not Mercurial's which is good. As for what is left exactly, Dirkjan can answer better than I can; at this point I am simply the guy trying to help keep the momentum going while not doing any technical work. From simon at ikanobori.jp Mon Jun 21 01:05:39 2010 From: simon at ikanobori.jp (Simon de Vlieger) Date: Mon, 21 Jun 2010 01:05:39 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 20 jun 2010, at 23:53, Nick Coghlan wrote: > About the only specific wording tweak I would suggest is that "little > regard for backwards compatibility" should be phrased as "less regard > for backwards compatibility". There were still quite a few ideas we > rejected as gratuitously incompatible, even for Py3k (the eventual > decision to retain old-style string formatting comes to mind). I have changed this text to include the wording tweak. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQIcBAEBAgAGBQJMHp7DAAoJEBBSHP7i+JXfSkMQANw1SNroVYNkDUEJCIKtdKEJ HyGBMZpG0liUfqVf8YAjNRYEscpWtsS2Inh8PBlTUwo5OTZPmbggJVZGO17E7Z8k ld9TASppKraNZL62nBno5us2rnc2aUJL6GCaKPL1SQkk8GG1yLAV57j8d4R50QZS 4S7ogFPgVveM4VYEZXaZrlHpzlHjdh8xjq7f4Pl8IKJQZm6uOorK+sL+jiw0DauA UEJ53rx0agy8GRwtnOY7XvqP0lgXLfZ/axTW9e6FkKXBcHYv3qdEAvdC3wyF9OKJ nSNo7vIj4z24V7x9WQdIcc2wifHGPqSBSfnUc4Y3TPAaPLAjlX3HX3C4J+iFbI4/ c6VIm/OSPhcuclV0IgTJGvDOoyVlxTXFnOhOobXFI3KcAtCMQw5Y9gzx+4f5nahJ YMlu54lFhqMsBzsTMlYcispEbbAuban4aZH7JAZ645F/AMzGqiTUZyHgD+A+i+9P Ctu7DStT4tI/ZHcsqjnSkmpLxFhr3kNZct71aS22xOpm4MBAXmPEFYa2a/LpozHi pDhuKJbwNc/+lbgiK267IP+V2pfKQ73qMQhn6hq0IPAgBXNu8fHJ6af6bygmIr/S sK/0zddz3C2qCgqHmYGBwYfrmQB0fgM4ic9Zi2I9/flH+6cLolhSHkOqGkH1m0DQ totdE00iTLVuy6VEmMmm =NcT9 -----END PGP SIGNATURE----- From lvh at laurensvh.be Mon Jun 21 01:06:57 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Mon, 21 Jun 2010 01:06:57 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: Okay cool, we fixed it: http://python-commandments.org/python3.html People are otherwise happy with the text? Thanks for your continued input, Laurens From tjreedy at udel.edu Mon Jun 21 01:33:39 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Jun 2010 19:33:39 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> Message-ID: On 6/20/2010 5:55 PM, Benjamin Peterson wrote: > 2010/6/20 Antoine Pitrou: >> On Sun, 20 Jun 2010 14:40:56 -0400 >> "P.J. Eby" wrote: >>> >>> Actually, I would say that it's more that (in the network protocol >>> case) we *have* bytes, some of which we would like to *treat* as >>> text, yet do not wish to constantly convert back and forth to >>> full-blown unicode >> >> Well, then why don't you just stick with a bytes object? > > There are not many tools for treating bytes as text. If one writes a function (most easily in Python) 1. in terms of the methods and operations shared by unicode and bytes, which is nearly all of them, and 2. does not gratuitously (and dare I say, unpythonically) do a class check to unnecessarily exclude one or the other, and 3. does not specialize by assuming only one of the possible values for type-specific constants, such as number of chars/codes, and 4. does not do something unicode specific such as normalization, then the function should be agnostic and operate generically. I think there was some temptation to be 'pure' and limit text methods to str and enforce the decode-manipulate-encode paradigm (which is extremely common in various forms, and nothing unusual). But for practicality and efficiency, that was not done. Do you have in mind any tools that could and should operate on both, but do not? (I realize that at the C level, code is not just specialized to 'unicode', but to 2-byte versus 4-byte representations.) Terry Jan Reedy From steve at pearwood.info Mon Jun 21 02:03:19 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 21 Jun 2010 10:03:19 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <201006211003.19488.steve@pearwood.info> On Mon, 21 Jun 2010 08:01:08 am Laurens Van Houtven wrote: > I think doing unicode/str properly in 2.x is very important, #python > stresses it quite often, I think Py3k's strictness is a good idea > because people very often write something that appears to work for a > long time, and then someone tries it using funny bytes, and > everything blows apart. Convincing people their software is wrong > when "everything worked five minutes ago" is really hard :-) Worse is when you have people who, when faced with their software failing to handle filenames containing non-ASCII characters ("those funny letters"), insist that the problem is the user for giving non-ASCII characters. Even when they're in the user's native (non-Latin) language. Even when the OS supports them. Gah. -- Steven D'Aprano From pje at telecommunity.com Mon Jun 21 03:33:55 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 20 Jun 2010 21:33:55 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> Message-ID: <20100621013405.19DC33A4099@sparrow.telecommunity.com> At 07:33 PM 6/20/2010 -0400, Terry Reedy wrote: >Do you have in mind any tools that could and should operate on both, >but do not? From http://mail.python.org/pipermail/web-sig/2009-September/004105.html : """The problem which arises is that unquoting of URLs in Python 3.X stdlib can only be done on unicode strings. If though a string contains non UTF-8 encoded characters it can fail.""" I don't have any direct experience with the specific issue demonstrated in that post, but in the context of the discussion as a whole, I understood the overall issue as "if you pass bytes to certain stdlib functions, you might get back unicode, an explicit error, or (at least in the case shown above) something that's just plain wrong." From pje at telecommunity.com Mon Jun 21 03:58:22 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 20 Jun 2010 21:58:22 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> Message-ID: <20100621015824.6A84E3A4099@sparrow.telecommunity.com> At 08:08 AM 6/21/2010 +1000, Nick Coghlan wrote: >Perhaps if people could identify which specific string methods are >causing problems? __getitem__(int) returns an integer rather than a bytestring, so anything that manipulates individual characters can't be given bytes and have it work. That was one of the key differences I had in mind for a bstr type, apart from designing it to coerce normal strings to bstrs in cross-type operations, and to allow O(1) "conversion" to/from bytes. Another randomly chosen byte/string incompatibility (Python 3.1; I don't have 3.2 handy at the moment): >>> os.path.join(b'x','y') Traceback (most recent call last): File "", line 1, in File "c:\Python31\lib\ntpath.py", line 161, in join if b[:1] in seps: TypeError: Type str doesn't support the buffer API >>> os.path.join('x',b'y') Traceback (most recent call last): File "", line 1, in File "c:\Python31\lib\ntpath.py", line 161, in join if b[:1] in seps: TypeError: 'in ' requires string as left operand, not bytes Ironically, it seems to me that in trying to make the type distinction more rigid, Py3K fails in this area precisely because it is not a rigidly typed language in the Java or Haskell sense: i.e., os.path.join doesn't say, "I need two stringlike objects of the *same type*", not even in its docstring. At least in Java, you would either implement a "path" type with coercions from bytes and strings, or you'd have a class with overloaded methods for handling join operations on bytes and strings, respectively, thereby avoiding this whole mess. (Alas, this little example on the 'in' operator also shows that my bstr effort would probably fail anyway, because there's no '__rcontains__' (__lcontains__?) to allow it to override the str type's __contains__.) From pje at telecommunity.com Mon Jun 21 04:30:01 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 20 Jun 2010 22:30:01 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100620234723.600ad4a8@pitrou.net> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> Message-ID: <20100621023005.EE17E3A4099@sparrow.telecommunity.com> At 11:47 PM 6/20/2010 +0200, Antoine Pitrou wrote: >On Sun, 20 Jun 2010 14:40:56 -0400 >"P.J. Eby" wrote: > > > > Actually, I would say that it's more that (in the network protocol > > case) we *have* bytes, some of which we would like to *treat* as > > text, yet do not wish to constantly convert back and forth to > > full-blown unicode > >Well, then why don't you just stick with a bytes object? Because the stdlib is not consistent in how well it handles bytes objects. > > While reading over this thread, I'm wondering whether at least my > > (WSGI-related) problems in this area would be solved by the > > availability of a type (say "bstr") that was simply a wrapper > > providing string-like behavior over an underlying bytes, byte array, > > or memoryview, that would produce objects of compatible type when > > combined with strings (by encoding them to match). > >This really sounds horrible. Python 3 was designed precisely to >discourage ad hoc mixing of bytes and unicode. Who said ad hoc mixing? The point is to have a simple way to ensure that my bytes don't get implicitly converted to unicode, and (ideally) don't have to get converted *back*, either. The idea that by passing bytes to the stdlib, I randomly get back either bytes or unicode (i.e. undocumentedly and inconsistently between different stdlib APIs, as well as possibly dependent on runtime conditions), is NOT "discouraging ad hoc mixing". > > seems so much saner than writing *this* everywhere: > > > > newurl = str(urljoin(str(base, 'latin-1'), 'subdir'), 'latin-1') > >urljoin already returns an str object. Why do you want to decode it >again? Ugh. I meant: newurl = urljoin(str(base, 'latin-1'), 'subdir').encode('latin-1') Which just goes to the point of how ridiculous it is to have to convert things to strings and back again to use APIs that ought to just handle bytes properly in the first place. (I don't know if there are actually any problems in the case of urljoin; I wasn't the person who originally brought up the "stdlib not treating URLs as bytestrings in 3.x" issue on the Web-SIG. Somewhere along the line I got the impression that urljoin was one such API, but in researching the issue it looks like maybe the canonical example was qsl_parse.) It's possible that the stdlib situation has improved tremendously since then, of course. I don't know if the bug was reported, or how many remain. And it's precisely the part where I don't know how many remain that keeps me from doing more than idly thinking about porting any of my libraries (let alone apps) to Python 3.x. The fact that the stdlib itself has these sorts of issues raises major red flags to me about whether the One Obvious Way has yet been found. If the stdlib maintainers don't agree on the One Obvious Way, that seems even worse. Or if there is such a Way, but nobody has documented its practices yet, that's almost the same thing. I also find it weird that there seem to be two camps on this subject, one of which claims that All Is Well And There Is No Problem -- but I do not recall seeing anyone who was in the "What do I do; this doesn't seem ready" camp who switched sides and took the time to write down what made them realize that they were wrong about there being a problem, and what steps they had to take. The existence of one or more such documents would certainly ease my mind, and I imagine that of other people who are less waiting for others' libraries, than for the stdlib (and/or language) itself to settle. (Or more precisely, for it to be SEEN to have settled.) From tjreedy at udel.edu Mon Jun 21 05:56:17 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 20 Jun 2010 23:56:17 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621013405.19DC33A4099@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621013405.19DC33A4099@sparrow.telecommunity.com> Message-ID: <4C1EE2E1.5030105@udel.edu> On 6/20/2010 9:33 PM, P.J. Eby wrote: > At 07:33 PM 6/20/2010 -0400, Terry Reedy wrote: >> Do you have in mind any tools that could and should operate on both, >> but do not? > > From http://mail.python.org/pipermail/web-sig/2009-September/004105.html : Thank for the concrete examples in this and your other post. I am cc-ing the author of the above. > """The problem which arises is that unquoting of URLs in Python 3.X > stdlib can only be done on unicode strings. Actually, I believe this is an encoding rather than bytes versus unicode issue. > If though a string > contains non UTF-8 encoded characters it can fail.""" Which is to say, I believe, if the ascii text in the (unicode) string has a % encoding of a byte that that is not a legal utf-8 encoding of anything. The specific example is >>> urllib.parse.parse_qsl('a=b%e0') [('a', 'b?')] where the character after 'b' is white ? in dark diamond, indicating an error. parse_qsl() splits that input on '=' and sends each piece to urllib.parse.unquote unquote() attempts to "Replace %xx escapes by their single-character equivalent.". unquote has an encoding parameter that defaults to 'utf-8' in *its* call to .decode. parse_qsl does not have an encoding parameter. If it did, and it passed that to unquote, then the above example would become (simulated interaction) >>> urllib.parse.parse_qsl('a=b%e0', encoding='latin-1') [('a', 'b?')] I got that output by copying the file and adding "encoding-'latin-1'" to the unquote call. Does this solve this problem? Has anything like this been added for 3.2? Should it be? > I don't have any direct experience with the specific issue demonstrated > in that post, but in the context of the discussion as a whole, I > understood the overall issue as "if you pass bytes to certain stdlib > functions, you might get back unicode, an explicit error, or (at least > in the case shown above) something that's just plain wrong." As indicated above, I so far think that the problem is with the application of the new model, not the model itself. Just for 'fun', I tried feeding bytes to the function. >>> p.parse_qsl(b'a=b%e0') Traceback (most recent call last): File "", line 1, in p.parse_qsl(b'a=b%e0') File "C:\Programs\Python31\lib\urllib\parse.py", line 377, in parse_qsl pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')] TypeError: Type str doesn't support the buffer API I do not know if that message is correct, but certainly trying to split bytes with unicode is (now, at least) a mistake. This could be 'fixed' by replacing the typed literals with expressions that match the type of the input. But I am not sure if that is sensible since the next step is to unquote and decode to unicode anyway. I just do not know the use case. Terry Jan Reedy From regebro at gmail.com Mon Jun 21 06:37:18 2010 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 21 Jun 2010 06:37:18 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> Message-ID: On Sun, Jun 20, 2010 at 23:55, Benjamin Peterson wrote: > There are not many tools for treating bytes as text. Well, what tools would you need that can be used also on bytes? Bytes objects has a lot of the same methods like strings do, and that will cover 99% of the cases. Most text tools assume that the text really is text, and much of it doesn't make sense unless you've converted it to Unicode first. But most of the things you would need to do, such as in a web-server doesn't really involve treating the text as something linguistic, but it's a matter of replacing and escaping and such, and that could be done while the text is in bytes form.But the tools for that exists... Is there some specific tool that is missing? -- Lennart Regebro: http://regebro.wordpress.com/ Python 3 Porting: http://python3porting.com/ +33 661 58 14 64 From stephen at xemacs.org Mon Jun 21 13:19:50 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 21 Jun 2010 20:19:50 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> Message-ID: <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> Robert Collins writes: > Also, url's are bytestrings - by definition; Eh? RFC 3896 explicitly says A URI is an identifier consisting of a sequence of characters matching the syntax rule named in Section 3. (where the phrase "sequence of characters" appears in all ancestors I found back to RFC 1738), and 2. Characters The URI syntax provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. The URI characters are, in turn, frequently encoded as octets for transport or presentation. This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text. > if the standard library has made them unicode objects in 3, I > expect a lot of pain in the webserver space. Yup. But pain is inevitable if people are treating URIs (whether URLs or otherwise) as octet sequences. Then your base URL is gonna be b'mailto:stephen at xemacs.org', but the natural thing the UI will want to do is formurl = baseurl + '?subject=??????????' IMO, the UI is right. "Something" like the above "ought" to work. So the function that actually handles composing the URL should take a string (ie, unicode), and do all escaping. The UI code should not need to know about escaping. If nothing escapes except the function that puts the URL in composed form, and that function always escapes, life is easy. Of course, in real life it's not that easy. But it's possible to make things unnecessarily hard for the users of your URI API(s), and one way to do that is to make URIs into "just bytes" (and "just unicode" is probably nearly as bad, except that at least you know it's not ready for the wire). From regebro at gmail.com Mon Jun 21 14:09:33 2010 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 21 Jun 2010 14:09:33 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2010/6/21 Stephen J. Turnbull : > IMO, the UI is right. ?"Something" like the above "ought" to work. Right. That said, many times when you want to do urlparse etc they might be binary, and you might want binary. So maybe the methods should work with both? -- Lennart Regebro: http://regebro.wordpress.com/ Python 3 Porting: http://python3porting.com/ +33 661 58 14 64 From ncoghlan at gmail.com Mon Jun 21 14:20:13 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Jun 2010 22:20:13 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621015824.6A84E3A4099@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> Message-ID: On Mon, Jun 21, 2010 at 11:58 AM, P.J. Eby wrote: > At 08:08 AM 6/21/2010 +1000, Nick Coghlan wrote: >> >> Perhaps if people could identify which specific string methods are >> causing problems? > > __getitem__(int) returns an integer rather than a bytestring, so anything > that manipulates individual characters can't be given bytes and have it > work. It can if you use length one slices rather than simple indexing. Depending on the details, such algorithms may still fail for multi-byte codecs though. > That was one of the key differences I had in mind for a bstr type, apart > from ?designing it to coerce normal strings to bstrs in cross-type > operations, and to allow O(1) "conversion" to/from bytes. Erk, that just sounds like a recipe for recreating the problems 2.x has in a new form. > Another randomly chosen byte/string incompatibility (Python 3.1; I don't > have 3.2 handy at the moment): > >>>> os.path.join(b'x','y') > Traceback (most recent call last): > ?File "", line 1, in > ?File "c:\Python31\lib\ntpath.py", line 161, in join > ? ?if b[:1] in seps: > TypeError: Type str doesn't support the buffer API > >>>> os.path.join('x',b'y') > Traceback (most recent call last): > ?File "", line 1, in > ?File "c:\Python31\lib\ntpath.py", line 161, in join > ? ?if b[:1] in seps: > TypeError: 'in ' requires string as left operand, not bytes > > Ironically, it seems to me that in trying to make the type distinction more > rigid, Py3K fails in this area precisely because it is not a rigidly typed > language in the Java or Haskell sense: i.e., os.path.join doesn't say, "I > need two stringlike objects of the *same type*", not even in its docstring. I believe it actually needs the objects to be compatible with the type of os.sep, rather than just with each other (i.e. the type restrictions on os.path.join are the same as those on os.sep.join, even though the join algorithm itself is slightly different). This restriction should be mentioned in the Py3k docstring and docs for os.path.join - if it isn't, that would be a doc bug. > At least in Java, you would either implement a "path" type with coercions > from bytes and strings, or you'd have a class with overloaded methods for > handling join operations on bytes and strings, respectively, thereby > avoiding this whole mess. > > (Alas, this little example on the 'in' operator also shows that my bstr > effort would probably fail anyway, because there's no '__rcontains__' > (__lcontains__?) to allow it to override the str type's __contains__.) OK, these examples convince me that the incompatibility problem is real. However, I don't think a bstr type can solve them even without the __rcontains__ problem - it would just recreate the pain that we already have in the 2.x world. Something that may make sense to ease the porting process is for some of these "on the boundary" I/O related string manipulation functions (such as os.path.join) to grow "encoding" keyword-only arguments. The recommended approach would be to provide all strings, but bytes could also be accepted if an encoding was specified. (If you want to mix encodings - tough, do the decoding yourself). For the idea of avoiding excess copying of bytes through multiple encoding/decoding calls... isn't that meant to be handled at an architectural level (i.e. decode once on the way in, encode once on the way out)? Optimising the single-byte codec case by minimising data copying (possibly through creative use of PEP 3118) may be something that we want to look at eventually, but it strikes me as something of a premature optimisation at this point in time (i.e. the old adage "first get it working, then get it working fast"). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jun 21 14:33:08 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Jun 2010 22:33:08 +1000 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On Mon, Jun 21, 2010 at 9:06 AM, Laurens Van Houtven wrote: > Okay cool, we fixed it: http://python-commandments.org/python3.html > > People are otherwise happy with the text? Yep, looks pretty good to me. I hope you don't mind, but I actually borrowed your text to seed a corresponding page on the Python wiki: http://wiki.python.org/moin/Python2orPython3 It turns out the beginner's guide on the wiki doesn't even acknowledge the possibility of downloading Python 3.1 rather than 2.6 to start experimenting with Python. The Wiki is probably a good place for this kind of material, anyway - it makes it much easier for people to update as they identify major third party libraries that do and don't have Py3k compatible versions (and, some day, Python2 compatible versions). Cheers, Nick. P.S. (We're going to have a tough decision to make somewhere along the line where docs.python.org is concerned, too - when do we flick the switch and make a 3.x version of the docs the default? We probably won't need to seriously consider that question until the 3.3. time frame though). -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jun 21 14:51:03 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Jun 2010 22:51:03 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621023005.EE17E3A4099@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> Message-ID: On Mon, Jun 21, 2010 at 12:30 PM, P.J. Eby wrote: > I also find it weird that there seem to be two camps on this subject, one of > which claims that All Is Well And There Is No Problem -- but I do not recall > seeing anyone who was in the "What do I do; this doesn't seem ready" camp > who switched sides and took the time to write down what made them realize > that they were wrong about there being a problem, and what steps they had to > take. ?The existence of one or more such documents would certainly ease my > mind, and I imagine that of other people who are less waiting for others' > libraries, than for the stdlib (and/or language) itself to settle. > > (Or more precisely, for it to be SEEN to have settled.) I don't know that the "all is well" camp actually exists. The camp that I do see existing is the one that says "without a bug report, inconsistencies in the standard library's unicode handling won't get fixed". The issues picked up by the regression test suite have already been dealt with, but that suite is unfortunately far from comprehensive. Just like a lot of Python code that is out there, the standard library isn't immune to the poor coding practices that were permitted by the blurry lines between text and octet streams in 2.x. It may be that there are places where we need to rewrite standard library algorithms to be bytes/str neutral (e.g. by using length one slices instead of indexing). It may be that there are more APIs that need to grow "encoding" keyword arguments that they then pass on to the functions they call or use to convert str arguments to bytes (or vice-versa). But without people trying to port affected libraries and reporting bugs when they find issues, the situation isn't going to improve. Now, if these bugs are already being reported against 3.1 and just aren't getting fixed, that's a completely different story... Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ben+python at benfinney.id.au Mon Jun 21 15:17:09 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Mon, 21 Jun 2010 23:17:09 +1000 Subject: [Python-Dev] [OT] carping about irritating people (was: bytes / unicode) References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <871vc0plt6.fsf_-_@benfinney.id.au> "Stephen J. Turnbull" writes: > your base URL is gonna be b'mailto:stephen at xemacs.org', but the > natural thing the UI will want to do is > > formurl = baseurl + '?subject=??????????' Incidentally, which irritating person was the topic of this Japanese-language message to you? (The subject in Stephen's example message translates roughly as ?(unspecified third person) is an irritating rascal, don't you agree?.) -- \ ?The userbase for strong cryptography declines by half with | `\ every additional keystroke or mouseclick required to make it | _o__) work.? ?Carl Ellison | Ben Finney From arcriley at gmail.com Mon Jun 21 15:37:45 2010 From: arcriley at gmail.com (Arc Riley) Date: Mon, 21 Jun 2010 09:37:45 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: I would suggest that if packages that do not have Python 3 support yet are listed, then their alternatives should also. PyQt has had Py3 support for some time. PostgreSQL and SQLite do (as does SQLAlchemy) CherryPy has had Py3 support for the last release cycle libxml2 does not, but lxml does. Also, under where it mentions that most OS's do not include Python 3, it should be noted which have good support for it. Gentoo (for example) has excellent support for Python 3, automatically installing Python packages which have Py3 support for both Py2 and Py3, and the python-based Portage package system runs cleanly on Py2.6, Py3.1 and Py3.2. Give credit where credit is due. :-) On Mon, Jun 21, 2010 at 8:33 AM, Nick Coghlan wrote: > On Mon, Jun 21, 2010 at 9:06 AM, Laurens Van Houtven > wrote: > > Okay cool, we fixed it: http://python-commandments.org/python3.html > > > > People are otherwise happy with the text? > > Yep, looks pretty good to me. > > I hope you don't mind, but I actually borrowed your text to seed a > corresponding page on the Python wiki: > http://wiki.python.org/moin/Python2orPython3 > > It turns out the beginner's guide on the wiki doesn't even acknowledge > the possibility of downloading Python 3.1 rather than 2.6 to start > experimenting with Python. > > The Wiki is probably a good place for this kind of material, anyway - > it makes it much easier for people to update as they identify major > third party libraries that do and don't have Py3k compatible versions > (and, some day, Python2 compatible versions). > > Cheers, > Nick. > > P.S. (We're going to have a tough decision to make somewhere along the > line where docs.python.org is concerned, too - when do we flick the > switch and make a 3.x version of the docs the default? We probably > won't need to seriously consider that question until the 3.3. time > frame though). > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/arcriley%40gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Jun 21 15:57:30 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 09:57:30 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: <20100621095730.752157d0@heresy> On Jun 21, 2010, at 09:37 AM, Arc Riley wrote: >Also, under where it mentions that most OS's do not include Python 3, it >should be noted which have good support for it. Gentoo (for example) has >excellent support for Python 3, automatically installing Python packages >which have Py3 support for both Py2 and Py3, and the python-based Portage >package system runs cleanly on Py2.6, Py3.1 and Py3.2. We're trying to get there for Ubuntu (driven also by Debian). We have Python 3.1.2 in main for Lucid, though we will probably not get 3.2 into Maverick (the October 2010 release). We're currently concentrating on Python 2.7 as a supported version because it'll be released by then, while 3.2 will still be in beta. If you want to help, or have complaints, kudos, suggestions, etc. for Python support on Ubuntu, you can contact me off-list. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ncoghlan at gmail.com Mon Jun 21 16:21:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jun 2010 00:21:31 +1000 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On Mon, Jun 21, 2010 at 11:37 PM, Arc Riley wrote: > I would suggest that if packages that do not have Python 3 support yet are > listed, then their alternatives should also. > > PyQt has had Py3 support for some time. > PostgreSQL and SQLite do (as does SQLAlchemy) > CherryPy has had Py3 support for the last release cycle > libxml2 does not, but lxml does. > > Also, under where it mentions that most OS's do not include Python 3, it > should be noted which have good support for it.? Gentoo (for example) has > excellent support for Python 3, automatically installing Python packages > which have Py3 support for both Py2 and Py3, and the python-based Portage > package system runs cleanly on Py2.6, Py3.1 and Py3.2. > > Give credit where credit is due. :-) A decent listing of major packages that already support Python 3 would be very handy for the new Python2orPython3 page I created on the wiki, and easier to keep up-to-date. (the old Early2to3Migrations page didn't look particularly up to date, but hopefully we can keep the new list in a happier state). It just ticked past midnight for me, so I'm off to bed, but for anyone with a wiki account, have at it: http://wiki.python.org/moin/Python2orPython3 (Updating the beginner's guide to recognise Python 3 as a valid option would also be helpful: http://wiki.python.org/moin/BeginnersGuide) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jun 21 16:25:58 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jun 2010 00:25:58 +1000 Subject: [Python-Dev] [OT] carping about irritating people (was: bytes / unicode) In-Reply-To: <871vc0plt6.fsf_-_@benfinney.id.au> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <871vc0plt6.fsf_-_@benfinney.id.au> Message-ID: On Mon, Jun 21, 2010 at 11:17 PM, Ben Finney wrote: > "Stephen J. Turnbull" writes: > >> your base URL is gonna be b'mailto:stephen at xemacs.org', but the >> natural thing the UI will want to do is >> >> formurl = baseurl + '?subject=??????????' > > Incidentally, which irritating person was the topic of this > Japanese-language message to you? > > (The subject in Stephen's example message translates roughly as > ?(unspecified third person) is an irritating rascal, don't you agree?.) Given what he said about the base URL, it would appear to be a self-deprecating self-description. Nicely done :) (I can pronounce that subject line, but I didn't know what it meant without the translation). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jun 21 16:27:59 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jun 2010 00:27:59 +1000 Subject: [Python-Dev] [OT] carping about irritating people (was: bytes / unicode) In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <871vc0plt6.fsf_-_@benfinney.id.au> Message-ID: > Given what he said about the base URL, it would appear to be a > self-deprecating self-description. Nicely done :) Gah, no it isn't, you're right, the message leaves it unspecified. OK, no more posting after midnight for me... (well, not tonight, anyway) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pje at telecommunity.com Mon Jun 21 16:51:25 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 10:51:25 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> Message-ID: <20100621145133.7F5333A404D@sparrow.telecommunity.com> At 10:20 PM 6/21/2010 +1000, Nick Coghlan wrote: >For the idea of avoiding excess copying of bytes through multiple >encoding/decoding calls... isn't that meant to be handled at an >architectural level (i.e. decode once on the way in, encode once on >the way out)? Optimising the single-byte codec case by minimising data >copying (possibly through creative use of PEP 3118) may be something >that we want to look at eventually, but it strikes me as something of >a premature optimisation at this point in time (i.e. the old adage >"first get it working, then get it working fast"). The issue is, I'd like to have an idempotent incantation that I can use to make the inputs and outputs to stdlib functions behave in a type-safe manner with respect to bytes, in cases where bytes are really what I want operated on. Note too that this is an argument for symmetry in wrapping the inputs and outputs, so that the code doesn't have to "know" what it's dealing with! After all, right now, if a stdlib function might return bytes or unicode depending on runtime conditions, I can't even hardcode an .encode() call -- it would fail if the return type is a bytes. This basically goes against the "tell, don't ask" pattern, and the Pythonically idempotent approach. That is, Python builtins normally return you back the same thing if it's already what you want - int(someInt)-> someInt, iter(someIter)->someIter, etc. Since this incantation may need to be used often, and in places that are not known to me in advance, I would like it to not impose new overhead in unexpected places. (i.e., the usual argument brought against making changes to the 'list' type that would change certain operations from O(1) to O(log something)). It's more about predictability, and having One *Obvious* Way To Do It, as opposed to "several ways, which you need to think carefully about and restructure your entire architecture around if necessary". One obvious way means I can focus on the mechanical effort of porting *first*, without having to think. So, the performance issue isn't really about performance *per se*, so much as about the "mental UI" of the language. You could just as easily lie and tell me that your bstr implementation is O(1), and I would probably be happy and never notice, because the issue was never really about performance as such, but about having to *think* about it. (i.e., breaking flow.) Really, the entire issue can presumably be dealt with by some series of incantations - it's just code after all. But having to sit and think about *every* situation where I'm dealing with bytes/unicode distinctions seems like a torture compared to being able to say, "okay, so when dealing with this sort of API and this sort of data, this is the One Obvious Way to do the conversions." It's One Obvious Way that I want, but some people seem to be arguing that the One Obvious Way is to Think Carefully About It Every Time -- and that seems to violate the "Obvious" part, IMO. ;-) From a.badger at gmail.com Mon Jun 21 17:28:05 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 21 Jun 2010 11:28:05 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <20100621095730.752157d0@heresy> References: <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> <20100621095730.752157d0@heresy> Message-ID: <20100621152805.GU5787@unaka.lan> On Mon, Jun 21, 2010 at 09:57:30AM -0400, Barry Warsaw wrote: > On Jun 21, 2010, at 09:37 AM, Arc Riley wrote: > > >Also, under where it mentions that most OS's do not include Python 3, it > >should be noted which have good support for it. Gentoo (for example) has > >excellent support for Python 3, automatically installing Python packages > >which have Py3 support for both Py2 and Py3, and the python-based Portage > >package system runs cleanly on Py2.6, Py3.1 and Py3.2. > > We're trying to get there for Ubuntu (driven also by Debian). We have Python > 3.1.2 in main for Lucid, though we will probably not get 3.2 into Maverick > (the October 2010 release). We're currently concentrating on Python 2.7 as a > supported version because it'll be released by then, while 3.2 will still be > in beta. > > If you want to help, or have complaints, kudos, suggestions, etc. for Python > support on Ubuntu, you can contact me off-list. > Fedora 14 is about the same. A nice to have thing that goes along with these would be a table that has packages ported to python3 and which distributions have the python3 version of the package. Once most of the important third party packages are ported to python3 and in the distributions, this table will likely become out-dated and probably should be reaped but right now it's a very useful thing to see. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From arcriley at gmail.com Mon Jun 21 17:31:08 2010 From: arcriley at gmail.com (Arc Riley) Date: Mon, 21 Jun 2010 11:31:08 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <201006211113.06767.stephan.richter@gmail.com> References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> Message-ID: Personally, I'd like to celebrate the upcoming Python 3.2 release (which will hopefully include 3to2) with moving all packages which do not have the 'Programming Language :: Python :: 3' classifier to a "Legacy" section of PyPI and offer only Python 3 packages otherwise. Of course put a banner at the top clearly explaining that Python 2 packages can be found in the Legacy section. Radical, I know, but at some point we really need to make this move. PyPI really needs a mechanism to cull out the moribund packages from being displayed next to the actively maintained ones. There's so many packages on there that only work on Python 2.2-2.4 (for example), or with a specific highly outdated version of another package, etc. On Mon, Jun 21, 2010 at 11:13 AM, Stephan Richter wrote: > On Monday, June 21, 2010, Nick Coghlan wrote: > > A decent listing of major packages that already support Python 3 would > > be very handy for the new Python2orPython3 page I created on the wiki, > > and easier to keep up-to-date. (the old Early2to3Migrations page > > didn't look particularly up to date, but hopefully we can keep the new > > list in a happier state). > > I really just want to be able to go to PyPI, Click on "Browse packages" and > then select "Python 3" (it can currently be accomplished by clicking > "Python" > and then "3"). Of course, package developers need to be encouraged to add > these Trove classifiers so that the listings are as complete as possible. > > Regards, > Stephan > -- > Entrepreneur and Software Geek > Google me. "Zope Stephan Richter" > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lvh at laurensvh.be Mon Jun 21 17:33:35 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Mon, 21 Jun 2010 17:33:35 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On Mon, Jun 21, 2010 at 3:37 PM, Arc Riley wrote: > I would suggest that if packages that do not have Python 3 support yet are > listed, then their alternatives should also. Okay, this is being worked on. > PyQt has had Py3 support for some time. Added, as well as PySide. > PostgreSQL and SQLite do (as does SQLAlchemy) wrt Postgres: Is that psycopg2? Not sure what that's an alternative to, since the 2.x list doesn't have any ORMs or database APIs at the moment (unless Django counts). > CherryPy has had Py3 support for the last release cycle Okay, going to add it but can't right now because lots of people are editing. > libxml2 does not, but lxml does. That's okay, I don't think many people seriously use python-libxml2 anyway (using lxml instead) :-) Again, not sure what that would be an alternative for though? > Also, under where it mentions that most OS's do not include Python 3, it > should be noted which have good support for it.? Gentoo (for example) has > excellent support for Python 3, automatically installing Python packages > which have Py3 support for both Py2 and Py3, and the python-based Portage > package system runs cleanly on Py2.6, Py3.1 and Py3.2. As Barry has pointed out 3.x is in many distros now, so in order to not make people angry that their distro who also does the Right Thing isn't mentioned (what's Arch do? py3k is easily available from AUR, that's not really ArchLinux proper but every Arch user I've ever talked to considers AUR an integral part), I added this: """ Also, quite a few distributions have Python 3.x available already for end-users, even if it's not the default interpreter. """ I think that would make everyone happy, and the wiki article that much more maintainable. Thanks for your input, Laurens From lvh at laurensvh.be Mon Jun 21 17:39:04 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Mon, 21 Jun 2010 17:39:04 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> Message-ID: On Mon, Jun 21, 2010 at 5:28 PM, Toshio Kuratomi wrote: > Fedora 14 is about the same. ?A nice to have thing that goes along > with these would be a table that has packages ported to python3 and which > distributions have the python3 version of the package. Yeah, this is exactly why I'd prefer to not have to maintain a specific list. Big distros are making Python 3.x available, it's not the default interpreter yet anywhere (AFAIK?), but that's going to happen in the next few releases of said distributions. On Mon, Jun 21, 2010 at 5:31 PM, Arc Riley wrote: > Personally, I'd like to celebrate the upcoming Python 3.2 release (which > will hopefully include 3to2) with moving all packages which do not have the > 'Programming Language :: Python :: 3' classifier to a "Legacy" section of > PyPI and offer only Python 3 packages otherwise.? Of course put a banner at > the top clearly explaining that Python 2 packages can be found in the Legacy > section. > > Radical, I know, but at some point we really need to make this move. I agree we have to make it at some point but I feel this is way, way too early. thanks for your continued input, Laurens From barry at python.org Mon Jun 21 17:43:07 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 11:43:07 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> Message-ID: <20100621114307.48735698@heresy> On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote: >Something that may make sense to ease the porting process is for some >of these "on the boundary" I/O related string manipulation functions >(such as os.path.join) to grow "encoding" keyword-only arguments. The >recommended approach would be to provide all strings, but bytes could >also be accepted if an encoding was specified. (If you want to mix >encodings - tough, do the decoding yourself). This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it. Would it make sense to have "encoding-carrying" bytes and str types? Basically, I'm thinking of types (maybe even the current ones) that carry around a .encoding attribute so that they can be automatically encoded and decoded where necessary. This at least would simplify APIs that need to do the conversion. By default, the .encoding attribute would be some marker to indicated "I have no idea, do it explicitly" and if you combine ebytes or estrs that have incompatible encodings, you'd either throw an exception or reset the .encoding to IAmConfuzzled. But say you had an email header like: =?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?= And code like the following (made less crappy): -----snip snip----- class ebytes(bytes): encoding = 'ascii' def __str__(self): s = estr(self.decode(self.encoding)) s.encoding = self.encoding return s class estr(str): encoding = 'ascii' s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp') b = bytes(s, 'euc-jp') eb = ebytes(b) eb.encoding = 'euc-jp' es = str(eb) print(repr(eb), es, es.encoding) -----snip snip----- Running this you get: b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ???????? euc-jp Would it be feasible? Dunno. Would it help ease the bytes/str confusion? Dunno. But I think it would help make APIs easier to design and use because it would cut down on the encoding-keyword function signature infection. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From murman at gmail.com Mon Jun 21 18:03:30 2010 From: murman at gmail.com (Michael Urman) Date: Mon, 21 Jun 2010 11:03:30 -0500 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621145133.7F5333A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> Message-ID: On Mon, Jun 21, 2010 at 09:51, P.J. Eby wrote: > The issue is, I'd like to have an idempotent incantation that I can use to > make the inputs and outputs to stdlib functions behave in a type-safe manner > with respect to bytes, in cases where bytes are really what I want operated > on. > > Note too that this is an argument for symmetry in wrapping the inputs and > outputs, so that the code doesn't have to "know" what it's dealing with! It is somewhat troublesome that there doesn't appear to be an obvious built-in idempotent-when-possible function that gives back the provided bytes/str, or converts to the requested type per the listed encoding (as of 3.1.2). Would it be useful to make the second versions of these work, or would that return us to the confusion of the 2.x era? On the other hand, since these are all TypeErrors instead of UnicodeErrors, it's an easy wrapper to write. >>> bytes('abc', 'latin-1') b'abc' >>> bytes(b'abc', 'latin-1') TypeError: encoding or errors without a string argument >>> str(b'abc', 'latin-1') 'abc' >>> str('abc', 'latin-1') TypeError: decoding str is not supported Interestingly the online docs for str say it can decode either a byte string or a character buffer, a term which doesn't yield a definition in a search; apparently either a string is not a character buffer, or the docs are incorrect. http://docs.python.org/py3k/library/functions.html?highlight=str#str However it looks like this is consistent with int. >>> int(4, 0) TypeError: int() can't convert non-string with explicit base -- Michael Urman From stephen at xemacs.org Mon Jun 21 18:08:53 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 01:08:53 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> Lennart Regebro writes: > 2010/6/21 Stephen J. Turnbull : > > IMO, the UI is right. ?"Something" like the above "ought" to work. > > Right. That said, many times when you want to do urlparse etc they > might be binary, and you might want binary. So maybe the methods > should work with both? First, a caveat: I'm a Unicode/encodings person, not an experienced web programmer. My opinions on whether this would work well in practice should be taken with a grain of salt. Speaking for myself, I live in a country where the natives have saddled themselves with no less than 4 encodings in common use, and I would never want "binary" since none of them would display as anything useful in a traceback. Wherever possible, I decode "blobs" into structured objects, I do it as soon as possible, and if for efficiency reasons I want to do this lazily, I store the blob in a separate .raw_object attribute. If they're textual, I decode them to text. I can't see an efficiency argument for decoding URIs lazily in most applications. In the case of structured text like URIs, I would create a separate class for handling them with string-like operations. Internally, all text would be raw Unicode (ie, not url-encoded); repr(uri) would use some kind of readable quoting convention (not url-encoding) to disambiguate random reserved characters from separators, while str(uri) would produce an url-encoded string. Converting to and from wire format is just .encode and .decode, then, and in this country you need to be flexible about which encoding you use. Agreed, this stuff is really annoying. But I think that just comes with the territory. PJE reports that folks don't like doing encoding and decoding all over the place. I understand that, but if they're doing a lot of that, I have to wonder why. Why not define the one line function and get on with life? The thing is, where I live, it's not going to be a one line function. I'm going to be dealing with URLs that are url-encoded representations of UTF-8, Shift-JIS, EUC-JP, and occasionally RFC 2047! So I need an API that explicitly encodes and decodes. And I need an API that presents Japanese as Japanese rather than as line noise. Eg, PJE writes Ugh. I meant: newurl = urljoin(str(base, 'latin-1'), 'subdir').encode('latin-1') Which just goes to the point of how ridiculous it is to have to convert things to strings and back again to use APIs that ought to just handle bytes properly in the first place. But if you need that "everywhere", what's so hard about def urljoin_wrapper (base, subdir): return urljoin(str(base, 'latin-1'), subdir).encode('latin-1') Now, note how that pattern fails as soon as you want to use non-ISO-8859-1 languages for subdir names. In Python 3, the code above is just plain buggy, IMHO. The original author probably will never need the generalization. But her name will be cursed unto the nth generation by people who use her code on a different continent. The net result is that bytes are *not* a programmer- or user-friendly way to do this, except for the minority of the world for whom Latin-1 is a good approximation to their daily-use unibyte encoding (eg, it's probably usable for debugging in Dansk, but you won't win any popularity contests in Tel Aviv or Shanghai). From tjreedy at udel.edu Mon Jun 21 18:23:18 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 12:23:18 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On 6/21/2010 8:33 AM, Nick Coghlan wrote: > P.S. (We're going to have a tough decision to make somewhere along the > line where docs.python.org is concerned, too - when do we flick the > switch and make a 3.x version of the docs the default? Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'. Trunk released always take over docs.python.org. To do otherwise would be to say that 3.2 is not a real trunk release and not yet ready for real use -- a major slam. Actually, I thought this was already discussed and decided ;-). Terry Jan Reedy From a.badger at gmail.com Mon Jun 21 18:34:04 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 21 Jun 2010 12:34:04 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621114307.48735698@heresy> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> Message-ID: <20100621163404.GV5787@unaka.lan> On Mon, Jun 21, 2010 at 11:43:07AM -0400, Barry Warsaw wrote: > On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote: > > >Something that may make sense to ease the porting process is for some > >of these "on the boundary" I/O related string manipulation functions > >(such as os.path.join) to grow "encoding" keyword-only arguments. The > >recommended approach would be to provide all strings, but bytes could > >also be accepted if an encoding was specified. (If you want to mix > >encodings - tough, do the decoding yourself). > > This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz > for it. > > Would it make sense to have "encoding-carrying" bytes and str types? > Basically, I'm thinking of types (maybe even the current ones) that carry > around a .encoding attribute so that they can be automatically encoded and > decoded where necessary. This at least would simplify APIs that need to do > the conversion. > > By default, the .encoding attribute would be some marker to indicated "I have > no idea, do it explicitly" and if you combine ebytes or estrs that have > incompatible encodings, you'd either throw an exception or reset the .encoding > to IAmConfuzzled. But say you had an email header like: > > =?euc-jp?b?pc+l7aG8pe+hvKXrpcmhqg==?= > > And code like the following (made less crappy): > > -----snip snip----- > class ebytes(bytes): > encoding = 'ascii' > > def __str__(self): > s = estr(self.decode(self.encoding)) > s.encoding = self.encoding > return s > > > class estr(str): > encoding = 'ascii' > > > s = str(b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa', 'euc-jp') > b = bytes(s, 'euc-jp') > > eb = ebytes(b) > eb.encoding = 'euc-jp' > es = str(eb) > print(repr(eb), es, es.encoding) > -----snip snip----- > > Running this you get: > > b'\xa5\xcf\xa5\xed\xa1\xbc\xa5\xef\xa1\xbc\xa5\xeb\xa5\xc9\xa1\xaa' ???????? euc-jp > > Would it be feasible? Dunno. Would it help ease the bytes/str confusion? > Dunno. But I think it would help make APIs easier to design and use because > it would cut down on the encoding-keyword function signature infection. > I like the idea of having encoding information carried with the data. I don't think that an ebytes type that can *optionally* have an encoding attribute makes the situation less confusing, though. To me the biggest problem with python-2.x's unicode/bytes handling was not that it threw exceptions but that it didn't always throw exceptions. You might test this in python2:: t = u'cafe' function(t) And say, ah my code works. Then a user gives it this:: t = u'caf?' function(t) And get a unicode error because the function only works with unicode in the ascii range. ebytes seems to have the same pitfall where the code path exercised by your tests could work with:: eb = ebytes(b) eb.encoding = 'euc-jp' function(eb) but the user exercises a code path that does this and fails:: eb = ebytes(b) function(eb) What do you think of making the encoding attribute a mandatory part of creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From tjreedy at udel.edu Mon Jun 21 18:35:08 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 12:35:08 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> Message-ID: On 6/21/2010 11:31 AM, Arc Riley wrote: > Personally, I'd like to celebrate the upcoming Python 3.2 release (which > will hopefully include 3to2) with moving all packages which do not have > the 'Programming Language :: Python :: 3' classifier to a "Legacy" > section of PyPI and offer only Python 3 packages otherwise. Of course > put a banner at the top clearly explaining that Python 2 packages can be > found in the Legacy section. I do not think 2.x should be dissed any more than 3.x, which is to say, not at all. The impression I got from lurking on #python last night, in between disconnects, is that at least a couple of people feel that there is a move afoot to push people to Python3. Whether that had any connection to discussions here, I could not tell. Having pypi.python.org/py2 and pypi.python.org/py3 though might be a good idea. Inquiries from either url would automatically filter. The counterargument is that there may be people looking for packages available for *both*. > Radical, I know, but at some point we really need to make this move. > > PyPI really needs a mechanism to cull out the moribund packages from > being displayed next to the actively maintained ones. The default ordering for search results is by rating. There's so many > packages on there that only work on Python 2.2-2.4 (for example), or > with a specific highly outdated version of another package, etc. And there are people running those versions. I think better classification and filtering is the answer, though hard to mandate. Terry Jan Reedy From pje at telecommunity.com Mon Jun 21 18:46:44 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 12:46:44 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> Message-ID: <20100621164650.16A093A414B@sparrow.telecommunity.com> At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote: >It may be that there are places where we need to rewrite standard >library algorithms to be bytes/str neutral (e.g. by using length one >slices instead of indexing). It may be that there are more APIs that >need to grow "encoding" keyword arguments that they then pass on to >the functions they call or use to convert str arguments to bytes (or >vice-versa). But without people trying to port affected libraries and >reporting bugs when they find issues, the situation isn't going to >improve. > >Now, if these bugs are already being reported against 3.1 and just >aren't getting fixed, that's a completely different story... The overall impression, though, is that this isn't really a step forward. Now, bytes are the special case instead of unicode, but that special case isn't actually handled any better by the stdlib - in fact, it's arguably worse. And, the burden of addressing this seems to have been shifted from the people who made the change, to the people who are going to use it. But those people are not necessarily in a position to tell you anything more than, "give me something that works with bytes". What I can tell you is that before, since string constants in the stdlib were ascii bytes, and transparently promoted to unicode, stdlib behavior was *predictable* in the presence of special cases: you got back either bytes or unicode, but either way, you could idempotently upgrade the result to unicode, or just pass it on. APIs were "str safe, unicode aware". If you passed in bytes, you weren't going to get unicode without a warning, and if you passed in unicode, it'd work and you'd get unicode back. Now, the APIs are neither safe nor aware -- if you pass bytes in, you get unpredictable results back. Ironically, it almost *would* have been better if bytes simply didn't work as strings at all, *ever*, but if you could wrap them with a bstr() to *treat* them as text. You could still have restrictions on combining them, as long as it was a restriction on the unicode you mixed with them. That is, if you could combine a bstr and a str if the *str* was restricted to ASCII. If we had the Python 3 design discussions to do over again, I think I would now have stuck with the position of not letting bytes be string-compatible at all, and instead proposed an explicit bstr() wrapper/adapter to use them as strings, that would (in that case) force coercion in the direction of bytes rather than strings. (And bstr need not have been a builtin - it could have been something you import, to help discourage casual usage.) Might this approach lead to some people doing things wrong in the case of porting? Sure. But there'd be little reason to use it in new code that didn't have a real need for bytestring manipulation. It might've been a better balance between practicality and purity, in that it keeps the language pure, while offering a practical way to deal with things in bytes if you really need to. And, bytes wouldn't silently succeed *some* of the time, leading to a trap. An easy inconsistency is worse than a bit of uniform chicken-waving. Is it too late to make that tradeoff? Probably. Certainly it's not practical to *implement* outside the language core, and removing string methods would fux0r anybody whose currently-ported code relies on bytes objects having string-like methods. From fuzzyman at voidspace.org.uk Mon Jun 21 18:49:55 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 21 Jun 2010 17:49:55 +0100 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621164650.16A093A414B@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> Message-ID: <4C1F9833.2080905@voidspace.org.uk> On 21/06/2010 17:46, P.J. Eby wrote: > At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote: >> It may be that there are places where we need to rewrite standard >> library algorithms to be bytes/str neutral (e.g. by using length one >> slices instead of indexing). It may be that there are more APIs that >> need to grow "encoding" keyword arguments that they then pass on to >> the functions they call or use to convert str arguments to bytes (or >> vice-versa). But without people trying to port affected libraries and >> reporting bugs when they find issues, the situation isn't going to >> improve. >> >> Now, if these bugs are already being reported against 3.1 and just >> aren't getting fixed, that's a completely different story... > > The overall impression, though, is that this isn't really a step > forward. Now, bytes are the special case instead of unicode, but that > special case isn't actually handled any better by the stdlib - in > fact, it's arguably worse. And, the burden of addressing this seems to > have been shifted from the people who made the change, to the people > who are going to use it. But those people are not necessarily in a > position to tell you anything more than, "give me something that works > with bytes". > > What I can tell you is that before, since string constants in the > stdlib were ascii bytes, and transparently promoted to unicode, stdlib > behavior was *predictable* in the presence of special cases: you got > back either bytes or unicode, but either way, you could idempotently > upgrade the result to unicode, or just pass it on. APIs were "str > safe, unicode aware". If you passed in bytes, you weren't going to get > unicode without a warning, and if you passed in unicode, it'd work and > you'd get unicode back. > > Now, the APIs are neither safe nor aware -- if you pass bytes in, you > get unpredictable results back. > > Ironically, it almost *would* have been better if bytes simply didn't > work as strings at all, *ever*, but if you could wrap them with a > bstr() to *treat* them as text. You could still have restrictions on > combining them, as long as it was a restriction on the unicode you > mixed with them. That is, if you could combine a bstr and a str if the > *str* was restricted to ASCII. > > If we had the Python 3 design discussions to do over again, I think I > would now have stuck with the position of not letting bytes be > string-compatible at all, and instead proposed an explicit bstr() > wrapper/adapter to use them as strings, that would (in that case) > force coercion in the direction of bytes rather than strings. (And > bstr need not have been a builtin - it could have been something you > import, to help discourage casual usage.) > > Might this approach lead to some people doing things wrong in the case > of porting? Sure. But there'd be little reason to use it in new code > that didn't have a real need for bytestring manipulation. > > It might've been a better balance between practicality and purity, in > that it keeps the language pure, while offering a practical way to > deal with things in bytes if you really need to. And, bytes wouldn't > silently succeed *some* of the time, leading to a trap. An easy > inconsistency is worse than a bit of uniform chicken-waving. > > Is it too late to make that tradeoff? Probably. Certainly it's not > practical to *implement* outside the language core, and removing > string methods would fux0r anybody whose currently-ported code relies > on bytes objects having string-like methods. > Why is your proposed bstr wrapper not practical to implement outside the core and use in your own libraries and frameworks? Michael > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From pje at telecommunity.com Mon Jun 21 18:54:53 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 12:54:53 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100621165506.26D4C3A404D@sparrow.telecommunity.com> At 01:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote: >But if you need that "everywhere", what's so hard about > >def urljoin_wrapper (base, subdir): > return urljoin(str(base, 'latin-1'), subdir).encode('latin-1') > >Now, note how that pattern fails as soon as you want to use >non-ISO-8859-1 languages for subdir names. Bear in mind that the use cases I'm talking about here are WSGI stacks with components written by multiple authors -- each of whom may have to define that function, and still get it right. Sure, there are some things that could go in wsgiref in the stdlib. However, as of this moment, there's only a very uneasy rough consensus in Web-Sig as to how the heck WSGI should actually *work* on Python 3, because of issues like these. That makes it tough to actually say what should happen in the stdlib -- e.g., which things should be classed as stdlib bugs, which things should be worked around with wrappers or new functions, etc. From benjamin at python.org Mon Jun 21 19:14:09 2010 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 21 Jun 2010 12:14:09 -0500 Subject: [Python-Dev] [RELEASED] Python 2.7 release candidate 2 Message-ID: On behalf of the Python development team, I'm tickled pink to announce the second release candidate of Python 2.7. Python 2.7 is scheduled (by Guido and Python-dev) to be the last major version in the 2.x series. However, 2.7 will have an extended period of bugfix maintenance. 2.7 includes many features that were first released in Python 3.1. The faster io module, the new nested with statement syntax, improved float repr, set literals, dictionary views, and the memoryview object have been backported from 3.1. Other features include an ordered dictionary implementation, unittests improvements, a new sysconfig module, auto-numbering of fields in the str/unicode format method, and support for ttk Tile in Tkinter. For a more extensive list of changes in 2.7, see http://doc.python.org/dev/whatsnew/2.7.html or Misc/NEWS in the Python distribution. To download Python 2.7 visit: http://www.python.org/download/releases/2.7/ While this is a preview release and is thus not suitable for production use, we strongly encourage Python application and library developers to test the release with their code and report any bugs they encounter to: http://bugs.python.org/ This helps ensure that those upgrading to Python 2.7 will encounter as few bumps as possible. 2.7 documentation can be found at: http://docs.python.org/2.7/ Enjoy! -- Benjamin Peterson Release Manager benjamin at python.org (on behalf of the entire python-dev team and 2.7's contributors) From pje at telecommunity.com Mon Jun 21 19:17:57 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 13:17:57 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621114307.48735698@heresy> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> Message-ID: <20100621171803.B35C33A414B@sparrow.telecommunity.com> At 11:43 AM 6/21/2010 -0400, Barry Warsaw wrote: >On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote: > >Something that may make sense to ease the porting process is for some > >of these "on the boundary" I/O related string manipulation functions > >(such as os.path.join) to grow "encoding" keyword-only arguments. The > >recommended approach would be to provide all strings, but bytes could > >also be accepted if an encoding was specified. (If you want to mix > >encodings - tough, do the decoding yourself). > >This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz >for it. > >Would it make sense to have "encoding-carrying" bytes and str types? It's not a stupid idea, and could potentially work. It also might have a better chance of being able to actually be *implemented* in 3.x than my idea. >Basically, I'm thinking of types (maybe even the current ones) that carry >around a .encoding attribute so that they can be automatically encoded and >decoded where necessary. This at least would simplify APIs that need to do >the conversion. I'm not really sure how much use the encoding is on a unicode object - what would it actually mean? Hm. I suppose it would effectively mean "this string can be represented in this encoding" -- which is useful, in that you could fail operations when combining with bytes of a different encoding. Hm... no, in that case you should just encode the string to the bytes' encoding, and let that throw an error if it fails. So, really, there's no reason for a string to know its encoding. All you need is the bytes type to have an encoding attribute, and when doing mixed-type operations between bytes and strings, coerce to *bytes of the same encoding*. However, if .encoding is None, then coercion would follow the same rules as now -- i.e., convert the bytes to unicode, assuming an ascii encoding. (This would be different than setting an encoding of 'ascii', because in that case, it means you want cross-type operations to result in ascii bytes, rather than a unicode string, and to fail if the unicode part can't be encoded appropriately. The 'None' setting is effectively a nod to compatibility with prior 3.x versions, since I assume we can't just throw out the old coercion behavior.) Then, a few more changes to the bytes type would round out the implementation: * Allow .decode() to not specify an encoding, unless .encoding is None * Add back in the missing string methods (e.g. .encode()), since you can transparently upgrade to a string) * Smart __str__, as shown in your proposal. >Would it be feasible? Dunno. Probably, although it might mean adding back in special cases that were previously taken out, and a few new ones. > Would it help ease the bytes/str confusion? Dunno. Not sure what confusion you mean -- Web-SIG and I at least are not confused about the difference between bytes and str, or we wouldn't be having an issue. ;-) Or maybe you mean the stdlib's API confusion? In which case, yes, definitely! > But I think it would help make APIs easier to design and use because >it would cut down on the encoding-keyword function signature infection. Not only that, but I believe it would also retroactively make the stdlib's implementation of those APIs "correct" again, and give us One Obvious Way to work with bytes of a known encoding, while constraining any unicode that gets combined with those bytes to be validly encodable. It also gives you an idempotent constructor for bytes of a specified encoding, that can take either a bytes of unspecified encoding, a bytes of the correct encoding, or a string that can be encoded as such. In short, +1. (I wish it were possible to go back and make bytes non-strings and have only this ebytes or bstr or whatever type have string methods, but I'm pretty sure that ship has already sailed.) From pje at telecommunity.com Mon Jun 21 19:24:10 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 13:24:10 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621163404.GV5787@unaka.lan> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> Message-ID: <20100621172413.578853A404D@sparrow.telecommunity.com> At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: >What do you think of making the encoding attribute a mandatory part of >creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). As long as the coercion rules force str+ebytes (or str % ebytes, ebytes % str, etc.) to result in another ebytes (and fail if the str can't be encoded in the ebytes' encoding), I'm personally fine with it, although I really like the idea of tacking the encoding to bytes objects in the first place. OTOH, one potential problem with having the encoding on the bytes object rather than the ebytes object is that then you can't easily take bytes from a socket and then say what encoding they are, without interfering with the sockets API (or whatever other place you get the bytes from). So, on balance, making ebytes a separate type (perhaps one that's just a pointer to the bytes and a pointer to the encoding) would indeed make more sense. It having different coercion rules for interacting with strings would make more sense too in that case. (The ideal, of course, would still be to not let bytes objects be stringlike at all, with only ebytes acting string-like. That way, you'd be forced to be explicit about your encoding when working with bytes, but all you'd need to do was make an ebytes call.) From a.badger at gmail.com Mon Jun 21 18:56:11 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 21 Jun 2010 12:56:11 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100621165611.GW5787@unaka.lan> On Tue, Jun 22, 2010 at 01:08:53AM +0900, Stephen J. Turnbull wrote: > Lennart Regebro writes: > > > 2010/6/21 Stephen J. Turnbull : > > > IMO, the UI is right. ?"Something" like the above "ought" to work. > > > > Right. That said, many times when you want to do urlparse etc they > > might be binary, and you might want binary. So maybe the methods > > should work with both? > > First, a caveat: I'm a Unicode/encodings person, not an experienced > web programmer. My opinions on whether this would work well in > practice should be taken with a grain of salt. > > Speaking for myself, I live in a country where the natives have > saddled themselves with no less than 4 encodings in common use, and I > would never want "binary" since none of them would display as anything > useful in a traceback. Wherever possible, I decode "blobs" into > structured objects, I do it as soon as possible, and if for efficiency > reasons I want to do this lazily, I store the blob in a separate > .raw_object attribute. If they're textual, I decode them to text. I > can't see an efficiency argument for decoding URIs lazily in most > applications. > > In the case of structured text like URIs, I would create a separate > class for handling them with string-like operations. Internally, all > text would be raw Unicode (ie, not url-encoded); repr(uri) would use > some kind of readable quoting convention (not url-encoding) to > disambiguate random reserved characters from separators, while > str(uri) would produce an url-encoded string. Converting to and from > wire format is just .encode and .decode, then, and in this country you > need to be flexible about which encoding you use. > > Agreed, this stuff is really annoying. But I think that just comes > with the territory. PJE reports that folks don't like doing encoding > and decoding all over the place. I understand that, but if they're > doing a lot of that, I have to wonder why. Why not define the one > line function and get on with life? > > The thing is, where I live, it's not going to be a one line function. > I'm going to be dealing with URLs that are url-encoded representations > of UTF-8, Shift-JIS, EUC-JP, and occasionally RFC 2047! So I need an > API that explicitly encodes and decodes. And I need an API that > presents Japanese as Japanese rather than as line noise. > > Eg, PJE writes > > Ugh. I meant: > > newurl = urljoin(str(base, 'latin-1'), 'subdir').encode('latin-1') > > Which just goes to the point of how ridiculous it is to have to > convert things to strings and back again to use APIs that ought to > just handle bytes properly in the first place. > > But if you need that "everywhere", what's so hard about > > def urljoin_wrapper (base, subdir): > return urljoin(str(base, 'latin-1'), subdir).encode('latin-1') > > Now, note how that pattern fails as soon as you want to use > non-ISO-8859-1 languages for subdir names. In Python 3, the code > above is just plain buggy, IMHO. The original author probably will > never need the generalization. But her name will be cursed unto the > nth generation by people who use her code on a different continent. > > The net result is that bytes are *not* a programmer- or user-friendly > way to do this, except for the minority of the world for whom Latin-1 > is a good approximation to their daily-use unibyte encoding (eg, it's > probably usable for debugging in Dansk, but you won't win any > popularity contests in Tel Aviv or Shanghai). > One comment here -- you can also have uri's that aren't decodable into their true textual meaning using a single encoding. Apache will happily serve out uris that have utf-8, shift-jis, and euc-jp components inside of their path but the textual representation that was intended will be garbled (or be represented by escaped byte sequences). For that matter, apache will serve requests that have no true textual representation as it is working on the byte level rather than the character level. So a complete solution really should allow the programmer to pass in uris as bytes when the programmer knows that they need it. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From tjreedy at udel.edu Mon Jun 21 19:27:30 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 13:27:30 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C1EE2E1.5030105@udel.edu> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621013405.19DC33A4099@sparrow.telecommunity.com> <4C1EE2E1.5030105@udel.edu> Message-ID: On 6/20/2010 11:56 PM, Terry Reedy wrote: > The specific example is > > >>> urllib.parse.parse_qsl('a=b%e0') > [('a', 'b?')] > > where the character after 'b' is white ? in dark diamond, indicating an > error. > > parse_qsl() splits that input on '=' and sends each piece to > urllib.parse.unquote > unquote() attempts to "Replace %xx escapes by their single-character > equivalent.". unquote has an encoding parameter that defaults to 'utf-8' > in *its* call to .decode. parse_qsl does not have an encoding parameter. > If it did, and it passed that to unquote, then > the above example would become (simulated interaction) > > >>> urllib.parse.parse_qsl('a=b%e0', encoding='latin-1') > [('a', 'b?')] > > I got that output by copying the file and adding "encoding-'latin-1'" to > the unquote call. > > Does this solve this problem? > Has anything like this been added for 3.2? > Should it be? With a little searching, I found http://bugs.python.org/issue5468 with Miles Kaufmann's year-old comment "parse_qs and parse_qsl should also grow encoding and errors parameters to pass to the underlying unquote()". Patch review is needed. Terry Jan Reedy From stephan.richter at gmail.com Mon Jun 21 17:13:06 2010 From: stephan.richter at gmail.com (Stephan Richter) Date: Mon, 21 Jun 2010 11:13:06 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> Message-ID: <201006211113.06767.stephan.richter@gmail.com> On Monday, June 21, 2010, Nick Coghlan wrote: > A decent listing of major packages that already support Python 3 would > be very handy for the new Python2orPython3 page I created on the wiki, > and easier to keep up-to-date. (the old Early2to3Migrations page > didn't look particularly up to date, but hopefully we can keep the new > list in a happier state). I really just want to be able to go to PyPI, Click on "Browse packages" and then select "Python 3" (it can currently be accomplished by clicking "Python" and then "3"). Of course, package developers need to be encouraged to add these Trove classifiers so that the listings are as complete as possible. Regards, Stephan -- Entrepreneur and Software Geek Google me. "Zope Stephan Richter" From guido at python.org Mon Jun 21 19:29:27 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Jun 2010 10:29:27 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621164650.16A093A414B@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> Message-ID: On Mon, Jun 21, 2010 at 9:46 AM, P.J. Eby wrote: > At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote: >> >> It may be that there are places where we need to rewrite standard >> library algorithms to be bytes/str neutral (e.g. by using length one >> slices instead of indexing). It may be that there are more APIs that >> need to grow "encoding" keyword arguments that they then pass on to >> the functions they call or use to convert str arguments to bytes (or >> vice-versa). But without people trying to port affected libraries and >> reporting bugs when they find issues, the situation isn't going to >> improve. >> >> Now, if these bugs are already being reported against 3.1 and just >> aren't getting fixed, that's a completely different story... > > The overall impression, though, is that this isn't really a step forward. > ?Now, bytes are the special case instead of unicode, but that special case > isn't actually handled any better by the stdlib - in fact, it's arguably > worse. ?And, the burden of addressing this seems to have been shifted from > the people who made the change, to the people who are going to use it. ?But > those people are not necessarily in a position to tell you anything more > than, "give me something that works with bytes". > > What I can tell you is that before, since string constants in the stdlib > were ascii bytes, and transparently promoted to unicode, stdlib behavior was > *predictable* in the presence of special cases: you got back either bytes or > unicode, but either way, you could idempotently upgrade the result to > unicode, or just pass it on. ?APIs were "str safe, unicode aware". ?If you > passed in bytes, you weren't going to get unicode without a warning, and if > you passed in unicode, it'd work and you'd get unicode back. Actually, the big problem with Python 2 is that if you mix str and unicode, things work or crash depending on whether any of the str objects involved contain non-ASCII bytes. If one API decides to upgrade to Unicode, the result, when passed to another API, may well cause a UnicodeError because not all arguments have had the same treatment. > Now, the APIs are neither safe nor aware -- if you pass bytes in, you get > unpredictable results back. This seems an overgeneralization of a particular bug. There are APIs that are strictly text-in, text-out. There are others that are bytes-in, bytes-out. Let's call all those *pure*. For some operations it makes sense that the API is *polymorphic*, with which I mean that text-in causes text-out, and bytes-in causes byte-out. All of these are fine. Perhaps there are more situations where a polymorphic API would be helpful. Such APIs are not always so easy to implement, because they have to be careful with literals or other constants (and even more so mutable state) used internally -- but it can be done, and there are plenty of examples in the stdlib. The real problem apparently lies in (what I believe is only a few rare) APIs that are text-or-bytes-in and always-text-out (or always-bytes-out). Let's call them *hybrid*. Clearly, mixing hybrid APIs in a stream of pure or polymorphic API calls is a problem, because they turn a pure or polymorphic overall operation into a hybrid one. There are also text-in, bytes-out or bytes-in, text-out APIs that are intended for encoding/decoding of course, but these are in a totally different class. Abstractly, it would be good if there were as few as possible hybrid APIs, many pure or polymorphic APIs (which it should be in a particular case is a pragmatic choice), and a limited number of encoding/decoding APIs, which should generally be invoked at the edges of the program (e.g., I/O). > Ironically, it almost *would* have been better if bytes simply didn't work > as strings at all, *ever*, but if you could wrap them with a bstr() to > *treat* them as text. ?You could still have restrictions on combining them, > as long as it was a restriction on the unicode you mixed with them. ?That > is, if you could combine a bstr and a str if the *str* was restricted to > ASCII. ISTR that we considered something like this and decided to stay away from it. At this point I think that a successful 3rd party bstr implementation would be required before we rush to add one to the stdlib. > If we had the Python 3 design discussions to do over again, I think I would > now have stuck with the position of not letting bytes be string-compatible > at all, They aren't, unless you consider the presence of some methods with similar behavior (.lower(), .split() and so on) and the existence of some polymorphic APIs (see above) as "compatibility". > and instead proposed an explicit bstr() wrapper/adapter to use them > as strings, that would (in that case) force coercion in the direction of > bytes rather than strings. ?(And bstr need not have been a builtin - it > could have been something you import, to help discourage casual usage.) I'm stil unclear on exactly what bstr is supposed to be, but it sounds a bit like one of the rejected proposals for having a single (Unicode-capable) str type that is implemented using different width encodings (Latin-1, UCS-2, UCS-4) underneath. > Might this approach lead to some people doing things wrong in the case of > porting? ?Sure. ?But there'd be little reason to use it in new code that > didn't have a real need for bytestring manipulation. > > It might've been a better balance between practicality and purity, in that > it keeps the language pure, while offering a practical way to deal with > things in bytes if you really need to. ?And, bytes wouldn't silently succeed > *some* of the time, leading to a trap. ?An easy inconsistency is worse than > a bit of uniform chicken-waving. I still believe that believe that the instances of bytes silently succeeding *some* of the time refers to specific bugs in specific APIs, either intentional because of misguided compatibility desires, or accidental in the haste of trying to convert the entire stdlib to Python 3 in a finite time. > Is it too late to make that tradeoff? ?Probably. ?Certainly it's not > practical to *implement* outside the language core, and removing string > methods would fux0r anybody whose currently-ported code relies on bytes > objects having string-like methods. > > -- --Guido van Rossum (python.org/~guido) From pje at telecommunity.com Mon Jun 21 19:29:55 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 13:29:55 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C1F9833.2080905@voidspace.org.uk> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> <4C1F9833.2080905@voidspace.org.uk> Message-ID: <20100621172957.EB55C3A404D@sparrow.telecommunity.com> At 05:49 PM 6/21/2010 +0100, Michael Foord wrote: >Why is your proposed bstr wrapper not practical to implement outside >the core and use in your own libraries and frameworks? __contains__ doesn't have a converse operation, so you can't code a type that works around this (Python 3.1 shown): >>> from os.path import join >>> join(b'x','y') Traceback (most recent call last): File "", line 1, in File "c:\Python31\lib\ntpath.py", line 161, in join if b[:1] in seps: TypeError: Type str doesn't support the buffer API >>> join('y',b'x') Traceback (most recent call last): File "", line 1, in File "c:\Python31\lib\ntpath.py", line 161, in join if b[:1] in seps: TypeError: 'in ' requires string as left operand, not bytes IOW, only one of these two cases can be worked around by using a bstr (or ebytes) that doesn't have support from the core string type. I'm not sure if the "in" operator is the only case where implementing such a type would fail, but it's the most obvious one. String formatting, of both the % and .format() varieties is another. (__rmod__ doesn't help if your bytes object is one of several data items in a tuple or dict -- the common case for % formatting.) From tjreedy at udel.edu Mon Jun 21 19:36:38 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 13:36:38 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621114307.48735698@heresy> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> Message-ID: On 6/21/2010 11:43 AM, Barry Warsaw wrote: > This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz > for it. > > Would it make sense to have "encoding-carrying" bytes and str types? On 2009-11-5 I posted 'Add encoding attribute to bytes' to python-ideas. It was shot down at the time. Terry Jan Reedy From tjreedy at udel.edu Mon Jun 21 19:45:20 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 13:45:20 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> Message-ID: On 6/21/2010 8:51 AM, Nick Coghlan wrote: > > I don't know that the "all is well" camp actually exists. The camp > that I do see existing is the one that says "without a bug report, > inconsistencies in the standard library's unicode handling won't get > fixed". > > The issues picked up by the regression test suite have already been > dealt with, but that suite is unfortunately far from comprehensive. > Just like a lot of Python code that is out there, the standard library > isn't immune to the poor coding practices that were permitted by the > blurry lines between text and octet streams in 2.x. > > It may be that there are places where we need to rewrite standard > library algorithms to be bytes/str neutral (e.g. by using length one > slices instead of indexing). It may be that there are more APIs that > need to grow "encoding" keyword arguments that they then pass on to > the functions they call or use to convert str arguments to bytes (or > vice-versa). But without people trying to port affected libraries and > reporting bugs when they find issues, the situation isn't going to > improve. > > Now, if these bugs are already being reported against 3.1 and just > aren't getting fixed, that's a completely different story... Some of the above have been, over a year ago. See, for instance, http://bugs.python.org/issue5468 I am getting the impression that the people who use the web modules tend, like me, to not have the tools to write and test patches . So they can squeak but not grease. Terry Jan Reedy From pje at telecommunity.com Mon Jun 21 19:46:56 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 13:46:56 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621165611.GW5787@unaka.lan> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> Message-ID: <20100621174659.D65403A404D@sparrow.telecommunity.com> At 12:56 PM 6/21/2010 -0400, Toshio Kuratomi wrote: >One comment here -- you can also have uri's that aren't decodable into their >true textual meaning using a single encoding. > >Apache will happily serve out uris that have utf-8, shift-jis, and euc-jp >components inside of their path but the textual representation that >was intended >will be garbled (or be represented by escaped byte sequences). For that >matter, apache will serve requests that have no true textual representation >as it is working on the byte level rather than the character level. > >So a complete solution really should allow the programmer to pass in uris as >bytes when the programmer knows that they need it. ebytes(somebytes, 'garbage'), perhaps, which would be like ascii, but where combining with non-garbage would results in another 'garbage' ebytes? From janssen at parc.com Mon Jun 21 19:56:59 2010 From: janssen at parc.com (Bill Janssen) Date: Mon, 21 Jun 2010 10:56:59 PDT Subject: [Python-Dev] red buildbots on 2.7 Message-ID: <73196.1277143019@parc.com> Considering that we've just released 2.7rc2, there are an awful lot of red buildbots for 2.7. In fact, I don't remember having seen a green buildbot for OS X and 2.7. Shouldn't these be fixed? On OS X Leopard, I'm seeing failures in test_py3kwarn, test_urllib2_localnet, test_uuid. On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn, test_ttk_guionly, and test_urllib2_localnet. We don't have a buildbot running Snow Leopard, apparently. Bill From turnbull at sk.tsukuba.ac.jp Mon Jun 21 19:58:22 2010 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 02:58:22 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621145133.7F5333A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> Message-ID: <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > Note too that this is an argument for symmetry in wrapping the > inputs and outputs, so that the code doesn't have to "know" what > it's dealing with! and > After all, right now, if a stdlib function might return bytes or > unicode depending on runtime conditions, I can't even hardcode an > .encode() call -- it would fail if the return type is a bytes. I'm lost. What stdlib functions are you talking about whose return type depends on runtime conditions, and what runtime conditions? What do you mean by "wrapping"? The only times I've run into str/bytes nondeterminancy is when I've mixed str/bytes myself, and passed them into functions that are type-identities (str -> str, bytes -> bytes), which then appear to give a nondeterministic result. It's a deterministic bug, though, always mine. > It's One Obvious Way that I want, but some people seem to be arguing > that the One Obvious Way is to Think Carefully About It Every Time -- > and that seems to violate the "Obvious" part, IMO. ;-) Nick alluded to the The One Obvious Way as a change in architecture. Specifically: Decode all bytes to typed objects (str, images, audio, structured objects) at input. Do no manipulations on bytes ever except decode and encode (both to text, and to special-purpose objects such as images) in a program that does I/O. (Obviously image manipulation libraries etc will have to operate on bytes, but they should have no functions that consume bytes except constructors a la bytes.decode() for text, and no functions that produce bytes except the output serializers that write files and the like, a la str.encode().) Encode back to bytes on output. Yes, this is tedious if you live in an ASCII world, compared to using bytes as characters. However, it works for the rest of us, which the old style doesn't. As for "Think Carefully About It Every Time", that is required only in Porting Programs That Mix Operation On Bytes With Operation On Str. If you write programs from scratch, however, the decode-process-encode paradigm quickly becomes second nature. From stephen at xemacs.org Mon Jun 21 20:08:42 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 03:08:42 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621114307.48735698@heresy> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> Message-ID: <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > Would it make sense to have "encoding-carrying" bytes and str > types? Why limit that to bytes and str? Why not have all objects carry their serializer/deserializer around with them? I think the answer is "no", though, because (1) it would constitute an attractive nuisance (the default would be abused, it would work fine in Kansas, and all hell would break loose in Kagoshima, simply delaying the pain and/or passing it on to third parties), and (2) you really want this under control of higher level objects that have access to some knowledge of the environment, rather than the lowest level. From pje at telecommunity.com Mon Jun 21 20:17:47 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 14:17:47 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> Message-ID: <20100621181750.267933A404D@sparrow.telecommunity.com> At 10:29 AM 6/21/2010 -0700, Guido van Rossum wrote: >Perhaps there are more situations where a polymorphic API would be >helpful. Such APIs are not always so easy to implement, because they >have to be careful with literals or other constants (and even more so >mutable state) used internally -- but it can be done, and there are >plenty of examples in the stdlib. What if we could use the time machine to make the APIs that *were* polymorphic, regain their previously-polymorphic status, without needing to actually *change* any of the code of those functions? That's what Barry's ebytes proposal would do, with appropriate coercion rules. Passing ebytes into such a function would yield back ebytes, even if the function used strings internally, as long as those strings could be encoded back to bytes using the ebytes' encoding. (Which would normally be the case, since stdlib constants are almost always ASCII, and the main use cases for ebytes would involve ascii-extended encodings.) >I'm stil unclear on exactly what bstr is supposed to be, but it sounds >a bit like one of the rejected proposals for having a single >(Unicode-capable) str type that is implemented using different width >encodings (Latin-1, UCS-2, UCS-4) underneath. Not quite - as modified by Barry's proposal (which I like better than mine) it'd be an object that just combines bytes with an attribute indicating the underlying encoding. When it interacts with strings, the strings are *encoded* to bytes, rather than upgrading the bytes to text. This is actually a big advantage for error-detection in any application where you're working with data that *must* be encodable in a specific encoding for output, as it allows you to catch errors much *earlier* than you would if you only did the encoding at your output boundary. Anyway, this would not be the normal bytes type or string type; it's "bytes with an encoding". It's also more general than Unicode, in the sense that it allows you to work with character sets that don't really *have* a proper Unicode mapping. One issue I remember from my "enterprise" days is some of the Asian-language developers at NTT/Verio explaining to me that unicode doesn't actually solve certain issues -- that there are use cases where you really *do* need "bytes plus encoding" in order to properly express something. Unfortunately, I never quite wrapped my head around the idea, I just remember it had something to do with the fact that Unicode has single character codes that mean different things in different languages, such that you were actually losing information by converting to unicode, or something like that. (Or maybe the characters were expressed differently in certain encodings according to what language they came from, so you couldn't roundtrip them through unicode without losing information. I think that's probably was what it was; maybe somebody here can chime in more on that point.) Anyway, a type like this would need to have at least a bit of support from the core language, because the str type would need to be able to handle at least the __contains__ and %/.format() coercion cases, since these functions don't have __r*__ equivalents that a user-implemented type could provide... and strings don't have anything like a '__coerce__' either. If sufficient hooks existed, then an ebytes could be implemented outside the stdlib, and still used within it. From benjamin at python.org Mon Jun 21 20:23:57 2010 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 21 Jun 2010 13:23:57 -0500 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <73196.1277143019@parc.com> References: <73196.1277143019@parc.com> Message-ID: 2010/6/21 Bill Janssen : > Considering that we've just released 2.7rc2, there are an awful lot of > red buildbots for 2.7. ?In fact, I don't remember having seen a green > buildbot for OS X and 2.7. ?Shouldn't these be fixed? It seems most of them are off line and there last run was just a failure. > > On OS X Leopard, I'm seeing failures in test_py3kwarn, > test_urllib2_localnet, test_uuid. > > On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn, > test_ttk_guionly, and test_urllib2_localnet. File bug reports. > > We don't have a buildbot running Snow Leopard, apparently. -- Regards, Benjamin From stephen at xemacs.org Mon Jun 21 20:20:43 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 03:20:43 +0900 Subject: [Python-Dev] [OT] carping about irritating people (was: bytes / unicode) In-Reply-To: <871vc0plt6.fsf_-_@benfinney.id.au> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <871vc0plt6.fsf_-_@benfinney.id.au> Message-ID: <87wrts2qo4.fsf@uwakimon.sk.tsukuba.ac.jp> Ben Finney writes: > "Stephen J. Turnbull" writes: > > > your base URL is gonna be b'mailto:stephen at xemacs.org', but the > > natural thing the UI will want to do is > > > > formurl = baseurl + '?subject=??????????' > > Incidentally, which irritating person was the topic of this > Japanese-language message to you? (Kudos to Nick.) "Urusai" is also used to refer to the finicky. So, the RFC-toting pedant. Ie, me. > (The subject in Stephen's example message translates roughly as > "(unspecified third person) Not quite. The subject of the copula, if omitted, is entirely context-dependent. From pje at telecommunity.com Mon Jun 21 20:24:27 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 14:24:27 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> Message-ID: <20100621182430.6213D3A404D@sparrow.telecommunity.com> At 01:36 PM 6/21/2010 -0400, Terry Reedy wrote: >On 6/21/2010 11:43 AM, Barry Warsaw wrote: > >>This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz >>for it. >> >>Would it make sense to have "encoding-carrying" bytes and str types? > >On 2009-11-5 I posted 'Add encoding attribute to bytes' to >python-ideas. It was shot down at the time. AFAICT, that's mainly for lack of apparent use cases, and also for confusion. Here, the use case (restoring the polymorphy of stdlib APIs) is pretty clear. However, if we had the string equivalent of a coercion protocol (that core strings and bytes would co-operate with), then it would enable people to write their own versions of either your idea or Barry's idea (or other things altogether), and still get the stdlib to play along. Personally, I think ebytes() would do the trick and it'd be nice to see it in stdlib, but gaining a string coercion protocol instead might not be a bad tradeoff. ;-) From solipsis at pitrou.net Mon Jun 21 20:37:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 21 Jun 2010 20:37:56 +0200 Subject: [Python-Dev] red buildbots on 2.7 References: <73196.1277143019@parc.com> Message-ID: <20100621203756.2f99757f@pitrou.net> On Mon, 21 Jun 2010 10:56:59 PDT Bill Janssen wrote: > Considering that we've just released 2.7rc2, there are an awful lot of > red buildbots for 2.7. In fact, I don't remember having seen a green > buildbot for OS X and 2.7. Shouldn't these be fixed? > > On OS X Leopard, I'm seeing failures in test_py3kwarn, > test_urllib2_localnet, test_uuid. > > On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn, > test_ttk_guionly, and test_urllib2_localnet. I'm afraid they can only be fixed by whoever is competent on OS X issues. If you want to tackle them, you're more than welcome. There also seem to be a couple of failures left with test_gdb... Regards Antoine. From p.f.moore at gmail.com Mon Jun 21 20:39:59 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 21 Jun 2010 19:39:59 +0100 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <73196.1277143019@parc.com> References: <73196.1277143019@parc.com> Message-ID: On 21 June 2010 18:56, Bill Janssen wrote: > Considering that we've just released 2.7rc2, there are an awful lot of > red buildbots for 2.7. ?In fact, I don't remember having seen a green > buildbot for OS X and 2.7. ?Shouldn't these be fixed? Ack! My buildbot has looked fine, but on closer inspection, it was the same build that's been running (more accurately, stuck in a test) for 5 days :-( The main buildslave page looked fine - except for the dates, which I didn't spot. Thanks for the alert. I've killed the stuck test and should see some runs going through now. Shame, really, I was getting used to seeing a nice page of all green results... Paul. From pje at telecommunity.com Mon Jun 21 20:46:57 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 14:46:57 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> At 02:58 AM 6/22/2010 +0900, Stephen J. Turnbull wrote: >Nick alluded to the The One Obvious Way as a change in architecture. > >Specifically: Decode all bytes to typed objects (str, images, audio, >structured objects) at input. Do no manipulations on bytes ever >except decode and encode (both to text, and to special-purpose objects >such as images) in a program that does I/O. This ignores the existence of use cases where what you have is text that can't be properly encoded in unicode. I know, it's a hard thing to wrap one's head around, since on the surface it sounds like unicode is the programmer's savior. Unfortunately, real-world text data exists which cannot be safely roundtripped to unicode, and must be handled in "bytes with encoding" form for certain operations. I personally do not have to deal with this *particular* use case any more -- I haven't been at NTT/Verio for six years now. But I do know it exists for e.g. Asian language email handling, which is where I first encountered it. At the time (this *may* have changed), many popular email clients did not actually support unicode, so you couldn't necessarily just send off an email in UTF-8. It drove us nuts on the project where this was involved (an i18n of an existing Python app), and I think we had to compromise a bit in some fashion (because we couldn't really avoid unicode roundtripping due to database issues), but the use case does actually exist. My current needs are simpler, thank goodness. ;-) However, they *do* involve situations where I'm dealing with *other* encoding-restricted legacy systems, such as software for interfacing with the US Postal Service that only works with a restricted subset of latin1, while receiving mangled ASCII from an ecommerce provider, and storing things in what's effectively a latin-1 database. Being able to easily assert what kind of bytes I've got would actually let me catch errors sooner, *if* those assertions were being checked when different kinds of strings or bytes were being combined. i.e., at coercion time). >Yes, this is tedious if you live in an ASCII world, compared to using >bytes as characters. However, it works for the rest of us, which the >old style doesn't. I'm not trying to go back to the old style -- ideally, I want something that would actually improve on the "it's not really unicode" use cases above if it were available in 2.x. I don't want to be "encoding agnostic" or "encoding implicit", -- I want to make it possible to be even *more* explicit and restrictive than it is currently possible to be in either 2.x OR 3.x. It's just that 3.x affords greater opportunity for doing this, and is an ideal place to make the switch -- i.e., at a point where you now have to get explicit about your encodings, anyway! >As for "Think Carefully About It Every Time", that is required only in >Porting Programs That Mix Operation On Bytes With Operation On Str. >If you write programs from scratch, however, the decode-process-encode >paradigm quickly becomes second nature. Which works if and only if your outputs are truly unicode-able. If you work with legacy systems (e.g. those Asian email clients and US postal software), you are really working with a *character set*, not unicode, and so putting your data in unicode form is actually *wrong* -- an expedient lie. Heresy, I know, but there you go. ;-) From robertc at robertcollins.net Mon Jun 21 20:59:26 2010 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 22 Jun 2010 06:59:26 +1200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: 2010/6/21 Stephen J. Turnbull : > Robert Collins writes: > > ?> Also, url's are bytestrings - by definition; > > Eh? ?RFC 3896 explicitly says ?Definitions of Managed Objects for the DS3/E3 Interface Type Perhaps you mean 3986 ? :) > ? ?A URI is an identifier consisting of a sequence of characters > ? ?matching the syntax rule named in Section 3. > > (where the phrase "sequence of characters" appears in all ancestors I > found back to RFC 1738), and Sure, ok, let me unpack what I meant just a little. An abstract URI is neither unicode nor bytes per se - see section 1.2.1 " A URI is a sequence of characters from a very limited set: the letters of the basic Latin alphabet, digits, and a few special characters. " URI interpretation is fairly strictly separated between producers and consumers. A consumer can manipulate a url with other url fragments - e.g. doing urljoin. But it needs to keep the url as a url and not try to decode it to a unicode representation. The producer of the url however, can decode via whatever heuristics it wants - because it defines the encoding used to go from unicode to URL encoding. As an example, if I give the uri "http://server/%c3%83", rendering that as http://server/? is able to lead to transcription errors and reinterpretation problems unless you know - out of band - that the server is using utf8 to encode. Conversely if someone enters in http://server/? in their browser window, choosing utf8 or their local encoding is quite arbitrary and able to not match how the server would represent that resource. Beyond that, producers can do odd things - like when there are a series of servers stacked and forwarding requests amongst themselves - where they generate different parts of the same URL using different encodings. > ? ?2. ?Characters > > ? ?The URI syntax provides a method of encoding data, presumably for > ? ?the sake of identifying a resource, as a sequence of characters. > ? ?The URI characters are, in turn, frequently encoded as octets for > ? ?transport or presentation. ?This specification does not mandate any > ? ?particular character encoding for mapping between URI characters > ? ?and the octets used to store or transmit those characters. ?When a > ? ?URI appears in a protocol element, the character encoding is > ? ?defined by that protocol; without such a definition, a URI is > ? ?assumed to be in the same character encoding as the surrounding > ? ?text. Thats true, but its been taken out of context; the set of characters permitted in a URL is a strict subset of characters found in ASCII; there is a BNF that defines it and it is quite precise. While it doesn't define a set of octets, it also doesn't define support for unicode characters - individual schemes need to define the mapping used between characters define as safe and those that get percent encoded. E.g. unicode (abstract) -> utf8 -> percent encoded. See also the section on comparing URL's - Unicode isn't at all relevant. > ?> if the standard library has made them unicode objects in 3, I > ?> expect a lot of pain in the webserver space. > > Yup. ?But pain is inevitable if people are treating URIs (whether URLs > or otherwise) as octet sequences. ?Then your base URL is gonna be > b'mailto:stephen at xemacs.org', but the natural thing the UI will want > to do is > > ? ?formurl = baseurl + '?subject=??????????' > > IMO, the UI is right. ?"Something" like the above "ought" to work. I wish it would. The problem is not in Python here though - and casually handwaving will exacerbate it, not fix it. Modelling URL's as string like things is great from a convenience perspective, but, like file paths, they are much more complex difficult. For your particular case, subject contains characters outside the URL specification, so someone needs to choose an encoding to get them into a sequence-of-bytes-that-can-be-percent-escaped. Section 2.5, identifying data goes into this to some degree. Note a trap - the last paragraph says 'when a *NEW* URI scheme...' (emphasis mine). Existing schemes do not mandate UTF8, which is why the producer/consumer split matters. I spent a few minutes looking, but its lost in the minutiae somewhere - HTTP does not specify UTF8 (though I wish it would) for its URI's, and std66 is the generic definition and rules for new URI schemes, preserving intact the mistake of HTTP. > So the function that actually handles composing the URL should take a > string (ie, unicode), and do all escaping. ?The UI code should not > need to know about escaping. ?If nothing escapes except the function > that puts the URL in composed form, and that function always escapes, > life is easy. Arg. The problem is very similar to the file system problem: - We get given a sequence of bytes - we have some rules that will let us manipulate the sequence to get hostnames, query parameters and so forth - and others to let use walk a directory structure - and no guarantee that any of the data is in any particular encoding other than 'URL'. In terms of sequence datatypes then, we can consider a few: - bytes - unicode - list-of-numbers - ... unicode is a problem because the system we're talking to is defined to be a superset of unicode. People can shove stuff that fits into the unused unicode plane, and its OK by the URL standard (for all that it would be ugly). Having a part-unicode, part-bytes representation would be pretty ugly IMO; certainly decoding only part of the URL would be prone to the sorts of issues Python 2 had with str/unicode. lists of numbers are really awkward to manipulate. bytes doesn't suffer the unicode problem, it can represent everything we receive, but it doesn't offer any particular support for getting a unicode string *when one is available*. > Of course, in real life it's not that easy. ?But it's possible to make > things unnecessarily hard for the users of your URI API(s), and one > way to do that is to make URIs into "just bytes" (and "just unicode" > is probably nearly as bad, except that at least you know it's not > ready for the wire). If Unicode was relevant to HTTP, I'd agree, but its not; we should put fragile heuristics at the outer layer of the API and work as robustly and mechanically as possible at the core. Where we need to guess, we need worker functions that won't guess at all - for the sanity of folk writing servers and protocol implementations. -Rob From janssen at parc.com Mon Jun 21 21:13:05 2010 From: janssen at parc.com (Bill Janssen) Date: Mon, 21 Jun 2010 12:13:05 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> Message-ID: <75635.1277147585@parc.com> Benjamin Peterson wrote: > 2010/6/21 Bill Janssen : > > Considering that we've just released 2.7rc2, there are an awful lot of > > red buildbots for 2.7. ?In fact, I don't remember having seen a green > > buildbot for OS X and 2.7. ?Shouldn't these be fixed? > > It seems most of them are off line and there last run was just a failure. No, the three OS X buildbots are all online and reporting failures. As far as I can remember, they haven't been green for weeks. They are at the end of the buildbot list, so off-screen if you are using a normal browser. You have to scroll to see them. > > On OS X Leopard, I'm seeing failures in test_py3kwarn, > > test_urllib2_localnet, test_uuid. > > > > On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn, > > test_ttk_guionly, and test_urllib2_localnet. Um -- saying what, the buildbots are red? Shouldn't having green buildbots be a part of the release process? In fact, it is -- but none of the OS X buildbots are part of the "stable" set. Why is that? Bill From pje at telecommunity.com Mon Jun 21 21:14:29 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 15:14:29 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100621191432.710993A404D@sparrow.telecommunity.com> At 03:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote: >Barry Warsaw writes: > > > Would it make sense to have "encoding-carrying" bytes and str > > types? > > >I think the answer is "no", though, because (1) it would constitute an >attractive nuisance (the default would be abused, it would work fine >in Kansas, and all hell would break loose in Kagoshima, simply >delaying the pain and/or passing it on to third parties), You have the proposal exactly backwards, actually. In Kagoshima, you'd use pass in an ebytes with your encoding to a stdlib API, and *get back an ebytes with the right encoding*, rather than an (incorrect and useless) unicode object which has lost data you need. >Why limit that to bytes and str? Why not have all objects carry their >serializer/deserializer around with them? Because it's not a serialization or deserialization. Your conceptual framework here implies that unicode objects are the real thing, and that bytes are "just" a way of transporting unicode around. But this is not the case at all, for use cases where "no, really, you *have to* work with bytes-encoded text streams". The mere release of Python 3.x will not cause all the world's applications, libraries, and protocols to suddenly work with unicode, where they did not before. Being explicit about the encoding of the bytes you're flinging around is actually an *increase* in specificity, explicitness, robustness, and error-checking ability over the status quo for either 2.x *or* 3.x... *and* it improves these qualities for essentially *all* string-handling code, without requiring that code to be rewritten to do so. It's like getting to use the time machine, really. >and (2) you >really want this under control of higher level objects that have >access to some knowledge of the environment, rather than the lowest >level. This proposal actually has such a higher-level object: an ebytes. And it passes that information *through* the lowest level, in such a way as to permit the stringlike operations to be fully polymorphic, without the information being lost inside somebody else's API. From barry at python.org Mon Jun 21 21:22:38 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 15:22:38 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <73196.1277143019@parc.com> References: <73196.1277143019@parc.com> Message-ID: <71F08437-B687-4AC2-8FC2-856BE0DE50FA@python.org> On Jun 21, 2010, at 1:56 PM, Bill Janssen wrote: > Considering that we've just released 2.7rc2, there are an awful lot of > red buildbots for 2.7. In fact, I don't remember having seen a green > buildbot for OS X and 2.7. Shouldn't these be fixed? > > On OS X Leopard, I'm seeing failures in test_py3kwarn, > test_urllib2_localnet, test_uuid. > > On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn, > test_ttk_guionly, and test_urllib2_localnet. > > We don't have a buildbot running Snow Leopard, apparently. On my OS X 10.6.4 box, only test_py3kwarn and test_urllib2_localnet fail. -Barry From solipsis at pitrou.net Mon Jun 21 21:29:04 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 21 Jun 2010 21:29:04 +0200 Subject: [Python-Dev] red buildbots on 2.7 References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> Message-ID: <20100621212904.7bec83f6@pitrou.net> On Mon, 21 Jun 2010 12:13:05 PDT Bill Janssen wrote: > > > > On OS X Leopard, I'm seeing failures in test_py3kwarn, > > > test_urllib2_localnet, test_uuid. > > > > > > On OS X Tiger, I'm seeing failures in test_pep277, test_py3kwarn, > > > test_ttk_guionly, and test_urllib2_localnet. > > Um -- saying what, the buildbots are red? Shouldn't having green > buildbots be a part of the release process? In fact, it is -- but none > of the OS X buildbots are part of the "stable" set. Why is that? Benjamin is not qualified to fix OS X bugs AFAIK (if you are, Benjamin, then sorry for misrepresenting you :-)). Actually, neither are most of us. Apparently some of these buildbots belong to you. Why don't you step up and investigate? Thanks, Antoine. From a.badger at gmail.com Mon Jun 21 21:29:52 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 21 Jun 2010 15:29:52 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621172413.578853A404D@sparrow.telecommunity.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> Message-ID: <20100621192952.GZ5787@unaka.lan> On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote: > At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: > >What do you think of making the encoding attribute a mandatory part of > >creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). > > As long as the coercion rules force str+ebytes (or str % ebytes, > ebytes % str, etc.) to result in another ebytes (and fail if the str > can't be encoded in the ebytes' encoding), I'm personally fine with > it, although I really like the idea of tacking the encoding to bytes > objects in the first place. > I wouldn't like this. It brings us back to the python2 problem where sometimes you pass an ebyte into a function and it works and other times you pass an ebyte into the function and it issues a traceback. The coercion must end up with a str and no traceback (this assumes that we've checked that the ebyte and the encoding "match" when we create the ebyte). If you want bytes out the other end, you should either have a different function or explicitly transform the output from str to bytes. So, what's the advantage of using ebytes instead of bytes? * It keeps together the text and encoding information when you're taking bytes in and want to give bytes back under the same encoding. * It takes some of the boilerplate that people are supposed to do (checking that bytes are legal in a specific encoding) and writes it into the initialization of the object. That forces you to think about the issue at two points in the code: when converting into ebytes and when converting out to bytes. For data that's going to be used with both str and bytes, this is the accepted best practice. (For exceptions, the byte type remains which you can do conversion on when you want to). -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From benjamin at python.org Mon Jun 21 21:30:15 2010 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 21 Jun 2010 14:30:15 -0500 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <75635.1277147585@parc.com> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> Message-ID: 2010/6/21 Bill Janssen : > They are at the end of the buildbot list, so off-screen if you are using > a normal browser. ?You have to scroll to see them. But not on the "stable" view and that's the only one I look at. -- Regards, Benjamin From barry at python.org Mon Jun 21 21:39:59 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 15:39:59 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <201006211113.06767.stephan.richter@gmail.com> References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> Message-ID: <20100621153959.01fee007@heresy> On Jun 21, 2010, at 11:13 AM, Stephan Richter wrote: >I really just want to be able to go to PyPI, Click on "Browse packages" and >then select "Python 3" (it can currently be accomplished by clicking "Python" >and then "3"). Of course, package developers need to be encouraged to add >these Trove classifiers so that the listings are as complete as possible. Trove classifiers are not particularly user friendly. I wonder if we can help with a (partially) automated or guided tool to help? Maybe something on the web page for packages w/o classifications, kind of like a Linked-in progress meter... -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From fuzzyman at voidspace.org.uk Mon Jun 21 21:45:08 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 21 Jun 2010 20:45:08 +0100 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> Message-ID: <4C1FC144.70600@voidspace.org.uk> On 21/06/2010 20:30, Benjamin Peterson wrote: > 2010/6/21 Bill Janssen: > >> They are at the end of the buildbot list, so off-screen if you are using >> a normal browser. You have to scroll to see them. >> > But not on the "stable" view and that's the only one I look at. > > What are the requirements for moving the OS X buildbots into the stable view? Are the builders themselves stable enough? (If the requirement is that the buildbots be green then it is something of a catch-22.) All the best, Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From barry at python.org Mon Jun 21 21:55:50 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 15:55:50 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621163404.GV5787@unaka.lan> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> Message-ID: <20100621155550.643d27b8@heresy> On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote: >I like the idea of having encoding information carried with the data. >I don't think that an ebytes type that can *optionally* have an encoding >attribute makes the situation less confusing, though. Agreed. I think the attribute should always be there, but there probably needs to be a magic value (perhaps None) that indicates and unknown, manual, garbage, error, broken encoding. Examples: you read bytes off a socket and don't know what the encoding is; you concatenate two ebytes that have incompatible encodings. >To me the biggest >problem with python-2.x's unicode/bytes handling was not that it threw >exceptions but that it didn't always throw exceptions. You might test this >in python2:: > t = u'cafe' > function(t) > >And say, ah my code works. Then a user gives it this:: > t = u'caf?' > function(t) > >And get a unicode error because the function only works with unicode in the >ascii range. That's an excellent point. >ebytes seems to have the same pitfall where the code path exercised by your >tests could work with:: > eb = ebytes(b) > eb.encoding = 'euc-jp' > function(eb) > >but the user exercises a code path that does this and fails:: > eb = ebytes(b) > function(eb) > >What do you think of making the encoding attribute a mandatory part of >creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). If ebytes is a separate type, then definitely +1. If 'ebytes is bytes' then I'd probably want to default the second argument to the magical "i-don't-know' marker. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From steve at holdenweb.com Mon Jun 21 21:55:55 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 22 Jun 2010 04:55:55 +0900 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> Message-ID: <4C1FC3CB.5080604@holdenweb.com> Laurens Van Houtven wrote: > On Mon, Jun 21, 2010 at 5:28 PM, Toshio Kuratomi wrote: >> Fedora 14 is about the same. A nice to have thing that goes along >> with these would be a table that has packages ported to python3 and which >> distributions have the python3 version of the package. > > Yeah, this is exactly why I'd prefer to not have to maintain a > specific list. Big distros are making Python 3.x available, it's not > the default interpreter yet anywhere (AFAIK?), but that's going to > happen in the next few releases of said distributions. > > On Mon, Jun 21, 2010 at 5:31 PM, Arc Riley wrote: >> Personally, I'd like to celebrate the upcoming Python 3.2 release (which >> will hopefully include 3to2) with moving all packages which do not have the >> 'Programming Language :: Python :: 3' classifier to a "Legacy" section of >> PyPI and offer only Python 3 packages otherwise. Of course put a banner at >> the top clearly explaining that Python 2 packages can be found in the Legacy >> section. >> >> Radical, I know, but at some point we really need to make this move. > > I agree we have to make it at some point but I feel this is way, way too early. > > thanks for your continued input, > Laurens But it's never too early to plan for something you know to be inevitable. More planning might have helped earlier on. I don't think it's likely to hurt now. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Mon Jun 21 21:55:55 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 22 Jun 2010 04:55:55 +0900 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> Message-ID: <4C1FC3CB.5080604@holdenweb.com> Laurens Van Houtven wrote: > On Mon, Jun 21, 2010 at 5:28 PM, Toshio Kuratomi wrote: >> Fedora 14 is about the same. A nice to have thing that goes along >> with these would be a table that has packages ported to python3 and which >> distributions have the python3 version of the package. > > Yeah, this is exactly why I'd prefer to not have to maintain a > specific list. Big distros are making Python 3.x available, it's not > the default interpreter yet anywhere (AFAIK?), but that's going to > happen in the next few releases of said distributions. > > On Mon, Jun 21, 2010 at 5:31 PM, Arc Riley wrote: >> Personally, I'd like to celebrate the upcoming Python 3.2 release (which >> will hopefully include 3to2) with moving all packages which do not have the >> 'Programming Language :: Python :: 3' classifier to a "Legacy" section of >> PyPI and offer only Python 3 packages otherwise. Of course put a banner at >> the top clearly explaining that Python 2 packages can be found in the Legacy >> section. >> >> Radical, I know, but at some point we really need to make this move. > > I agree we have to make it at some point but I feel this is way, way too early. > > thanks for your continued input, > Laurens But it's never too early to plan for something you know to be inevitable. More planning might have helped earlier on. I don't think it's likely to hurt now. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From janssen at parc.com Mon Jun 21 21:57:22 2010 From: janssen at parc.com (Bill Janssen) Date: Mon, 21 Jun 2010 12:57:22 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <20100621212904.7bec83f6@pitrou.net> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> Message-ID: <77297.1277150242@parc.com> Antoine Pitrou wrote: > Benjamin is not qualified to fix OS X bugs AFAIK (if you are, Benjamin, > then sorry for misrepresenting you :-)). Actually, neither are most of > us. Right. I was thinking that the release manager should however be responsible for not releasing while there are red buildbots. But it's not his fault, either; there are no OS X buildbots on the "stable" list, and that's the list PEP 101 says to look at. The real problem here is that a major platform doesn't have a "stable" buildbot, I think. I've logged an issue to that effect. > Apparently some of these buildbots belong to you. Why don't you step > up and investigate? The fact that I'm running some buildbots doesn't mean I have to fix the problems that they reveal, I think. I did look at the py3kwarn failure, and couldn't figure out the various twisty passages of deprecation warning as further snarled by the test package. I think that one needs someone who's intimately familiar with the testing framework. Bill From janssen at parc.com Mon Jun 21 21:57:53 2010 From: janssen at parc.com (Bill Janssen) Date: Mon, 21 Jun 2010 12:57:53 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> Message-ID: <77310.1277150273@parc.com> Benjamin Peterson wrote: > 2010/6/21 Bill Janssen : > > They are at the end of the buildbot list, so off-screen if you are using > > a normal browser. ?You have to scroll to see them. > > But not on the "stable" view and that's the only one I look at. Right, and properly so. Bill From barry at python.org Mon Jun 21 22:01:05 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 16:01:05 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100621160105.25ae602f@heresy> On Jun 22, 2010, at 03:08 AM, Stephen J. Turnbull wrote: >Barry Warsaw writes: > > > Would it make sense to have "encoding-carrying" bytes and str > > types? > >Why limit that to bytes and str? Why not have all objects carry their >serializer/deserializer around with them? Only because the .encoding attribute isn't really a serializer/deserializer. That's still bytes() and str() or the equivalent. This is just a hint to a specific serializer for parameters to that action. >I think the answer is "no", though, because (1) it would constitute an >attractive nuisance (the default would be abused, it would work fine >in Kansas, and all hell would break loose in Kagoshima, simply >delaying the pain and/or passing it on to third parties), and (2) you >really want this under control of higher level objects that have >access to some knowledge of the environment, rather than the lowest >level. I'm still not sure ebytes solves the problem, but it avoids one I'm most concerned about seeing proposed. I really really do not want to add encoding=blah arguments to boatloads of function signatures. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From solipsis at pitrou.net Mon Jun 21 22:02:50 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 21 Jun 2010 22:02:50 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <77297.1277150242@parc.com> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> Message-ID: <1277150570.3369.1.camel@localhost.localdomain> Le lundi 21 juin 2010 ? 12:57 -0700, Bill Janssen a ?crit : > > > Apparently some of these buildbots belong to you. Why don't you step > > up and investigate? > > The fact that I'm running some buildbots doesn't mean I have to fix the > problems that they reveal, I think. You certainly don't have to. But please don't ask others to do it for you, *especially* if the failure can't be reproduced under anything else than OS X, and if no useful diagnosis is available. Regards Antoine. From barry at python.org Mon Jun 21 22:04:20 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 16:04:20 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621172413.578853A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> Message-ID: <20100621160420.63037f1c@heresy> On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote: >OTOH, one potential problem with having the encoding on the bytes object >rather than the ebytes object is that then you can't easily take bytes from a >socket and then say what encoding they are, without interfering with the >sockets API (or whatever other place you get the bytes from). Unless the default was the "I don't know" marker and you were able to set it after you've done whatever kind of application-level calculation you needed to do. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From steve at holdenweb.com Mon Jun 21 21:59:53 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 22 Jun 2010 04:59:53 +0900 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: Terry Reedy wrote: > On 6/21/2010 8:33 AM, Nick Coghlan wrote: > >> P.S. (We're going to have a tough decision to make somewhere along the >> line where docs.python.org is concerned, too - when do we flick the >> switch and make a 3.x version of the docs the default? > > Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'. > Trunk released always take over docs.python.org. To do otherwise would > be to say that 3.2 is not a real trunk release and not yet ready for > real use -- a major slam. > > Actually, I thought this was already discussed and decided ;-). > This also gives the 2.7 release it's day in the sun before relegation to maintenance status. The Python 3 documents, when they become the default, should contain an every-page link to the Python 2 documentation (though linkages may be a problem - they could probably be done at a gross level). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From srichter at cosmos.phy.tufts.edu Mon Jun 21 21:44:35 2010 From: srichter at cosmos.phy.tufts.edu (Stephan Richter) Date: Mon, 21 Jun 2010 15:44:35 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <20100621153959.01fee007@heresy> References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> <20100621153959.01fee007@heresy> Message-ID: <201006211544.35518.srichter@cosmos.phy.tufts.edu> On Monday, June 21, 2010, Barry Warsaw wrote: > On Jun 21, 2010, at 11:13 AM, Stephan Richter wrote: > >I really just want to be able to go to PyPI, Click on "Browse packages" > >and then select "Python 3" (it can currently be accomplished by clicking > >"Python" and then "3"). Of course, package developers need to be > >encouraged to add these Trove classifiers so that the listings are as > >complete as possible. > > Trove classifiers are not particularly user friendly. I wonder if we can > help with a (partially) automated or guided tool to help? Maybe something > on the web page for packages w/o classifications, kind of like a Linked-in > progress meter... Yeah that would be good. I thought the "Score" was something like that, but it is not transparent enough. It would be great, if PyPI would tell me how I can improve my package meta-data. (The Linked-in progress meter worked for me too. ;-) Regards, Stephan -- Entrepreneur and Software Geek Google me. "Zope Stephan Richter" From pengyu.ut at gmail.com Mon Jun 21 22:07:53 2010 From: pengyu.ut at gmail.com (Peng Yu) Date: Mon, 21 Jun 2010 15:07:53 -0500 Subject: [Python-Dev] Adding additional level of bookmarks and section numbers in python pdf documents. Message-ID: Hi, Current pdf version of python documents don't have bookmarks for sussubsection. For example, there is no bookmark for the following section in python_2.6.5_reference.pdf. Also the bookmarks don't have section numbers in them. I suggest to include the section numbers. Could these features be added in future release of python document. 3.4.1 Basic customization -- Regards, Peng From barry at python.org Mon Jun 21 22:09:04 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 16:09:04 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621192952.GZ5787@unaka.lan> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621192952.GZ5787@unaka.lan> Message-ID: <20100621160904.166ad082@heresy> On Jun 21, 2010, at 03:29 PM, Toshio Kuratomi wrote: >I wouldn't like this. It brings us back to the python2 problem where >sometimes you pass an ebyte into a function and it works and other times you >pass an ebyte into the function and it issues a traceback. The coercion >must end up with a str and no traceback (this assumes that we've checked >that the ebyte and the encoding "match" when we create the ebyte). Doing this at ebyte construction time does have the nice benefit of getting the exception early, and because the ebyte is unmutable, you could cache the results in an attribute on the ebyte. Well, unmutable if the .encoding is also unmutable. If that can change, then you'd have to re-run the cached decoding whenever the attribute were set, and there would be a penalty paid each time this was done. That, plus the socket use case, does argue for a separate ebytes type. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From pje at telecommunity.com Mon Jun 21 22:09:52 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 16:09:52 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621192952.GZ5787@unaka.lan> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621192952.GZ5787@unaka.lan> Message-ID: <20100621201006.5A3223A404D@sparrow.telecommunity.com> At 03:29 PM 6/21/2010 -0400, Toshio Kuratomi wrote: >On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote: > > At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: > > >What do you think of making the encoding attribute a mandatory part of > > >creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). > > > > As long as the coercion rules force str+ebytes (or str % ebytes, > > ebytes % str, etc.) to result in another ebytes (and fail if the str > > can't be encoded in the ebytes' encoding), I'm personally fine with > > it, although I really like the idea of tacking the encoding to bytes > > objects in the first place. > > >I wouldn't like this. It brings us back to the python2 problem where >sometimes you pass an ebyte into a function and it works and other times you >pass an ebyte into the function and it issues a traceback. For stdlib functions, this isn't going to happen unless your ebytes' encoding is not compatible with the ascii subset of unicode, or the stdlib function is working with dynamic data... in which case you really *do* want to fail early! I don't see this as a repeat of the 2.x situation; rather, it allows you to cause errors to happen much *earlier* than they would otherwise show up if you were using unicode for your encoded-bytes data. For example, if your program's intent is to end up with latin-1 output, then it would be better for an error to show up at the very *first* point where non-latin1 characters are mixed with your data, rather than only showing up at the output boundary! However, if you promoted mixed-type operation results to unicode instead of ebytes, then you: 1) can't preserve data that doesn't have a 1:1 mapping to unicode, and 2) can't detect an error until your data reaches the output point in your application -- forcing you to defensively insert ebytes calls everywhere (vs. simply wrapping them around a handful of designated inputs), or else have to go right back to tracing down where the unusable data showed up in the first place. One thing that seems like a bit of a blind spot for some folks is that having unicode is *not* everybody's goal. Not because we don't believe unicode is generally a good thing or anything like that, but because we have to work with systems that flat out don't *do* unicode, thereby making the presence of (fully-general) unicode an error condition that has to be stamped out! IOW, if you're producing output that has to go into another system that doesn't take unicode, it doesn't matter how theoretically-correct it would be for your app to process the data in unicode form. In that case, unicode is not a feature: it's a bug. And as it really *is* an error in that case, it should not pass silently, unless explicitly silenced. >So, what's the advantage of using ebytes instead of bytes? > >* It keeps together the text and encoding information when you're taking > bytes in and want to give bytes back under the same encoding. >* It takes some of the boilerplate that people are supposed to do (checking > that bytes are legal in a specific encoding) and writes it into the > initialization of the object. That forces you to think about the issue > at two points in the code: when converting into ebytes and when > converting out to bytes. For data that's going to be used with both > str and bytes, this is the accepted best practice. (For exceptions, the > byte type remains which you can do conversion on when you want to). Hm. For the output case, I suppose that means you might also want the text I/O wrappers to be able to be strict about ebytes' encoding. From fuzzyman at voidspace.org.uk Mon Jun 21 22:13:26 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 21 Jun 2010 21:13:26 +0100 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <1277150570.3369.1.camel@localhost.localdomain> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> Message-ID: <4C1FC7E6.5070707@voidspace.org.uk> On 21/06/2010 21:02, Antoine Pitrou wrote: > Le lundi 21 juin 2010 ? 12:57 -0700, Bill Janssen a ?crit : > >> >>> Apparently some of these buildbots belong to you. Why don't you step >>> up and investigate? >>> >> The fact that I'm running some buildbots doesn't mean I have to fix the >> problems that they reveal, I think. >> > You certainly don't have to. But please don't ask others to do it for > you, *especially* if the failure can't be reproduced under anything else > than OS X, and if no useful diagnosis is available. > If OS X is a supported and important platform for Python then fixing all problems that it reveals (or being willing to) should definitely not be a pre-requisite of providing a buildbot (which is already a service to the Python developer community). Fixing bugs / failures revealed by Bill's buildbot is not fixing them "for Bill" it is fixing them for Python. All the best, Michael > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From pje at telecommunity.com Mon Jun 21 22:16:13 2010 From: pje at telecommunity.com (P.J. Eby) Date: Mon, 21 Jun 2010 16:16:13 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621160420.63037f1c@heresy> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621160420.63037f1c@heresy> Message-ID: <20100621201616.EADEF3A404D@sparrow.telecommunity.com> At 04:04 PM 6/21/2010 -0400, Barry Warsaw wrote: >On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote: > > >OTOH, one potential problem with having the encoding on the bytes object > >rather than the ebytes object is that then you can't easily take > bytes from a > >socket and then say what encoding they are, without interfering with the > >sockets API (or whatever other place you get the bytes from). > >Unless the default was the "I don't know" marker and you were able to set it >after you've done whatever kind of application-level calculation you needed to >do. True, but making it a separate type with a required encoding gets rid of the magical "I don't know" - the "I don't know" encoding is just a plain old bytes object. (In principle, you could then drop *all* the stringlike methods from plain-old-bytes objects. If it's really text-in-bytes you want, you should use an ebytes with the encoding specified.) From barry at python.org Mon Jun 21 22:19:17 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 16:19:17 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621171803.B35C33A414B@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621171803.B35C33A414B@sparrow.telecommunity.com> Message-ID: <20100621161917.13efe49a@heresy> On Jun 21, 2010, at 01:17 PM, P.J. Eby wrote: >I'm not really sure how much use the encoding is on a unicode object - what >would it actually mean? > >Hm. I suppose it would effectively mean "this string can be represented in >this encoding" -- which is useful, in that you could fail operations when >combining with bytes of a different encoding. That's basically what I was thinking. >Hm... no, in that case you should just encode the string to the bytes' >encoding, and let that throw an error if it fails. So, really, there's no >reason for a string to know its encoding. All you need is the bytes type to >have an encoding attribute, and when doing mixed-type operations between >bytes and strings, coerce to *bytes of the same encoding*. If ebytes were a separate type, and it did the encoding check at constructor time, and the results of the decoding were cached, then I think you would not need the equivalent of an estr type. If you had a string and knew what it could be encoded to, then you could just coerce it to an ebytes and use the cached decoded value wherever you needed it. E.g. >>> mystring = 'some unicode string' >>> myencoding = 'iso-9999-foo' >>> myebytes = ebytes(mystring, myencoding) >>> myebytes.encoding == myencoding True >>> myebytes.string == mystring True So ebytes() could accept a str or bytes as its first argument. >>> mybytes = b'some encoded string' >>> myebytes = ebytes(mybytes, myencoding) >>> mybytes == myebytes True >>> myebytes.encoding == myencoding True In the first example ebytes() encodes mystring to set the internal bytes representation. In the second example, ebytes() decodes the bytes to get the .string attribute value. In both cases, an exception is raised if the encoding/decoding fails. >However, if .encoding is None, then coercion would follow the same rules as >now -- i.e., convert the bytes to unicode, assuming an ascii encoding. (This >would be different than setting an encoding of 'ascii', because in that case, >it means you want cross-type operations to result in ascii bytes, rather than >a unicode string, and to fail if the unicode part can't be encoded >appropriately. The 'None' setting is effectively a nod to compatibility with >prior 3.x versions, since I assume we can't just throw out the old coercion >behavior.) > >Then, a few more changes to the bytes type would round out the implementation: > >* Allow .decode() to not specify an encoding, unless .encoding is None > >* Add back in the missing string methods (e.g. .encode()), since you can transparently upgrade to a string) > >* Smart __str__, as shown in your proposal. If my example above isn't nonsense, then __str__() would just return the .string attribute. >In short, +1. (I wish it were possible to go back and make bytes non-strings >and have only this ebytes or bstr or whatever type have string methods, but >I'm pretty sure that ship has already sailed.) Maybe it's PEP time? No, I'm not volunteering. ;) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Mon Jun 21 22:24:47 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 16:24:47 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621201616.EADEF3A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621160420.63037f1c@heresy> <20100621201616.EADEF3A404D@sparrow.telecommunity.com> Message-ID: <20100621162447.697af8da@heresy> On Jun 21, 2010, at 04:16 PM, P.J. Eby wrote: >At 04:04 PM 6/21/2010 -0400, Barry Warsaw wrote: >>On Jun 21, 2010, at 01:24 PM, P.J. Eby wrote: >> >> >OTOH, one potential problem with having the encoding on the bytes object >> >rather than the ebytes object is that then you can't easily take > bytes from a >> >socket and then say what encoding they are, without interfering with the >> >sockets API (or whatever other place you get the bytes from). >> >>Unless the default was the "I don't know" marker and you were able to set it >>after you've done whatever kind of application-level calculation you needed to >>do. > >True, but making it a separate type with a required encoding gets rid of the magical "I don't know" - the "I don't know" encoding is just a plain old bytes object. > >(In principle, you could then drop *all* the stringlike methods from plain-old-bytes objects. If it's really text-in-bytes you want, you should use an ebytes with the encoding specified.) Yep, agreed! -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From solipsis at pitrou.net Mon Jun 21 22:25:26 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 21 Jun 2010 22:25:26 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FC7E6.5070707@voidspace.org.uk> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> Message-ID: <1277151926.3369.6.camel@localhost.localdomain> Le lundi 21 juin 2010 ? 21:13 +0100, Michael Foord a ?crit : > > If OS X is a supported and important platform for Python then fixing all > problems that it reveals (or being willing to) should definitely not be > a pre-requisite of providing a buildbot (which is already a service to > the Python developer community). Fixing bugs / failures revealed by > Bill's buildbot is not fixing them "for Bill" it is fixing them for Python. I didn't say it was a prerequisite. I was merely pointing out that when platform-specific bugs appear, people using the specific platform should be helping if they want to actually encourage the fixing of these bugs. OS X is only "a supported and important platform" if we have dedicated core developers diagnosing or even fixing issues for it (like we obviously have for Windows and Linux). Otherwise, I don't think we have any moral obligation to support it. Regards Antoine. From a.badger at gmail.com Mon Jun 21 22:28:39 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 21 Jun 2010 16:28:39 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> Message-ID: <20100621202839.GA5787@unaka.lan> On Mon, Jun 21, 2010 at 02:46:57PM -0400, P.J. Eby wrote: > At 02:58 AM 6/22/2010 +0900, Stephen J. Turnbull wrote: > >Nick alluded to the The One Obvious Way as a change in architecture. > > > >Specifically: Decode all bytes to typed objects (str, images, audio, > >structured objects) at input. Do no manipulations on bytes ever > >except decode and encode (both to text, and to special-purpose objects > >such as images) in a program that does I/O. > > This ignores the existence of use cases where what you have is text > that can't be properly encoded in unicode. I know, it's a hard thing > to wrap one's head around, since on the surface it sounds like > unicode is the programmer's savior. Unfortunately, real-world text > data exists which cannot be safely roundtripped to unicode, and must > be handled in "bytes with encoding" form for certain operations. > > I personally do not have to deal with this *particular* use case any > more -- I haven't been at NTT/Verio for six years now. But I do know > it exists for e.g. Asian language email handling, which is where I > first encountered it. At the time (this *may* have changed), many > popular email clients did not actually support unicode, so you > couldn't necessarily just send off an email in UTF-8. It drove us > nuts on the project where this was involved (an i18n of an existing > Python app), and I think we had to compromise a bit in some fashion > (because we couldn't really avoid unicode roundtripping due to > database issues), but the use case does actually exist. > > My current needs are simpler, thank goodness. ;-) However, they > *do* involve situations where I'm dealing with *other* > encoding-restricted legacy systems, such as software for interfacing > with the US Postal Service that only works with a restricted subset > of latin1, while receiving mangled ASCII from an ecommerce provider, > and storing things in what's effectively a latin-1 database. Being > able to easily assert what kind of bytes I've got would actually let > me catch errors sooner, *if* those assertions were being checked when > different kinds of strings or bytes were being combined. i.e., at > coercion time). > While it's certainly possible that you have a grapheme that has no corresponding unicode codepoint, it doesn't sound like this is the case you're dealing with here. You talk about "restricted subset of latin1" but all of latin1's graphemes have unicode codepoints. You also talk about not being able to "send off an email in UTF-8" but UTF-8 is an encoding of unicode, not unicode itself. Similarly, the statement that some email clients don't support unicode isn't very clear as to actual problem. The email client supports displaying graphemes using glyphs present on the computer. As long as the graphemes needed have a unicode codepoint, using unicode inside of your application and then encoding to bytes on the way out works fine. Even in cases where there's no unicode codepoint for the grapheme that you're receiving unicode gives you a way out. It provides you a private use area where you can map the graphemes to unused codepoints. Your application keeps a mapping from that codepoint to the particular byte sequence that you want. Then write you a codec that converts from unicode w/ these private codepoints into your particular encoding (and from bytes into unicode). -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From mal at egenix.com Mon Jun 21 22:29:13 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 21 Jun 2010 22:29:13 +0200 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621155550.643d27b8@heresy> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621155550.643d27b8@heresy> Message-ID: <4C1FCB99.1090102@egenix.com> Barry Warsaw wrote: > On Jun 21, 2010, at 12:34 PM, Toshio Kuratomi wrote: > >> I like the idea of having encoding information carried with the data. >> I don't think that an ebytes type that can *optionally* have an encoding >> attribute makes the situation less confusing, though. > > Agreed. I think the attribute should always be there, but there probably > needs to be a magic value (perhaps None) that indicates and unknown, manual, > garbage, error, broken encoding. > > Examples: you read bytes off a socket and don't know what the encoding is; you > concatenate two ebytes that have incompatible encodings. Such extra information tends to be lost whenever you pass the bytes data through a C level API or some other function that doesn't know about the special nature of those objects, treating them just like any bytes object. It may sound nice in theory, but in practice it doesn't work out. Besides, if you do know the encoding, you can easily carry the data around in a Unicode str object. The problem lies elsewhere: What to do with a piece of text for which you don't know the encoding and how to combine that piece of text with other pieces of text for which you do know the encoding. There are a few options at hand: * you keep working on the bytes data and only convert things to Unicode when needed and where the encoding is known * you decode the bytes data for which you don't have the encoding information into some special Unicode form (eg. using the surrogateescape error handler) and hope that when the time comes to encode the Unicode data back into bytes, the codec supports reversing the conversion * you manage the data as a list of Unicode str and bytes objects and don't even try to be clever about encodings of text without unknown encoding It depends a lot on the use case, which of these options fits best. >> To me the biggest >> problem with python-2.x's unicode/bytes handling was not that it threw >> exceptions but that it didn't always throw exceptions. You might test this >> in python2:: >> t = u'cafe' >> function(t) >> >> And say, ah my code works. Then a user gives it this:: >> t = u'caf?' >> function(t) >> >> And get a unicode error because the function only works with unicode in the >> ascii range. > > That's an excellent point. Here's a little known fact: by changing the Python2 default encoding to 'undefined' (yes, that's a real codec !), you can disable all automatic string coercion in Python2. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 21 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 27 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From janssen at parc.com Mon Jun 21 22:36:20 2010 From: janssen at parc.com (Bill Janssen) Date: Mon, 21 Jun 2010 13:36:20 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <1277150570.3369.1.camel@localhost.localdomain> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> Message-ID: <94209.1277152580@parc.com> Antoine Pitrou wrote: > Le lundi 21 juin 2010 ? 12:57 -0700, Bill Janssen a ?crit : > > > > > Apparently some of these buildbots belong to you. Why don't you step > > > up and investigate? > > > > The fact that I'm running some buildbots doesn't mean I have to fix the > > problems that they reveal, I think. > > You certainly don't have to. But please don't ask others to do it for > you, *especially* if the failure can't be reproduced under anything else > than OS X, and if no useful diagnosis is available. I'm more concerned about doing it for *us*, rather than for *me*. Yes, an OS X machine would be required to poke at it, but I doubt I'm the only one here with an OS X machine :-). If I am, that's a problem, and we as a community should do something about that. I downloaded 2.7rc2 and built it on my Intel OS X 10.5.8 machine. It still fails the test_uuid test: % make test [...] test_uuid test test_uuid failed -- Traceback (most recent call last): File "/private/tmp/Python-2.7rc2/Lib/test/test_uuid.py", line 472, in testIssue8621 self.assertNotEqual(parent_value, child_value) AssertionError: '8395a08e40454895be537a180539b7fb' == '8395a08e40454895be537a180539b7fb' [...] However, when I run it directly: % ./python.exe -Wd -3 -E -tt ./Lib/test/regrtest.py -v test_uuid == CPython 2.7rc2 (r27rc2:82137, Jun 21 2010, 12:50:22) [GCC 4.0.1 (Apple Inc. build 5493)] == Darwin-9.8.0-i386-32bit little-endian == /private/tmp/Python-2.7rc2/build/test_python_58012 test_uuid testIssue8621 (test.test_uuid.TestUUID) ... ok test_UUID (test.test_uuid.TestUUID) ... ok test_exceptions (test.test_uuid.TestUUID) ... ok test_getnode (test.test_uuid.TestUUID) ... ok test_ifconfig_getnode (test.test_uuid.TestUUID) ... ok test_ipconfig_getnode (test.test_uuid.TestUUID) ... ok test_netbios_getnode (test.test_uuid.TestUUID) ... ok test_random_getnode (test.test_uuid.TestUUID) ... ok test_unixdll_getnode (test.test_uuid.TestUUID) ... ok test_uuid1 (test.test_uuid.TestUUID) ... ok test_uuid3 (test.test_uuid.TestUUID) ... ok test_uuid4 (test.test_uuid.TestUUID) ... ok test_uuid5 (test.test_uuid.TestUUID) ... ok test_windll_getnode (test.test_uuid.TestUUID) ... ok ---------------------------------------------------------------------- Ran 14 tests in 0.087s OK 1 test OK. % So I don't know what to think. The same thing happens with the py3kwarn test: % ./python.exe -Wd -3 -E -tt ./Lib/test/regrtest.py -v test_py3kwarn == CPython 2.7rc2 (r27rc2:82137, Jun 21 2010, 12:50:22) [GCC 4.0.1 (Apple Inc. build 5493)] == Darwin-9.8.0-i386-32bit little-endian == /private/tmp/Python-2.7rc2/build/test_python_58057 test_py3kwarn test_backquote (test.test_py3kwarn.TestPy3KWarnings) ... ok test_buffer (test.test_py3kwarn.TestPy3KWarnings) ... ok test_builtin_function_or_method_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok test_cell_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok test_code_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok test_dict_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok test_file_xreadlines (test.test_py3kwarn.TestPy3KWarnings) ... ok test_forbidden_names (test.test_py3kwarn.TestPy3KWarnings) ... ok test_frame_attributes (test.test_py3kwarn.TestPy3KWarnings) ... ok test_hash_inheritance (test.test_py3kwarn.TestPy3KWarnings) ... ok test_methods_members (test.test_py3kwarn.TestPy3KWarnings) ... ok test_object_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok test_operator (test.test_py3kwarn.TestPy3KWarnings) ... ok test_paren_arg_names (test.test_py3kwarn.TestPy3KWarnings) ... ok test_slice_methods (test.test_py3kwarn.TestPy3KWarnings) ... ok test_softspace (test.test_py3kwarn.TestPy3KWarnings) ... ok test_sort_cmp_arg (test.test_py3kwarn.TestPy3KWarnings) ... ok test_sys_exc_clear (test.test_py3kwarn.TestPy3KWarnings) ... ok test_tuple_parameter_unpacking (test.test_py3kwarn.TestPy3KWarnings) ... ok test_type_inequality_comparisons (test.test_py3kwarn.TestPy3KWarnings) ... ok test_mutablestring_removal (test.test_py3kwarn.TestStdlibRemovals) ... ok test_optional_module_removals (test.test_py3kwarn.TestStdlibRemovals) ... ok test_os_path_walk (test.test_py3kwarn.TestStdlibRemovals) ... ok test_platform_independent_removals (test.test_py3kwarn.TestStdlibRemovals) ... ok test_platform_specific_removals (test.test_py3kwarn.TestStdlibRemovals) ... /private/tmp/Python-2.7rc2/Lib/plat-mac/findertools.py:303: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def _setlocation(object_alias, (x, y)): /private/tmp/Python-2.7rc2/Lib/plat-mac/findertools.py:445: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def _setwindowsize(folder_alias, (w, h)): /private/tmp/Python-2.7rc2/Lib/plat-mac/findertools.py:496: SyntaxWarning: tuple parameter unpacking has been removed in 3.x def _setwindowposition(folder_alias, (x, y)): ok test_reduce_move (test.test_py3kwarn.TestStdlibRemovals) ... ok ---------------------------------------------------------------------- Ran 26 tests in 0.343s OK 1 test OK. % The only failing test remaining, when run as a singleton, is test_urllib_localnet: % ./python.exe -Wd -3 -E -tt ./Lib/test/regrtest.py -v test_urllib2_localnet == CPython 2.7rc2 (r27rc2:82137, Jun 21 2010, 12:50:22) [GCC 4.0.1 (Apple Inc. build 5493)] == Darwin-9.8.0-i386-32bit little-endian == /private/tmp/Python-2.7rc2/build/test_python_58063 test_urllib2_localnet test_proxy_qop_auth_int_works_or_throws_urlerror (test.test_urllib2_localnet.ProxyAuthTests) ... ok test_proxy_qop_auth_works (test.test_urllib2_localnet.ProxyAuthTests) ... ok test_proxy_with_bad_password_raises_httperror (test.test_urllib2_localnet.ProxyAuthTests) ... FAIL test_proxy_with_no_password_raises_httperror (test.test_urllib2_localnet.ProxyAuthTests) ... FAIL test_200 (test.test_urllib2_localnet.TestUrlopen) ... ok test_200_with_parameters (test.test_urllib2_localnet.TestUrlopen) ... ok test_404 (test.test_urllib2_localnet.TestUrlopen) ... ok test_bad_address (test.test_urllib2_localnet.TestUrlopen) ... ok test_basic (test.test_urllib2_localnet.TestUrlopen) ... ok test_geturl (test.test_urllib2_localnet.TestUrlopen) ... ok test_info (test.test_urllib2_localnet.TestUrlopen) ... ok test_redirection (test.test_urllib2_localnet.TestUrlopen) ... ok test_sending_headers (test.test_urllib2_localnet.TestUrlopen) ... ok ====================================================================== FAIL: test_proxy_with_bad_password_raises_httperror (test.test_urllib2_localnet.ProxyAuthTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/private/tmp/Python-2.7rc2/Lib/test/test_urllib2_localnet.py", line 264, in test_proxy_with_bad_password_raises_httperror self.URL) AssertionError: HTTPError not raised ====================================================================== FAIL: test_proxy_with_no_password_raises_httperror (test.test_urllib2_localnet.ProxyAuthTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/private/tmp/Python-2.7rc2/Lib/test/test_urllib2_localnet.py", line 270, in test_proxy_with_no_password_raises_httperror self.URL) AssertionError: HTTPError not raised ---------------------------------------------------------------------- Ran 13 tests in 9.050s FAILED (failures=2) test test_urllib2_localnet failed -- multiple errors occurred 1 test failed: test_urllib2_localnet % Bill From regebro at gmail.com Mon Jun 21 22:55:41 2010 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 21 Jun 2010 22:55:41 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> Message-ID: On Sun, Jun 20, 2010 at 02:02, Terry Reedy wrote: > After reading the discussion in the previous thread, signed in to #python > and verified that the intro message starts with a lie about python3. I also > verified that the official #python site links to "Python Commandment Don't > use Python 3? yet". Well, it *should* say: "If you need to ask if you should use Python 2 or Python 3, you probably are better off with Python 2 for the moment". But that's a bit long. :-) -- Lennart Regebro: http://regebro.wordpress.com/ Python 3 Porting: http://python3porting.com/ +33 661 58 14 64 From regebro at gmail.com Mon Jun 21 23:03:08 2010 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 21 Jun 2010 23:03:08 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> Message-ID: On Sun, Jun 20, 2010 at 18:20, Laurens Van Houtven wrote: > 2.x or 3.x? http://tinyurl.com/py2or3 Wow. That's almost not an improvement... That link doesn't really help anyone choose at all. -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64 From martin at v.loewis.de Mon Jun 21 23:12:54 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 21 Jun 2010 23:12:54 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FC7E6.5070707@voidspace.org.uk> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> Message-ID: <4C1FD5D6.7070007@v.loewis.de> > If OS X is a supported and important platform for Python then fixing all > problems that it reveals (or being willing to) should definitely not be > a pre-requisite of providing a buildbot (which is already a service to > the Python developer community). Fixing bugs / failures revealed by > Bill's buildbot is not fixing them "for Bill" it is fixing them for Python. I wish people would stop using the word "supported" when they talk about free software. *No* system is "supported" by Python - not even in the sense "we strive to pass the test suite". "We" don't. Now, one may argue whether failing buildbots should be an unconditional reason to defer the release. I personally would say "no", despite what some PEP may say. People proposing that a release is postponed typically hope that somebody gets frustrated enough to step up and fix the bug, just so that the software gets released. Instead, I would propose that the only way to delay a release is by proposing to take some specific action to remedy the situation that should cause the delay. Otherwise, releasing is at the discretion of the release manager, who has the ultimate say to whether the problem is important or not. As for OSX, it seems that the only test that is failing is the ctypes test suite, and there only a single test. I don't think this is sufficient reason to block the release. Regards, Martin From janssen at parc.com Mon Jun 21 23:13:47 2010 From: janssen at parc.com (Bill Janssen) Date: Mon, 21 Jun 2010 14:13:47 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <1277151926.3369.6.camel@localhost.localdomain> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <1277151926.3369.6.camel@localhost.localdomain> Message-ID: <95708.1277154827@parc.com> Antoine Pitrou wrote: > OS X is only "a supported and important platform" if we have dedicated > core developers diagnosing or even fixing issues for it (like we > obviously have for Windows and Linux). Otherwise, I don't think we have > any moral obligation to support it. Fair enough. That being said, there are two classes of OS X issues. The first is the kind of thing that Ronald Oussoren and Ned Deily keep fixing for us, which require a knowledge of OS X frameworks and SDKs and various other deeply-Apple oddnesses. But the second class is a set of UNIX issues, where OS X is just a variant of UNIX with minor differences from other UNIX platforms. It looks to me as if we don't really need Apple geeks for the second class of issues, we just need developers who have a Mac to test on. It looks to me, for instance, as if the failures in test_py3kwarn and test_uuid on Leopard are bugs in the Python testing framework that happen to be exercised on OS X, rather than bugs caused in some way by the platform. There, the requisite knowledge is, how does regrtest.py really work? Bill From martin at v.loewis.de Mon Jun 21 23:16:31 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 21 Jun 2010 23:16:31 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FC144.70600@voidspace.org.uk> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <4C1FC144.70600@voidspace.org.uk> Message-ID: <4C1FD6AF.6050804@v.loewis.de> Am 21.06.2010 21:45, schrieb Michael Foord: > On 21/06/2010 20:30, Benjamin Peterson wrote: >> 2010/6/21 Bill Janssen: >>> They are at the end of the buildbot list, so off-screen if you are using >>> a normal browser. You have to scroll to see them. >> But not on the "stable" view and that's the only one I look at. >> > > What are the requirements for moving the OS X buildbots into the stable > view? Are the builders themselves stable enough? (If the requirement is > that the buildbots be green then it is something of a catch-22.) It is indeed the latter (at least, how I understand it). The builder should "usually" give green, which means it should have done so over some extended period of time. If it then gets broken it means that somebody actually broke the code, rather than the system showing one of its glitches. So asking for addition to the stable list *while* the slave is red is a bad idea. FWIW, nobody has requested changing any of the build slaves to "stable" for the last two years or so. Regards, Martin From fuzzyman at voidspace.org.uk Mon Jun 21 23:23:23 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 21 Jun 2010 22:23:23 +0100 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FD5D6.7070007@v.loewis.de> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> Message-ID: <4C1FD84B.3030202@voidspace.org.uk> On 21/06/2010 22:12, "Martin v. L?wis" wrote: >> If OS X is a supported and important platform for Python then fixing all >> problems that it reveals (or being willing to) should definitely not be >> a pre-requisite of providing a buildbot (which is already a service to >> the Python developer community). Fixing bugs / failures revealed by >> Bill's buildbot is not fixing them "for Bill" it is fixing them for >> Python. > > I wish people would stop using the word "supported" when they talk > about free software. *No* system is "supported" by Python - not even > in the sense "we strive to pass the test suite". "We" don't. > Well, for better or for worse I think "we" do. We certainly *strive* to support these platforms and having the buildbots is a big part of this. > Now, one may argue whether failing buildbots should be an > unconditional reason to defer the release. I personally would say > "no", despite what some PEP may say. People proposing that a release > is postponed typically hope that somebody gets frustrated enough to > step up and fix the bug, just so that the software gets released. > > Instead, I would propose that the only way to delay a release is by > proposing to take some specific action to remedy the situation that > should cause the delay. Otherwise, releasing is at the discretion of > the release manager, who has the ultimate say to whether the problem > is important or not. > I would agree with leaving it to the discretion of the release manager and we should aim for rather than hard require all stable buildbots to be green. I would still *expect* that a release manager would look at the stable buildbots before cutting a release. > As for OSX, it seems that the only test that is failing is the ctypes > test suite, and there only a single test. I don't think this is > sufficient reason to block the release. > Bill listed several other failures he saw on the buildbots and I see the same set, plus test_posix. All the best, Michael > Regards, > Martin -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From simon at ikanobori.jp Mon Jun 21 23:26:13 2010 From: simon at ikanobori.jp (Simon de Vlieger) Date: Mon, 21 Jun 2010 23:26:13 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> Message-ID: <4557027D-4EF9-4A4B-B816-3004DFB78F2A@ikanobori.jp> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 21 jun 2010, at 23:03, Lennart Regebro wrote: > On Sun, Jun 20, 2010 at 18:20, Laurens Van Houtven > wrote: >> 2.x or 3.x? http://tinyurl.com/py2or3 > > Wow. That's almost not an improvement... That link doesn't really help > anyone choose at all. Lennart, That part of the topic will be replaced after all feedback is gathered on the new article Laurens provided at: http://python-commandments.org/python3.html as stated earlier in this thread. Regards, Simon de Vlieger -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQIcBAEBAgAGBQJMH9j1AAoJEBBSHP7i+JXf8qQP/1w6Esl/x6S5+4lDqykx0R7w M9v6x8G2JvnthTkzh2hF76vruLc4e3SNs1QVCmirh5vjdkRHneJQ/2w/dRVKLi2b /tayYg5QyzjPL37wiAarRnsr7SSiwFgEUCHWZVAAw0dRvszYF/CoLmxTs8TQWs8o KnRuwO4UHuXvtarqO8JeY6gMR4bwcdEXHVNqdRK+PSoRXH9IVJky6IcqwtTC0bzf vyLlQZmVdiXIXvjYOxNQgoufmsC74daqqodzhxtCn2WTHSN2s1ws/gkxBqe+NZPz zYlAukVSiLz/YMcK3NGZYukseT8ZBGiNMuhPVt3lb4SY2LnKVRUiYqNCp9wpWCr/ ASmjaZDU0Dz5I+PHSNCWC4NHyTNClPy3b4b9y3LJ/6hpNZaC3wGHTX5IDxQKjt5u ajEgzstM2wuZDtVNQhcADHk2KWBsCoaE9c0tXKz40T7nIq15zbbGqhyTXjmyouLB JoonSPbS5Ap1UY6RGWEt6t3ZdVDDnMwJzL/DBMOiMgWZIVf7B6/VPy0j9jV9U0WV Sx+U5WnaYqKYo+ZkRTg1iI6dPuK5GTGph+2gzjdTHRVMFFPETxkFz/pBZJG4DOHq bkaKG2IFMWB+Ua9GrTJTbfmTP3YzgJwBG34ZWRLFSQu7zJaY1JdQqQK7z+SCJ5Lg toMEpj7z8KxfUAF84xBG =hTod -----END PGP SIGNATURE----- From foom at fuhm.net Mon Jun 21 22:54:06 2010 From: foom at fuhm.net (James Y Knight) Date: Mon, 21 Jun 2010 16:54:06 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <4C1FCB99.1090102@egenix.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621155550.643d27b8@heresy> <4C1FCB99.1090102@egenix.com> Message-ID: <1E87B24F-FE0A-4C97-B895-FB15022DA2A0@fuhm.net> On Jun 21, 2010, at 4:29 PM, M.-A. Lemburg wrote: > Here's a little known fact: by changing the Python2 default > encoding to 'undefined' (yes, that's a real codec !), you can disable > all automatic string coercion in Python2. I tried that once: half the stdlib stops working if you do (for example, the re module), so it's not particularly useful for checking if your own code is unicode-safe. James From regebro at gmail.com Mon Jun 21 23:29:31 2010 From: regebro at gmail.com (Lennart Regebro) Date: Mon, 21 Jun 2010 23:29:31 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <4557027D-4EF9-4A4B-B816-3004DFB78F2A@ikanobori.jp> References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> <4557027D-4EF9-4A4B-B816-3004DFB78F2A@ikanobori.jp> Message-ID: On Mon, Jun 21, 2010 at 23:26, Simon de Vlieger wrote: > That part of the topic will be replaced after all feedback is gathered on > the new article Laurens provided at: > http://python-commandments.org/python3.html as stated earlier in this > thread. OK, great, I missed that! -- Lennart Regebro: Python, Zope, Plone, Grok http://regebro.wordpress.com/ +33 661 58 14 64 From martin at v.loewis.de Mon Jun 21 23:36:37 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 21 Jun 2010 23:36:37 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FD84B.3030202@voidspace.org.uk> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> Message-ID: <4C1FDB65.4020503@v.loewis.de> > Bill listed several other failures he saw on the buildbots and I see the > same set, plus test_posix. Still, the question would be whether any of these failures can manage to block a release. Are they regressions from 2.6? That would make them good candidates for release blockers. Except that I still would like to see commitment from somebody to fix them or else they can't block the release: if "we" don't mean that supporting a platform also means volunteering to fix bugs, then I guess "we" should stop declaring the platform supported. Just wishing that it was supported actually doesn't make it so. If the test failure *isn't* a regression, I think it shouldn't block the release. Regards, Martin From a.badger at gmail.com Mon Jun 21 23:41:19 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 21 Jun 2010 17:41:19 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621201006.5A3223A404D@sparrow.telecommunity.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621192952.GZ5787@unaka.lan> <20100621201006.5A3223A404D@sparrow.telecommunity.com> Message-ID: <20100621214119.GB5787@unaka.lan> On Mon, Jun 21, 2010 at 04:09:52PM -0400, P.J. Eby wrote: > At 03:29 PM 6/21/2010 -0400, Toshio Kuratomi wrote: > >On Mon, Jun 21, 2010 at 01:24:10PM -0400, P.J. Eby wrote: > >> At 12:34 PM 6/21/2010 -0400, Toshio Kuratomi wrote: > >> >What do you think of making the encoding attribute a mandatory part of > >> >creating an ebyte object? (ex: ``eb = ebytes(b, 'euc-jp')``). > >> > >> As long as the coercion rules force str+ebytes (or str % ebytes, > >> ebytes % str, etc.) to result in another ebytes (and fail if the str > >> can't be encoded in the ebytes' encoding), I'm personally fine with > >> it, although I really like the idea of tacking the encoding to bytes > >> objects in the first place. > >> > >I wouldn't like this. It brings us back to the python2 problem where > >sometimes you pass an ebyte into a function and it works and other times you > >pass an ebyte into the function and it issues a traceback. > > For stdlib functions, this isn't going to happen unless your ebytes' > encoding is not compatible with the ascii subset of unicode, or the > stdlib function is working with dynamic data... in which case you > really *do* want to fail early! > The ebytes encoding will often be incompatible with the ascii subset. It's the reason that people were so often tempted to change the defaultencoding on python2 to utf8. > I don't see this as a repeat of the 2.x situation; rather, it allows > you to cause errors to happen much *earlier* than they would > otherwise show up if you were using unicode for your encoded-bytes > data. > > For example, if your program's intent is to end up with latin-1 > output, then it would be better for an error to show up at the very > *first* point where non-latin1 characters are mixed with your data, > rather than only showing up at the output boundary! > That highly depends on your usage. If you're formatting a comment on a web page, checking at output and replacing with '?' is better than a traceback. If you're entering key values into a database, then you likely want to know where the non-latin1 data is entering your program, not where it's mixed with your data or the output boundary. > However, if you promoted mixed-type operation results to unicode > instead of ebytes, then you: > > 1) can't preserve data that doesn't have a 1:1 mapping to unicode, and > ebytes should be immutable like bytes and str. So you shouldn't lose the data if you keep a reference to it. > 2) can't detect an error until your data reaches the output point in > your application -- forcing you to defensively insert ebytes calls > everywhere (vs. simply wrapping them around a handful of designated > inputs), or else have to go right back to tracing down where the > unusable data showed up in the first place. > Usually, you don't want to know where you are combining two incompatible strings. Instead, you want to know where the incompatible strings are being set in the first place. If function(a, b) tracebacks with certain combinations of a and b I need to know where a and b are being set, not where function(a, b) is in the source code. So you need to be making input values ebytes() (or str in current python3) no matter what. > One thing that seems like a bit of a blind spot for some folks is > that having unicode is *not* everybody's goal. Not because we don't > believe unicode is generally a good thing or anything like that, but > because we have to work with systems that flat out don't *do* > unicode, thereby making the presence of (fully-general) unicode an > error condition that has to be stamped out! > I think that sometimes as well. However, here I think you're in a bit of a blind spot yourself. I'm saying that making ebytes + str coerce to ebytes will only yield a traceback some of the time; which is the python2 behaviour. Having ebytes + str coerce to str will never throw a traceback as long as our implementation checks that the bytes and encoding work together fro mthe start. Throwing an error in code, only on some input is one of the main reasons that debugging unicode vs byte issues sucks on python2. On my box, with my dataset, everything works. Toss it up on pypi and suddenly I have a user in Japan who reports that he gets a traceback with his dataset that he can't give to me because it's proprietary, overly large, or transient. > IOW, if you're producing output that has to go into another system > that doesn't take unicode, it doesn't matter how > theoretically-correct it would be for your app to process the data in > unicode form. In that case, unicode is not a feature: it's a bug. > This is not always true. If you read a webpage, chop it up so you get a list of words, create a histogram of word length, and then write the output as utf8 to a database. Should you do all your intermediate string operations on utf8 encoded byte strings? No, you should do them on unicode strings as otherwise you need to know about the details of how utf8 encodes characters. > And as it really *is* an error in that case, it should not pass > silently, unless explicitly silenced. > This is very true -- although the python3 stdlib does explicitly silence errors related to unicode in some cases. Anyhow -- IMHO, you should get a TypeError when you attempt to pass a unicode value into a function that is meant to work with bytes. (You can accept an ebytes object as well since it has a known bytes representation). -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From lvh at laurensvh.be Mon Jun 21 23:41:59 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Mon, 21 Jun 2010 23:41:59 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <4A9DD603-3A5F-4A8A-A37C-21C88ABC2A43@twistedmatrix.com> Message-ID: On Mon, Jun 21, 2010 at 11:03 PM, Lennart Regebro wrote: > On Sun, Jun 20, 2010 at 18:20, Laurens Van Houtven wrote: >> 2.x or 3.x? http://tinyurl.com/py2or3 > > Wow. That's almost not an improvement... That link doesn't really help > anyone choose at all. > > -- > Lennart Regebro: Python, Zope, Plone, Grok > http://regebro.wordpress.com/ > +33 661 58 14 64 > Please read the rest of the thread: that's ancient information and no longer the latest work. We just removed the thing that offended people, so that the situation could be defused instantly and then we could work towards something everyone liked in a calm and productive environment. Laurens From john.arbash.meinel at gmail.com Mon Jun 21 23:52:08 2010 From: john.arbash.meinel at gmail.com (John Arbash Meinel) Date: Mon, 21 Jun 2010 16:52:08 -0500 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621214119.GB5787@unaka.lan> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621192952.GZ5787@unaka.lan> <20100621201006.5A3223A404D@sparrow.telecommunity.com> <20100621214119.GB5787@unaka.lan> Message-ID: <4C1FDF08.3010401@gmail.com> ... >> IOW, if you're producing output that has to go into another system >> that doesn't take unicode, it doesn't matter how >> theoretically-correct it would be for your app to process the data in >> unicode form. In that case, unicode is not a feature: it's a bug. >> > This is not always true. If you read a webpage, chop it up so you get > a list of words, create a histogram of word length, and then write the output as > utf8 to a database. Should you do all your intermediate string operations > on utf8 encoded byte strings? No, you should do them on unicode strings as > otherwise you need to know about the details of how utf8 encodes characters. > You'd still have problems in Unicode given stuff like ? =~ a? even though u'\xe5' vs u'a\u030a' (those will look the same depending on your Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw with my current font shows the second as 2 characters.) I realize this was a toy example, but it does point out that Unicode complicates the idea of 'equality' as well as the idea of 'what is a character'. And just saying "decode it to Unicode" isn't really sufficient. John =:-> From fuzzyman at voidspace.org.uk Mon Jun 21 23:52:28 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 21 Jun 2010 22:52:28 +0100 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FDB65.4020503@v.loewis.de> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> Message-ID: <4C1FDF1C.2060308@voidspace.org.uk> On 21/06/2010 22:36, "Martin v. L?wis" wrote: >> Bill listed several other failures he saw on the buildbots and I see the >> same set, plus test_posix. > > Still, the question would be whether any of these failures can manage > to block a release. Are they regressions from 2.6? The test_posix failure is a regression from 2.6 (but it only shows up on some machines - it is caused by a fairly braindead implementation of a couple of posix apis by Apple apparently). http://bugs.python.org/issue7900 There are various patches available and a lot of work that has gone into diagnosing it - but there was some disagreement on what is the *best* way to fix it. Two of the other failures I'm pretty sure are problems in the test suite rather than bugs (as Bill said) and I'm not sure about the ctypes issue. Just starting a full build here. Michael > That would make them good candidates for release blockers. Except that > I still would like to see commitment from somebody to fix them or else > they can't block the release: if "we" don't mean that supporting a > platform also means volunteering to fix bugs, then I guess "we" should > stop declaring the > platform supported. Just wishing that it was supported actually > doesn't make it so. > > If the test failure *isn't* a regression, I think it shouldn't block > the release. > > Regards, > Martin > > > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Mon Jun 21 23:57:04 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 21 Jun 2010 22:57:04 +0100 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FDF1C.2060308@voidspace.org.uk> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> Message-ID: <4C1FE030.7020700@voidspace.org.uk> On 21/06/2010 22:52, Michael Foord wrote: > On 21/06/2010 22:36, "Martin v. L?wis" wrote: >>> Bill listed several other failures he saw on the buildbots and I see >>> the >>> same set, plus test_posix. >> >> Still, the question would be whether any of these failures can manage >> to block a release. Are they regressions from 2.6? > > The test_posix failure is a regression from 2.6 (but it only shows up > on some machines - it is caused by a fairly braindead implementation > of a couple of posix apis by Apple apparently). > > http://bugs.python.org/issue7900 > > There are various patches available and a lot of work that has gone > into diagnosing it - but there was some disagreement on what is the > *best* way to fix it. > > Two of the other failures I'm pretty sure are problems in the test > suite rather than bugs (as Bill said) and I'm not sure about the > ctypes issue. Just starting a full build here. Right now I'm *only* seeing these two failures on Mac OS X (10.6.4): test_posix test_urllib2_localnet All the best, Michael > > Michael >> That would make them good candidates for release blockers. Except >> that I still would like to see commitment from somebody to fix them >> or else they can't block the release: if "we" don't mean that >> supporting a platform also means volunteering to fix bugs, then I >> guess "we" should stop declaring the >> platform supported. Just wishing that it was supported actually >> doesn't make it so. >> >> If the test failure *isn't* a regression, I think it shouldn't block >> the release. >> >> Regards, >> Martin >> >> >> > > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From ncoghlan at gmail.com Tue Jun 22 00:03:58 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jun 2010 08:03:58 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621201616.EADEF3A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621160420.63037f1c@heresy> <20100621201616.EADEF3A404D@sparrow.telecommunity.com> Message-ID: On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby wrote: > True, but making it a separate type with a required encoding gets rid of the > magical "I don't know" - the "I don't know" encoding is just a plain old > bytes object. So, to boil down the ebytes idea, it is basically a request for a second string type that holds an octet stream plus an encoding name, rather than a Unicode character stream. Calling it "ebytes" seems to emphasise the wrong parallel in that case (you have a 'str' object with a different internal structure, not any kind of bytes object). For now I'll call it an "altstr". Then the idea can be described as - altstr would expose the same API as str, NOT the same API as bytes - explicit conversion via "str" would use the altstr's __str__ method - explicit conversion via "bytes" would use the altstr's __bytes__ method - implicit interaction with str would convert the str to an altstr object according to the altstr's rules. This may be best handled via a coercion method on altstr, rather than str actually needing to know the details (i.e. an altrstr.__coerce_str__() method). For the 'ebytes' model, this would do something like "type(self)(other.encode(self.encoding), self.encoding))". The operation would then be handled by the corresponding method on the coerced object. A new type could then override operations such as __contains__, __mod__, format() and join(). This is still smelling an awful lot like the 2.x str type to me, but supporting a __coerce_str__ method may allow some useful experimentation in this space (as PJE suggested). There's a chance it would be abused, but it offers a greater chance of success than trying to come up with a concrete altstr type without providing a means for experimentation first. > (In principle, you could then drop *all* the stringlike methods from > plain-old-bytes objects. ?If it's really text-in-bytes you want, you should > use an ebytes with the encoding specified.) Except that a lot of those string-like methods are just plain useful, even when you *know* you're dealing with an octet stream rather than latin-1 encoded text. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From a.badger at gmail.com Tue Jun 22 00:06:57 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 21 Jun 2010 18:06:57 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <4C1FDF08.3010401@gmail.com> References: <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621192952.GZ5787@unaka.lan> <20100621201006.5A3223A404D@sparrow.telecommunity.com> <20100621214119.GB5787@unaka.lan> <4C1FDF08.3010401@gmail.com> Message-ID: <20100621220657.GC5787@unaka.lan> On Mon, Jun 21, 2010 at 04:52:08PM -0500, John Arbash Meinel wrote: > > ... > >> IOW, if you're producing output that has to go into another system > >> that doesn't take unicode, it doesn't matter how > >> theoretically-correct it would be for your app to process the data in > >> unicode form. In that case, unicode is not a feature: it's a bug. > >> > > This is not always true. If you read a webpage, chop it up so you get > > a list of words, create a histogram of word length, and then write the output as > > utf8 to a database. Should you do all your intermediate string operations > > on utf8 encoded byte strings? No, you should do them on unicode strings as > > otherwise you need to know about the details of how utf8 encodes characters. > > > > You'd still have problems in Unicode given stuff like ? =~ a? even though > u'\xe5' vs u'a\u030a' (those will look the same depending on your > Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw > with my current font shows the second as 2 characters.) > > I realize this was a toy example, but it does point out that Unicode > complicates the idea of 'equality' as well as the idea of 'what is a > character'. And just saying "decode it to Unicode" isn't really sufficient. > Ah -- but if you're dealing with unicode objects you can use the unicodedata.normalize() function on them to come out with the right values. If you're using bytes, it's yet another case where you, the programmer, have to know what byte sequences represent combining characters in the particular encoding that you're dealing with. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From martin at v.loewis.de Tue Jun 22 00:16:15 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 22 Jun 2010 00:16:15 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FDF1C.2060308@voidspace.org.uk> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> Message-ID: <4C1FE4AF.80009@v.loewis.de> > The test_posix failure is a regression from 2.6 (but it only shows up on > some machines - it is caused by a fairly braindead implementation of a > couple of posix apis by Apple apparently). > > http://bugs.python.org/issue7900 Ah, that one. I definitely think this should *not* block the release: a) there is no clear solution in sight. So if we wait for it resolved, it could take months until we get a 2.7 release. b) it's only about getgroups - a fairly minor API. c) IIUC, it only occurs to users which are member of more than 16 groups - a fairly uncommon setup. Regards, Martin From ncoghlan at gmail.com Tue Jun 22 00:18:20 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jun 2010 08:18:20 +1000 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <20100621203756.2f99757f@pitrou.net> References: <73196.1277143019@parc.com> <20100621203756.2f99757f@pitrou.net> Message-ID: > There also seem to be a couple of failures left with test_gdb... Do you mean the compiler and debugger specific issues reported in http://bugs.python.org/issue8482? Fixing that properly is messy, and according to Victor's last message, even the correct conditions for skipping the test aren't completely clear. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Tue Jun 22 00:19:33 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 21 Jun 2010 23:19:33 +0100 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FE030.7020700@voidspace.org.uk> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> Message-ID: On 21 June 2010 22:57, Michael Foord wrote: >> Two of the other failures I'm pretty sure are problems in the test suite >> rather than bugs (as Bill said) and I'm not sure about the ctypes issue. >> Just starting a full build here. > > Right now I'm *only* seeing these two failures on Mac OS X (10.6.4): > > ? ?test_posix test_urllib2_localnet I'm still seeing a test_ctypes failure (on Windows XP). Not sure if it's the same one Bill was seeing: FAIL: test_issue_8959_b (ctypes.test.test_callbacks.SampleCallbacksTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\buildslave\trunk.moore-windows\build\lib\ctypes\test\test_callbacks.py", line 208, in test_issue_8959_b self.assertFalse(windowCount == 0) AssertionError: True is not False Looks like this test was added today, and counts the windows. As my buildbot is running as a service, and I generally leave it running when logged off, a window count of 0 may well be correct - I can't be sure. So my view is that it's possibly a bug in the test - but it could do with someone more expert to confirm this. I've got a build running at the moment, when it's finished I'll rerun the trunk build (I currently have a disconnected session with a window open, I'll see if that makes it pass). Paul. From p.f.moore at gmail.com Tue Jun 22 00:39:56 2010 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 21 Jun 2010 23:39:56 +0100 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> Message-ID: On 21 June 2010 23:19, Paul Moore wrote: > On 21 June 2010 22:57, Michael Foord wrote: >>> Two of the other failures I'm pretty sure are problems in the test suite >>> rather than bugs (as Bill said) and I'm not sure about the ctypes issue. >>> Just starting a full build here. >> >> Right now I'm *only* seeing these two failures on Mac OS X (10.6.4): >> >> ? ?test_posix test_urllib2_localnet > > I'm still seeing a test_ctypes failure (on Windows XP). Not sure if > it's the same one Bill was seeing: > > FAIL: test_issue_8959_b (ctypes.test.test_callbacks.SampleCallbacksTestCase) > ---------------------------------------------------------------------- > Traceback (most recent call last): > ?File "C:\buildslave\trunk.moore-windows\build\lib\ctypes\test\test_callbacks.py", > line 208, in test_issue_8959_b > ? ?self.assertFalse(windowCount == 0) > AssertionError: True is not False > > Looks like this test was added today, and counts the windows. As my > buildbot is running as a service, and I generally leave it running > when logged off, a window count of 0 may well be correct - I can't be > sure. So my view is that it's possibly a bug in the test - but it > could do with someone more expert to confirm this. > > I've got a build running at the moment, when it's finished I'll rerun > the trunk build (I currently have a disconnected session with a window > open, I'll see if that makes it pass). Yes, looks like it's a bug in the test. http://bugs.python.org/issue9055 raised. Paul. From stefan at bytereef.org Tue Jun 22 00:37:11 2010 From: stefan at bytereef.org (Stefan Krah) Date: Tue, 22 Jun 2010 00:37:11 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <94209.1277152580@parc.com> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <94209.1277152580@parc.com> Message-ID: <20100621223711.GA25865@yoda.bytereef.org> Bill Janssen wrote: > % make test > [...] > test_uuid > test test_uuid failed -- Traceback (most recent call last): > File "/private/tmp/Python-2.7rc2/Lib/test/test_uuid.py", line 472, in testIssue8621 > self.assertNotEqual(parent_value, child_value) > AssertionError: '8395a08e40454895be537a180539b7fb' == '8395a08e40454895be537a180539b7fb' > > [...] I reopened http://bugs.python.org/issue8621 . Could you comment there and help resolve the test failure? Stefan Krah From tjreedy at udel.edu Tue Jun 22 00:48:12 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 18:48:12 -0400 Subject: [Python-Dev] Adding additional level of bookmarks and section numbers in python pdf documents. In-Reply-To: References: Message-ID: On 6/21/2010 4:07 PM, Peng Yu wrote: > Hi, > > Current pdf version of python documents don't have bookmarks for > sussubsection. For example, there is no bookmark for the following > section in python_2.6.5_reference.pdf. Also the bookmarks don't have > section numbers in them. I suggest to include the section numbers. > Could these features be added in future release of python document. > > 3.4.1 Basic customization Search doc issues on the tracker for this topic and file a feature request doc issue if there is not one. -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 22 01:01:09 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 19:01:09 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On 6/21/2010 3:59 PM, Steve Holden wrote: > Terry Reedy wrote: >> On 6/21/2010 8:33 AM, Nick Coghlan wrote: >> >>> P.S. (We're going to have a tough decision to make somewhere along the >>> line where docs.python.org is concerned, too - when do we flick the >>> switch and make a 3.x version of the docs the default? >> >> Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'. >> Trunk released always take over docs.python.org. To do otherwise would >> be to say that 3.2 is not a real trunk release and not yet ready for >> real use -- a major slam. >> >> Actually, I thought this was already discussed and decided ;-). >> > This also gives the 2.7 release it's day in the sun before relegation to > maintenance status. Every new version (except 3.0 and 3.1) has gone to maintenance status *and* becomes the featured release on docs.python.org the day it was released. 2.7 would just spend less time as the featured release on that page. > The Python 3 documents, when they become the default, should contain an > every-page link to the Python 2 documentation (though linkages may be a > problem - they could probably be done at a gross level). docs.python.org contains links to docs to other releases, both past and future. There is no reason to treat 3.2 specially, or to junk up its pages. The 3.x docs have intentionally been cleaned of nearly all references to 2.x. The current 2.6 and 2.7 pages have no references to corresponding 3.1 pages. Terry Jan Reedy From barry at python.org Tue Jun 22 01:12:57 2010 From: barry at python.org (Barry Warsaw) Date: Mon, 21 Jun 2010 19:12:57 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <20100621163404.GV5787@unaka.lan> <20100621172413.578853A404D@sparrow.telecommunity.com> <20100621160420.63037f1c@heresy> <20100621201616.EADEF3A404D@sparrow.telecommunity.com> Message-ID: <20100621191257.698ae6cc@heresy> On Jun 22, 2010, at 08:03 AM, Nick Coghlan wrote: >On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby wrote: >> True, but making it a separate type with a required encoding gets rid of the >> magical "I don't know" - the "I don't know" encoding is just a plain old >> bytes object. > >So, to boil down the ebytes idea, it is basically a request for a >second string type that holds an octet stream plus an encoding name, >rather than a Unicode character stream. Calling it "ebytes" seems to >emphasise the wrong parallel in that case (you have a 'str' object >with a different internal structure, not any kind of bytes object). >For now I'll call it an "altstr". Then the idea can be described as Actually no. We're introducing a second bytes type that holds an octet stream plus an encoding name. See the toy implementation I included in a previous message. As opposed to say a bytes object that represented an image, which would make almost no sense to decode to a unicode, this ebytes type would help bridge the gap between a pure bytes object and a pure unicode object. It would know how to accurately convert to a unicode (i.e. __str__()) because it would know the encoding of the bytes. Obviously, it could convert to a pure bytes object. Because it can be accurately stringified, it can have the most if not all of the str API. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From db3l.net at gmail.com Tue Jun 22 01:17:48 2010 From: db3l.net at gmail.com (David Bolen) Date: Mon, 21 Jun 2010 19:17:48 -0400 Subject: [Python-Dev] red buildbots on 2.7 References: <73196.1277143019@parc.com> Message-ID: Paul Moore writes: > Thanks for the alert. I've killed the stuck test and should see some > runs going through now. Shame, really, I was getting used to seeing a > nice page of all green results... In my experience, my OSX and Windows buildbots need some manual TLC on an ongoing basis. I kill off stranded python processes several times a week on both platforms. OSX actually seems as bad as Windows in this regard, which is strange given its *nix heritage, but perhaps its how some of the test processes are created. Most of the time the stranded processes aren't hurting anything but local resource, but sometimes they can lock directories, or hang a build/test for a particular builder. My windows buildbots also have a tendency to fill up temp, or even if there's room, get sluggish due to all the cruft left in that directory, so I periodically clean that out manually as well. -- David From steve at pearwood.info Tue Jun 22 01:23:28 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Jun 2010 09:23:28 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621201006.5A3223A404D@sparrow.telecommunity.com> References: <20100621192952.GZ5787@unaka.lan> <20100621201006.5A3223A404D@sparrow.telecommunity.com> Message-ID: <201006220923.28378.steve@pearwood.info> On Tue, 22 Jun 2010 06:09:52 am P.J. Eby wrote: > However, if you promoted mixed-type operation results to unicode > instead of ebytes, then you: > > 1) can't preserve data that doesn't have a 1:1 mapping to unicode, Sounds like exactly the sort of thing the Unicode private codepoints were invented for, as Toshio suggests. In any case, if there are use-cases for text that aren't solved by Unicode, and I'm not convinced that there are, Python doesn't need to solve them. At the very least, such a solution should start off as a third-party package to prove itself before being made a part of the standard library, let alone a built-in. -- Steven D'Aprano From steve at pearwood.info Tue Jun 22 01:27:31 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Jun 2010 09:27:31 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621201616.EADEF3A404D@sparrow.telecommunity.com> Message-ID: <201006220927.31875.steve@pearwood.info> On Tue, 22 Jun 2010 08:03:58 am Nick Coghlan wrote: > On Tue, Jun 22, 2010 at 6:16 AM, P.J. Eby wrote: > > True, but making it a separate type with a required encoding gets > > rid of the magical "I don't know" - the "I don't know" encoding is > > just a plain old bytes object. > > So, to boil down the ebytes idea, it is basically a request for a > second string type that holds an octet stream plus an encoding name, > rather than a Unicode character stream. Do any other languages have any equivalent to this ebtyes type? If not, how do they deal with this issue? [...] > This is still smelling an awful lot like the 2.x str type to me Yes. Virtually the only difference I can see is that it lets the user set a per-object default encoding to use when coercing strings to and from bytes. If this is not the case, can somebody please explain what I'm missing? -- Steven D'Aprano From tjreedy at udel.edu Tue Jun 22 01:48:46 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 19:48:46 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621172957.EB55C3A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> <4C1F9833.2080905@voidspace.org.uk> <20100621172957.EB55C3A404D@sparrow.telecommunity.com> Message-ID: On 6/21/2010 1:29 PM, P.J. Eby wrote: > At 05:49 PM 6/21/2010 +0100, Michael Foord wrote: >> Why is your proposed bstr wrapper not practical to implement outside >> the core and use in your own libraries and frameworks? > > __contains__ doesn't have a converse operation, so you can't code a type > that works around this (Python 3.1 shown): > > >>> from os.path import join > >>> join(b'x','y') > >>> join('y',b'x') I am really unclear what result you intend for such mixed pairs, for all possible mixed pairs, sensible or not. It would seem to me best to write your own pjoin function that did exactly what you want over the whole input domain. -- Terry Jan Reedy From nyamatongwe at gmail.com Tue Jun 22 01:49:52 2010 From: nyamatongwe at gmail.com (Neil Hodgson) Date: Tue, 22 Jun 2010 09:49:52 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <201006220927.31875.steve@pearwood.info> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621201616.EADEF3A404D@sparrow.telecommunity.com> <201006220927.31875.steve@pearwood.info> Message-ID: Steven D'Aprano: > Do any other languages have any equivalent to this ebtyes type? The String type in Ruby 1.9 is a byte string with an encoding attribute. Most online Ruby documentation is for 1.8 but the API can be examined here: http://ruby-doc.org/ruby-1.9/index.html Here's something more explanatory: http://blog.grayproductions.net/articles/ruby_19s_string My view is that this actually makes things much more complex by making encoding combination an n*n problem (where n is the number of encodings) rather an n sized problem when you have a single core string type Neil From tjreedy at udel.edu Tue Jun 22 01:55:59 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 19:55:59 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> Message-ID: On 6/21/2010 1:29 PM, Guido van Rossum wrote: > Actually, the big problem with Python 2 is that if you mix str and > unicode, things work or crash depending on whether any of the str > objects involved contain non-ASCII bytes. > > If one API decides to upgrade to Unicode, the result, when passed to > another API, may well cause a UnicodeError because not all arguments > have had the same treatment. > >> Now, the APIs are neither safe nor aware -- if you pass bytes in, you get >> unpredictable results back. > > This seems an overgeneralization of a particular bug. There are APIs > that are strictly text-in, text-out. There are others that are > bytes-in, bytes-out. Let's call all those *pure*. For some operations > it makes sense that the API is *polymorphic*, with which I mean that > text-in causes text-out, and bytes-in causes byte-out. All of these > are fine. > > Perhaps there are more situations where a polymorphic API would be > helpful. Such APIs are not always so easy to implement, because they > have to be careful with literals or other constants (and even more so > mutable state) used internally -- but it can be done, and there are > plenty of examples in the stdlib. > > The real problem apparently lies in (what I believe is only a few > rare) APIs that are text-or-bytes-in and always-text-out (or > always-bytes-out). Let's call them *hybrid*. Clearly, mixing hybrid > APIs in a stream of pure or polymorphic API calls is a problem, > because they turn a pure or polymorphic overall operation into a > hybrid one. > > There are also text-in, bytes-out or bytes-in, text-out APIs that are > intended for encoding/decoding of course, but these are in a totally > different class. > > Abstractly, it would be good if there were as few as possible hybrid > APIs, many pure or polymorphic APIs (which it should be in a > particular case is a pragmatic choice), and a limited number of > encoding/decoding APIs, which should generally be invoked at the edges > of the program (e.g., I/O). Nice summary of part of the 'why' for Python3. > I still believe that believe that the instances of bytes silently > succeeding *some* of the time refers to specific bugs in specific > APIs, either intentional because of misguided compatibility desires, > or accidental in the haste of trying to convert the entire stdlib to > Python 3 in a finite time. I think http://bugs.python.org/issue5468 reports one aspect of haste, missing encoding and errors paramaters. But it has not gotten much attention. -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 22 02:46:03 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 20:46:03 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6/21/2010 1:58 PM, Stephen J. Turnbull wrote: > As for "Think Carefully About It Every Time", that is required only in > Porting Programs That Mix Operation On Bytes With Operation On Str. The 2.x anti-pattern > If you write programs from scratch, however, the decode-process-encode > paradigm quickly becomes second nature. Except in this particular arena, it already should be to anyone reading this list. Decorate-sort-undecorate is another example of the same idea. Transform-compute-untransform is the basis of NP-complete theory. Frequency domain processing sandwiched between forward and reverse Fourier transforms is a third example. And so on. -- Terry Jan Reedy From jess.austin at gmail.com Tue Jun 22 02:59:03 2010 From: jess.austin at gmail.com (Jess Austin) Date: Mon, 21 Jun 2010 19:59:03 -0500 Subject: [Python-Dev] email package status in 3.X Message-ID: On Mon, Jun 22, 2010 at 7:27:31 PM, Steven D'Aprano wrote: > On Tue, 22 Jun 2010 08:03:58 am Nick Coghlan wrote: >> So, to boil down the ebytes idea, it is basically a request for a >> second string type that holds an octet stream plus an encoding name, >> rather than a Unicode character stream. > > Do any other languages have any equivalent to this ebtyes type? Ruby seems to do this: http://yokolet.blogspot.com/2009/07/design-and-implementation-of-ruby-m17n.html I don't use ruby myself, and I'm probably missing some subtle flaws, but the exposition at that link makes sense to me. cheers, Jess From alexander.belopolsky at gmail.com Tue Jun 22 03:05:54 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Jun 2010 21:05:54 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> Message-ID: On Mon, Jun 21, 2010 at 6:39 PM, Paul Moore wrote: .. > Yes, looks like it's a bug in the test. http://bugs.python.org/issue9055 raised. I concur. I've updated the issue with a proposed fix. (The problem is that proxy host names should have a '.' in them on OSX.) I am trying to decide whether the fix should be applied for all platforms or conditionally for darwin. Can someone test the fix on Windows? From alexander.belopolsky at gmail.com Tue Jun 22 03:08:19 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Jun 2010 21:08:19 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> Message-ID: Oh, I thought that was about http://bugs.python.org/issue8455 . On Mon, Jun 21, 2010 at 9:05 PM, Alexander Belopolsky wrote: > On Mon, Jun 21, 2010 at 6:39 PM, Paul Moore wrote: > .. >> Yes, looks like it's a bug in the test. http://bugs.python.org/issue9055 raised. > > I concur. ?I've updated the issue with a proposed fix. ?(The problem > is that proxy host names should have a '.' in them on OSX.) ?I am > trying to decide whether the fix should be applied for all platforms > or conditionally for darwin. ?Can someone test the fix on Windows? > From janssen at parc.com Tue Jun 22 03:26:59 2010 From: janssen at parc.com (Bill Janssen) Date: Mon, 21 Jun 2010 18:26:59 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> Message-ID: <1180.1277170019@parc.com> Alexander Belopolsky wrote: > Oh, I thought that was about http://bugs.python.org/issue8455 . > > On Mon, Jun 21, 2010 at 9:05 PM, Alexander Belopolsky > wrote: > > On Mon, Jun 21, 2010 at 6:39 PM, Paul Moore wrote: > > .. > >> Yes, looks like it's a bug in the test. http://bugs.python.org/issue9055 raised. > > > > I concur. ?I've updated the issue with a proposed fix. ?(The problem > > is that proxy host names should have a '.' in them on OSX.) ?I am > > trying to decide whether the fix should be applied for all platforms > > or conditionally for darwin. ?Can someone test the fix on Windows? Ah, thanks for tracking that one down. I'll bet it's the same problem I'm seeing with proxy authentication with bad credentials unexpectedly succeeding. Though, isn't that behavior of urllib.proxy_bypass another bug? Bill From alexander.belopolsky at gmail.com Tue Jun 22 03:38:43 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Jun 2010 21:38:43 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FE4AF.80009@v.loewis.de> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On Mon, Jun 21, 2010 at 6:16 PM, "Martin v. L?wis" wrote: >> The test_posix failure is a regression from 2.6 (but it only shows up on >> some machines - it is caused by a fairly braindead implementation of a >> couple of posix apis by Apple apparently). >> >> http://bugs.python.org/issue7900 > > Ah, that one. I definitely think this should *not* block the release: I agree that this is nowhere near being a release blocker, but I think it would be nice to do something about it before the final release. > a) there is no clear solution in sight. So if we wait for it resolved, > ? it could take months until we get a 2.7 release. The ideal solution will have to wait until Apple gets its act together and fixed the problem on their end. I would say "months" is an overly optimistic time estimate for that. However, the issue is a regression from prior versions. In 2.5 getgroups would truncate the list to 16 groups, but won't crash. More importantly the 16 groups returned would be correct per-process groups and not something immune to setgroup changes. I proposed a very simple fix: http://bugs.python.org/file16326/no-darwin-ext.diff which simply minimally reverts the change that introduced the regression. > b) it's only about getgroups - a fairly minor API. Agree, but failing regression test is an annoyance particularly in this case where the diagnostic from the test is very vague. Short of fixing the problem, we can skip the failing test on OSX if getgroups raises exception. > c) IIUC, it only occurs to users which are member of more than 16 > ? groups - a fairly uncommon setup. > Unfortunately it is fairly common. The default root account on OSX is member of 18 groups. Given that many os tests require root privileges, people will run these tests as root. From stephen at xemacs.org Tue Jun 22 03:41:02 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 10:41:02 +0900 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <4C1FDB65.4020503@v.loewis.de> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> Message-ID: <87tyov3kup.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > Still, the question would be whether any of these failures can manage to > block a release. Exactly. Personally, I would say that in a volunteer-maintained project, "Platform X is supported" means that "There is a bug that seems to affect only Platform X" is a candidate for release blocker, or other standardized action to get things fixed (call for volunteers, etc). That's a matter for agreement among the volunteers, not an objective definition. I think statements of support for certain platforms are useful to users, and that they cause very little additional friction or misunderstanding. (Users who think that "support" implies "support contract" are usually capable of finding an excuse to ignore *any* disclaimer of warrantee; simply refusing to use the word "support" won't save you from them!) If a distinction needs to be made, we can say "Python *support* for a platform does not imply that any particular issue will receive concentrated attention from the core developers in any time frame. When and how to address issues is up to the judgment of the development community. *Support contracts* are available from the businesses listed on the Wiki under 'Python Consultancies' for those who need a higher level of support." From tjreedy at udel.edu Tue Jun 22 03:46:27 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 21 Jun 2010 21:46:27 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> Message-ID: On 6/21/2010 2:46 PM, P.J. Eby wrote: > This ignores the existence of use cases where what you have is text that > can't be properly encoded in unicode. I think it depends on what you mean by 'properly'. I will try to explain with English examples. 1. Unicode represents a finite set of characters and symbols and a few control or markup operators. The potential set is unbounded, so unicode includes a user area. I include use of that area in 'properly'. I kind of suspect that the statement above does not since any byte or short byte sequence that does not translate can instead use the user area. 2. Unicode disclaims direct representation of font and style information, leaving that to markup either in or out of the text stream. (It made an exception for japanese narrow and wide ascii chars, which I consider to essentially be duplicate font variations of the normal ascii codes.) Html uses both in-band and out-of-band (css) markup. Stripping markup information is a loss of information. If one wants it, one must keep it in one form or another. I believe that some early editors like Wordstar used high-bit-set bytes for bold, underline, italic on and off. Assuming I have the example right, can Wordstar text be 'properly encoded in unicode'? If one insists that that mean replacement of each of the format markup chars with a single defined char in the Basic Multilingual Plane, then 'no'. If one allows replacement by , , and so on, then 'yes'. 3. Unicode disclaims direct representation of glyphic variants (though again, exceptions were made for asian acceptance). For example, in English, mechanically printed 'a' and 'g' are different from manually printed 'a' and 'g'. Representing both by the same codepoint, in itself, loses information. One who wishes to preserve the distinction must instead use a font tag or perhaps a tag. Similarly, older English had a significantly different glyph for 's', which looks more like a modern 'f'. If IBM's EBCDIC had codes for these glyph variants, IBM might have insisted that unicode also have such so char for char round-tripping would be possible. It does not and unicode does not. (Wordstar and other 1980s editor publishers were mostly defunct or weak and not in a position to make such demands.) If one wants to write on the history of glyph evolution, say of latin chars, one much either number the variants 'e-0', 'e-1', etc, or resort to the user area. In either case, proprietary software would be needed to actually print the variations with other text. > I know, it's a hard thing to wrap > one's head around, since on the surface it sounds like unicode is the > programmer's savior. Unfortunately, real-world text data exists which > cannot be safely roundtripped to unicode, I do not believe that. Digital information can always be recoded one way or another. As it is, the rules were bent for Japanese, in a way that they were not for English, to aid round-tripping of the major public encodings. I can, however, believe that there were private encodings for which round-tripping is more difficult. But there are also difficulties for old proprietary and even private English encodings. -- Terry Jan Reedy From stephen at xemacs.org Tue Jun 22 04:06:36 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 11:06:36 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621160105.25ae602f@heresy> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621160105.25ae602f@heresy> Message-ID: <87sk4f3jo3.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > I'm still not sure ebytes solves the problem, I don't see how it can. If you have an encoding to stuff into ebytes, you could just convert to Unicode and guarantee that all internal string operations will succeed. If you use ebytes instead, every string operation has to be wrapped in "try ... except EBytesError", to no gain that I can see. If you don't have an encoding, then you just have bytes, which strictly speaking shouldn't be operated on (in the sense of slicing, dicing, or stir-frying) at all if you're in an environment where they are a carrier for formatted information such as non-ASCII characters or PNG images. > but it avoids one I'm most concerned about seeing proposed. I > really really do not want to add encoding=blah arguments to > boatloads of function signatures. Agreed. But ebytes isn't a solution to that; it's a regression to one of the hardest problems in Python 2. OTOH, it seems to me that there's only one boatload to worry about. That's the boatload containing protocol-less APIs, ie, Unix OS data (names in the filesystem, content of environment variables). Other platforms (Windows, Mac) are standardizing on protocols for these things and enforcing them in the OS, and free Unices are going to the convention that everything is non-normalized UTF-8. What other boats are you worried about? From alexander.belopolsky at gmail.com Tue Jun 22 04:21:36 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 21 Jun 2010 22:21:36 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <1180.1277170019@parc.com> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> <1180.1277170019@parc.com> Message-ID: On Mon, Jun 21, 2010 at 9:26 PM, Bill Janssen wrote: .. > Though, isn't that behavior of urllib.proxy_bypass another bug? I don't know. Ask Ronald. From stephen at xemacs.org Tue Jun 22 04:58:57 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 11:58:57 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621165611.GW5787@unaka.lan> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> Message-ID: <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> Toshio Kuratomi writes: > One comment here -- you can also have uri's that aren't decodable into their > true textual meaning using a single encoding. > > Apache will happily serve out uris that have utf-8, shift-jis, and > euc-jp components inside of their path but the textual > representation that was intended will be garbled (or be represented > by escaped byte sequences). For that matter, apache will serve > requests that have no true textual representation as it is working > on the byte level rather than the character level. Sure. I've never seen that combination, but I have seen Shift JIS and KOI8-R in the same path. But in that case, just using 'latin-1' as the encoding allows you to use the (unicode) string operations internally, and then spew your mess out into the world for someone else to clean up, just as using bytes would. > So a complete solution really should allow the programmer to pass > in uris as bytes when the programmer knows that they need it. Other than passing bytes into a constructor, I would argue if a complete solution requires, eg, an interface that allows urljoin(base,subdir) where the types of base and subdir are not required to match, then it doesn't belong in the stdlib. For stdlib usage, that's premature optimization IMO. The RFC says that URIs are text, and therefore they can (and IMO should) be operated on as text in the stdlib. It's not just a matter of manipulating the URIs themselves, where working directly on bytes will work just as well and and with the same string operations (as long as everything is bytes). It's also a question of API complexity (eg, Barry's bugaboo of proliferation of encoding= parameters) and of debugging (if URIs are internally str, then they will display sanely in tracebacks and the interpreter). The cases where URIs can't be sanely treated as text are garbage input, and the stdlib should not try to provide a solution. Just passing in bytes and getting out bytes is GIGO. Trying to do "some" error-checking is going to be insufficient much of the time and overly strict most of the rest of the time. The programmer in the trenches is going to need to decide what to allow and what not; I don't think there are general answers because we know that allowing random URLs on the web leads to various kinds of problems. Some sites will need to address some of them. Note also that the "complete solution" argument cuts both ways. Eg, a "complete" solution should implement UTS 39 "confusables detection"[1] and IDNA[2]. Good luck doing that with bytes! If you *need* bytes (rather than simply trying to avoid conversion overhead), you're in a hazmat handling situation. Passing bytes in to stdlib APIs here is the equivalent of carrying around kilograms of fissionables in an open bucket. While the Tokaimura comparison is hyperbole, it can't be denied that use of bytes here shortcuts a lot of processing strongly suggested by the RFCs, and prevents use of various programming conveniences (such as reasonable display of URI values in debugging). Does the efficiency really justify including that in the stdlib? I dunno, I'm not a web programmer in the trenches. But I take my cue from MvL and MAL who don't seem real enthusiastic about this. And as Martin says, there is as yet no evidence offered that the overhead of conversion is a general problem. Footnotes: [1] http://www.unicode.org/reports/tr39/ [2] http://www.rfc-editor.org/rfc/rfc3490.txt From stephen at xemacs.org Tue Jun 22 06:15:19 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 13:15:19 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87ocf33dpk.fsf@uwakimon.sk.tsukuba.ac.jp> Robert Collins writes: > Perhaps you mean 3986 ? :) Thank you for the correction. > > ? ?A URI is an identifier consisting of a sequence of characters > > ? ?matching the syntax rule named in Section 3. > > > > (where the phrase "sequence of characters" appears in all ancestors I > > found back to RFC 1738), and > > Sure, ok, let me unpack what I meant just a little. An abstract URI is > neither unicode nor bytes per se - see section 1.2.1 " A URI is a > sequence of characters from a very limited set: the letters of the > basic Latin alphabet, digits, and a few special characters. " My position is that this describes the network protocol, not the abstract URI. It in no way suggests that uri-encoded forms should be handled internally. And the RFC explicitly says this is text, and therefore sanctions the user- and programmer-friendly practice of doing internal processing as text. Note that in a hypothetical bytes-oriented API base = convert_uri_to_wire_format('http://www.example.org/') formuri = uri_join(base,b'home/steve/public_html') the bytes literal b'/home/steve/public_html' clearly is intended as readable text. This is mixing types in the programmer's mind, even though base is internally in bytes format and the relative URI is also in bytes format. This is un-Pythonic IMO. > URI interpretation is fairly strictly separated between producers and > consumers. A consumer can manipulate a url with other url fragments - > e.g. doing urljoin. But it needs to keep the url as a url and not try > to decode it to a unicode representation. -------------- next part -------------- Unfortunately, outside of Kansas and Canberra, it don't work that way. How do you propose to uri_join base as above and '/home/?????/public_html'? Encoding and/or decoding must be done somewhere, and it would be damn unfriendly to make the browser user do it! In the bytes-oriented API, the programmer must be continually making decisions about whether and how to handle non-ASCII components from "outside" (or, more likely, cursing the existence of the damned foreigners, and then ignoring the possibility ... let them eat UnicodeException!) -------------- next part -------------- > As an example, if I give the uri "http://server/%c3%83", rendering > that as http://server/? is able to lead to transcription errors and > reinterpretation problems unless you know - out of band - that the > server is using utf8 to encode. Conversely if someone enters in > http://server/? in their browser window, choosing utf8 or their local > encoding is quite arbitrary and able to not match how the server would > represent that resource. Sure. Using bytes doesn't solve either problem. It just allows you to wash your hands of it and pass it on to someone else, who probably has even less information than you do. Eg, in the case of passing the uri "http://server/%c3%83" to someone else without telling them the encoding means that effectively they're limited to ASCII if they want to append meaningful relative paths without guessing the encoding. In the case of the user entering "http://server/?", you have to do *something* to produce bytes eventually. When was the last time you typed "%c3%83" at the end of a URL in a browser address field? > > ? ?2. ?Characters > > > > ? ?The URI syntax provides a method of encoding data, presumably for > > ? ?the sake of identifying a resource, as a sequence of characters. > > ? ?The URI characters are, in turn, frequently encoded as octets for > > ? ?transport or presentation. ?This specification does not mandate any > > ? ?particular character encoding for mapping between URI characters > > ? ?and the octets used to store or transmit those characters. ?When a > > ? ?URI appears in a protocol element, the character encoding is > > ? ?defined by that protocol; without such a definition, a URI is > > ? ?assumed to be in the same character encoding as the surrounding > > ? ?text. > > Thats true, but its been taken out of context; the set of characters > permitted in a URL is a strict subset of characters found in ASCII; No. Again, you're confounding "the URL" with its network format. There's no question that the network format is in bytes, and before putting the URI into a wire protocol, you need to encode non-URI characters. However, the abstract URI is text, and may not even be represented by octets or Unicode at all (eg, represented by carbon residue on recycled wood pulp). > See also the section on comparing URL's - Unicode isn't at all relevant. Not to the RFC, which talks about *characters* and gives examples that imply transcoding (eg, between EBCDIC and UTF-16), see the section you cite. However, Unicode is the canonical representation of text inside Python, and therefore TOOWTDI for URL comparison in Python. Thank you for that killer argument for my position; I hadn't thought of it. > I wish it would. The problem is not in Python here though - and > casually handwaving will exacerbate it, not fix it. Using bytes "because we just don't know" is exactly casual handwaving. Well, maybe not casual; I'm aware that many programmers are driven to it by the recognition that only the extremes (all bytes vs. all text) make sense, and they choose bytes for efficiency reasons. I believe that focus on efficiency is un-Pythonic; that in Python 3 text should be chosen (in the stdlib) because it makes writing programs more fun (you can use literal notation for non-ASCII string constants, for example) and debuggable. Sure, in some cases you'll need to punt to 'latin-1' (ie, 'binary') or perhaps PEP 383 lone surrogates (this would require special handling to get reasonably friendly presentation to users and debuggers, I suppose), but for the many cases where you know that everything is in the same encoding life is a lot better. And of course I have no objection to an additional API for efficiency for those who want it, and maybe that even belongs in the stdlib. But IMO the TOOWTDI should use text (ie, Python 3 str = Unicode) by default. > Modelling URL's as string like things is great from a convenience > perspective, but, like file paths, they are much more complex > difficult. No. Like file paths, it is the key to any real solution to the problem. Users, both server admins, URN specifiers, and browsers, think about the URI as text and expect inputting text to work. As does the RFC. Machines, on the other hand, think of both as bytes (at least in the general Unix world). It is the programmer's job to do the best she can to identify the correct encoding to bridge the mismatch. She can abdicate that job, of course, but if she chooses *not* to abdicate, (1) treating the URI as text encourages her to confront the issue early, and (2) ensures that to the extent possible the URI will maintain its quality of intelligible text. With bytes, your only sane choice is to abdicate. N.B. STD 66 refrains from redefining HTTP URLs to be UTF-8 because *it would not work*. Practically, Nippon Tel & Tel will continue to use Shift JIS URIs for cellphone-oriented sites because its handset browsers only understand Shift JIS (or some such nonsense). > If Unicode was relevant to HTTP, Again, Unicode is relevant not because of the wire protocols, but because of Python's and because of the intent of the RFCs. > I'd agree, but its not; we should put fragile heuristics at the > outer layer of the API and work as robustly and mechanically as > possible at the core. Where we need to guess, we need worker > functions that won't guess at all - for the sanity of folk writing > servers and protocol implementations. A worker function that doesn't guess must error in the absence of out-of-band information about the encoding. This is true whether you represent URIs internally as bytes or as text. Refusing to error constitutes a guess, because in a bytes-internal system, eventually text from outside will find its way into the system, and must be encoded to bytes, and in the case of a text-internal system, obviously bytes from outside are coming in and must be decoded to text. From stephen at xemacs.org Tue Jun 22 07:17:10 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 14:17:10 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621191432.710993A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621114307.48735698@heresy> <871vc045sl.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621191432.710993A404D@sparrow.telecommunity.com> Message-ID: <87mxun3auh.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > In Kagoshima, you'd use pass in an ebytes with your encoding to a > stdlib API, and *get back an ebytes with the right encoding*, > rather than an (incorrect and useless) unicode object which has > lost data you need. How does the stdlib do that? Unless it guesses which encoding for Japanese is being used? And even if this ebytes uses Shift JIS, what makes that the "right" encoding for anything? On the other hand, I know when *I* need some encoding, and when I figure it out I will store it in an appropriate place in my program. The problem is that for some programs it is not unlikely that I will see all of Shift JIS, EUC-JP, ISO-2022-JP, UTF-8, and UTF-16, and on a very bad day, RFC 2047, GB 2312, and Big5, too, used to encode Japanese. It's not totally unlikely for a browser to send URLs to a server expecting UTF-8 to recover a message/rfc822 object containing ISO-2022-JP in the mail header and EUC-JP in the body. So I need to know which encoding was used by the server that sent the reply, but the ebytes can't tell me that if it fishes an URL in EUC-JP out of the message body. I need to convert that URL to UTF-8, or most servers will 404. > But this is not the case at all, for use cases where "no, really, you > *have to* work with bytes-encoded text streams". The mere release of > Python 3.x will not cause all the world's applications, libraries, > and protocols to suddenly work with unicode, where they did not before. Sure. That's what .encode() and .decode() are for. The problem is what to do when you don't know what to put in the parentheses, and I can't think of a use case offhand where ebytes(stuff,'garbage') does better than PEP 383-enabled str for: > Being explicit about the encoding of the bytes you're flinging > around is actually an *increase* in specificity, explicitness, > robustness, and error-checking ability over the status quo for > either 2.x *or* 3.x... *and* it improves these qualities for > essentially *all* string-handling code, without requiring that code > to be rewritten to do so. A well-spoken piece. But, you see, most of those encodings are *only* interesting so that you can transcode characters to the encoding of interest. What's the e.o.i.? That is easily found in the context or has an obvious default, if you're lucky, or otherwise a hard problem that ebytes does nothing to help solve as far as I can see. Cf. Robert Collins' post , where he makes it quite explicit that a bytes interface is all about punting in the face of missing encoding information. > >and (2) you really want this under control of higher level objects > >that have access to some knowledge of the environment, rather than > >the lowest level. > > This proposal actually has such a higher-level object: an > ebytes. I don't see how that can be true. An ebytes is a very low-level object that has no idea whether its encoding is interesting (eg, the one that an RFC or a server specifies), or a technical detail of use only until the ebytes is decoded, then can be thrown away. I just don't see, in the case where there is a real encoding in the ebytes, what harm is done by decoding the ebytes to str. If context indicates that the encoding is an interesting one (eg, it should be the default for encoding on output), then you want to save that in an appropriate place that preserves not just the encoding itself, but the context that gives it its importance. From glyph at twistedmatrix.com Tue Jun 22 07:22:22 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 01:22:22 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621181750.267933A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> <20100621181750.267933A404D@sparrow.telecommunity.com> Message-ID: On Jun 21, 2010, at 2:17 PM, P.J. Eby wrote: > One issue I remember from my "enterprise" days is some of the Asian-language developers at NTT/Verio explaining to me that unicode doesn't actually solve certain issues -- that there are use cases where you really *do* need "bytes plus encoding" in order to properly express something. The thing that I have heard in passing from a couple of folks with experience in this area is that some older software in asia would present characters differently if they were originally encoded in a "japanese" encoding versus a "chinese" encoding, even though they were really "the same" characters. I do know that Han Unification is a giant political mess ( makes for some interesting reading), but my understanding is that it has handled enough of the cases by now that one can write software to display asian languages and it will basically work with a modern version of unicode. (And of course, there's always the private use area, as Stephen Turnbull pointed out.) Regardless, this is another example where keeping around a string isn't really enough. If you need to display a japanese character in a distinct way because you are operating in the japanese *script*, you need a tag surrounding your data that is a hint to its presentation. The fact that these presentation hints were sometimes determined by their encoding is an unfortunate historical accident. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Tue Jun 22 07:31:16 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 01:31:16 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote: > The RFC says that URIs are text, and therefore they can (and IMO > should) be operated on as text in the stdlib. No, *blue* is the best color for a shed. Oops, wait, let me try that again. While I broadly agree with this statement, it is really an oversimplification. An URI is a structured object, with many different parts, which are transformed from bytes to ASCII (or something latin1-ish, which is really just bytes with a nice face on them) to real, honest-to-goodness text via the IRI specification: . > Note also that the "complete solution" argument cuts both ways. Eg, a > "complete" solution should implement UTS 39 "confusables detection"[1] > and IDNA[2]. Good luck doing that with bytes! And good luck doing that with just characters, too. You need a parsed representation of the URI that you can encode different parts of in different ways. (My understanding is that you should only really implement confusables detection in the netloc... while that may be a bogus example, you're certainly only supposed to do IDNA in the netloc!) You can just call urlsplit() all over the place to emulate this, but this does not give you the ability to go back to the original bytes, and thereby preserve things like brokenly-encoded segments, which seems to be what a lot of this hand-wringing is about. To put it another way, there is no possible information-preserving string or bytes type that will make everyone happy as a result from urljoin(). The only return-type that gives you *everything* is "URI". > just using 'latin-1' as the encoding allows you to > use the (unicode) string operations internally, and then spew your > mess out into the world for someone else to clean up, just as using > bytes would. This is the limitation that everyone seems to keep dancing around. If you are using the stdlib, with functions that operate on sequences like 'str' or 'bytes', you need to choose from one of three options: 1. "decode" everything to latin1 (although I prefer to call it "charmap" when used in this way) so that you can have some mojibake that will fool a function that needs a unicode object, but not lose any information about your input so that it can be transformed back into exact bytes (and be very careful to never pass it somewhere that it will interact with real text!), 2. actually decode things to an appropriate encoding to be displayed to the user and manipulated with proper text-manipulation tools, and throw away information about the bytes, 3. keep both the bytes and the characters together (perhaps in a data structure) so that you can both display the data and encode it in situationally-appropriate ways. The stdlib as it is today is not going to handle the 3rd case for anyone. I think that's fine; it is not the stdlib's job to solve everyone's problems. I've been happy with it providing correctly-functioning pieces that can be used to build more elaborate solutions. This is what I meant when I said I agree with Stephen's first point: the stdlib *should* just keep operating entirely on strings, because URIs are defined, by the spec, to be sequences of ASCII characters. But that's not the whole story. PJE's "bstr" and "ebytes" proposals set my teeth on edge. I can totally understand the motivation for them, but I think it would be a big step backwards for python 3 to succumb to that temptation, even in the form of a third-party library. It is really trying to cram more information into a pile of bytes than truly exists there. (Also, if we're going to have encodings attached to bytes objects, I would very much like to add "JPEG" and "FLAC" to the list of possibilities.) The real tension there is that WSGI is desperately trying to avoid defining any data structures (i.e. classes), while still trying to work with structured data. An URI class with a 'child' method could handily solve this problem. You could happily call IRI(...).join(some bytes).join(some text) and then just say "give me some bytes, it's time to put this on the network", or "give me some characters, I have to show something to the user", or even "give me some characters appropriate for an 'href=' target in some HTML I'm generating" - although that last one could be left to the HTML generator, provided it could get enough information from the URI/IRI object's various parts itself. I don't mean to pick on WSGI, either. This is a common pain-point for porting software to 3.x - you had a string, it kinda worked most of the time before, but now you need to keep track of text too and the functions which seemed to work on bytes no longer do. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jun 22 07:28:57 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 14:28:57 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> Message-ID: <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Michael Urman writes: > It is somewhat troublesome that there doesn't appear to be an obvious > built-in idempotent-when-possible function that gives back the > provided bytes/str, If you want something idempotent, it's already the case that bytes(b'abc') => b'abc'. What might be desirable is to make bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII (or maybe ISO 8859/1). Unfortunately, str(b'abc') already does work, but steve at uwakimon ~ $ python3.1 Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) [GCC 4.3.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> str(b'abc') "b'abc'" >>> Oops. You can see why that probably "should" be the case. From a.badger at gmail.com Tue Jun 22 07:50:40 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 22 Jun 2010 01:50:40 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100622055040.GE5787@unaka.lan> On Tue, Jun 22, 2010 at 11:58:57AM +0900, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > > One comment here -- you can also have uri's that aren't decodable into their > > true textual meaning using a single encoding. > > > > Apache will happily serve out uris that have utf-8, shift-jis, and > > euc-jp components inside of their path but the textual > > representation that was intended will be garbled (or be represented > > by escaped byte sequences). For that matter, apache will serve > > requests that have no true textual representation as it is working > > on the byte level rather than the character level. > > Sure. I've never seen that combination, but I have seen Shift JIS and > KOI8-R in the same path. > > But in that case, just using 'latin-1' as the encoding allows you to > use the (unicode) string operations internally, and then spew your > mess out into the world for someone else to clean up, just as using > bytes would. > This is true. I'm giving this as a real-world counter example to the assertion that URIs are "text". In fact, I think you're confusing things a little by asserting that the RFC says that URIs are text. I'll address that in two sections down. > > So a complete solution really should allow the programmer to pass > > in uris as bytes when the programmer knows that they need it. > > Other than passing bytes into a constructor, I would argue if a > complete solution requires, eg, an interface that allows > urljoin(base,subdir) where the types of base and subdir are not > required to match, then it doesn't belong in the stdlib. For stdlib > usage, that's premature optimization IMO. > I'll definitely buy that. Would urljoin(b_base, b_subdir) => bytes and urljoin(u_base, u_subdir) => unicode be acceptable though? (I think, given other options, I'd rather see two separate functions, though. It seems more discoverable and less prone to taking bad input some of the time to have two functions that clearly only take one type of data apiece.) > The RFC says that URIs are text, and therefore they can (and IMO > should) be operated on as text in the stdlib. If I'm reading the RFC correctly, you're actually operating on two different levels here. Here's the section 2 that you quoted earlier, now in its entirety:: 2. Characters The URI syntax provides a method of encoding data, presumably for the sake of identifying a resource, as a sequence of characters. The URI characters are, in turn, frequently encoded as octets for transport or presentation. This specification does not mandate any particular character encoding for mapping between URI characters and the octets used to store or transmit those characters. When a URI appears in a protocol element, the character encoding is defined by that protocol; without such a definition, a URI is assumed to be in the same character encoding as the surrounding text. The ABNF notation defines its terminal values to be non-negative integers (codepoints) based on the US-ASCII coded character set [ASCII]. Because a URI is a sequence of characters, we must invert that relation in order to understand the URI syntax. Therefore, the integer values used by the ABNF must be mapped back to their corresponding characters via US-ASCII in order to complete the syntax rules. A URI is composed from a limited set of characters consisting of digits, letters, and a few graphic symbols. A reserved subset of those characters may be used to delimit syntax components within a URI while the remaining characters, including both the unreserved set and those reserved characters not acting as delimiters, define each component's identifying data. So here's some data that matches those terms up to actual steps in the process:: # We start off with some arbitrary data that defines a resource. This is # not necessarily text. It's the data from the first sentence: data = b"\xff\xf0\xef\xe0" # We encode that into text and combine it with the scheme and host to form # a complete uri. This is the "URI characters" mentioned in section #2. # It's also the "sequence of characters mentioned in 1.1" as it is not # until this point that we actually have a URI. uri = b"http://host/" + percentencoded(data) # # Note1: percentencoded() needs to take any bytes or characters outside of # the characters listed in section 2.3 (ALPHA / DIGIT / "-" / "." / "_" # / "~") and percent encode them. The URI can only consist of characters # from this set and the reserved character set (2.2). # # Note2: in this simplistic example, we're only dealing with one piece of # data. With multiple pieces, we'd need to combine them with separators, # for instance like this: # uri = b'http://host/' + percentencoded(data1) + b'/' # + percentencoded(data2) # # Note3: at this point, the uri could be stored as unicode or bytes in # python3. It doesn't matter. It will be a subset of ASCII in either # case. # Then we take this and encode it for presentation inside of a data # file. If we're saving in any encoding that has ASCII as a subset and we # had bytes returned from the previous step, all we need to do is save to # a file. If we had unicode from the previous step, we need to transform # to the encoding we're using and output it. u_uri.encode('utf8') With all this in mind... URIs are text according to the RFC if you want to deal with URIs that are percent encoded. In other words, things like this:: http://host/%ff%f0%ef%e0 If you want to deal with things like this:: http://host/caf? Then you are going one step further; back to the orginal data that was encoded in the RFC. At that point you are no longer dealing with the sequence of characters talked about in the RFC. You are dealing with data which may or may not be text. As Robert Collins says, this is bytes by definition which I pretty much agree with. It's very very convenient to work with this data as text most of the time but the RFC does not mandate that it is text so operating on it as bytes is perfectly reasonable. > It's not just a matter > of manipulating the URIs themselves, where working directly on bytes > will work just as well and and with the same string operations (as > long as everything is bytes). It's also a question of API complexity > (eg, Barry's bugaboo of proliferation of encoding= parameters) and of > debugging (if URIs are internally str, then they will display sanely > in tracebacks and the interpreter). The proliferation of encoding I agree is a thing that is ugly. Although, if I'm thinking correctly, that only matters when you want to allow mixing bytes and unicode, correct? One of these cases: * I take in some mix of parameters with at least one unicode and output bytes * I take in some mix of parameters with at least one bytes and output unicode * I take in either bytes or unicode and transform them internally to the other type before operating on them. Then I transform the output to the input type before returning. For debugging, I'm either not understanding or you're wrong. If I'm given an arbitrary sequence of bytes how do I sanely store them as str internally? If I transform them using an encoding that anticipates the full range of bytes I may be able to display some representation of them but it's not necessarily the sanest method of display (for instance, if I know that path element 1 is always going to be a utf8 encoded string and path element 2 is always shift-jis encoded, and path element 3 is binary data, I could construct a much saner display method than treating the whole thing as latin1). > The cases where URIs can't be sanely treated as text are garbage > input, and the stdlib should not try to provide a solution. Just > passing in bytes and getting out bytes is GIGO. Trying to do "some" > error-checking is going to be insufficient much of the time and overly > strict most of the rest of the time. The programmer in the trenches > is going to need to decide what to allow and what not; I don't think > there are general answers because we know that allowing random URLs on > the web leads to various kinds of problems. Some sites will need to > address some of them. > What is your basis for asserting that URIs that aren't sanely treated as text are garbage? It's definitely not in the RFC. > Note also that the "complete solution" argument cuts both ways. Eg, a > "complete" solution should implement UTS 39 "confusables detection"[1] > and IDNA[2]. Good luck doing that with bytes! > Note that IDNA and confusables detection operate on a different portion of the uri than the need for bytes. Those operate on the domain name (looks like it's called the authority in the rfc) whereas bytes are useful for the path, query, and fragment portions. Note: I'm not sure precisely what Philip is looking to do but the little I've read sounds like its contrary to the design principles of the python3 unicode handling redesign. I'm stating my reading of the RFC not to defend the use case Philip has, but because I think that the outlook that non-text uris (before being percentencoded) are violations of the RFC is wrong and will lead to interoperability problems/warts(since you could turn them into latin1 and from there into bytes and from there into the proper values) if allowed to predominate the thinking. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From raymond.hettinger at gmail.com Tue Jun 22 08:21:51 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 21 Jun 2010 23:21:51 -0700 Subject: [Python-Dev] UserDict in 2.7 Message-ID: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> There's an entry in whatsnew for 2.7 to the effect of "The UserDict class is now a new-style class". I had thought there was a conscious decision to not change any existing classes from old-style to new-style. IIRC, Martin had championed this idea and had rejected all of proposals to make existing classes inherit from object. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Tue Jun 22 08:39:19 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 22 Jun 2010 08:39:19 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <1277151926.3369.6.camel@localhost.localdomain> References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <1277151926.3369.6.camel@localhost.localdomain> Message-ID: <30F79991-F933-44C6-A884-5A8D5671DB8C@mac.com> On 21 Jun, 2010, at 22:25, Antoine Pitrou wrote: > Le lundi 21 juin 2010 ? 21:13 +0100, Michael Foord a ?crit : >> >> If OS X is a supported and important platform for Python then fixing all >> problems that it reveals (or being willing to) should definitely not be >> a pre-requisite of providing a buildbot (which is already a service to >> the Python developer community). Fixing bugs / failures revealed by >> Bill's buildbot is not fixing them "for Bill" it is fixing them for Python. > > I didn't say it was a prerequisite. I was merely pointing out that when > platform-specific bugs appear, people using the specific platform should > be helping if they want to actually encourage the fixing of these bugs. > > OS X is only "a supported and important platform" if we have dedicated > core developers diagnosing or even fixing issues for it (like we > obviously have for Windows and Linux). Otherwise, I don't think we have > any moral obligation to support it. I look into and fix OSX issues, but do so in my spare time. This means it can take a while until I get around doing so. Ronald P.S. Please file bugs for issues on OSX and set the compontent to Macintosh instead of discussing them on python-dev. I don't read python-dev on a daily basis almost missed this thread. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From raymond.hettinger at gmail.com Tue Jun 22 08:47:46 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 21 Jun 2010 23:47:46 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> Message-ID: <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: > This is a common pain-point for porting software to 3.x - you had a string, it kinda worked most of the time before, but now you need to keep track of text too and the functions which seemed to work on bytes no longer do. Thanks Glyph. That is a nice summary of one kind of challenge facing programmers. Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jun 22 08:49:01 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 15:49:01 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> Message-ID: <87k4pr36le.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > I know, it's a hard thing to wrap one's head around, since on the > surface it sounds like unicode is the programmer's savior. I don't need to wrap my head around it. It's been deeply embedded, point first, and the nasty barbs ensure that I have no desire to pull it back out. To wit, I've been dealing with Japanese encoding issues on a daily basis for 20 years, and I'm well aware that programmers have several good reasons (and a lot more bad ones) for avoiding them, and even for avoiding Unicode when they must deal with encodings at all. I don't think any of the good reasons have been offered here yet, that's all. > Unfortunately, real-world text data exists which cannot be safely > roundtripped to unicode, and must be handled in "bytes with > encoding" form for certain operations. Or "Unicode with encoding" form. See below for why this makes sense in the context of Python. > I personally do not have to deal with this *particular* use case any > more -- I haven't been at NTT/Verio for six years now. As mentioned, I have a bit of understanding of the specific problems of Japanese-language computing. In particular, roundtripping Japanese from *any* encoding to *any other* encoding is problematic, because the national standards provide a proper subset of the repertoire actually used by the Japanese people. (Even JIS X 0213.) > My current needs are simpler, thank goodness. ;-) However, they > *do* involve situations where I'm dealing with *other* > encoding-restricted legacy systems, such as software for interfacing > with the US Postal Service that only works with a restricted subset > of latin1, while receiving mangled ASCII from an ecommerce provider, > and storing things in what's effectively a latin-1 database. Yes, I know of similar issues in other applications. For example, TeX error messages do not respect UTF-8 character boundaries, so Emacs has to handle them specially (basically a mechanism similar in spirit to PEP 383 is used). > Being able to easily assert what kind of bytes I've got would > actually let me catch errors sooner, *if* those assertions were > being checked when different kinds of strings or bytes were being > combined. i.e., at coercion time). I see that this would make life a little easier for you in maintaining without refactoring. I'd say it's a kludge, but without a full list of requirements I'm in no position to claim any authority . Eg, for a non-kludgey suggestion, how about defining a codec which takes Latin-1 bytes, checks (with error on failure) for the restricted subset, and converts to str? Then you can manipulate these things as str with abandon internally. Finally you get another check in the outgoing codec which converts from str to "effective Latin-1 bytes", however that is defined. But OK, maybe I'm just being naive. You need this unlovely artifice so you can put in asserts in appropriate places. Now, does it belong in the stdlib? It seems to me that in the case of Japanese roundtripping, *most* of the time encoding back to a standard Japanese encoding will work. If you run into one of the problematic characters that JIS doesn't allow but Japanese like to use because they prefer the glyph to the JIS-standard glyph, you get an occasional error on encoding to a standard Japanese encoding, which you handle specially with a database of such characters. Knowing the specific encoding originally used *normally does not help unless you're replying to that person and **only** that person*, because the extended repertoires vary widely and the only standard is Japanese. I conclude ebytes does *no* good here. For the ecommerce/USPS case, well, actually you need special-purpose encodings anyway (ISTM). 'latin-1' loses, the USPS is allergic to some valid 'latin-1' characters. 'ascii' loses, apparently you need some of the Latin-1 repertoire, and anyway AIUI the ecommerce provider munges the ASCII. So what does ebytes actually buy you here, unless you write the codecs? If you've got the codecs, what additional benefit do you get from ebytes? Note that you would *also* need to do explicit transcoding anyway if you were dealing with Japan Post instead of the USPS, although I grant your code is probably general enough to deal with Deutsche Telecom (but the German equivalent of your ecommerce provider probably has its own ways of munging Latin-1). I conclude that there may be genuine benefits to ebytes here, but they're probably not general enough to put in the stdlib (or the Python language). > Which works if and only if your outputs are truly unicode-able. With PEP 383, they always are, as long as you allow Unicode to be decoded to the same garbage your bytes-based program would have produced anyway. > If you work with legacy systems (e.g. those Asian email clients and > US postal software), you are really working with a *character set*, > not unicode, I think you're missing something. Namely, Unicode is a standard for handling character objects as integers, and a registry for mapping characters to integers. It includes over 100,000 points for making up your own mappings, and recent Python also provides (as an internal extension) for embedding non-characters in a str. Unicode does not define a repertoire, however. That's up to the application, and Python 2+ provides a convenient way to restrict repertoires by defining special purpose codecs in Python. It is then up to the program to ensure that all candidates claiming to be text pass through the cleansing fire of a codec before being allowed into the Pure Land of str. This can be something of a problem; there are a few ways for textual data to get into Python, and not all of them were obvious to me. But this problem would be even worse for mechanisms like ebytes, where it's up to the programmer to decide which things are put into ebytes. > and so putting your data in unicode form is actually *wrong* > -- an expedient lie. > > Heresy, I know, but there you go. ;-) It's not heresy, it's simply assuming a restriction on use of Unicode that just isn't true. It *is* true that mapping the data to Unicode according to some encoding is not always sufficient. It *is* often the case that further information must be provided to ensure semantic correctness. However, given the mapping (== properly defined codecs), roundtripping *is* always possible, at least up to the size of private space, which is big enough to hold the Post Office's repertoire, for sure. And that mapping is a Python object which will fit into a variable for later use. From stephen at xemacs.org Tue Jun 22 09:33:53 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 16:33:53 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> Message-ID: <87hbkv34im.fsf@uwakimon.sk.tsukuba.ac.jp> Glyph Lefkowitz writes: > On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote: > > Note also that the "complete solution" argument cuts both ways. Eg, a > > "complete" solution should implement UTS 39 "confusables detection"[1] > > and IDNA[2]. Good luck doing that with bytes! > > And good luck doing that with just characters, too. I agree with you, sorry. I meant to cast doubt on the idea of complete solutions, or at least claims that completeness is an excuse for putting it in the stdlib. > This is the limitation that everyone seems to keep dancing around. > If you are using the stdlib, with functions that operate on > sequences like 'str' or 'bytes', you need to choose from one of > three options: There's a *fourth* way: specially designed codecs to preserve as much metainformation as you need, while always using the str format internally. This can be done for at least 100,000 separate (character, encoding) pairs by multiplexing into private space with an auxiliary table of encodings and equivalences. That's probably overkill. In many cases, adding simple PEP 383 mechanism (to preserve uninterpreted bytes) might be enough though, and that's pretty plausible IMO. From lesni.bleble at gmail.com Tue Jun 22 11:08:56 2010 From: lesni.bleble at gmail.com (lesni bleble) Date: Tue, 22 Jun 2010 11:08:56 +0200 Subject: [Python-Dev] adding new function Message-ID: hello, how can i simply add new functions to module after its initialization (Py_InitModule())? I'm missing something like PyModule_AddCFunction(). thank you L. From fetchinson at googlemail.com Tue Jun 22 11:44:38 2010 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Tue, 22 Jun 2010 11:44:38 +0200 Subject: [Python-Dev] adding new function In-Reply-To: References: Message-ID: > how can i simply add new functions to module after its initialization > (Py_InitModule())? I'm missing something like > PyModule_AddCFunction(). This type of question really belongs to python-list aka comp.lang.python which I CC-d now. Please keep the discussion on that list. Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From ncoghlan at gmail.com Tue Jun 22 12:41:39 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jun 2010 20:41:39 +1000 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87k4pr36le.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> <87k4pr36le.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull wrote: > ?> Which works if and only if your outputs are truly unicode-able. > > With PEP 383, they always are, as long as you allow Unicode to be > decoded to the same garbage your bytes-based program would have > produced anyway. Could it be that part of the problem here is that we need to better advertise "errors='surrogateescape'" as a mechanism for decoding incorrectly encoded data according to a nominal codec without throwing UnicodeDecode and UnicodeEncode errors all over the place? Currently it only garners a mention in the docs in the context of the os module, the list of error handlers in the codecs module and as a default error handler argument in the tarfile module. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Jun 22 12:52:39 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Jun 2010 20:52:39 +1000 Subject: [Python-Dev] [OT] glyphs [was Re: email package status in 3.X] In-Reply-To: References: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> Message-ID: <201006222052.39734.steve@pearwood.info> On Tue, 22 Jun 2010 11:46:27 am Terry Reedy wrote: > 3. Unicode disclaims direct representation of glyphic variants > (though again, exceptions were made for asian acceptance). For > example, in English, mechanically printed 'a' and 'g' are different > from manually printed 'a' and 'g'. Representing both by the same > codepoint, in itself, loses information. One who wishes to preserve > the distinction must instead use a font tag or perhaps a > tag. Similarly, older English had a significantly > different glyph for 's', which looks more like a modern 'f'. An unfortunate example, as the old English long-s gets its own Unicode codepoint. http://en.wikipedia.org/wiki/Long_s -- Steven D'Aprano From stephen at xemacs.org Tue Jun 22 13:31:13 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 20:31:13 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100622055040.GE5787@unaka.lan> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Toshio Kuratomi writes: > I'll definitely buy that. Would urljoin(b_base, b_subdir) => bytes and > urljoin(u_base, u_subdir) => unicode be acceptable though? Probably. But it doesn't matter what I say, since Guido has defined that as "polymorphism" and approved it in principle. > (I think, given other options, I'd rather see two separate > functions, though. Yes. > If you want to deal with things like this:: > http://host/caf? Yes. > At that point you are no longer dealing with the sequence of > characters talked about in the RFC. You are dealing with data > which may or may not be text. That's right, and I think that in most cases that is what programmers want to be dealing with. Let the library make sure that what goes on the wire conforms to the RFC. I don't want to know about it, I want to work with the content of the URI. > The proliferation of encoding I agree is a thing that is ugly. > Although, if I'm thinking correctly, that only matters when you > want to allow mixing bytes and unicode, correct? Well you need to know a fair amount about the encoding: that the reserved bytes are used as defined in the RFC, for example. > For debugging, I'm either not understanding or you're wrong. If I'm given > an arbitrary sequence of bytes how do I sanely store them as str internally? If it's really arbitrary, you use either a mapping to private space or PEP 383, and accept that it won't make sense. But in most cases you should be able to achieve a fair degree of sanity. > If I transform them using an encoding that anticipates the full range of > bytes I may be able to display some representation of them but it's not > necessarily the sanest method of display (for instance, if I know that path > element 1 is always going to be a utf8 encoded string and path element 2 is > always shift-jis encoded, and path element 3 is binary data, I could > construct a much saner display method than treating the whole thing as > latin1). And I think in most cases you will know, although the cases where you'll know will be because of a system-wide encoding. > What is your basis for asserting that URIs that aren't sanely treated as > text are garbage? I don't mean we can throw them away, I mean we can't do any sensible processing on them. You at least need to know about the reseved delimiters. In the same way that Philip used 'garbage' for the "unknown" encoding. And in the sense of "garbage in, garbage out". > unicode handling redesign. I'm stating my reading of the RFC not to defend > the use case Philip has, but because I think that the outlook that non-text > uris (before being percentencoded) are violations of the RFC That's not what I'm saying. What I'm trying to point out is that manipulating a bytes object as an URI sort of presumes a lot about its encoding as text. Since many of the URIs we deal with are more or less textual, why not take advantage of that? From stephen at xemacs.org Tue Jun 22 13:55:41 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 22 Jun 2010 20:55:41 +0900 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <8739wg469t.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> <87k4pr36le.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87aaqn2sea.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > On Tue, Jun 22, 2010 at 4:49 PM, Stephen J. Turnbull wrote: > > ?> Which works if and only if your outputs are truly unicode-able. > > > > With PEP 383, they always are, as long as you allow Unicode to be > > decoded to the same garbage your bytes-based program would have > > produced anyway. > > Could it be that part of the problem here is that we need to better > advertise "errors='surrogateescape'" as a mechanism for decoding > incorrectly encoded data according to a nominal codec without throwing > UnicodeDecode and UnicodeEncode errors all over the place? Yes, I think that would make the "use str internally to urllib" strategy a lot more palatable. But it still needs to be combined with a program architecture of decode-process-encode, which might require substantial refactoring for some existing modules. From fdrake at acm.org Tue Jun 22 14:40:29 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 22 Jun 2010 08:40:29 -0400 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: On Tue, Jun 22, 2010 at 2:21 AM, Raymond Hettinger wrote: > I had thought there was a conscious decision to not change any existing > classes from old-style to new-style. I thought so as well. Changing any class from old-style to new-style risks breaking applications in obscure & mysterious ways. (Yes, we've been bitten by this before; it's a real problem.) -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From benjamin at python.org Tue Jun 22 14:48:25 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 22 Jun 2010 07:48:25 -0500 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: 2010/6/22 Raymond Hettinger : > There's an entry in whatsnew for 2.7 to the effect of "The UserDict class is > now a new-style class". > I had thought there was a conscious decision to not change any existing > classes from old-style to new-style. IIRC, Martin had championed this idea > and had rejected all of proposals to make existing classes inherit from > object. IIRC this was because UserDict tries to be a MutableMapping but abcs require new style classes. -- Regards, Benjamin From lvh at laurensvh.be Tue Jun 22 15:23:36 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Tue, 22 Jun 2010 15:23:36 +0200 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: On Tue, Jun 22, 2010 at 2:40 PM, Fred Drake wrote: > On Tue, Jun 22, 2010 at 2:21 AM, Raymond Hettinger > wrote: >> I had thought there was a conscious decision to not change any existing >> classes from old-style to new-style. > > I thought so as well. ?Changing any class from old-style to new-style > risks breaking applications in obscure & mysterious ways. ?(Yes, we've > been bitten by this before; it's a real problem.) > > > ?-Fred +1. I've been bitten by this more than once in some of the more obscure old(-style) classes in twisted.python. Laurens From murman at gmail.com Tue Jun 22 15:24:28 2010 From: murman at gmail.com (Michael Urman) Date: Tue, 22 Jun 2010 08:24:28 -0500 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull wrote: > Michael Urman writes: > > ?> It is somewhat troublesome that there doesn't appear to be an obvious > ?> built-in idempotent-when-possible function that gives back the > ?> provided bytes/str, > > If you want something idempotent, it's already the case that > bytes(b'abc') => b'abc'. ?What might be desirable is to make > bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII > (or maybe ISO 8859/1). By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, errors) that would pass an instance of bytes through, or encode an instance of str. And of course a to_str that performs similarly, passing str through and decoding bytes. While bytes(b'abc') will give me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me the b'abc' I want to see. These are trivial functions; I just don't fully understand why the capability isn't baked in. A one argument call is idempotent capable; a two argument call isn't as it only converts. It's not a completely made-up requirement either. A cross-platform piece of software may need to present to a user items that are sometimes str and sometimes bytes - particularly filenames. > Unfortunately, str(b'abc') already does work, but > > steve at uwakimon ~ $ python3.1 > Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> str(b'abc') > "b'abc'" >>>> > > Oops. ?You can see why that probably "should" be the case Sure, and I love having this there for debugging. But this is hardly good enough for presenting to a user once you leave ascii. >>> u = '???' >>> sjis = bytes(u, 'shift-jis') >>> utf8 = bytes(u, 'utf-8') >>> str(sjis), str(utf8) ("b'\\x93\\xfa\\x96{\\x8c\\xea'", "b'\\xe6\\x97\\xa5\\xe6\\x9c\\xac\\xe8\\xaa\\x9e'") When I happen to know the encoding, I can reverse it much more cleanly. >>> str(sjis, 'shift-jis'), str(utf8, 'utf-8') ('???', '???') But I can't mix this approach with str instances without writing a different invocation. >>> str(u, 'argh') TypeError: decoding str is not supported -- Michael Urman From guido at python.org Tue Jun 22 18:17:31 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jun 2010 09:17:31 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100622055040.GE5787@unaka.lan> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: [Just addressing one little issue here; generally I'm just happy that we're discussing this issue in such detail from so many points of view.] On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi wrote: >[...] Would urljoin(b_base, b_subdir) => bytes and > urljoin(u_base, u_subdir) => unicode be acceptable though? ?(I think, given > other options, I'd rather see two separate functions, though. ?It seems more > discoverable and less prone to taking bad input some of the time to have two > functions that clearly only take one type of data apiece.) Hm. I'd rather see a single function (it would be "polymorphic" in my earlier terminology). After all a large number of string method calls (and some other utility function calls) already look the same regardless of whether they are handling bytes or text (as long as it's uniform). If the building blocks are all polymorphic it's easier to create additional polymorphic functions. FWIW, there are two problems with polymorphic functions, though they can be overcome: (1) Literals. If you write something like x.split('&') you are implicitly assuming x is text. I don't see a very clean way to overcome this; you'll have to implement some kind of type check e.g. x.split('&') if isinstance(x, str) else x.split(b'&') A handy helper function can be written: def literal_as(constant, variable): if isinstance(variable, str): return constant else: return constant.encode('utf-8') So now you can write x.split(literal_as('&', x)). (2) Data sources. These can be functions that produce new data from non-string data, e.g. str(), read it from a named file, etc. An example is read() vs. write(): it's easy to create a (hypothetical) polymorphic stream object that accepts both f.write('booh') and f.write(b'booh'); but you need some other hack to make read() return something that matches a desired return type. I don't have a generic suggestion for a solution; for streams in particular, the existing distinction between binary and text streams works, of course, but there are other situations where this doesn't generalize (I think some XML interfaces have this awkwardness in their API for converting a tree to a string). -- --Guido van Rossum (python.org/~guido) From tseaver at palladion.com Tue Jun 22 18:37:14 2010 From: tseaver at palladion.com (Tres Seaver) Date: Tue, 22 Jun 2010 12:37:14 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jesse Noller wrote: > > On Jun 19, 2010, at 10:13 AM, Tres Seaver wrote: >>> Nothing is set in stone; if something is incredibly painful, or worse >>> yet broken, then someone needs to file a bug, bring it to this list, >>> or bring up a patch. >> Or walk away. >> > > Ok. If you want. I specifically said I *didn't* want to walk away. I'm pointing out that in the general case, the ordinary user who finds something incredibly painful or broken is far more likely to walk away from the platform than try to fix it, especially if there are available alternatives (e.g., Ruby, Python 2) where the pain level for that user's application is lower. >>> I guess tutorial welcome, rather than patch welcome then ;) >> The only folks who can write the tutorial are the ones who have >> already drunk the koolaid. Note that I've been making my living with Python >> for about twelve years now, and would *like* to use Python3, but can't, >> yet, and therefore haven't taken the first sip. > > Why can't you? Is it a bug? It's not *a* bug, it is that I do my day to day work on very large applications which depend on a large number of not-yet-ported libraries. This barrier is the negative "network effect" which is the whole point of this thread: there is nothing wrong with Python3 except that, to use it, I have to stop doing the work which pays to do an indeterminately-large amount of "hobby" work (of which I already do quite a lot). > Let's file it and fix it. Is it that you > need a dependency ported? I need dozens of them ported, and am working on some of them in the aforementioned "copious spare time." > Cool - let's bring it up to the maintainers, > or this list, or ask the PSF to push resources into helping port. > Anything but nothing. Nothing is the default: I am already successful with Python 2, and can't be successfulwith Python 3 (in the sense of delivering timely, cost-effective solutions to my customers) until *all* those dependencies are ported and stable there. > If what you're saying is that python 3 is a completely unsuitable > platform, well, then yeah - we can all "fix" it or walk away. I didn't say that: I said that Python 3 is unsuitable *today* for the work I'm doing, and that the relative wins it provides over Python 2 are dwarfed by the effort required to do all those ports myself. >>>> IOW, 3.x has broken TOOOWTDI for me in some areas. There may >>>> be obvious ways to do it, but, as per the Zen of Python, "that >>>> way may not be obvious at first unless you're Dutch". ;-) OT: The Dutch smiley there doesn't actually help anything but undercut any point to having TOOOWTDI in the list at all. >>> What areas. We need specifics which can either be: >>> >>> 1> Shot down. >>> 2> Turned into bugs, so they can be fixed >>> 3> Documented in the core documentation. >> That's bloody ironic in a thread which had pointed at reasons why >> people are not even considering Py3 for their projects: those folks won't >> even find the issues due to the lack of confidence in the suitability of >> the platform. > > What I saw was a thread about some issues in email, and cgi. We have > some work being done to address the issue. This will help resolve some > of the issues. > > If there are other issues, then we should step up and either help, or > get out ofthe way. Arguing about the viability of a platform we knew > would take a bit for adoption is silly and breeds ill will. I'm not arguing about viability: there are obviously users for whom Python 3 is not only viable, but superior to Python 2. However, I am quite confident that many pro-Python 3 folks arguing here underestimate the scope of the issues which have generated the (self-fullfilling) "not yet" perception. > It's not a turd, and it's not hopeless, in fact rumor has it NumPy > will be ported soon which is a major stepping stone. Sure, for the (far from trivial) subset of the community doing numerical work. > The only way to counteract this meme that python 3 is horribly > broken is to prove that it's not, fix bugs, and move on. There's no > point debating relative turdiness here. Any "turdiness" (which I am *not* arguing for) is a natural consequence of the kinds of backward incompatibilities which were *not* ruled out for Python 3, along with the (early, now waning) "build it and they will come" optimism about adoption rates. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwg5rIACgkQ+gerLs4ltQ6J7wCdFkQL7XeKtBM407Z5D2rSKk8n EWYAoJUfW+JgURUz7NJcWmqFw3PkNYde =WZEv -----END PGP SIGNATURE----- From ronaldoussoren at mac.com Tue Jun 22 18:39:03 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Tue, 22 Jun 2010 18:39:03 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On 22 Jun, 2010, at 3:38, Alexander Belopolsky wrote: > On Mon, Jun 21, 2010 at 6:16 PM, "Martin v. L?wis" wrote: >>> The test_posix failure is a regression from 2.6 (but it only shows up on >>> some machines - it is caused by a fairly braindead implementation of a >>> couple of posix apis by Apple apparently). >>> >>> http://bugs.python.org/issue7900 >> >> Ah, that one. I definitely think this should *not* block the release: > > I agree that this is nowhere near being a release blocker, but I think > it would be nice to do something about it before the final release. > >> a) there is no clear solution in sight. So if we wait for it resolved, >> it could take months until we get a 2.7 release. > > The ideal solution will have to wait until Apple gets its act together > and fixed the problem on their end. I would say "months" is an overly > optimistic time estimate for that. I'd say there is no chance at all that this will be fixed in OSX 10.6, with some luck they'll change this in 10.7. > However, the issue is a regression > from prior versions. In 2.5 getgroups would truncate the list to 16 > groups, but won't crash. More importantly the 16 groups returned > would be correct per-process groups and not something immune to > setgroup changes. > > I proposed a very simple fix: > > http://bugs.python.org/file16326/no-darwin-ext.diff > > which simply minimally reverts the change that introduced the regression. That is one way to fix it, another just as valid fix is to change posix.getgroups to be able to return more than 16 groups on OSX (see my patch in issue7900). Both are valid fixes, both have both advantages and disadvantages. Your proposal: * Reverts to the behavior in 2.6 * Ensures that posix.getgroups and posix.setgroups are internally consistent My proposal: * Uses the newer ABI, which is more likely to be the one Apple wants you to use * Is compatible with system tools (that is, posix.getgroups() agrees with id(1)) * Is compatible with /usr/bin/python * results in posix.getgroups not reflecting results of posix.setgroups What I haven't done yet, and probably should, is to check how either implementation of getgroups interacts with groups in the System Preferences panel and with groups in managed environment (using OSX Server). My gut feeling is that second option (my proposal) would give more useful semantics, but that said: I almost never write code where I need os.setgroups. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From dirkjan at ochtman.nl Tue Jun 22 18:54:21 2010 From: dirkjan at ochtman.nl (Dirkjan Ochtman) Date: Tue, 22 Jun 2010 18:54:21 +0200 Subject: [Python-Dev] State of json in 2.7 Message-ID: It looks like simplejson 2.1.0 and 2.1.1 have been released: http://bob.pythonmac.org/archives/2010/03/10/simplejson-210/ http://bob.pythonmac.org/archives/2010/03/31/simplejson-211/ It looks like any changes that didn't come from the Python tree didn't go into the Python tree, either. I guess we can't put these changes into 2.7 anymore? How can we make this better next time? Cheers, Dirkjan From benjamin at python.org Tue Jun 22 18:56:09 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 22 Jun 2010 11:56:09 -0500 Subject: [Python-Dev] State of json in 2.7 In-Reply-To: References: Message-ID: 2010/6/22 Dirkjan Ochtman : > I guess we can't put these changes into 2.7 anymore? How can we make > this better next time? Never have externally maintained packages. -- Regards, Benjamin From raymond.hettinger at gmail.com Tue Jun 22 18:24:42 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 22 Jun 2010 09:24:42 -0700 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: <4FBE15CB-2397-46B2-8417-588F8785BA20@gmail.com> On Jun 22, 2010, at 5:48 AM, Benjamin Peterson wrote: > 2010/6/22 Raymond Hettinger : >> There's an entry in whatsnew for 2.7 to the effect of "The UserDict class is >> now a new-style class". >> I had thought there was a conscious decision to not change any existing >> classes from old-style to new-style. IIRC, Martin had championed this idea >> and had rejected all of proposals to make existing classes inherit from >> object. > > IIRC this was because UserDict tries to be a MutableMapping but abcs > require new style classes. ISTM, this change should be reverted to the way it was in 2.6. The registration was already working fine: Python 2.6.4 (r264:75821M, Oct 27 2009, 19:48:32) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin >>> import UserDict >>> import collections >>> collections.MutableMapping.register(UserDict.UserDict) >>> issubclass(UserDict.UserDict, collections.MutableMapping) True We've didn't have any problems with this registration nor did there seem to be an issue with UserDict not implementing dictviews. Please revert this change. UserDicts have a long history and are used by a lot of code, so we need to avoid unnecessary breakage. Thank you, Raymond From ianb at colorstudy.com Tue Jun 22 19:03:29 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 22 Jun 2010 12:03:29 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 22, 2010 at 6:31 AM, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > > I'll definitely buy that. Would urljoin(b_base, b_subdir) => bytes and > > urljoin(u_base, u_subdir) => unicode be acceptable though? > > Probably. > > But it doesn't matter what I say, since Guido has defined that as > "polymorphism" and approved it in principle. > > > (I think, given other options, I'd rather see two separate > > functions, though. > > Yes. > > > If you want to deal with things like this:: > > http://host/caf? > > Yes. > Just for perspective, I don't know if I've ever wanted to deal with a URL like that. I know how it is supposed to work, and I know what a browser does with that, but so many tools will clean that URL up *or* won't be able to deal with it at all that it's not something I'll be passing around. So from a practical point of view this really doesn't come up, and if it did it would be in a situation where you could easily do something ad hoc (though there is not currently a routine to quote unsafe characters in a URL... that would be helpful, though maybe urllib.quote(url.encode('utf8'), '%/:') would do it). Also while it is problematic to treat the URL-unquoted value as text (because it has an unknown encoding, no encoding, or regularly a mixture of encodings), the URL-quoted value is pretty easy to pass around, and normalization (in this case to http://host/caf%C3%A9) is generally fine. While it's nice to be correct about encodings, sometimes it is impractical. And it is far nicer to avoid the situation entirely. That is, decoding content you don't care about isn't just inefficient, it's complicated and can introduce errors. The encoding of the underlying bytes of a %-decoded URL is largely uninteresting. Browsers (whose behavior drives a lot of convention) don't touch any of that encoding except lately occasionally to *display* some data in a more friendly way. But it's only display, and errors just make it revert to the old encoded display. Similarly I'd expect (from experience) that a programmer using Python to want to take the same approach, sticking with unencoded data in nearly all situations. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Tue Jun 22 19:05:38 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 22 Jun 2010 13:05:38 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On Tue, Jun 22, 2010 at 12:39 PM, Ronald Oussoren wrote: .. > Both are valid fixes, both have both advantages and disadvantages. > > Your proposal: > * Reverts to the behavior in 2.6 > * Ensures that posix.getgroups and posix.setgroups are internally consistent > It is also very simple and since posix module worked fine on OSX for years without _DARWIN_C_SOURCE, I think this is a very low risk change. > My proposal: > * Uses the newer ABI, which is more likely to be the one Apple wants you to use I don't think so. In getgroups(2) I see LEGACY DESCRIPTION If _DARWIN_C_SOURCE is defined, getgroups() can return more than {NGROUPS_MAX} groups. This suggests that this is legacy behavior. Newer applications should use getgrouplist instead. > * Is compatible with system tools (that is, posix.getgroups() agrees with id(1)) I have not tested this recently, but I think if you exec id from a program after a call to setgroups(), it will return process groups, not user groups. > * Is compatible with /usr/bin/python I am sure that one this issue is fixed upstream, Apple will pick it up with the next version. > * results in posix.getgroups not reflecting results of posix.setgroups > This effectively substitutes getgrouplist called on the current user for getgroups. In 3.x, I believe the correct action will be to provide direct access to getgrouplist which is while not POSIX (yet?), is widely available. From benjamin at python.org Tue Jun 22 19:08:02 2010 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 22 Jun 2010 12:08:02 -0500 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <4FBE15CB-2397-46B2-8417-588F8785BA20@gmail.com> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4FBE15CB-2397-46B2-8417-588F8785BA20@gmail.com> Message-ID: 2010/6/22 Raymond Hettinger : > > On Jun 22, 2010, at 5:48 AM, Benjamin Peterson wrote: > >> 2010/6/22 Raymond Hettinger : >>> There's an entry in whatsnew for 2.7 to the effect of "The UserDict class is >>> now a new-style class". >>> I had thought there was a conscious decision to not change any existing >>> classes from old-style to new-style. ?IIRC, Martin had championed this idea >>> and had rejected all of proposals to make existing classes inherit from >>> object. >> >> IIRC this was because UserDict tries to be a MutableMapping but abcs >> require new style classes. > > ISTM, this change should be reverted to the way it was in 2.6. > > The registration was already working fine: Actually I believe it was an error that it could. There was a typo in abc.py which prevented it from raising errors when non new-style class objects were passed in. -- Regards, Benjamin From janssen at parc.com Tue Jun 22 19:17:01 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 22 Jun 2010 10:17:01 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE030.7020700@voidspace.org.uk> <1180.1277170019@parc.com> Message-ID: <1422.1277227021@parc.com> Alexander Belopolsky wrote: > On Mon, Jun 21, 2010 at 9:26 PM, Bill Janssen wrote: > .. > > Though, isn't that behavior of urllib.proxy_bypass another bug? > > I don't know. Ask Ronald. Hmmm. I brought up the System Preferences panel on my Mac, and sure enough, there's a checkbox, "Exclude simple hostnames". So I guess it's not a bug, though none of my Macs are configured that way. Bill From a.badger at gmail.com Tue Jun 22 19:21:23 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 22 Jun 2010 13:21:23 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100622172123.GG5787@unaka.lan> On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote: > Toshio Kuratomi writes: > > unicode handling redesign. I'm stating my reading of the RFC not to defend > > the use case Philip has, but because I think that the outlook that non-text > > uris (before being percentencoded) are violations of the RFC > > That's not what I'm saying. What I'm trying to point out is that > manipulating a bytes object as an URI sort of presumes a lot about its > encoding as text. I think we're more or less in agreement now but here I'm not sure. What manipulations are you thinking about? Which stage of URI construction are you considering? I've just taken a quick look at python3.1's urllib module and I see that there is a bit of confusion there. But it's not about unicode vs bytes but about whether a URI should be operated on at the real URI level or the data-that-makes-a-uri level. * all functions I looked at take python3 str rather than bytes so there's no confusing stuff here * urllib.request.urlopen takes a strict uri. That means that you must have a percent encoded uri at this point * urllib.parse.urljoin takes regular string values * urllib.parse and urllib.unparse take regular string values > Since many of the URIs we deal with are more or > less textual, why not take advantage of that? > Cool, so to summarize what I think we agree on: * Percent encoded URIs are text according to the RFC. * The data that is used to construct the URI is not defined as text by the RFC. * However, it is very often text in an unspecified encoding * It is extremely convenient for programmers to be able to treat the data that is used to form a URI as text in nearly all common cases. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From guido at python.org Tue Jun 22 18:53:00 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jun 2010 09:53:00 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> Message-ID: On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger wrote: > > On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: > > ??This is a common pain-point for porting software to 3.x - you had a > string, it kinda worked most of the time before, but now you need to keep > track of text too and the functions which seemed?to work on bytes no longer > do. > > Thanks Glyph. ?That is a nice summary of one kind of challenge facing > programmers. Ironically, Glyph also described the pain in 2.x: it only "kinda" worked. -- --Guido van Rossum (python.org/~guido) From raymond.hettinger at gmail.com Tue Jun 22 19:31:36 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 22 Jun 2010 10:31:36 -0700 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4FBE15CB-2397-46B2-8417-588F8785BA20@gmail.com> Message-ID: On Jun 22, 2010, at 10:08 AM, Benjamin Peterson wrote: > . There was a typo in > abc.py which prevented it from raising errors when non new-style class > objects were passed in. For 2.x, that was probably a good thing, a happy accident that made it possible to register existing mapping classes as a MutableMapping. "Fixing" that typo will break code that currently uses ABCs with old-style classes. I believe we are better-off leaving this as it was released in 2.6. Raymond From guido at python.org Tue Jun 22 18:49:27 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jun 2010 09:49:27 -0700 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jun 21, 2010 at 10:28 PM, Stephen J. Turnbull wrote: > Michael Urman writes: > > ?> It is somewhat troublesome that there doesn't appear to be an obvious > ?> built-in idempotent-when-possible function that gives back the > ?> provided bytes/str, > > If you want something idempotent, it's already the case that > bytes(b'abc') => b'abc'. ?What might be desirable is to make > bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII > (or maybe ISO 8859/1). No, no, no! That's just what Python 2 did. > Unfortunately, str(b'abc') already does work, but > > steve at uwakimon ~ $ python3.1 > Python 3.1.2 (release31-maint, May 12 2010, 20:15:06) > [GCC 4.3.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> str(b'abc') > "b'abc'" >>>> > > Oops. ?You can see why that probably "should" be the case. There is a near-contract that str() of pretty much anything returns a "printable" version of that thing. -- --Guido van Rossum (python.org/~guido) From foom at fuhm.net Tue Jun 22 20:07:18 2010 From: foom at fuhm.net (James Y Knight) Date: Tue, 22 Jun 2010 14:07:18 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote: > Similarly I'd expect (from experience) that a programmer using > Python to want to take the same approach, sticking with unencoded > data in nearly all situations. Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early, even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined transformations, which don't actually depend on it having being decoded properly. (For example, extracting the path from the URL and attempting to open it as a file on the filesystem.) This means that Python3 programs can become *more* fragile in the face of random data you encounter out in the real world, rather than less fragile, which was the goal of the whole exercise. The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. It seems kinda too late for that, though: next time someone designs a language, they can try that. :) James -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Tue Jun 22 20:09:24 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 22 Jun 2010 20:09:24 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: <4C20FC54.9000608@egenix.com> Guido van Rossum wrote: > [Just addressing one little issue here; generally I'm just happy that > we're discussing this issue in such detail from so many points of > view.] > > On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi wrote: >> [...] Would urljoin(b_base, b_subdir) => bytes and >> urljoin(u_base, u_subdir) => unicode be acceptable though? (I think, given >> other options, I'd rather see two separate functions, though. It seems more >> discoverable and less prone to taking bad input some of the time to have two >> functions that clearly only take one type of data apiece.) > > Hm. I'd rather see a single function (it would be "polymorphic" in my > earlier terminology). After all a large number of string method calls > (and some other utility function calls) already look the same > regardless of whether they are handling bytes or text (as long as it's > uniform). If the building blocks are all polymorphic it's easier to > create additional polymorphic functions. > > FWIW, there are two problems with polymorphic functions, though they > can be overcome: > > (1) Literals. > > If you write something like x.split('&') you are implicitly assuming x > is text. I don't see a very clean way to overcome this; you'll have to > implement some kind of type check e.g. > > x.split('&') if isinstance(x, str) else x.split(b'&') > > A handy helper function can be written: > > def literal_as(constant, variable): > if isinstance(variable, str): > return constant > else: > return constant.encode('utf-8') > > So now you can write x.split(literal_as('&', x)). This polymorphism is what we used in Python2 a lot to write code that works for both Unicode and 8-bit strings. Unfortunately, this no longer works as easily in Python3 due to the literals sometimes having the wrong type and using such a helper function slows things down a lot. It would be great if we could have something like the above as builtin method: x.split('&'.as(x)) Perhaps something to discuss on the language summit at EuroPython. Too bad we can't add such porting enhancements to Python2 anymore. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 22 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 26 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From a.badger at gmail.com Tue Jun 22 20:44:44 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 22 Jun 2010 14:44:44 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100622184444.GJ5787@unaka.lan> On Tue, Jun 22, 2010 at 08:24:28AM -0500, Michael Urman wrote: > On Tue, Jun 22, 2010 at 00:28, Stephen J. Turnbull wrote: > > Michael Urman writes: > > > > ?> It is somewhat troublesome that there doesn't appear to be an obvious > > ?> built-in idempotent-when-possible function that gives back the > > ?> provided bytes/str, > > > > If you want something idempotent, it's already the case that > > bytes(b'abc') => b'abc'. ?What might be desirable is to make > > bytes('abc') work and return b'abc', but only if 'abc' is pure ASCII > > (or maybe ISO 8859/1). > > By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, > errors) that would pass an instance of bytes through, or encode an > instance of str. And of course a to_str that performs similarly, > passing str through and decoding bytes. While bytes(b'abc') will give > me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me > the b'abc' I want to see. > A month or so ago, I finally broke down and wrote a python2 library that had these functions in it (along with a bunch of other trivial boilerplate functions that I found myself writing over and over in different projects) https://fedorahosted.org/releases/k/i/kitchen/docs/api-text-converters.html#unicode-and-byte-str-conversion I suppose I could port this to python3 and we could see if it gains adoption as a thirdparty addon. I have been hesitating over doing that since I don't use python3 for everyday work and I have a vague feeling that 2to3 won't understand what that code needs to do. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From brett at python.org Tue Jun 22 21:27:49 2010 From: brett at python.org (Brett Cannon) Date: Tue, 22 Jun 2010 12:27:49 -0700 Subject: [Python-Dev] State of json in 2.7 In-Reply-To: References: Message-ID: [cc'ing Bob on his gmail address; didn't have any other address handy so I don't know if this will actually get to him] On Tue, Jun 22, 2010 at 09:54, Dirkjan Ochtman wrote: > It looks like simplejson 2.1.0 and 2.1.1 have been released: > > http://bob.pythonmac.org/archives/2010/03/10/simplejson-210/ > http://bob.pythonmac.org/archives/2010/03/31/simplejson-211/ > > It looks like any changes that didn't come from the Python tree didn't > go into the Python tree, either. Has anyone asked Bob why he did this? There might be a logical reason. -Brett From bob at redivi.com Tue Jun 22 22:11:10 2010 From: bob at redivi.com (Bob Ippolito) Date: Tue, 22 Jun 2010 13:11:10 -0700 Subject: [Python-Dev] State of json in 2.7 In-Reply-To: References: Message-ID: On Tuesday, June 22, 2010, Brett Cannon wrote: > [cc'ing Bob on his gmail address; didn't have any other address handy > so I don't know if this will actually get to him] > > On Tue, Jun 22, 2010 at 09:54, Dirkjan Ochtman wrote: >> It looks like simplejson 2.1.0 and 2.1.1 have been released: >> >> http://bob.pythonmac.org/archives/2010/03/10/simplejson-210/ >> http://bob.pythonmac.org/archives/2010/03/31/simplejson-211/ >> >> It looks like any changes that didn't come from the Python tree didn't >> go into the Python tree, either. > > Has anyone asked Bob why he did this? There might be a logical reason. I've just been busy. It's not trivial to move patches from one to the other, so it's not something that has been easy for me to get around to actually doing. It seems that more often than not when I have had time to look at something, it didn't line up well with python's release schedule. (and speaking of busy I'm en route for a week long honeymoon so don't expect much else from me on this thread) -bob From tjreedy at udel.edu Tue Jun 22 22:19:45 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 16:19:45 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> <20100621181750.267933A404D@sparrow.telecommunity.com> Message-ID: On 6/22/2010 1:22 AM, Glyph Lefkowitz wrote: > The thing that I have heard in passing from a couple of folks with > experience in this area is that some older software in asia would > present characters differently if they were originally encoded in a > "japanese" encoding versus a "chinese" encoding, even though they were > really "the same" characters. As I tried to say in another post, that to me is similar to wanting to present English text is different fonts depending on whether spoken by an American or Brit, or a modern person versus a Renaissance person. > I do know that Han Unification is a giant political mess > ( makes for some Thanks, I will take a look. > interesting reading), but my understanding is that it has handled enough > of the cases by now that one can write software to display asian > languages and it will basically work with a modern version of unicode. > (And of course, there's always the private use area, as Stephen Turnbull > pointed out.) > > Regardless, this is another example where keeping around a string isn't > really enough. If you need to display a japanese character in a distinct > way because you are operating in the japanese *script*, you need a tag > surrounding your data that is a hint to its presentation. The fact that > these presentation hints were sometimes determined by their encoding is > an unfortunate historical accident. Yes. The asian languages I know anything about seems to natively have almost none of the symbols English has, many borrowed from math, that have been pressed into service for text markup. -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 22 22:32:40 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 16:32:40 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6/22/2010 9:24 AM, Michael Urman wrote: > By idempotent-when-possible, I mean to_bytes(str_or_bytes, encoding, > errors) that would pass an instance of bytes through, or encode an > instance of str. And of course a to_str that performs similarly, > passing str through and decoding bytes. While bytes(b'abc') will give > me b'abc', neither bytes('abc') nor bytes(b'abc', 'latin-1') get me > the b'abc' I want to see. > > These are trivial functions; > I just don't fully understand why the capability isn't baked in. Possible reasons: They are special purpose functions easily built on the basic functions provided. Fine for a 3rd party library. Most people do not need them. Some might be mislead by them. As other have said, "Not every one-liner should be builtin". -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 22 22:41:54 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 16:41:54 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> Message-ID: On 6/22/2010 12:53 PM, Guido van Rossum wrote: > On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger > wrote: >> >> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: >> >> This is a common pain-point for porting software to 3.x - you had a >> string, it kinda worked most of the time before, but now you need to keep >> track of text too and the functions which seemed to work on bytes no longer >> do. >> >> Thanks Glyph. That is a nice summary of one kind of challenge facing >> programmers. > > Ironically, Glyph also described the pain in 2.x: it only "kinda" worked. The people with problematic code to convert must imclude some who managed to tolerate and perhaps suppress the pain. I suspect that conversion attempts brings it back to the surface. It is natural to blame the re-surfacer rather than the original source. (As in 'blame the messenger'). -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 22 22:47:58 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 16:47:58 -0400 Subject: [Python-Dev] [OT] glyphs [was Re: email package status in 3.X] In-Reply-To: <201006222052.39734.steve@pearwood.info> References: <20100621184700.BAD7F3A404D@sparrow.telecommunity.com> <201006222052.39734.steve@pearwood.info> Message-ID: On 6/22/2010 6:52 AM, Steven D'Aprano wrote: > On Tue, 22 Jun 2010 11:46:27 am Terry Reedy wrote: >> 3. Unicode disclaims direct representation of glyphic variants >> (though again, exceptions were made for asian acceptance). For >> example, in English, mechanically printed 'a' and 'g' are different >> from manually printed 'a' and 'g'. Representing both by the same >> codepoint, in itself, loses information. One who wishes to preserve >> the distinction must instead use a font tag or perhaps a >> tag. Similarly, older English had a significantly >> different glyph for 's', which looks more like a modern 'f'. > > An unfortunate example, as the old English long-s gets its own Unicode > codepoint. Whoops. I suppose I should thank you for the correction so I never make the same error again. Thank you. > http://en.wikipedia.org/wiki/Long_s Very interesting to find out the source of both the integral sign and shilling symbols. -- Terry Jan Reedy From cyounkins at gmail.com Tue Jun 22 23:14:45 2010 From: cyounkins at gmail.com (Craig Younkins) Date: Tue, 22 Jun 2010 17:14:45 -0400 Subject: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities Message-ID: Hello, The method in question: http://docs.python.org/library/cgi.html#cgi.escape http://svn.python.org/view/python/tags/r265/Lib/cgi.py?view=markup # at the bottom "Convert the characters '&', '<' and '>' in string s to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. If the optional flag quote is true, the quotation mark character ('"') is also translated; this helps for inclusion in an HTML attribute value, as in . If the value to be quoted might include single- or double-quote characters, or both, consider using the quoteattr() function in the xml.sax.saxutils module instead." cgi.escape never escapes single quote characters, which can easily lead to a Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, but a quick search reveals many are using cgi.escape for HTML attribute escaping. The intended use of this method is unclear to me. Up to and including the latest published version of Mako (0.3.3), this method was the HTML escaping method. Used in this manner, single-quoted attributes with user-supplied data are easily susceptible to cross-site scripting vulnerabilities. Proof of concept in Mako: >>> from mako.template import Template >>> print Template("
", default_filters=['h']).render(data="' onload='alert(1);' id='")
I've emailed Michael Bayer, the creator of Mako, and this will be fixed in version 0.3.4. While the documentation says "if the value to be quoted might include single- or double-quote characters... [use the] xml.sax.saxutils module instead," it also implies that this method will make input safe for HTML. Because this method escapes 4 of the 5 key XML characters, it is reasonable to expect some will use it in the manner Mako did. I suggest rewording the documentation for the method making it more clear what it should and should not be used for. I would like to see the method changed to properly escape single-quotes, but if it is not changed, the documentation should explicitly say this method does not make input safe for inclusion in HTML. Shameless plug: http://www.PythonSecurity.org/ Craig Younkins -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Tue Jun 22 22:46:45 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 22 Jun 2010 15:46:45 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: On Tue, Jun 22, 2010 at 1:07 PM, James Y Knight wrote: > The surrogateescape method is a nice workaround for this, but I can't help > thinking that it might've been better to just treat stuff as > possibly-invalid-but-probably-utf8 byte-strings from input, through > processing, to output. It seems kinda too late for that, though: next time > someone designs a language, they can try that. :) > surrogateescape does help a lot, my only problem with it is that it's out-of-band information. That is, if you have data that went through data.decode('utf8', 'surrogateescape') you can restore it to bytes or transcode it to another encoding, but you have to know that it was decoded specifically that way. And of course if you did have to transcode it (e.g., text.encode('utf8', 'surrogateescape').decode('latin1')) then if you had actually handled the text in any way you may have broken it; you don't *really* have valid text. A lazier solution feels like it would be easier and more transparent to work with. But... I also don't see any major language constraint to having another kind of string that is bytes+encoding. I think PJE brought up a problem with a couple coercion aspects. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jun 22 23:21:53 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 22 Jun 2010 17:21:53 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: Tres, I am a Python3 enthusiast and realist. I did not expect major adoption for about 3 years (more optimistic than the 5 years of some). If you are feeling pressured to 'move' to Python3, it is not from me. I am sure you will do so on your own, perhaps even with enthusiasm, when it will be good for *you* to do so. If someone wants to contribute while sticking to Python2, its easy. The tracker has perhaps 2000 open 2.x issues, hundreds with no responses. If more Python2 people worked on making 2.7 as bug-free as possible, the developers would be freer to make 3.2 as good as possible (which is what *I* want). The porting of numpy (which I suspect has gotten some urging) will not just benefit 'nemerical' computing. For instance, there cannot be a 3.x version of pygame until there is a 3.x version of numpy, its main Python dependency. (The C Simple Directmedia Llibrary it also wraps and builds upon does not care.) -- Terry Jan Reedy From guido at python.org Tue Jun 22 19:03:29 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jun 2010 10:03:29 -0700 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver wrote: > Any "turdiness" (which I am *not* arguing for) is a natural consequence > of the kinds of backward incompatibilities which were *not* ruled out > for Python 3, along with the (early, now waning) "build it and they will > ?come" optimism about adoption rates. FWIW, my optimisim is *not* waning. I think it's good that we're having this discussion and I expect something useful will come out of it; I also expect in general that the (admittedly serious) problem of having to port all dependencies will be solved in the next few years. Not by magic, but because many people are taking small steps in the right direction, and there will be light eventually. In the mean time I don't blame anyone for sticking with 2.x or being too busy to help port stuff to 3.x. Python 3 has been a long time in the making -- it will be a bit longer still, which was expected. -- --Guido van Rossum (python.org/~guido) From janssen at parc.com Tue Jun 22 23:29:50 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 22 Jun 2010 14:29:50 PDT Subject: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities In-Reply-To: References: Message-ID: <10286.1277242190@parc.com> Craig Younkins wrote: > cgi.escape never escapes single quote characters, which can easily lead to a > Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, > but a quick search reveals many are using cgi.escape for HTML attribute > escaping. Did you file a bug report? Bill From robertc at robertcollins.net Tue Jun 22 23:40:45 2010 From: robertc at robertcollins.net (Robert Collins) Date: Wed, 23 Jun 2010 09:40:45 +1200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C20FC54.9000608@egenix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> Message-ID: On Wed, Jun 23, 2010 at 6:09 AM, M.-A. Lemburg wrote: >> ? ? ? ? ? return constant.encode('utf-8') >> >> So now you can write x.split(literal_as('&', x)). > > This polymorphism is what we used in Python2 a lot to write > code that works for both Unicode and 8-bit strings. > > Unfortunately, this no longer works as easily in Python3 due > to the literals sometimes having the wrong type and using > such a helper function slows things down a lot. I didn't work in 2 either - see for instance the traceback module with an Exception with unicode args and a non-ascii file path - the file path is in its bytes form, the string joining logic triggers an implicit upcast and *boom*. > Too bad we can't add such porting enhancements to Python2 anymore Perhaps a 'py3compat' module on pypi, with things like the py._builtin reraise helper and so forth ? -Rob From martin at v.loewis.de Tue Jun 22 23:50:49 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 22 Jun 2010 23:50:49 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: <4C213039.5090300@v.loewis.de> > This effectively substitutes getgrouplist called on the current user > for getgroups. In 3.x, I believe the correct action will be to > provide direct access to getgrouplist which is while not POSIX (yet?), > is widely available. As a policy, adding non-POSIX functions to the posix module is perfectly fine, as long as there is an autoconf test for it (plain ifdefs are gruntingly accepted also). Regards, Martin From fdrake at acm.org Tue Jun 22 21:23:13 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 22 Jun 2010 15:23:13 -0400 Subject: [Python-Dev] State of json in 2.7 In-Reply-To: References: Message-ID: On Tue, Jun 22, 2010 at 12:56 PM, Benjamin Peterson wrote: > Never have externally maintained packages. Seriously! I concur with this. Fortunately, it's not a real problem in this case. There's the (maintained) simplejson package, and the unmaintained json package. And simplejson works with older versions of Python, too, :-) -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From ncoghlan at gmail.com Tue Jun 22 23:41:51 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Jun 2010 07:41:51 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: On Wed, Jun 23, 2010 at 2:17 AM, Guido van Rossum wrote: > (1) Literals. > > If you write something like x.split('&') you are implicitly assuming x > is text. I don't see a very clean way to overcome this; you'll have to > implement some kind of type check e.g. > > ? ?x.split('&') if isinstance(x, str) else x.split(b'&') > > A handy helper function can be written: > > ?def literal_as(constant, variable): > ? ? ?if isinstance(variable, str): > ? ? ? ? ?return constant > ? ? ?else: > ? ? ? ? ?return constant.encode('utf-8') > > So now you can write x.split(literal_as('&', x)). I think this is a key point. In checking the behaviour of the os module bytes APIs (see below), I used a simple filter along the lines of: [x for x in seq if x.endswith("b")] It would be nice if code along those lines could easily be made polymorphic. Maybe what we want is a new class method on bytes and str (this idea is similar to what MAL suggests later in the thread): def coerce(cls, obj, encoding=None, errors='surrogateescape'): if isinstance(obj, cls): return existing if encoding is None: encoding = sys.getdefaultencoding() # This is the str version, bytes,coerce would use obj.encode() instead return obj.decode(encoding, errors) Then my example above could be made polymorphic (for ASCII compatible encodings) by writing: [x for x in seq if x.endswith(x.coerce("b"))] I'm trying to see downsides to this idea, and I'm not really seeing any (well, other than 2.7 being almost out the door and the fact we'd have to grant ourselves an exception to the language moratorium) > (2) Data sources. > > These can be functions that produce new data from non-string data, > e.g. str(), read it from a named file, etc. An example is read() > vs. write(): it's easy to create a (hypothetical) polymorphic stream > object that accepts both f.write('booh') and f.write(b'booh'); but you > need some other hack to make read() return something that matches a > desired return type. I don't have a generic suggestion for a solution; > for streams in particular, the existing distinction between binary and > text streams works, of course, but there are other situations where > this doesn't generalize (I think some XML interfaces have this > awkwardness in their API for converting a tree to a string). We may need to use the os and io modules as the precedents here: os: normal API is text using the surrogateescape error handler, parallel bytes API exposes raw bytes. Parallel API is polymorphic if possible (e.g. os.listdir), but appends a 'b' to the name if the polymorphic approach isn't practical (e.g. os.environb, os.getcwdb, os.getenvb). io. layered API, where both the raw bytes of the wire protocol and the decoded bytes of the text layer are available Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Jun 23 00:07:07 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Jun 2010 08:07:07 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C20FC54.9000608@egenix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> Message-ID: On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg wrote: > It would be great if we could have something like the above as > builtin method: > > x.split('&'.as(x)) As per my other message, another possible (and reasonably intuitive) spelling would be: x.split(x.coerce('&')) Writing it as a helper function is also possible, although it be trickier to remember the correct argument ordering: def coerce_to(target, obj, encoding=None, errors='surrogateescape'): if isinstance(obj, type(target)): return obj if encoding is None: encoding = sys.getdefaultencoding() try:: convert = obj.decode except AttributeError: convert = obj.encode return convert(encoding, errors) x.split(coerce_to(x, '&')) > Perhaps something to discuss on the language summit at EuroPython. > > Too bad we can't add such porting enhancements to Python2 anymore. Well, we can if we really want to, it just entails convincing Benjamin to reschedule the 2.7 final release. Given the UserDict/ABC/old-style classes issue, there's a fair chance there's going to be at least one more 2.7 RC anyway. That said, since this kind of coercion can be done in a helper function, that should be adequate for the 2.x to 3.x conversion case (for 2.x, the helper function can be defined to just return the second argument since bytes and str are the same type, while the 3.x version would look something like the code above) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Wed Jun 23 01:03:06 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 23 Jun 2010 11:03:06 +1200 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> Message-ID: <4C21412A.9030709@canterbury.ac.nz> Benjamin Peterson wrote: > IIRC this was because UserDict tries to be a MutableMapping but abcs > require new style classes. Are there any use cases for UserList and UserDict in new code, now that list and dict can be subclassed? If not, I don't think it would be a big problem if they were left out of the ABC ecosystem. No worse than what happens to any other existing user-defined class that predates ABCs -- if people want them to inherit from ABCs, they have to update their code. In this case, the update would consist of changing subclasses to inherit from list or dict instead. -- Greg From fuzzyman at voidspace.org.uk Wed Jun 23 00:59:12 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Tue, 22 Jun 2010 23:59:12 +0100 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <4C21412A.9030709@canterbury.ac.nz> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4C21412A.9030709@canterbury.ac.nz> Message-ID: <4C214040.20304@voidspace.org.uk> On 23/06/2010 00:03, Greg Ewing wrote: > Benjamin Peterson wrote: > >> IIRC this was because UserDict tries to be a MutableMapping but abcs >> require new style classes. > > Are there any use cases for UserList and UserDict in new > code, now that list and dict can be subclassed? Inheriting from list or dict isn't very useful as you to have to override *every* method to control behaviour. (For example with the dict if you override __setitem__ then update and setdefault (etc) don't go through your new __setitem__ and if you override __getitem__ then pop and friends don't go through your new __getitem__.) In 2.6+ you can of course use the collections.MutableMapping abc, but if you want to write cross-Python version code UserDict is still useful. If you want abc support then you are *already* on 2.6+ though I guess. All the best, Michael > > If not, I don't think it would be a big problem if they > were left out of the ABC ecosystem. No worse than what > happens to any other existing user-defined class that > predates ABCs -- if people want them to inherit from > ABCs, they have to update their code. In this case, the > update would consist of changing subclasses to inherit > from list or dict instead. > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From fuzzyman at voidspace.org.uk Wed Jun 23 01:04:15 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 23 Jun 2010 00:04:15 +0100 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> Message-ID: <4C21416F.2040009@voidspace.org.uk> On 22/06/2010 22:40, Robert Collins wrote: > On Wed, Jun 23, 2010 at 6:09 AM, M.-A. Lemburg wrote: > > >>> return constant.encode('utf-8') >>> >>> So now you can write x.split(literal_as('&', x)). >>> >> This polymorphism is what we used in Python2 a lot to write >> code that works for both Unicode and 8-bit strings. >> >> Unfortunately, this no longer works as easily in Python3 due >> to the literals sometimes having the wrong type and using >> such a helper function slows things down a lot. >> > I didn't work in 2 either - see for instance the traceback module with > an Exception with unicode args and a non-ascii file path - the file > path is in its bytes form, the string joining logic triggers an > implicit upcast and *boom*. > > Yeah, there are still a few places in unittest where a unicode exception can cause the whole test run to bomb out. No-one has *yet* reported these as bugs and I try and ferret them out as I find them. All the best, Michael >> Too bad we can't add such porting enhancements to Python2 anymore >> > Perhaps a 'py3compat' module on pypi, with things like the py._builtin > reraise helper and so forth ? > > -Rob > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From raymond.hettinger at gmail.com Wed Jun 23 01:17:54 2010 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Tue, 22 Jun 2010 16:17:54 -0700 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: <4C214040.20304@voidspace.org.uk> References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4C21412A.9030709@canterbury.ac.nz> <4C214040.20304@voidspace.org.uk> Message-ID: On Jun 22, 2010, at 3:59 PM, Michael Foord wrote: > On 23/06/2010 00:03, Greg Ewing wrote: >> Benjamin Peterson wrote: >> >>> IIRC this was because UserDict tries to be a MutableMapping but abcs >>> require new style classes. >> >> Are there any use cases for UserList and UserDict in new >> code, now that list and dict can be subclassed? > > Inheriting from list or dict isn't very useful as you to have to override *every* method to control behaviour. Benjamin fixed the UserDict and ABC problem earlier today in r82155. It is now the same as it was in Py2.6. Nothing to see here. Move along. Raymond From fuzzyman at voidspace.org.uk Wed Jun 23 01:18:29 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Wed, 23 Jun 2010 00:18:29 +0100 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: <4C2144C5.2070902@voidspace.org.uk> On 22/06/2010 19:07, James Y Knight wrote: > > On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote: >> Similarly I'd expect (from experience) that a programmer using Python >> to want to take the same approach, sticking with unencoded data in >> nearly all situations. > > Yeah. This is a real issue I have with the direction Python3 went: it > pushes you into decoding everything to unicode early, Well, both .NET and Java take this approach as well. I wonder how they cope with the particular issues that have been mentioned for web applications - both platforms are used extensively for web apps. Having used IronPython, which has .NET unicode strings (although it does a lot of magic to *allow* you to store binary data in strings for compatibility with CPython), I have to say that this approach makes a lot of programming *so* much more pleasant. We did a lot of I/O (can you do useful programming without I/O?) including working with databases, but I didn't work *much* with wire protocols (fetching a fair bit of data from the web though now I think about it). I think wire protocols can present particular problems; sometimes having mixed encodings in the same data it seems. Where you don't have these problems keeping bytes data and all Unicode text data separate and encoding / decoding at the boundaries is really much more sane and pleasant. It would be a real shame if we decided that the way forward for Python 3 was to try and move closer to how bytes/text was handled in Python 2. All the best, Michael > even when you don't care -- all you really wanted to do is pass it > from one API to another, with some well-defined transformations, which > don't actually depend on it having being decoded properly. (For > example, extracting the path from the URL and attempting to open it as > a file on the filesystem.) > > This means that Python3 programs can become *more* fragile in the face > of random data you encounter out in the real world, rather than less > fragile, which was the goal of the whole exercise. > > The surrogateescape method is a nice workaround for this, but I can't > help thinking that it might've been better to just treat stuff as > possibly-invalid-but-probably-utf8 byte-strings from input, through > processing, to output. It seems kinda too late for that, though: next > time someone designs a language, they can try that. :) > > James > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Wed Jun 23 01:23:40 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Tue, 22 Jun 2010 18:23:40 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: On Tue, Jun 22, 2010 at 11:17 AM, Guido van Rossum wrote: > (2) Data sources. > > These can be functions that produce new data from non-string data, > e.g. str(), read it from a named file, etc. An example is read() > vs. write(): it's easy to create a (hypothetical) polymorphic stream > object that accepts both f.write('booh') and f.write(b'booh'); but you > need some other hack to make read() return something that matches a > desired return type. I don't have a generic suggestion for a solution; > for streams in particular, the existing distinction between binary and > text streams works, of course, but there are other situations where > this doesn't generalize (I think some XML interfaces have this > awkwardness in their API for converting a tree to a string). > This reminds me of the optimization ElementTree and lxml made in Python 2 (not sure what they do in Python 3?) where they use str when a string is ASCII to avoid the memory and performance overhead of unicode. Also at least lxml is also dealing with the divide between the internal libxml2 string representation and the Python representation. This is a place where bytes+encoding might also have some benefit. XML is someplace where you might load a bunch of data but only touch a little bit of it, and the amount of data is frequently large enough that the efficiencies are important. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Wed Jun 23 01:55:11 2010 From: pje at telecommunity.com (P.J. Eby) Date: Tue, 22 Jun 2010 19:55:11 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: <20100622235514.7B3FC3A4099@sparrow.telecommunity.com> At 07:41 AM 6/23/2010 +1000, Nick Coghlan wrote: >Then my example above could be made polymorphic (for ASCII compatible >encodings) by writing: > > [x for x in seq if x.endswith(x.coerce("b"))] > >I'm trying to see downsides to this idea, and I'm not really seeing >any (well, other than 2.7 being almost out the door and the fact we'd >have to grant ourselves an exception to the language moratorium) Notice, however, that if multi-string operations used a coercion protocol (they currently have to do type checks already for byte/unicode mixes), then you could make the entire stdlib polymorphic by default, even for other kinds of strings that don't exist yet. If you invent a new numeric type, generally speaking you can pass it to existing stdlib functions taking numbers, as long as it implements the appropriate protocols. Why not do the same for strings? From glyph at twistedmatrix.com Wed Jun 23 02:23:56 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 20:23:56 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <39CFC9B3-E55A-41BB-9718-1457E20ACECC@twistedmatrix.com> <89AE7ED6-FB94-45DA-9432-7FCBA25A56BF@gmail.com> Message-ID: On Jun 22, 2010, at 12:53 PM, Guido van Rossum wrote: > On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger > wrote: >> >> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote: >> >> This is a common pain-point for porting software to 3.x - you had a >> string, it kinda worked most of the time before, but now you need to keep >> track of text too and the functions which seemed to work on bytes no longer >> do. >> >> Thanks Glyph. That is a nice summary of one kind of challenge facing >> programmers. > > Ironically, Glyph also described the pain in 2.x: it only "kinda" worked. It was not my intention to be ironic about it - that was exactly what I meant :). 3.x is forcing you to confront an issue that you _should_ have confronted for 2.x anyway. (And, I hope, most libraries doing a 3.x migration will take the opportunity to make their 2.x APIs unicode-clean while still in 2to3 mode, and jump ship to 3.x source only _after_ there's a nice transition path for their clients that can be taken in 2 steps.) From glyph at twistedmatrix.com Wed Jun 23 02:25:31 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 20:25:31 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: <94700B9C-25B4-4A75-BA43-20FEA3FDE772@twistedmatrix.com> On Jun 22, 2010, at 2:07 PM, James Y Knight wrote: > Yeah. This is a real issue I have with the direction Python3 went: it pushes you into decoding everything to unicode early, even when you don't care -- all you really wanted to do is pass it from one API to another, with some well-defined transformations, which don't actually depend on it having being decoded properly. (For example, extracting the path from the URL and attempting to open it as a file on the filesystem.) But you _do_ need to decode it in this case. If you got your URL from some funky UTF-32 datasource, b"\x00\x00\x00/" is not a path separator, "/" is. Plus, you should really be separating path segments and looking at them individually so that you don't fall victim to "%2F" bugs. And if you want your code to be portable, you need a Unicode representation of your pathname anyway for Windows; plus, there, you need to care about "\" as well as "/". The fact that your wire-bytes were probably ASCII(-ish) and your filesystem probably encodes pathnames as UTF-8 and so everything looks like it lines up is no excuse not to be explicit about your expectations there. You may want to transcode your characters into some other characters later, but that shouldn't stop you from treating them as characters of some variety in the meanwhile. > The surrogateescape method is a nice workaround for this, but I can't help thinking that it might've been better to just treat stuff as possibly-invalid-but-probably-utf8 byte-strings from input, through processing, to output. It seems kinda too late for that, though: next time someone designs a language, they can try that. :) I can think of lots of optimizations that might be interesting for Python (or perhaps some other runtime less concerned with cleverness overload, like PyPy) to implement, like a UTF-8 combining-characters overlay that would allow for fast indexing, lazily populated as random access dictates. But this could all be implemented as smartness inside .encode() and .decode() and the str and bytes types without changing the way the API works. I realize that there are implications at the C level, but as long as you can squeeze a function call in to certain places, it could still work. I can also appreciate what's been said in this thread a bunch of times: to my knowledge, nobody has actually shown a profile of an application where encoding is significant overhead. I believe that encoding _will_ be a significant overhead for some applications (and actually I think it will be very significant for some applications that I work on), but optimizations should really be implemented once that's been demonstrated, so that there's a better understanding of what the overhead is, exactly. Is memory a big deal? Is CPU? Is it both? Do you want to tune for the tradeoff? etc, etc. Clever data-structures seem premature until someone has a good idea of all those things. From glyph at twistedmatrix.com Wed Jun 23 02:34:31 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 20:34:31 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: <5A4340BB-7B64-4C76-81FF-8A43F179AA7A@twistedmatrix.com> On Jun 22, 2010, at 7:23 PM, Ian Bicking wrote: > This is a place where bytes+encoding might also have some benefit. XML is someplace where you might load a bunch of data but only touch a little bit of it, and the amount of data is frequently large enough that the efficiencies are important. Different encodings have different characteristics, though, which makes them amenable to different types of optimizations. If you've got an ASCII string or a latin1 string, the optimizations of unicode are pretty obvious; if you've got one in UTF-16 with no multi-code-unit sequences, you could also hypothetically cheat for a while if you're on a UCS4 build of Python. I suspect the practical problem here is that there's no CharacterString ABC in the collections module for third-party libraries to provide their own peculiarly-optimized implementations that could lazily turn into real 'str's as needed. I'd volunteer to write a PEP if I thought I could actually get it done :-\. If someone else wants to be the primary author though, I'll try to help out. From murman at gmail.com Wed Jun 23 02:38:00 2010 From: murman at gmail.com (Michael Urman) Date: Tue, 22 Jun 2010 19:38:00 -0500 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100621015824.6A84E3A4099@sparrow.telecommunity.com> <20100621145133.7F5333A404D@sparrow.telecommunity.com> <87lja73aau.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, Jun 22, 2010 at 15:32, Terry Reedy wrote: > On 6/22/2010 9:24 AM, Michael Urman wrote: >> These are trivial functions; >> I just don't fully understand why the capability isn't baked in. > > Possible reasons: They are special purpose functions easily built on the > basic functions provided. Fine for a 3rd party library. Most people do not > need them. Some might be mislead by them. As other have said, "Not every > one-liner should be builtin". Perhaps the two-argument constructions on bytes and str should have been removed in favor of the .decode and .encode methods on their respective classes. Or vice versa; I don't have the history to know in which order they originated, and which is theoretically preferred these days. -- Michael Urman From mike.klaas at gmail.com Wed Jun 23 02:39:04 2010 From: mike.klaas at gmail.com (Mike Klaas) Date: Tue, 22 Jun 2010 17:39:04 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> Message-ID: On Tue, Jun 22, 2010 at 4:23 PM, Ian Bicking wrote: > This reminds me of the optimization ElementTree and lxml made in Python 2 > (not sure what they do in Python 3?) where they use str when a string is > ASCII to avoid the memory and performance overhead of unicode. An optimization that forces me to typecheck the return value of the function and that I only discovered after code started breaking. I can't say was enthused about that decision when I discovered it. -Mike From robertc at robertcollins.net Wed Jun 23 02:57:48 2010 From: robertc at robertcollins.net (Robert Collins) Date: Wed, 23 Jun 2010 12:57:48 +1200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <94700B9C-25B4-4A75-BA43-20FEA3FDE772@twistedmatrix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <94700B9C-25B4-4A75-BA43-20FEA3FDE772@twistedmatrix.com> Message-ID: On Wed, Jun 23, 2010 at 12:25 PM, Glyph Lefkowitz wrote: > I can also appreciate what's been said in this thread a bunch of times: to my knowledge, nobody has actually shown a profile of an application where encoding is significant overhead. ?I believe that encoding _will_ be a significant overhead for some applications (and actually I think it will be very significant for some applications that I work on), but optimizations should really be implemented once that's been demonstrated, so that there's a better understanding of what the overhead is, exactly. ?Is memory a big deal? ?Is CPU? ?Is it both? ?Do you want to tune for the tradeoff? ?etc, etc. ?Clever data-structures seem premature until someone has a good idea of all those things. bzr has a cache of decoded strings in it precisely because decode is slow. We accept slowness encoding to the users locale because thats typically much less data to examine than we've examined while generating the commit/diff/whatever. We also face memory pressure on a regular basis, and that has been, at least partly, due to UCS4 - our translation cache helps there because we have less duplicate UCS4 strings. You're welcome to dig deeper into this, but I don't have more detail paged into my head at the moment. -Rob From janssen at parc.com Wed Jun 23 03:56:51 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 22 Jun 2010 18:56:51 PDT Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: <73196.1277143019@parc.com> References: <73196.1277143019@parc.com> Message-ID: <14929.1277258211@parc.com> Bill Janssen wrote: > Considering that we've just released 2.7rc2, there are an awful lot of > red buildbots for 2.7. In fact, I don't remember having seen a green > buildbot for OS X and 2.7. Shouldn't these be fixed? Thanks to some action by Ronald, my two PPC OS X buildbots are now showing green for the trunk. Bill From fdrake at acm.org Wed Jun 23 03:58:07 2010 From: fdrake at acm.org (Fred Drake) Date: Tue, 22 Jun 2010 21:58:07 -0400 Subject: [Python-Dev] UserDict in 2.7 In-Reply-To: References: <58CEF265-1B25-4FD6-9C45-88353A0AF0E7@gmail.com> <4C21412A.9030709@canterbury.ac.nz> <4C214040.20304@voidspace.org.uk> Message-ID: On Tue, Jun 22, 2010 at 7:17 PM, Raymond Hettinger wrote: > Benjamin fixed the UserDict ?and ABC problem earlier today in r82155. > It is now the same as it was in Py2.6. Thanks, Benjamin! -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From stephen at xemacs.org Wed Jun 23 08:44:28 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 23 Jun 2010 15:44:28 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <871vbyp7sj.fsf@uwakimon.sk.tsukuba.ac.jp> Ian Bicking writes: > Just for perspective, I don't know if I've ever wanted to deal with a URL > like that. Ditto, I do many times a day for Japanese media sites and Wikipedia. > I know how it is supposed to work, and I know what a browser does > with that, but so many tools will clean that URL up *or* won't be > able to deal with it at all that it's not something I'll be passing > around. I'm not suggesting that is something you want to be "passing around"; it's a presentation form, and I prefer that the internal form use Unicode. > While it's nice to be correct about encodings, sometimes it is > impractical. And it is far nicer to avoid the situation entirely. But you cannot avoid it entirely. Processing bytes mean you are assuming ASCII compatibility. Granted, this is a pretty good assumption, especially if you got the bytes off the wire, but it's not universally so. Maybe it's a YAGNI, but one reason I prefer the decode-process-encode paradigm is that choice of codec is a specification of the assumptions you're making about encoding. So the Know-Nothing codec described above assumes just enough ASCII compatibility to parse the scheme. You could also have codecs which assume just enough ASCII compatibility to parse a hierarchical scheme, etc. > That is, decoding content you don't care about isn't just > inefficient, it's complicated and can introduce errors. That depends on the codec(s) used. > Similarly I'd expect (from experience) that a programmer using > Python to want to take the same approach, sticking with unencoded > data in nearly all situations. Indeed, a programmer using Python 2 would want to do so, because all her literal strings are bytes by default (ie, if she doesn't mark them with `u'), and interactive input is, too. This is no longer so obvious in Python 3 which takes the attitude that things that are expected to be human-readable should be processed as str. The obvious example in URI space is the file:/// URL, which you'll typically build up from a user string or a file browser, which will call the os.path stuff which returns str. Text editors and viewers will also use str for their buffers, and if they provide a way to fish out URIs for their users, they'll probably return str. I won't pretend to judge the relative importance of such use cases. But use cases for urllib which naturally favor str until you put the URI on the wire do exist, as does the debugging presentation aspect. From ronaldoussoren at mac.com Wed Jun 23 08:08:13 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 23 Jun 2010 08:08:13 +0200 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On 22 Jun, 2010, at 19:05, Alexander Belopolsky wrote: > On Tue, Jun 22, 2010 at 12:39 PM, Ronald Oussoren > wrote: > .. >> Both are valid fixes, both have both advantages and disadvantages. >> >> Your proposal: >> * Reverts to the behavior in 2.6 >> * Ensures that posix.getgroups and posix.setgroups are internally consistent >> > It is also very simple and since posix module worked fine on OSX for > years without _DARWIN_C_SOURCE, I think this is a very low risk > change. I don't agree. The patch itself is pretty simple, but it does make a rather significant change to the build process: the compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule itself wouldn't. This may lead to subtle bugs, or even compile errors (because some function definitions change when _DARWIN_C_SOURCE active). And man compat(5) says: 32-BIT COMPILATION Defining _NONSTD_SOURCE causes library and kernel calls to behave as closely to Mac OS X 10.3's library and kernel calls as possible. Any behavioral changes in this mode are documented in the LEGACY sections of the individual function calls. Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform to the SUSv3 standards even if doing so would alter the behavior of functions used in 10.3. Defining _POSIX_C_SOURCE also removes functions, types, and other interfaces that are not part of SUSv3 from the normal C namespace, unless _DARWIN_C_SOURCE is also defined (i.e., _DARWIN_C_SOURCE is _POSIX_C_SOURCE with non-POSIX exten- sions). In any of these cases, the _DARWIN_FEATURE_UNIX_CONFORMANCE feature macro will be defined to the SUS conformance level (it is unde- fined otherwise). Starting in Mac OS X 10.5, if none of the macros _NONSTD_SOURCE, _POSIX_C_SOURCE or _DARWIN_C_SOURCE are defined, and the environment vari- able MACOSX_DEPLOYMENT_TARGET is either undefined or set to 10.5 or greater (or equivalently, the gcc(1) option -mmacosx-version-min is either not specified or set to 10.5 or greater), then UNIX conformance will be on by default, and non-POSIX extensions will also be available (this is the equivalent of defining _DARWIN_C_SOURCE). For version values less that 10.5, UNIX conformance will be off (the equivalent of defining _NONSTD_SOURCE). My interpretation of that is that _DARWIN_C_SOURCE should be used to get SUSv3 APIs while keeping access to darwin-specific API's at well. When you deploy to 10.5 or later the compiler will set _DARWIN_C_SOURCE for you unless you set one of the other feature selecting defines. > >> My proposal: >> * Uses the newer ABI, which is more likely to be the one Apple wants you to use > > I don't think so. In getgroups(2) I see > > LEGACY DESCRIPTION > If _DARWIN_C_SOURCE is defined, getgroups() can return more than > {NGROUPS_MAX} groups. > > This suggests that this is legacy behavior. Newer applications should > use getgrouplist instead. I honestly don't know why this is in the LEGACY DESCRIPTION. But as the functionality you get with _DARWIN_C_SOURCE was added later I'd say that the behavior is intentional and not legacy. By not definining _DARWIN_C_SOURCE we don't necessarily get full UNIX behavior for other APIs. > >> * Is compatible with system tools (that is, posix.getgroups() agrees with id(1)) > > I have not tested this recently, but I think if you exec id from a > program after a call to setgroups(), it will return process groups, > not user groups. > >> * Is compatible with /usr/bin/python > > I am sure that one this issue is fixed upstream, Apple will pick it up > with the next version. Haha. Apple explicitly added patches to get the current behavior instead of the default, what makes you think that they'll revert to the older behavior. > >> * results in posix.getgroups not reflecting results of posix.setgroups >> > > This effectively substitutes getgrouplist called on the current user > for getgroups. In 3.x, I believe the correct action will be to > provide direct access to getgrouplist which is while not POSIX (yet?), > is widely available. I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix (), although this isn't a requirement for being added to the posix module. It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch is more complicated and the library function we use can be considered to be broken. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From stephen at xemacs.org Wed Jun 23 09:07:50 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 23 Jun 2010 16:07:50 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> James Y Knight writes: > The surrogateescape method is a nice workaround for this, but I can't > help thinking that it might've been better to just treat stuff as > possibly-invalid-but-probably-utf8 byte-strings from input, through > processing, to output. This is the world we already have, modulo s/utf8/ascii + random GR charset/. It doesn't work, and it can't, in Japan or China or Korea, and probably not in Russia or Kazakhstan, for some time yet. That's not to say that byte-oriented processing doesn't have its place. And in many cases it's reasonable (but not secure or bulletproof!) to assume ASCII compatibility of the byte stream, passing through syntactically unimportant bytes verbatim. Syntactic analysis of such streams will surely have a lot in common with that for text streams, so the same tools should be available. (That's the point of Guido's endorsement of polymorphism, AIUI.) But it's just not reasonable to assume that will work in a context where text streams from various sources are mixed with byte streams. In that case, the byte streams need to be converted to text before mixing. (You can't do it the other way around because there is no guarantee that the text is compatible with the current encoding of the byte stream, nor that all the byte streams have the same encoding.) We do need str-based implementations of modules like urllib. From mal at egenix.com Wed Jun 23 11:18:23 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 23 Jun 2010 11:18:23 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> Message-ID: <4C21D15F.8070304@egenix.com> Nick Coghlan wrote: > On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg wrote: >> It would be great if we could have something like the above as >> builtin method: >> >> x.split('&'.as(x)) > > As per my other message, another possible (and reasonably intuitive) > spelling would be: > > x.split(x.coerce('&')) You are right: there are two ways to adapt one object to another. You can either adapt object 1 to object 2 or object 2 to object 1. This is what the Python2 coercion protocol does for operators. I just wanted to avoid using that term, since Python3 removes the coercion protocol. > Writing it as a helper function is also possible, although it be > trickier to remember the correct argument ordering: > > def coerce_to(target, obj, encoding=None, errors='surrogateescape'): > if isinstance(obj, type(target)): > return obj > if encoding is None: > encoding = sys.getdefaultencoding() > try:: > convert = obj.decode > except AttributeError: > convert = obj.encode > return convert(encoding, errors) > > x.split(coerce_to(x, '&')) > >> Perhaps something to discuss on the language summit at EuroPython. >> >> Too bad we can't add such porting enhancements to Python2 anymore. > > Well, we can if we really want to, it just entails convincing Benjamin > to reschedule the 2.7 final release. Given the UserDict/ABC/old-style > classes issue, there's a fair chance there's going to be at least one > more 2.7 RC anyway. > > That said, since this kind of coercion can be done in a helper > function, that should be adequate for the 2.x to 3.x conversion case > (for 2.x, the helper function can be defined to just return the second > argument since bytes and str are the same type, while the 3.x version > would look something like the code above) True. Note that the point of using a builtin method was to get better performance. Such type adaptions are often needed in loops, so adding a few extra Python function calls just to convert a str object to a bytes object or vice-versa is a bit much overhead. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 23 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 25 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From cesare.di.mauro at gmail.com Wed Jun 23 12:12:36 2010 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Wed, 23 Jun 2010 12:12:36 +0200 Subject: [Python-Dev] WPython 1.1 was released Message-ID: I've released WPython 1.1, which brings many optimizations and refactorings. The project is hosted at Google Code: http://code.google.com/p/wpython2/ and available as a Mercurial repository http://code.google.com/p/wpython2/source/checkout?repo=wpython11 . In the download section http://code.google.com/p/wpython2/downloads/listthere are the slides of the last italian PyCon where I have presented the project and illustrated the changes. You can also download the binaries for Windows (compressed in 7-Zip format: http://www.7-zip.org/ ) and sources (for Unix users, Parser/Python.asdl and configure files need to be chmod +x ). Attached there are some benchmarks with the Unladen Swallow tests suite (against Python 2.6.4). Regards Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ?Report on Darwin iMac-di-Mirco.local 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:57:13 PST 2010; root:xnu-1504.3.12~1/RELEASE_X86_64 x86_64 i386 Total CPU cores: 2 ### 2to3 ### 29.085133 -> 25.601404: 1.1361x faster ### bzr_startup ### Min: 0.204419 -> 0.096856: 2.1105x faster Avg: 0.213686 -> 0.113666: 1.8799x faster Significant (t=71.767819) Stddev: 0.01277 -> 0.00559: 2.2833x smaller Timeline: http://tinyurl.com/y7qgndp ### call_method ### Min: 0.644754 -> 0.622001: 1.0366x faster Avg: 0.806862 -> 0.725472: 1.1122x faster Significant (t=11.301638) Stddev: 0.07300 -> 0.04951: 1.4744x smaller Timeline: http://tinyurl.com/y3rfsnu ### call_method_slots ### Min: 0.626559 -> 0.589525: 1.0628x faster Avg: 0.761122 -> 0.680558: 1.1184x faster Significant (t=11.706336) Stddev: 0.06496 -> 0.05371: 1.2093x smaller Timeline: http://tinyurl.com/y7kkg9m ### call_method_unknown ### Min: 0.669814 -> 0.593711: 1.1282x faster Avg: 0.883463 -> 0.746100: 1.1841x faster Significant (t=8.601215) Stddev: 0.13619 -> 0.14039: 1.0308x larger Timeline: http://tinyurl.com/y6u5qut ### call_simple ### Min: 0.486911 -> 0.435191: 1.1188x faster Avg: 0.700634 -> 0.590928: 1.1857x faster Significant (t=9.030587) Stddev: 0.12218 -> 0.08491: 1.4390x smaller Timeline: http://tinyurl.com/y2pnbfz ### float ### Min: 0.126226 -> 0.097072: 1.3003x faster Avg: 0.174486 -> 0.164656: 1.0597x faster Significant (t=2.822244) Stddev: 0.02922 -> 0.04668: 1.5976x larger Timeline: http://tinyurl.com/y3o7gko ### hg_startup ### Min: 0.057444 -> 0.042930: 1.3381x faster Avg: 0.067769 -> 0.050515: 1.3416x faster Significant (t=109.019677) Stddev: 0.00293 -> 0.00199: 1.4687x smaller Timeline: http://tinyurl.com/y5ss3l9 ### html5lib ### Min: 16.410586 -> 15.971322: 1.0275x faster Avg: 16.579096 -> 16.119135: 1.0285x faster Significant (t=5.554462) Stddev: 0.13844 -> 0.12297: 1.1258x smaller Timeline: http://tinyurl.com/yya44oj ### html5lib_warmup ### Min: 17.765242 -> 15.582871: 1.1400x faster Avg: 17.968972 -> 16.065290: 1.1185x faster Significant (t=10.236030) Stddev: 0.28980 -> 0.29826: 1.0292x larger Timeline: http://tinyurl.com/y7osmkp ### iterative_count ### Min: 0.156827 -> 0.084917: 1.8468x faster Avg: 0.166389 -> 0.090218: 1.8443x faster Significant (t=26.855602) Stddev: 0.01766 -> 0.00950: 1.8586x smaller Timeline: http://tinyurl.com/y2kz25f ### nbody ### Min: 0.498760 -> 0.427710: 1.1661x faster Avg: 0.515754 -> 0.445318: 1.1582x faster Significant (t=22.964790) Stddev: 0.01500 -> 0.01566: 1.0442x larger Timeline: http://tinyurl.com/y7b92bm ### normal_startup ### Min: 0.534059 -> 0.817747: 1.5312x slower Avg: 0.547493 -> 0.838141: 1.5309x slower Significant (t=-127.297104) Stddev: 0.00799 -> 0.01403: 1.7567x larger Timeline: http://tinyurl.com/y5tfkm3 ### nqueens ### Min: 0.583106 -> 0.573619: 1.0165x faster Avg: 0.611182 -> 0.595222: 1.0268x faster Significant (t=3.893252) Stddev: 0.02367 -> 0.01674: 1.4142x smaller Timeline: http://tinyurl.com/y79zhpz ### pickle ### Min: 1.660705 -> 1.576223: 1.0536x faster Avg: 1.757750 -> 1.672262: 1.0511x faster Significant (t=9.284162) Stddev: 0.04774 -> 0.04427: 1.0785x smaller Timeline: http://tinyurl.com/y2f3eee ### pickle_dict ### Min: 1.389026 -> 1.468648: 1.0573x slower Avg: 1.479180 -> 1.551554: 1.0489x slower Significant (t=-7.056610) Stddev: 0.05664 -> 0.04529: 1.2507x smaller Timeline: http://tinyurl.com/y2kl4no ### pickle_list ### Min: 0.802236 -> 0.780976: 1.0272x faster Avg: 0.843450 -> 0.822717: 1.0252x faster Significant (t=3.353898) Stddev: 0.02861 -> 0.03305: 1.1554x larger Timeline: http://tinyurl.com/y2csxb9 ### pybench ### Min: 4906 -> 4344: 1.1294x faster Avg: 5235 -> 4618: 1.1336x faster ### regex_compile ### Min: 0.757385 -> 0.663902: 1.1408x faster Avg: 0.807480 -> 0.698190: 1.1565x faster Significant (t=20.304562) Stddev: 0.03027 -> 0.02308: 1.3116x smaller Timeline: http://tinyurl.com/y5vmu5y ### regex_effbot ### Min: 0.102901 -> 0.095138: 1.0816x faster Avg: 0.109344 -> 0.102460: 1.0672x faster Significant (t=5.515715) Stddev: 0.00574 -> 0.00670: 1.1678x larger Timeline: http://tinyurl.com/yyhbuzh ### regex_v8 ### Min: 0.123948 -> 0.106031: 1.1690x faster Avg: 0.128534 -> 0.111830: 1.1494x faster Significant (t=16.677634) Stddev: 0.00436 -> 0.00558: 1.2787x larger Timeline: http://tinyurl.com/y2zrssn ### richards ### Min: 0.354665 -> 0.287113: 1.2353x faster Avg: 0.381205 -> 0.306374: 1.2442x faster Significant (t=23.417400) Stddev: 0.01926 -> 0.01182: 1.6294x smaller Timeline: http://tinyurl.com/yyzqb7v ### slowpickle ### Min: 0.753230 -> 0.664495: 1.1335x faster Avg: 0.801162 -> 0.708291: 1.1311x faster Significant (t=17.994391) Stddev: 0.02267 -> 0.02860: 1.2612x larger Timeline: http://tinyurl.com/y4z6poh ### slowspitfire ### Min: 0.868708 -> 0.872393: 1.0042x slower Avg: 0.971014 -> 0.919428: 1.0561x faster Significant (t=4.503573) Stddev: 0.07780 -> 0.02253: 3.4529x smaller Timeline: http://tinyurl.com/y64sn8p ### slowunpickle ### Min: 0.337317 -> 0.299357: 1.1268x faster Avg: 0.353311 -> 0.313929: 1.1254x faster Significant (t=19.034627) Stddev: 0.01066 -> 0.01002: 1.0629x smaller Timeline: http://tinyurl.com/y3symau ### startup_nosite ### Min: 0.317232 -> 0.224719: 1.4117x faster Avg: 0.333151 -> 0.235118: 1.4170x faster Significant (t=95.671333) Stddev: 0.00851 -> 0.00571: 1.4919x smaller Timeline: http://tinyurl.com/yyvr8m5 ### threaded_count ### Min: 0.194147 -> 0.116080: 1.6725x faster Avg: 0.216559 -> 0.139140: 1.5564x faster Significant (t=50.972602) Stddev: 0.00765 -> 0.00753: 1.0162x smaller Timeline: http://tinyurl.com/y38bz5h ### unpack_sequence ### Min: 0.000093 -> 0.000082: 1.1337x faster Avg: 0.000098 -> 0.000086: 1.1343x faster Significant (t=25.434812) Stddev: 0.00007 -> 0.00008: 1.1129x larger Timeline: http://tinyurl.com/y5hv9ck ### unpickle ### Min: 1.102754 -> 1.015811: 1.0856x faster Avg: 1.138448 -> 1.052802: 1.0814x faster Significant (t=18.018135) Stddev: 0.02248 -> 0.02499: 1.1118x larger Timeline: http://tinyurl.com/y49x4pk ### unpickle_list ### Min: 0.990238 -> 0.881112: 1.1239x faster Avg: 1.043900 -> 0.933968: 1.1177x faster Significant (t=21.205782) Stddev: 0.02977 -> 0.02139: 1.3913x smaller Timeline: http://tinyurl.com/y49pm9p Report on Linux cionci-desktop 2.6.27-17-generic #1 SMP Fri Mar 12 02:08:25 UTC 2010 x86_64 Total CPU cores: 2 ### 2to3 ### 27.729733 -> 25.521595: 1.0865x faster ### bzr_startup ### Min: 0.072004 -> 0.068004: 1.0588x faster Avg: 0.094326 -> 0.091926: 1.0261x faster Not significant Stddev: 0.00883 -> 0.00958: 1.0851x larger Timeline: http://tinyurl.com/y5zc5ca ### call_method ### Min: 0.630349 -> 0.566228: 1.1132x faster Avg: 0.655913 -> 0.574280: 1.1421x faster Significant (t=54.712328) Stddev: 0.01462 -> 0.01096: 1.3344x smaller Timeline: http://tinyurl.com/y6eg77c ### call_method_slots ### Min: 0.635804 -> 0.511669: 1.2426x faster Avg: 0.660014 -> 0.528936: 1.2478x faster Significant (t=69.342882) Stddev: 0.01859 -> 0.01380: 1.3470x smaller Timeline: http://tinyurl.com/y7p9esb ### call_method_unknown ### Min: 0.766309 -> 0.562713: 1.3618x faster Avg: 0.774030 -> 0.585773: 1.3214x faster Significant (t=90.713925) Stddev: 0.00759 -> 0.02426: 3.1937x larger Timeline: http://tinyurl.com/y6y6w7a ### call_simple ### Min: 0.498106 -> 0.451661: 1.1028x faster Avg: 0.502283 -> 0.460072: 1.0917x faster Significant (t=62.530336) Stddev: 0.00738 -> 0.00373: 1.9763x smaller Timeline: http://tinyurl.com/y5gt8qa ### float ### Min: 0.117934 -> 0.102821: 1.1470x faster Avg: 0.129057 -> 0.117482: 1.0985x faster Significant (t=12.577691) Stddev: 0.00811 -> 0.01208: 1.4897x larger Timeline: http://tinyurl.com/y2pc4wj ### hg_startup ### Min: 0.012000 -> 0.012001: 1.0001x slower Avg: 0.033594 -> 0.032258: 1.0414x faster Significant (t=3.596547) Stddev: 0.00597 -> 0.00578: 1.0320x smaller Timeline: http://tinyurl.com/y449a8r ### html5lib ### Min: 16.581036 -> 15.668980: 1.0582x faster Avg: 16.823451 -> 15.946597: 1.0550x faster Significant (t=4.738181) Stddev: 0.22787 -> 0.34542: 1.5159x larger Timeline: http://tinyurl.com/y3wx52k ### html5lib_warmup ### Min: 16.436294 -> 15.664941: 1.0492x faster Avg: 16.810495 -> 15.983748: 1.0517x faster Significant (t=2.827967) Stddev: 0.43953 -> 0.48388: 1.1009x larger Timeline: http://tinyurl.com/y74vue8 ### iterative_count ### Min: 0.189088 -> 0.083317: 2.2695x faster Avg: 0.191612 -> 0.088073: 2.1756x faster Significant (t=65.385891) Stddev: 0.00501 -> 0.01001: 1.9975x larger Timeline: http://tinyurl.com/y65yy5c ### nbody ### Min: 0.568523 -> 0.426052: 1.3344x faster Avg: 0.580190 -> 0.428620: 1.3536x faster Significant (t=72.626477) Stddev: 0.01450 -> 0.00273: 5.3178x smaller Timeline: http://tinyurl.com/y5hbwsy ### normal_startup ### Min: 0.420100 -> 0.408876: 1.0275x faster Avg: 0.475876 -> 0.489076: 1.0277x slower Not significant Stddev: 0.04082 -> 0.05543: 1.3579x larger Timeline: http://tinyurl.com/y5jdfgq ### nqueens ### Min: 0.585605 -> 0.577289: 1.0144x faster Avg: 0.603038 -> 0.594904: 1.0137x faster Significant (t=2.026307) Stddev: 0.01851 -> 0.02152: 1.1629x larger Timeline: http://tinyurl.com/yydzdhw ### pickle ### Min: 1.592286 -> 1.584492: 1.0049x faster Avg: 1.611001 -> 1.606726: 1.0027x faster Not significant Stddev: 0.01343 -> 0.03570: 2.6586x larger Timeline: http://tinyurl.com/yyax7wc ### pickle_dict ### Min: 1.316577 -> 1.298239: 1.0141x faster Avg: 1.320249 -> 1.311228: 1.0069x faster Significant (t=3.270732) Stddev: 0.00367 -> 0.01915: 5.2196x larger Timeline: http://tinyurl.com/y2smb8n ### pickle_list ### Min: 0.734164 -> 0.727414: 1.0093x faster Avg: 0.749225 -> 0.738023: 1.0152x faster Significant (t=3.523434) Stddev: 0.01996 -> 0.01035: 1.9291x smaller Timeline: http://tinyurl.com/yybbuct ### pybench ### Min: 5133 -> 4264: 1.2038x faster Avg: 5370 -> 4448: 1.2073x faster ### regex_compile ### Min: 0.783521 -> 0.706420: 1.1091x faster Avg: 0.805385 -> 0.743189: 1.0837x faster Significant (t=14.697890) Stddev: 0.01900 -> 0.02312: 1.2168x larger Timeline: http://tinyurl.com/y4ng9oz ### regex_effbot ### Min: 0.106946 -> 0.108064: 1.0105x slower Avg: 0.108937 -> 0.112714: 1.0347x slower Significant (t=-4.189386) Stddev: 0.00158 -> 0.00618: 3.9173x larger Timeline: http://tinyurl.com/y2xs6yp ### regex_v8 ### Min: 0.114305 -> 0.110961: 1.0301x faster Avg: 0.119100 -> 0.113885: 1.0458x faster Significant (t=6.210478) Stddev: 0.00525 -> 0.00278: 1.8876x smaller Timeline: http://tinyurl.com/y5q2nlh ### richards ### Min: 0.376030 -> 0.309641: 1.2144x faster Avg: 0.389031 -> 0.314998: 1.2350x faster Significant (t=29.499544) Stddev: 0.01745 -> 0.00325: 5.3669x smaller Timeline: http://tinyurl.com/y5rh4av ### slowpickle ### Min: 0.800369 -> 0.711095: 1.1255x faster Avg: 0.824734 -> 0.735770: 1.1209x faster Significant (t=19.434640) Stddev: 0.02554 -> 0.01989: 1.2842x smaller Timeline: http://tinyurl.com/y79lh35 ### slowspitfire ### Min: 0.813913 -> 0.761560: 1.0687x faster Avg: 0.829754 -> 0.841118: 1.0137x slower Not significant Stddev: 0.01202 -> 0.05522: 4.5958x larger Timeline: http://tinyurl.com/y4y6f4x ### slowunpickle ### Min: 0.369238 -> 0.296829: 1.2439x faster Avg: 0.384044 -> 0.300151: 1.2795x faster Significant (t=32.788791) Stddev: 0.01766 -> 0.00391: 4.5186x smaller Timeline: http://tinyurl.com/y84c2bp ### startup_nosite ### Min: 0.173227 -> 0.183291: 1.0581x slower Avg: 0.234029 -> 0.235226: 1.0051x slower Not significant Stddev: 0.02222 -> 0.01951: 1.1389x smaller Timeline: http://tinyurl.com/y2esfmd ### threaded_count ### Min: 0.203453 -> 0.084667: 2.4030x faster Avg: 0.263979 -> 0.105661: 2.4984x faster Significant (t=26.001645) Stddev: 0.03833 -> 0.01960: 1.9552x smaller Timeline: http://tinyurl.com/y74qvbf ### unpack_sequence ### Min: 0.000116 -> 0.000108: 1.0728x faster Avg: 0.000121 -> 0.000118: 1.0261x faster Significant (t=13.346440) Stddev: 0.00004 -> 0.00004: 1.0544x larger Timeline: http://tinyurl.com/y6rld7k ### unpickle ### Min: 0.919231 -> 0.922668: 1.0037x slower Avg: 0.936096 -> 0.947798: 1.0125x slower Significant (t=-3.379601) Stddev: 0.01505 -> 0.01931: 1.2834x larger Timeline: http://tinyurl.com/y3ymn85 ### unpickle_list ### Min: 0.690399 -> 0.690025: 1.0005x faster Avg: 0.729519 -> 0.698789: 1.0440x faster Significant (t=11.660568) Stddev: 0.01430 -> 0.01195: 1.1965x smaller Timeline: http://tinyurl.com/y38lfuh Report on Linux sauron 2.6.33-ARCH #1 SMP PREEMPT Sun Apr 4 10:27:30 CEST 2010 x86_64 AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ Total CPU cores: 2 ### 2to3 ### 29.598071 -> 23.691789: 1.2493x faster ### bzr_startup ### Min: 0.083328 -> 0.076661: 1.0870x faster Avg: 0.100727 -> 0.094061: 1.0709x faster Significant (t=5.464159) Stddev: 0.00863 -> 0.00863: 1.0000x larger Timeline: http://tinyurl.com/y6mng7k ### call_method ### Min: 0.796609 -> 0.538237: 1.4800x faster Avg: 0.816184 -> 0.547101: 1.4918x faster Significant (t=92.212665) Stddev: 0.03177 -> 0.01636: 1.9417x smaller Timeline: http://tinyurl.com/yygle37 ### call_method_slots ### Min: 0.780177 -> 0.535730: 1.4563x faster Avg: 0.797951 -> 0.544117: 1.4665x faster Significant (t=104.627536) Stddev: 0.02414 -> 0.01733: 1.3926x smaller Timeline: http://tinyurl.com/y76hawm ### call_method_unknown ### Min: 0.808852 -> 0.610603: 1.3247x faster Avg: 0.821008 -> 0.614395: 1.3363x faster Significant (t=109.946891) Stddev: 0.02158 -> 0.00800: 2.6994x smaller Timeline: http://tinyurl.com/y43e5fl ### call_simple ### Min: 0.602984 -> 0.484837: 1.2437x faster Avg: 0.627628 -> 0.508925: 1.2332x faster Significant (t=56.792486) Stddev: 0.02009 -> 0.01587: 1.2658x smaller Timeline: http://tinyurl.com/yyrerh8 ### float ### Min: 0.145489 -> 0.120753: 1.2048x faster Avg: 0.157275 -> 0.131557: 1.1955x faster Significant (t=29.200486) Stddev: 0.01020 -> 0.00948: 1.0763x smaller Timeline: http://tinyurl.com/y5h4frq ### hg_startup ### Min: 0.013332 -> 0.016666: 1.2501x slower Avg: 0.030811 -> 0.033631: 1.0915x slower Significant (t=-7.625262) Stddev: 0.00610 -> 0.00558: 1.0933x smaller Timeline: http://tinyurl.com/y7c2vbv ### html5lib ### Min: 16.772239 -> 13.632444: 1.2303x faster Avg: 17.400199 -> 13.809100: 1.2601x faster Significant (t=19.710438) Stddev: 0.35648 -> 0.19722: 1.8075x smaller Timeline: http://tinyurl.com/y52q84h ### html5lib_warmup ### Min: 17.155307 -> 13.597860: 1.2616x faster Avg: 17.758442 -> 14.069391: 1.2622x faster Significant (t=12.638530) Stddev: 0.58006 -> 0.29922: 1.9386x smaller Timeline: http://tinyurl.com/y5ragx4 ### iterative_count ### Min: 0.272019 -> 0.144380: 1.8841x faster Avg: 0.321844 -> 0.155405: 2.0710x faster Significant (t=23.655896) Stddev: 0.04319 -> 0.02469: 1.7493x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.46044492722&chco=FF0000,0000FF&chdl=/usr/bin/python|../wpython11/python&chds=0,1.46044492722&chd=t:0.28,0.28,0.28,0.28,0.33,0.33,0.31,0.31,0.29,0.3,0.32,0.35,0.29,0.3,0.29,0.28,0.27,0.27,0.27,0.29,0.32,0.35,0.31,0.28,0.27,0.3,0.35,0.3,0.29,0.28,0.3,0.29,0.31,0.31,0.33,0.32,0.34,0.41,0.34,0.33,0.33,0.34,0.34,0.36,0.4,0.43,0.46,0.41,0.38,0.35|0.3,0.15,0.15,0.15,0.17,0.15,0.15,0.15,0.16,0.15,0.14,0.14,0.2,0.16,0.17,0.19,0.16,0.16,0.16,0.2,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.15,0.16,0.16,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.14,0.15,0.14,0.14,0.15,0.14,0.17,0.15&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=iterative_count ### nbody ### Min: 0.639303 -> 0.496505: 1.2876x faster Avg: 0.663221 -> 0.507123: 1.3078x faster Significant (t=42.102614) Stddev: 0.01815 -> 0.01892: 1.0424x larger Timeline: http://tinyurl.com/y64lglq ### normal_startup ### Min: 0.374472 -> 0.461435: 1.2322x slower Avg: 0.413358 -> 0.515210: 1.2464x slower Significant (t=-17.591195) Stddev: 0.02972 -> 0.02815: 1.0558x smaller Timeline: http://tinyurl.com/y7qj6zz ### nqueens ### Min: 0.698012 -> 0.507417: 1.3756x faster Avg: 0.748165 -> 0.559723: 1.3367x faster Significant (t=21.603119) Stddev: 0.03138 -> 0.05310: 1.6921x larger Timeline: http://tinyurl.com/y3xv95e ### pickle ### Min: 1.584518 -> 1.526627: 1.0379x faster Avg: 1.673835 -> 1.658376: 1.0093x faster Not significant Stddev: 0.06500 -> 0.07568: 1.1644x larger Timeline: http://tinyurl.com/y4224pp ### pickle_dict ### Min: 1.568636 -> 1.498363: 1.0469x faster Avg: 1.618752 -> 1.575946: 1.0272x faster Significant (t=4.120055) Stddev: 0.04758 -> 0.05598: 1.1767x larger Timeline: http://tinyurl.com/yyzl6b5 ### pickle_list ### Min: 0.771403 -> 0.752089: 1.0257x faster Avg: 0.797367 -> 0.778438: 1.0243x faster Significant (t=3.157783) Stddev: 0.02620 -> 0.03332: 1.2721x larger Timeline: http://tinyurl.com/yyp5cjx ### pybench ### Min: 5994 -> 4470: 1.3409x faster Avg: 6250 -> 4781: 1.3073x faster ### regex_compile ### Min: 0.838116 -> 0.664657: 1.2610x faster Avg: 0.846488 -> 0.691629: 1.2239x faster Significant (t=31.710076) Stddev: 0.01236 -> 0.03224: 2.6085x larger Timeline: http://tinyurl.com/y65ceh8 ### regex_effbot ### Min: 0.169898 -> 0.152830: 1.1117x faster Avg: 0.179772 -> 0.158301: 1.1356x faster Significant (t=13.100118) Stddev: 0.00746 -> 0.00887: 1.1895x larger Timeline: http://tinyurl.com/yyazgxh ### regex_v8 ### Min: 0.152255 -> 0.134914: 1.1285x faster Avg: 0.159778 -> 0.144822: 1.1033x faster Significant (t=10.310186) Stddev: 0.00598 -> 0.00834: 1.3944x larger Timeline: http://tinyurl.com/y4znhxx ### richards ### Min: 0.361250 -> 0.281802: 1.2819x faster Avg: 0.384307 -> 0.294562: 1.3047x faster Significant (t=27.621845) Stddev: 0.02043 -> 0.01052: 1.9419x smaller Timeline: http://tinyurl.com/y3hx8w2 ### slowpickle ### Min: 0.826115 -> 0.610384: 1.3534x faster Avg: 0.872314 -> 0.627799: 1.3895x faster Significant (t=43.041072) Stddev: 0.03384 -> 0.02165: 1.5626x smaller Timeline: http://tinyurl.com/y4dr42c ### slowspitfire ### Min: 0.820168 -> 0.697804: 1.1754x faster Avg: 0.840062 -> 0.736274: 1.1410x faster Significant (t=20.687150) Stddev: 0.02540 -> 0.02477: 1.0256x smaller Timeline: http://tinyurl.com/y6cn2c7 ### slowunpickle ### Min: 0.423866 -> 0.306436: 1.3832x faster Avg: 0.431624 -> 0.308273: 1.4001x faster Significant (t=103.485543) Stddev: 0.00781 -> 0.00318: 2.4556x smaller Timeline: http://tinyurl.com/y7p5ugb ### startup_nosite ### Min: 0.182274 -> 0.166099: 1.0974x faster Avg: 0.201290 -> 0.185015: 1.0880x faster Significant (t=8.405736) Stddev: 0.01255 -> 0.01474: 1.1748x larger Timeline: http://tinyurl.com/y26jqjm ### threaded_count ### Min: 0.292005 -> 0.174754: 1.6710x faster Avg: 0.345331 -> 0.191805: 1.8004x faster Significant (t=48.856578) Stddev: 0.02041 -> 0.00877: 2.3267x smaller Timeline: http://tinyurl.com/y6dl2e6 ### unpack_sequence ### Min: 0.000106 -> 0.000091: 1.1684x faster Avg: 0.000114 -> 0.000099: 1.1433x faster Significant (t=21.367174) Stddev: 0.00009 -> 0.00012: 1.2958x larger Timeline: http://tinyurl.com/y2sujno ### unpickle ### Min: 0.908351 -> 0.803020: 1.1312x faster Avg: 0.984448 -> 0.856525: 1.1494x faster Significant (t=19.812585) Stddev: 0.03248 -> 0.03209: 1.0122x smaller Timeline: http://tinyurl.com/y4zmlaj ### unpickle_list ### Min: 0.754476 -> 0.719254: 1.0490x faster Avg: 0.802729 -> 0.759628: 1.0567x faster Significant (t=6.699951) Stddev: 0.03771 -> 0.02544: 1.4821x smaller Timeline: http://tinyurl.com/y6tv2us Report on Linux raffaello 2.6.31.12-0.2-desktop #1 SMP PREEMPT 2010-03-16 21:25:39 +0100 i686 athlon Total CPU cores: 1 ### 2to3 ### 43.432397 -> 43.283420: 1.0034x faster ### bzr_startup ### Min: 0.140979 -> 0.144978: 1.0284x slower Avg: 0.159606 -> 0.157596: 1.0128x faster Significant (t=2.709326) Stddev: 0.00578 -> 0.00465: 1.2418x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.175973&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.175973&chd=t:0.16,0.16,0.16,0.16,0.16,0.16,0.15,0.18,0.16,0.16,0.17,0.17,0.17,0.16,0.16,0.17,0.16,0.16,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.16,0.17,0.15,0.16,0.16,0.15,0.16,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.17,0.16,0.15,0.16,0.16,0.16,0.16,0.15,0.16,0.15,0.16,0.16,0.16,0.17,0.16,0.16,0.16,0.16,0.15,0.16,0.16,0.15,0.16,0.17,0.16,0.15,0.15,0.14,0.16,0.16,0.16,0.16,0.17,0.16,0.15,0.16,0.16,0.17,0.16,0.16,0.16,0.15,0.15,0.17,0.16,0.16,0.16,0.15,0.15,0.16,0.14,0.15,0.17,0.16,0.15,0.16,0.16,0.15,0.16,0.15,0.15|0.16,0.15,0.15,0.15,0.16,0.15,0.16,0.16,0.16,0.15,0.15,0.15,0.16,0.15,0.16,0.15,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.17,0.15,0.15,0.16,0.16,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.15,0.16,0.16,0.15,0.16,0.16,0.16,0.16,0.16,0.16,0.16,0.17,0.16,0.16,0.16,0.16,0.16,0.16,0.15,0.16,0.16,0.15,0.15,0.15,0.16,0.16,0.15,0.15,0.16,0.16,0.15,0.15,0.14,0.16,0.17,0.16,0.15,0.15,0.16,0.16,0.16,0.16,0.15,0.16,0.16,0.16,0.15,0.15,0.16,0.16,0.16,0.16,0.15,0.15,0.15,0.16,0.16,0.16,0.16,0.17&chxl=0:|1|20|40|60|80|100|2:||Iteration|3:||Time+(secs)&chtt=bzr_startup ### call_method ### Min: 1.158909 -> 1.059187: 1.0941x faster Avg: 1.161172 -> 1.113055: 1.0432x faster Significant (t=22.522125) Stddev: 0.00131 -> 0.02613: 19.9944x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.07623100281,2.1763420105&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.07623100281,2.1763420105&chd=t:1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.17,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.17,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16,1.16|1.13,1.16,1.13,1.11,1.11,1.14,1.16,1.08,1.1,1.14,1.14,1.14,1.1,1.13,1.15,1.15,1.13,1.14,1.17,1.11,1.1,1.11,1.14,1.11,1.13,1.11,1.14,1.11,1.11,1.15,1.13,1.18,1.16,1.1,1.1,1.1,1.12,1.08,1.11,1.09,1.09,1.09,1.12,1.16,1.08,1.1,1.08,1.12,1.13,1.15,1.14,1.16,1.13,1.14,1.16,1.09,1.14,1.15,1.13,1.11,1.1,1.09,1.1,1.1,1.11,1.11,1.11,1.14,1.15,1.12,1.13,1.14,1.16,1.14,1.09&chxl=0:|1|15|30|45|60|75|2:||Iteration|3:||Time+(secs)&chtt=call_method ### call_method_slots ### Min: 1.149059 -> 1.078626: 1.0653x faster Avg: 1.151797 -> 1.143283: 1.0074x faster Significant (t=3.330294) Stddev: 0.00124 -> 0.03128: 25.1750x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.09424901009,2.22079586983&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.09424901009,2.22079586983&chd=t:1.15,1.16,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.16,1.16,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15,1.15|1.16,1.13,1.2,1.16,1.16,1.15,1.17,1.13,1.14,1.13,1.16,1.19,1.13,1.17,1.17,1.14,1.11,1.14,1.19,1.2,1.17,1.22,1.2,1.14,1.14,1.15,1.21,1.16,1.19,1.1,1.15,1.13,1.15,1.13,1.09,1.18,1.18,1.14,1.13,1.13,1.12,1.15,1.18,1.17,1.19,1.21,1.19,1.19,1.22,1.18,1.18,1.17,1.16,1.16,1.18,1.18,1.16,1.16,1.16,1.16,1.17,1.16,1.14,1.13,1.12,1.14,1.15,1.19,1.14,1.15,1.15,1.15,1.15,1.13,1.1&chxl=0:|1|15|30|45|60|75|2:||Iteration|3:||Time+(secs)&chtt=call_method_slots ### call_method_unknown ### Min: 1.170848 -> 1.155544: 1.0132x faster Avg: 1.180379 -> 1.201501: 1.0179x slower Significant (t=-9.479015) Stddev: 0.01125 -> 0.02487: 2.2110x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.17149400711,2.26189613342&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.17149400711,2.26189613342&chd=t:1.19,1.2,1.21,1.19,1.2,1.2,1.18,1.17,1.18,1.17,1.17,1.19,1.21,1.2,1.17,1.17,1.17,1.17,1.19,1.19,1.17,1.18,1.17,1.18,1.17,1.2,1.17,1.22,1.2,1.19,1.2,1.19,1.2,1.19,1.19,1.21,1.2,1.19,1.2,1.2,1.19,1.19,1.19,1.19,1.19,1.2,1.21,1.19,1.17,1.19,1.18,1.17,1.19,1.17,1.17,1.19,1.17,1.18,1.17,1.17,1.17,1.17,1.18,1.17,1.19,1.17,1.18,1.18,1.17,1.18,1.17,1.17,1.17,1.19,1.2|1.26,1.24,1.21,1.2,1.22,1.23,1.22,1.21,1.22,1.24,1.23,1.2,1.21,1.25,1.23,1.2,1.19,1.19,1.2,1.19,1.2,1.24,1.2,1.19,1.2,1.21,1.24,1.22,1.24,1.19,1.18,1.2,1.21,1.18,1.2,1.21,1.2,1.17,1.19,1.19,1.22,1.2,1.2,1.19,1.2,1.2,1.18,1.2,1.23,1.24,1.25,1.23,1.21,1.19,1.2,1.24,1.24,1.21,1.23,1.24,1.23,1.24,1.18,1.2,1.19,1.21,1.23,1.24,1.25,1.24,1.23,1.24,1.23,1.2,1.2&chxl=0:|1|15|30|45|60|75|2:||Iteration|3:||Time+(secs)&chtt=call_method_unknown ### call_simple ### Min: 0.905800 -> 0.908177: 1.0026x slower Avg: 0.911217 -> 0.942381: 1.0342x slower Significant (t=-18.575059) Stddev: 0.00579 -> 0.01972: 3.4054x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.98918581009&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.98918581009&chd=t:0.91,0.91,0.92,0.91,0.92,0.91,0.91,0.91,0.91,0.92,0.92,0.92,0.91,0.93,0.91,0.91,0.91,0.92,0.92,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.93,0.91,0.92,0.93,0.92,0.91,0.92,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.92,0.93,0.91,0.93,0.91,0.91,0.91,0.91,0.91,0.91,0.91,0.92,0.92,0.91,0.93,0.92,0.91,0.91,0.91,0.91,0.91,0.91,0.92,0.93,0.93,0.93,0.91,0.91,0.91,0.91,0.91,0.91|0.99,0.95,0.93,0.95,0.95,0.94,0.94,0.96,0.93,0.97,0.96,0.95,0.97,0.94,0.96,0.95,0.95,0.94,0.97,0.96,0.94,0.96,0.98,0.93,0.94,0.96,0.97,0.94,0.97,0.97,0.95,0.95,0.94,0.96,0.96,0.93,0.92,0.95,0.96,0.97,0.92,0.95,0.96,0.94,0.91,0.96,0.97,0.95,0.94,0.95,0.92,0.95,0.95,0.97,0.93,0.94,0.95,0.96,0.97,0.94,0.96,0.96,0.95,0.94,0.99,0.99,0.97,0.94,0.97,0.97,0.96,0.95,0.96,0.98,0.95&chxl=0:|1|15|30|45|60|75|2:||Iteration|3:||Time+(secs)&chtt=call_simple ### float ### Min: 0.222201 -> 0.224009: 1.0081x slower Avg: 0.232227 -> 0.239783: 1.0325x slower Significant (t=-9.550820) Stddev: 0.00855 -> 0.00913: 1.0688x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.26341700554&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.26341700554&chd=t:0.24,0.24,0.25,0.25,0.24,0.24,0.25,0.24,0.25,0.24,0.24,0.24,0.24,0.23,0.25,0.24,0.25,0.23,0.25,0.23,0.24,0.25,0.24,0.23,0.24,0.25,0.25,0.25,0.24,0.25,0.24,0.24,0.23,0.25,0.24,0.25,0.23,0.25,0.24,0.24,0.25,0.25,0.23,0.24,0.24,0.24,0.25,0.24,0.24,0.24,0.25,0.23,0.25,0.24,0.24,0.23,0.25,0.23,0.24,0.25,0.24,0.23,0.24,0.24,0.24,0.25,0.24,0.25,0.24,0.25,0.24,0.25,0.24,0.25,0.23,0.25,0.24,0.25,0.25,0.25,0.23,0.24,0.25,0.24|0.25,0.25,0.26,0.26,0.24,0.25,0.26,0.25,0.26,0.25,0.25,0.25,0.25,0.24,0.26,0.25,0.25,0.24,0.25,0.24,0.25,0.26,0.25,0.24,0.25,0.25,0.26,0.26,0.25,0.25,0.25,0.26,0.24,0.26,0.25,0.25,0.25,0.26,0.25,0.26,0.26,0.25,0.23,0.24,0.25,0.24,0.25,0.24,0.25,0.24,0.24,0.23,0.25,0.24,0.26,0.24,0.25,0.25,0.25,0.25,0.25,0.23,0.25,0.25,0.24,0.25,0.25,0.25,0.25,0.26,0.24,0.26,0.25,0.25,0.24,0.26,0.24,0.25,0.26,0.25,0.24,0.25,0.25,0.25&chxl=0:|1|17|34|51|68|84|2:||Iteration|3:||Time+(secs)&chtt=float ### hg_startup ### Min: 0.045993 -> 0.048992: 1.0652x slower Avg: 0.057321 -> 0.056441: 1.0156x faster Significant (t=4.488042) Stddev: 0.00319 -> 0.00301: 1.0620x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.06599&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.06599&chd=t:0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06|0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.07,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.05,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06,0.06&chxl=0:|1|20|40|60|80|100|2:||Iteration|3:||Time+(secs)&chtt=hg_startup ### html5lib ### Min: 26.507970 -> 25.616106: 1.0348x faster Avg: 26.597557 -> 25.732688: 1.0336x faster Significant (t=9.827764) Stddev: 0.09216 -> 0.17386: 1.8865x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,24.616106,27.70594&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=24.616106,27.70594&chd=t:26.51,26.71,26.54,26.69,26.55|25.68,25.62,25.68,26.04,25.65&chxl=0:|1|2|3|4|5|2:||Iteration|3:||Time+(secs)&chtt=html5lib ### html5lib_warmup ### Min: 25.655162 -> 25.466228: 1.0074x faster Avg: 26.110781 -> 25.898441: 1.0082x faster Not significant Stddev: 0.26144 -> 0.25576: 1.0222x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,24.4662280083,27.2955319881&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=24.4662280083,27.2955319881&chd=t:25.66,26.3,26.27,26.18,26.16|25.47,26.04,26.12,25.89,25.97&chxl=0:|1|2|3|4|5|2:||Iteration|3:||Time+(secs)&chtt=html5lib_warmup ### iterative_count ### Min: 0.369361 -> 0.223053: 1.6559x faster Avg: 0.371506 -> 0.240774: 1.5430x faster Significant (t=72.130793) Stddev: 0.00198 -> 0.01266: 6.3935x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.38339400291&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.38339400291&chd=t:0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.38,0.37|0.25,0.25,0.23,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.23,0.22,0.23,0.23,0.23,0.23,0.24,0.25,0.25,0.25,0.24,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.24,0.25,0.25,0.25,0.24,0.25,0.25,0.25,0.24,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.24,0.25&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=iterative_count ### nbody ### Min: 0.935157 -> 0.931795: 1.0036x faster Avg: 0.946445 -> 0.943684: 1.0029x faster Significant (t=2.384189) Stddev: 0.00409 -> 0.00709: 1.7332x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.95390200615&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.95390200615&chd=t:0.94,0.95,0.94,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.94,0.95,0.94,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.94,0.95,0.95,0.95,0.95,0.94,0.94,0.94,0.94,0.94,0.94,0.94,0.94,0.95,0.94,0.95,0.94,0.95,0.95,0.95,0.95|0.94,0.93,0.94,0.95,0.94,0.95,0.94,0.93,0.93,0.94,0.95,0.93,0.94,0.94,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.95,0.94,0.94,0.93,0.93,0.95,0.94,0.93,0.94,0.94,0.94,0.95,0.95,0.95,0.94,0.94,0.95,0.93,0.94,0.94,0.95,0.95,0.95,0.93,0.95,0.95,0.94,0.95,0.94&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=nbody ### normal_startup ### Min: 0.685616 -> 0.676500: 1.0135x faster Avg: 0.686916 -> 0.678582: 1.0123x faster Significant (t=31.273550) Stddev: 0.00078 -> 0.00171: 2.1897x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.69004797935&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.69004797935&chd=t:0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69,0.69|0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68,0.68&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=normal_startup ### nqueens ### Min: 0.980723 -> 0.947436: 1.0351x faster Avg: 0.989169 -> 0.954421: 1.0364x faster Significant (t=46.434070) Stddev: 0.00394 -> 0.00353: 1.1181x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.99711680412&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.99711680412&chd=t:0.99,0.99,0.99,0.99,0.99,0.99,1.0,0.99,0.99,0.99,0.99,0.99,1.0,0.99,0.98,0.99,0.99,0.99,0.99,0.99,0.98,0.98,0.98,0.98,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.98,0.98,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.99,0.98,0.99,0.99,0.99|0.95,0.96,0.95,0.96,0.95,0.96,0.95,0.96,0.95,0.96,0.96,0.95,0.95,0.95,0.95,0.95,0.96,0.95,0.96,0.96,0.95,0.95,0.95,0.95,0.96,0.96,0.96,0.95,0.95,0.95,0.95,0.96,0.96,0.95,0.96,0.96,0.95,0.96,0.95,0.95,0.95,0.96,0.96,0.96,0.95,0.96,0.95,0.96,0.95,0.96&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=nqueens ### pickle ### Min: 3.346728 -> 3.398232: 1.0154x slower Avg: 3.367508 -> 3.415437: 1.0142x slower Significant (t=-28.797501) Stddev: 0.00840 -> 0.00824: 1.0186x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,2.34672808647,4.43019509315&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=2.34672808647,4.43019509315&chd=t:3.37,3.37,3.38,3.37,3.36,3.38,3.36,3.35,3.37,3.37,3.37,3.36,3.38,3.36,3.38,3.37,3.36,3.37,3.37,3.37,3.35,3.37,3.36,3.38,3.37,3.37,3.38,3.37,3.36,3.37,3.36,3.36,3.37,3.36,3.37,3.36,3.37,3.36,3.38,3.38,3.36,3.37,3.36,3.37,3.37,3.36,3.37,3.39,3.38,3.36|3.4,3.4,3.41,3.43,3.41,3.42,3.41,3.42,3.41,3.41,3.41,3.41,3.41,3.4,3.41,3.43,3.42,3.41,3.41,3.41,3.41,3.42,3.41,3.41,3.4,3.41,3.42,3.42,3.43,3.43,3.42,3.42,3.41,3.42,3.41,3.41,3.42,3.42,3.43,3.41,3.43,3.41,3.42,3.42,3.43,3.42,3.42,3.41,3.41,3.41&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=pickle ### pickle_dict ### Min: 3.395274 -> 3.338732: 1.0169x faster Avg: 3.513604 -> 3.359646: 1.0458x faster Significant (t=16.225759) Stddev: 0.06605 -> 0.01182: 5.5896x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,2.33873200417,4.60737299919&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=2.33873200417,4.60737299919&chd=t:3.49,3.56,3.59,3.57,3.51,3.46,3.59,3.55,3.4,3.52,3.49,3.59,3.57,3.43,3.55,3.6,3.5,3.43,3.43,3.56,3.42,3.44,3.52,3.53,3.6,3.6,3.41,3.46,3.4,3.46,3.4,3.54,3.57,3.54,3.55,3.6,3.58,3.48,3.48,3.42,3.6,3.57,3.4,3.61,3.5,3.51,3.45,3.54,3.54,3.57|3.36,3.35,3.34,3.36,3.38,3.37,3.36,3.35,3.35,3.35,3.36,3.36,3.34,3.36,3.37,3.36,3.36,3.35,3.38,3.34,3.36,3.37,3.39,3.38,3.35,3.36,3.35,3.35,3.36,3.36,3.36,3.34,3.37,3.36,3.38,3.37,3.38,3.35,3.35,3.36,3.36,3.38,3.37,3.34,3.35,3.35,3.35,3.36,3.35,3.36&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=pickle_dict ### pickle_list ### Min: 1.720434 -> 1.708855: 1.0068x faster Avg: 1.762757 -> 1.719942: 1.0249x faster Significant (t=11.198322) Stddev: 0.02604 -> 0.00727: 3.5808x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.70885491371,2.81176018715&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.70885491371,2.81176018715&chd=t:1.77,1.77,1.8,1.79,1.76,1.75,1.77,1.8,1.8,1.78,1.74,1.79,1.81,1.77,1.74,1.76,1.8,1.79,1.78,1.73,1.78,1.8,1.77,1.76,1.79,1.78,1.75,1.72,1.72,1.72,1.72,1.76,1.74,1.79,1.8,1.75,1.74,1.72,1.75,1.73,1.73,1.79,1.73,1.74,1.75,1.77,1.74,1.76,1.77,1.78|1.72,1.71,1.71,1.71,1.72,1.73,1.71,1.71,1.71,1.71,1.73,1.72,1.71,1.71,1.71,1.73,1.73,1.73,1.74,1.73,1.72,1.71,1.71,1.72,1.71,1.73,1.72,1.71,1.72,1.72,1.73,1.73,1.71,1.72,1.72,1.73,1.72,1.72,1.73,1.73,1.73,1.73,1.73,1.73,1.71,1.71,1.72,1.72,1.72,1.72&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=pickle_list ### pybench ### Min: 8937 -> 8141: 1.0978x faster Avg: 9069 -> 8266: 1.0971x faster ### regex_compile ### Min: 1.297481 -> 1.230614: 1.0543x faster Avg: 1.303290 -> 1.235283: 1.0551x faster Significant (t=120.657667) Stddev: 0.00304 -> 0.00257: 1.1834x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.23061418533,2.31539511681&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.23061418533,2.31539511681&chd=t:1.31,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.31,1.3,1.31,1.3,1.3,1.3,1.31,1.31,1.3,1.31,1.3,1.3,1.3,1.3,1.3,1.31,1.3,1.3,1.31,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.3,1.31,1.3,1.3,1.3,1.3,1.3,1.3,1.32|1.23,1.24,1.24,1.23,1.24,1.23,1.24,1.24,1.24,1.23,1.23,1.23,1.24,1.23,1.23,1.23,1.24,1.24,1.24,1.24,1.24,1.24,1.24,1.25,1.24,1.24,1.24,1.24,1.24,1.23,1.23,1.23,1.23,1.23,1.23,1.23,1.24,1.24,1.23,1.23,1.24,1.23,1.24,1.23,1.23,1.23,1.24,1.23,1.23,1.23&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=regex_compile ### regex_effbot ### Min: 0.238711 -> 0.234200: 1.0193x faster Avg: 0.239331 -> 0.236123: 1.0136x faster Significant (t=19.737486) Stddev: 0.00050 -> 0.00104: 2.0828x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.24141407013&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.24141407013&chd=t:0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24|0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.24&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=regex_effbot ### regex_v8 ### Min: 0.229685 -> 0.217755: 1.0548x faster Avg: 0.232979 -> 0.219208: 1.0628x faster Significant (t=36.278688) Stddev: 0.00217 -> 0.00157: 1.3824x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.23589801788&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.23589801788&chd=t:0.23,0.24,0.23,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.24,0.23,0.23,0.23,0.24,0.23,0.24,0.23,0.23,0.23,0.24,0.23,0.24,0.23,0.23,0.23,0.24,0.23,0.23,0.23,0.24,0.23,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.24,0.23,0.24|0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.23,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22,0.22&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=regex_v8 ### richards ### Min: 0.543314 -> 0.504176: 1.0776x faster Avg: 0.550139 -> 0.542886: 1.0134x faster Significant (t=3.118548) Stddev: 0.00397 -> 0.01596: 4.0203x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.57444500923&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.57444500923&chd=t:0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.56,0.55,0.56,0.55,0.55,0.54,0.54,0.55,0.55,0.55,0.54,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.56,0.56,0.56,0.56,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55|0.53,0.55,0.55,0.55,0.55,0.54,0.53,0.53,0.55,0.54,0.55,0.55,0.56,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.55,0.54,0.56,0.53,0.51,0.56,0.51,0.56,0.52,0.56,0.51,0.54,0.57,0.53,0.57,0.52,0.57,0.54,0.54,0.53,0.53,0.52,0.54,0.5&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=richards ### slowpickle ### Min: 1.453602 -> 1.361336: 1.0678x faster Avg: 1.459776 -> 1.370334: 1.0653x faster Significant (t=102.747004) Stddev: 0.00249 -> 0.00563: 2.2567x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.36133599281,2.46742391586&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.36133599281,2.46742391586&chd=t:1.46,1.46,1.46,1.46,1.46,1.45,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.45,1.47,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46,1.46|1.37,1.38,1.37,1.36,1.38,1.38,1.38,1.37,1.37,1.37,1.37,1.38,1.38,1.38,1.38,1.37,1.37,1.36,1.36,1.36,1.36,1.36,1.36,1.36,1.37,1.37,1.36,1.37,1.37,1.37,1.37,1.37,1.37,1.38,1.38,1.38,1.38,1.37,1.37,1.37,1.37,1.37,1.38,1.38,1.37,1.37,1.37,1.36,1.36,1.36&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=slowpickle ### slowspitfire ### Min: 1.507587 -> 1.393345: 1.0820x faster Avg: 1.512317 -> 1.405533: 1.0760x faster Significant (t=83.955024) Stddev: 0.00415 -> 0.00798: 1.9254x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.39334487915,2.53158593178&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.39334487915,2.53158593178&chd=t:1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.53,1.52,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.52,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.51,1.53,1.51,1.51,1.51,1.51,1.52,1.51,1.51,1.51,1.51|1.41,1.41,1.42,1.39,1.41,1.41,1.4,1.42,1.4,1.39,1.4,1.39,1.4,1.4,1.42,1.41,1.4,1.4,1.42,1.4,1.4,1.41,1.42,1.4,1.42,1.41,1.41,1.41,1.41,1.41,1.4,1.4,1.4,1.4,1.41,1.41,1.41,1.4,1.41,1.4,1.4,1.4,1.41,1.41,1.42,1.41,1.4,1.41,1.4,1.4&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=slowspitfire ### slowunpickle ### Min: 0.692674 -> 0.645382: 1.0733x faster Avg: 0.695322 -> 0.648033: 1.0730x faster Significant (t=102.284826) Stddev: 0.00177 -> 0.00275: 1.5551x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.70394492149&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.70394492149&chd=t:0.69,0.69,0.7,0.69,0.69,0.69,0.7,0.69,0.7,0.7,0.69,0.7,0.69,0.7,0.69,0.7,0.69,0.7,0.7,0.7,0.7,0.69,0.7,0.69,0.7,0.69,0.7,0.69,0.69,0.7,0.69,0.7,0.7,0.69,0.69,0.7,0.7,0.7,0.69,0.7,0.7,0.7,0.7,0.69,0.7,0.7,0.69,0.7,0.7,0.7|0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.65,0.66,0.65,0.65,0.65,0.65&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=slowunpickle ### startup_nosite ### Min: 0.247376 -> 0.246369: 1.0041x faster Avg: 0.249051 -> 0.248113: 1.0038x faster Significant (t=6.716428) Stddev: 0.00109 -> 0.00088: 1.2345x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.25523996353&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.25523996353&chd=t:0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.26,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25|0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25,0.25&chxl=0:|1|20|40|60|80|100|2:||Iteration|3:||Time+(secs)&chtt=startup_nosite ### threaded_count ### Min: 0.373155 -> 0.227307: 1.6416x faster Avg: 0.374912 -> 0.234906: 1.5960x faster Significant (t=224.886947) Stddev: 0.00110 -> 0.00426: 3.8673x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.37840795517&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.37840795517&chd=t:0.37,0.37,0.38,0.37,0.37,0.37,0.38,0.38,0.37,0.38,0.37,0.37,0.37,0.38,0.38,0.38,0.37,0.38,0.38,0.38,0.38,0.38,0.37,0.37,0.37,0.38,0.37,0.38,0.37,0.37,0.37,0.37,0.38,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.37,0.38,0.37,0.38,0.37,0.37,0.38,0.38,0.37,0.37|0.24,0.24,0.24,0.24,0.23,0.23,0.23,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.24,0.24,0.23,0.23,0.24,0.24,0.24,0.24,0.24,0.24,0.24,0.23,0.23,0.23,0.23,0.23,0.23,0.23,0.25,0.24,0.25,0.24,0.24,0.23,0.24,0.23&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=threaded_count ### unpack_sequence ### Min: 0.000150 -> 0.000159: 1.0605x slower Avg: 0.000153 -> 0.000161: 1.0550x slower Significant (t=-450.521988) Stddev: 0.00000 -> 0.00000: 1.2070x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0,1.00053215027&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0,1.00053215027&chd=t:0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0|0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0&chxl=0:|1|20|40|60|80|100|2:||Iteration|3:||Time+(secs)&chtt=unpack_sequence ### unpickle ### Min: 2.042838 -> 2.023408: 1.0096x faster Avg: 2.054084 -> 2.037836: 1.0080x faster Significant (t=13.396235) Stddev: 0.00551 -> 0.00657: 1.1931x larger Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,1.0234079361,3.0667848587&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=1.0234079361,3.0667848587&chd=t:2.06,2.06,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.06,2.06,2.06,2.06,2.05,2.06,2.06,2.06,2.06,2.05,2.04,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.04,2.04,2.05,2.06,2.06,2.06,2.06,2.06,2.06,2.06,2.07,2.06,2.06,2.06,2.05,2.06|2.04,2.04,2.04,2.04,2.04,2.02,2.04,2.03,2.02,2.04,2.04,2.04,2.03,2.03,2.04,2.04,2.04,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.05,2.03,2.03,2.03,2.03,2.03,2.03,2.03,2.04,2.03,2.03,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.04,2.03,2.03,2.03&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=unpickle ### unpickle_list ### Min: 1.542357 -> 1.645569: 1.0669x slower Avg: 1.554601 -> 1.654697: 1.0644x slower Significant (t=-93.061602) Stddev: 0.00647 -> 0.00400: 1.6147x smaller Timeline: http://chart.apis.google.com/chart?cht=lc&chs=700x400&chxt=x,y,x,y&chxr=1,0.54235696793,2.66085600853&chco=FF0000,0000FF&chdl=/btrfs/src/Python-2.6.4/python|/btrfs/src/wpython2-wpython11/python&chds=0.54235696793,2.66085600853&chd=t:1.56,1.55,1.55,1.56,1.55,1.56,1.56,1.56,1.56,1.55,1.55,1.54,1.56,1.55,1.55,1.54,1.55,1.55,1.55,1.54,1.54,1.55,1.55,1.54,1.55,1.54,1.55,1.55,1.56,1.55,1.55,1.55,1.56,1.55,1.56,1.56,1.56,1.56,1.56,1.56,1.56,1.56,1.56,1.57,1.56,1.56,1.56,1.56,1.57,1.56|1.66,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.65,1.66,1.66,1.65,1.66,1.66,1.65,1.66,1.66,1.66,1.66,1.66,1.66,1.65,1.66,1.65,1.66,1.66,1.66,1.66,1.66,1.65,1.66,1.66,1.65,1.66,1.66,1.66,1.66,1.66,1.65,1.66&chxl=0:|1|10|20|30|40|50|2:||Iteration|3:||Time+(secs)&chtt=unpickle_list Report on Darwin unknown-00-1e-c2-bc-ea-b3.config 10.3.0 Darwin Kernel Version 10.3.0: Fri Feb 26 11:58:09 PST 2010; root:xnu-1504.3.12~1/RELEASE_I386 i386 i386 Total CPU cores: 2 ### 2to3 ### 25.590659 -> 23.666681: 1.0813x faster ### bzr_startup ### Min: 0.102069 -> 0.099751: 1.0232x faster Avg: 0.102827 -> 0.100411: 1.0241x faster Significant (t=20.360035) Stddev: 0.00072 -> 0.00094: 1.3152x larger Timeline: http://tinyurl.com/y6yjv5w ### call_method ### Min: 0.606348 -> 0.548343: 1.1058x faster Avg: 0.609875 -> 0.556685: 1.0955x faster Significant (t=54.742949) Stddev: 0.00303 -> 0.01151: 3.7924x larger Timeline: http://tinyurl.com/y7wkkmp ### call_method_slots ### Min: 0.641415 -> 0.549939: 1.1663x faster Avg: 0.648512 -> 0.571999: 1.1338x faster Significant (t=66.043832) Stddev: 0.01162 -> 0.00815: 1.4253x smaller Timeline: http://tinyurl.com/y7mlu86 ### call_method_unknown ### Min: 0.675142 -> 0.613596: 1.1003x faster Avg: 0.685377 -> 0.616531: 1.1117x faster Significant (t=35.991776) Stddev: 0.02328 -> 0.00260: 8.9669x smaller Timeline: http://tinyurl.com/y6p65wk ### call_simple ### Min: 0.443526 -> 0.425943: 1.0413x faster Avg: 0.447255 -> 0.442844: 1.0100x faster Significant (t=4.469438) Stddev: 0.00569 -> 0.01066: 1.8738x larger Timeline: http://tinyurl.com/y8xbq2f ### float ### Min: 0.102775 -> 0.096776: 1.0620x faster Avg: 0.110484 -> 0.102809: 1.0747x faster Significant (t=13.220150) Stddev: 0.00738 -> 0.00546: 1.3507x smaller Timeline: http://tinyurl.com/yyhutwh ### hg_startup ### Min: 0.045108 -> 0.043234: 1.0433x faster Avg: 0.046845 -> 0.043972: 1.0653x faster Significant (t=28.354118) Stddev: 0.00206 -> 0.00095: 2.1622x smaller Timeline: http://tinyurl.com/y5b9xx5 ### html5lib ### Min: 15.549443 -> 14.847499: 1.0473x faster Avg: 15.582542 -> 14.859007: 1.0487x faster Significant (t=64.534012) Stddev: 0.02167 -> 0.01261: 1.7190x smaller Timeline: http://tinyurl.com/y3g6t44 ### html5lib_warmup ### Min: 15.770884 -> 15.074864: 1.0462x faster Avg: 16.133120 -> 15.319287: 1.0531x faster Significant (t=4.375747) Stddev: 0.30506 -> 0.28266: 1.0793x smaller Timeline: http://tinyurl.com/y2xcn3m ### iterative_count ### Min: 0.147178 -> 0.085756: 1.7162x faster Avg: 0.151184 -> 0.088620: 1.7060x faster Significant (t=49.925293) Stddev: 0.00651 -> 0.00601: 1.0834x smaller Timeline: http://tinyurl.com/yybv496 ### nbody ### Min: 0.471700 -> 0.463253: 1.0182x faster Avg: 0.483086 -> 0.475017: 1.0170x faster Significant (t=3.488633) Stddev: 0.01129 -> 0.01183: 1.0477x larger Timeline: http://tinyurl.com/y6lrfst ### normal_startup ### Min: 0.811946 -> 0.789491: 1.0284x faster Avg: 0.854893 -> 0.819687: 1.0430x faster Significant (t=5.095698) Stddev: 0.03899 -> 0.02943: 1.3249x smaller Timeline: http://tinyurl.com/yydc2u4 ### nqueens ### Min: 0.597376 -> 0.570333: 1.0474x faster Avg: 0.606725 -> 0.588271: 1.0314x faster Significant (t=5.653285) Stddev: 0.00920 -> 0.02117: 2.3015x larger Timeline: http://tinyurl.com/y3n2fg3 ### pickle ### Min: 1.651874 -> 1.574163: 1.0494x faster Avg: 1.680315 -> 1.612453: 1.0421x faster Significant (t=10.340275) Stddev: 0.02313 -> 0.04023: 1.7395x larger Timeline: http://tinyurl.com/y7r55ms ### pickle_dict ### Min: 1.308464 -> 1.275010: 1.0262x faster Avg: 1.318127 -> 1.296507: 1.0167x faster Significant (t=4.484688) Stddev: 0.00605 -> 0.03355: 5.5471x larger Timeline: http://tinyurl.com/y4j9v5q ### pickle_list ### Min: 0.743117 -> 0.803173: 1.0808x slower Avg: 0.751905 -> 0.810111: 1.0774x slower Significant (t=-44.249464) Stddev: 0.00663 -> 0.00652: 1.0172x smaller Timeline: http://tinyurl.com/y633yb6 ### pybench ### Min: 4763 -> 4342: 1.0970x faster Avg: 4988 -> 4463: 1.1176x faster ### regex_compile ### Min: 0.740278 -> 0.661458: 1.1192x faster Avg: 0.764527 -> 0.685639: 1.1151x faster Significant (t=15.011621) Stddev: 0.02380 -> 0.02854: 1.1995x larger Timeline: http://tinyurl.com/y524doe ### regex_effbot ### Min: 0.096349 -> 0.096083: 1.0028x faster Avg: 0.100523 -> 0.099285: 1.0125x faster Not significant Stddev: 0.00504 -> 0.00327: 1.5444x smaller Timeline: http://tinyurl.com/y3e6z2j ### regex_v8 ### Min: 0.107875 -> 0.104745: 1.0299x faster Avg: 0.114243 -> 0.109286: 1.0454x faster Significant (t=2.325803) Stddev: 0.01377 -> 0.00612: 2.2522x smaller Timeline: http://tinyurl.com/y4qvh3d ### richards ### Min: 0.329455 -> 0.286851: 1.1485x faster Avg: 0.340571 -> 0.298913: 1.1394x faster Significant (t=13.324069) Stddev: 0.01252 -> 0.01822: 1.4556x larger Timeline: http://tinyurl.com/y3d8zxk ### slowpickle ### Min: 0.717864 -> 0.646023: 1.1112x faster Avg: 0.748511 -> 0.659941: 1.1342x faster Significant (t=17.041455) Stddev: 0.03039 -> 0.02067: 1.4701x smaller Timeline: http://tinyurl.com/y5ht5y5 ### slowspitfire ### Min: 0.797233 -> 0.762146: 1.0460x faster Avg: 0.839011 -> 0.812074: 1.0332x faster Significant (t=4.203713) Stddev: 0.02803 -> 0.03560: 1.2699x larger Timeline: http://tinyurl.com/y7owc3g ### slowunpickle ### Min: 0.320963 -> 0.289625: 1.1082x faster Avg: 0.325532 -> 0.293422: 1.1094x faster Significant (t=17.014061) Stddev: 0.00791 -> 0.01075: 1.3598x larger Timeline: http://tinyurl.com/y5dcwdj ### startup_nosite ### Min: 0.210807 -> 0.219255: 1.0401x slower Avg: 0.222933 -> 0.232971: 1.0450x slower Significant (t=-4.776980) Stddev: 0.01592 -> 0.01372: 1.1601x smaller Timeline: http://tinyurl.com/y2cexr7 ### threaded_count ### Min: 0.195203 -> 0.113455: 1.7205x faster Avg: 0.225064 -> 0.176248: 1.2770x faster Significant (t=12.769360) Stddev: 0.00850 -> 0.02566: 3.0192x larger Timeline: http://tinyurl.com/y74c4w3 ### unpack_sequence ### Min: 0.000092 -> 0.000083: 1.1095x faster Avg: 0.000094 -> 0.000085: 1.1058x faster Significant (t=61.506288) Stddev: 0.00002 -> 0.00002: 1.1541x smaller Timeline: http://tinyurl.com/yykzcrg ### unpickle ### Min: 1.026543 -> 1.018970: 1.0074x faster Avg: 1.048295 -> 1.042098: 1.0059x faster Not significant Stddev: 0.01646 -> 0.03854: 2.3408x larger Timeline: http://tinyurl.com/y786tft ### unpickle_list ### Min: 0.908621 -> 0.905129: 1.0039x faster Avg: 0.926660 -> 0.928462: 1.0019x slower Not significant Stddev: 0.01631 -> 0.01509: 1.0806x smaller Timeline: http://tinyurl.com/y5m6s3u From ncoghlan at gmail.com Wed Jun 23 12:58:00 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 23 Jun 2010 20:58:00 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C21D15F.8070304@egenix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <4C20FC54.9000608@egenix.com> <4C21D15F.8070304@egenix.com> Message-ID: On Wed, Jun 23, 2010 at 7:18 PM, M.-A. Lemburg wrote: > Note that the point of using a builtin method was to get > better performance. Such type adaptions are often needed in > loops, so adding a few extra Python function calls just to > convert a str object to a bytes object or vice-versa is a > bit much overhead. I actually agree with that, I just think we need more real world experience as to what works with the Python 3 text model before we start messing with the APIs for the builtin objects (fair point that "coerce" is a loaded term given the existence of the old coercion protocol. It's the right word for the task though). One of the key points coming out of this thread (to my mind) is the lack of a Text ABC or other way of making an object that can be passed to functions expecting a str instance with a reasonable expectation of having it work. Are there some core string capabilities that can be identified and then expanded out to a full str-compatible API? (i.e. something along the lines of what collections.MutableMapping now provides for dict-alikes). However, even if something like that was added, PJE is correct in pointing out that builtin strings still don't play well with others in many cases (usually due to underlying optimisations or other sound reasons, but perhaps sometimes gratuitously). Most of the string binary operations can be dealt with through their reflected forms, but str.__mod__ will never return NotImplemented, __contains__ has no reflected form and the actual method calls are of course right out (e.g. the arguments to str.join() or str.split() calls have no ability to affect the type of the result). Third party number implementations couldn't provide comparable funtionality to builtin int and long objects until the __index__ protocol was added. Perhaps PJE is right that what this is really crying out for is a way to have third party "real string" implementations so that there can actually be genuine experimentation in the Unicode handling space outside the language core (comparable to the difference between the "you can turn me into an int" __int__ method and the "I am an int equivalent" __index__ method). That may be tapping in a nail with a sledgehammer (and would raise significant moratorium questions if pursued further), but I think it's a valid question to at least ask. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Wed Jun 23 13:12:40 2010 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 23 Jun 2010 21:12:40 +1000 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: References: Message-ID: <201006232112.41047.steve@pearwood.info> On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote: > I've released WPython 1.1, which brings many optimizations and > refactorings. For those of us who don't know what WPython is, and are too lazy, too busy, or reading their email off-line, could you give us a one short paragraph description of what it is? Actually, since I'm none of the above, I'll answer my own question: WPython is an implementation of Python that uses 16-bit wordcodes instead of byte code, and claims to have various performance benefits from doing so. It looks like good work, thank you. -- Steven D'Aprano From cesare.di.mauro at gmail.com Wed Jun 23 13:28:58 2010 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Wed, 23 Jun 2010 13:28:58 +0200 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: <201006232112.41047.steve@pearwood.info> References: <201006232112.41047.steve@pearwood.info> Message-ID: 2010/6/23 Steven D'Aprano > On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote: > > I've released WPython 1.1, which brings many optimizations and > > refactorings. > > For those of us who don't know what WPython is, and are too lazy, too > busy, or reading their email off-line, could you give us a one short > paragraph description of what it is? > > Actually, since I'm none of the above, I'll answer my own question: > WPython is an implementation of Python that uses 16-bit wordcodes > instead of byte code, and claims to have various performance benefits > from doing so. > > It looks like good work, thank you. > > -- > Steven D'Aprano > Hi Steven, sorry, I made a mistake, assuming that the project was known. WPython is a CPython 2.6.4 implementation that uses "wordcodes" instead of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) used to represent VM opcodes. This new encoding enabled to simplify the execution of the virtual machine main cycle, improving understanding, maintenance, and extensibility; less space is required on average, and execution speed is improved too. Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at holdenweb.com Wed Jun 23 14:17:20 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 23 Jun 2010 08:17:20 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: <4C21FB50.1080905@holdenweb.com> Guido van Rossum wrote: > On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver wrote: >> Any "turdiness" (which I am *not* arguing for) is a natural consequence >> of the kinds of backward incompatibilities which were *not* ruled out >> for Python 3, along with the (early, now waning) "build it and they will >> come" optimism about adoption rates. > > FWIW, my optimisim is *not* waning. I think it's good that we're > having this discussion and I expect something useful will come out of > it; I also expect in general that the (admittedly serious) problem of > having to port all dependencies will be solved in the next few years. > Not by magic, but because many people are taking small steps in the > right direction, and there will be light eventually. In the mean time > I don't blame anyone for sticking with 2.x or being too busy to help > port stuff to 3.x. Python 3 has been a long time in the making -- it > will be a bit longer still, which was expected. > +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Wed Jun 23 14:17:20 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 23 Jun 2010 08:17:20 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> Message-ID: <4C21FB50.1080905@holdenweb.com> Guido van Rossum wrote: > On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver wrote: >> Any "turdiness" (which I am *not* arguing for) is a natural consequence >> of the kinds of backward incompatibilities which were *not* ruled out >> for Python 3, along with the (early, now waning) "build it and they will >> come" optimism about adoption rates. > > FWIW, my optimisim is *not* waning. I think it's good that we're > having this discussion and I expect something useful will come out of > it; I also expect in general that the (admittedly serious) problem of > having to port all dependencies will be solved in the next few years. > Not by magic, but because many people are taking small steps in the > right direction, and there will be light eventually. In the mean time > I don't blame anyone for sticking with 2.x or being too busy to help > port stuff to 3.x. Python 3 has been a long time in the making -- it > will be a bit longer still, which was expected. > +1 The important thing is to avoid bigotry and FUD, and deal with things the way they are. The #python IRC team have just helped us make a major step forward. This won't be a campaign with a victorious charge over some imaginary finish line. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From alexander.belopolsky at gmail.com Wed Jun 23 16:06:27 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Jun 2010 10:06:27 -0400 Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: References: <73196.1277143019@parc.com> <75635.1277147585@parc.com> <20100621212904.7bec83f6@pitrou.net> <77297.1277150242@parc.com> <1277150570.3369.1.camel@localhost.localdomain> <4C1FC7E6.5070707@voidspace.org.uk> <4C1FD5D6.7070007@v.loewis.de> <4C1FD84B.3030202@voidspace.org.uk> <4C1FDB65.4020503@v.loewis.de> <4C1FDF1C.2060308@voidspace.org.uk> <4C1FE4AF.80009@v.loewis.de> Message-ID: On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren wrote: .. > I don't agree. ?The patch itself is pretty simple, but it does make a rather significant change to the build process: the > compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions > that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule > itself wouldn't. ? ?This may lead to subtle bugs, or even compile errors (because some function definitions change when > _DARWIN_C_SOURCE active). I agree. Messing with compatibility macros outside of pyconfig.h is not a good idea. Martin's hack, while likely to work in most cases, is still a hack. I believe, however we can undefine _DARWIN_C_SOURCE globally at least on 10.4 and higher. I grepped throught the headers on my 10.6 system and I notice that the majority of checks for _DARWIN_C_SOURCE are in the form of #if !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) According to a comment in configure, # On Mac OS X 10.4, defining _POSIX_C_SOURCE or _XOPEN_SOURCE # disables platform specific features beyond repair. # On Mac OS X 10.3, defining _POSIX_C_SOURCE or _XOPEN_SOURCE # has no effect, don't bother defining them _POSIX_C_SOURCE is already undefined in python headers, so undefining _DARWIN_C_SOURCE will have no effect on the majority of checks. I was able to find very few exceptions: some cases check _XOPEN_SOURCE instead or in addition to _POSIX_C_SOURCE before ignoring _DARWIN_C_SOURCE: /usr/include/grp.h:#if !defined(_XOPEN_SOURCE) || defined(_DARWIN_C_SOURCE) /usr/include/pwd.h:#if (!defined(_POSIX_C_SOURCE) && !defined(_XOPEN_SOURCE)) || defined(_DARWIN_C_SOURCE) .. Since _XOPEN_SOURCE is similarly undefined in python headers, these cases are unaffected as well. This leaves a handful of cases where Apple provides additional macros for fine grained control: /usr/include/stdio.h:#if defined(__DARWIN_10_6_AND_LATER) && (defined(_DARWIN_UNLIMITED_STREAMS) || defined(_DARWIN_C_SOURCE)) /usr/include/unistd.h:#if defined(_DARWIN_UNLIMITED_GETGROUPS) || defined(_DARWIN_C_SOURCE) The second line above is our dear friend and the _DARWIN_C_SOURCE behavior conditioned on the first line can be enabled by defining _DARWIN_UNLIMITED_STREAMS macro. I believe _DARWIN_C_SOURCE casts its net to wide and more targeted macros should be used instead. .. > ? ? Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform > to the SUSv3 standards even if doing so would alter? the behavior of functions used in 10.3. I cannot reconcile this with !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) logic that I see in the headers. From pje at telecommunity.com Wed Jun 23 16:24:18 2010 From: pje at telecommunity.com (P.J. Eby) Date: Wed, 23 Jun 2010 10:24:18 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <5A4340BB-7B64-4C76-81FF-8A43F179AA7A@twistedmatrix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <5A4340BB-7B64-4C76-81FF-8A43F179AA7A@twistedmatrix.com> Message-ID: <20100623142422.36F873A404D@sparrow.telecommunity.com> At 08:34 PM 6/22/2010 -0400, Glyph Lefkowitz wrote: >I suspect the practical problem here is that there's no CharacterString ABC That, and the absence of a string coercion protocol so that mixing your custom string with standard strings will do the right thing for your intended use. From alexander.belopolsky at gmail.com Wed Jun 23 16:48:24 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Jun 2010 10:48:24 -0400 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 Message-ID: On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren wrote: .. >> >>> * [Ronald's proposal] results in posix.getgroups not reflecting results of posix.setgroups >>> >> >> This effectively substitutes getgrouplist called on the current user >> for getgroups. ?In 3.x, I believe the correct action will be to >> provide direct access to getgrouplist which is while not POSIX (yet?), >> is widely available. > > I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix > (), although this isn't a > requirement for being added to the posix module. > (The link you provided leads to "Linux Standard Base Core Specification," which is different from POSIX, but the distinction is not relevant for our discussion.) > > It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch > is more complicated and the library function we use can be considered to be broken. Let me try to formulate what the disagreement is. There are two different group lists that can be associated with a running process: 1) The list of current supplementary group IDs maintained by the system for each process and stored in per-process system tables; and 2) The list of the groups that include the uid under which the process is running as a member. The first list is returned by a system call getgroups and the second can be obtained using system database access functions as follows: pw = getpwuid(getuid()) getgrouplist(pw->pw_name, ..) The first list can be modified by privileged processes using setgroups system call, while the second changes when system databases change. The problem that _DARWIN_C_SOURCE introduces is that it replaces system getgroups with a database query effectively making the true process' list of supplementary group IDs inaccessible to programs. See source code at . The problem is complicated by the fact that OSX true getgroups call appears to truncate the list of groups to NGROUPS_MAX=16. Note, however that it is not clear whether the system call truncates the list or the underlying process tables are limited to 16 entries and additional groups are ignored when the process is created. In my view, getgroups and getgrouplist are two fundamentally different operations and both should be provided by the os module. Redefining os.getgroups to invoke getgrouplist instead of system getgroups on one particular platform to work around that platform's system call limitation is not right. From ronaldoussoren at mac.com Wed Jun 23 17:03:39 2010 From: ronaldoussoren at mac.com (ronaldoussoren) Date: Wed, 23 Jun 2010 08:03:39 -0700 (PDT) Subject: [Python-Dev] red buildbots on 2.7 In-Reply-To: Message-ID: <91321b7f-d5a2-6f2f-8ecd-813636aaa3bd@me.com> On 23 Jun, 2010,at 04:06 PM, Alexander Belopolsky wrote: On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren wrote: .. > I don't agree. ?The patch itself is pretty simple, but it does make a rather significant change to the build process: the > compile-time environment in configure would be different than during the compilation of posixmodule. That is, in functions > that check for features (the HAVE_FOOBAR macros in pyconfig.h) would use _DARWIN_C_SOURCE while posixmodule > itself wouldn't. ? ?This may lead to subtle bugs, or even compile errors (because some function definitions change when > _DARWIN_C_SOURCE active). I agree. Messing with compatibility macros outside of pyconfig.h is not a good idea. Martin's hack, while likely to work in most cases, is still a hack. I believe, however we can undefine _DARWIN_C_SOURCE globally at least on 10.4 and higher. I grepped throught the headers on my 10.6 system and I notice that the majority of checks for _DARWIN_C_SOURCE are in the form of As I wrote the system will assume _DARWIN_C_SOURCE is set when ?when you don't set _POSIX_C_SOURCE or other feature macros. ? Working around that is a hack that I don't wish to support. .. > ? ? Defining _POSIX_C_SOURCE or _DARWIN_C_SOURCE causes library and kernel calls to conform > to the SUSv3 standards even if doing so would alter? the behavior of functions used in 10.3. I cannot reconcile this with !defined(_POSIX_C_SOURCE) || defined(_DARWIN_C_SOURCE) logic that I see in the headers. This seems to be arranged in sys/cdefs.h. ? I honestly don't care how this done, the documentation clearly says that this happens and that indicates that _DARWIN_C_SOURCE selects the API Apple would like you to use. Anyway, why is this discusion on python-dev instead of in the issue tracker? BTW. IMHO resolution of this issue can wait until after 2.7.0, there is always 2.7.1 and I don't think we need to rush this (the issue has been dormant for quite a while) Ronald -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Wed Jun 23 17:30:23 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 23 Jun 2010 11:30:23 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stephen J. Turnbull wrote: > We do need str-based implementations of modules like urllib. Why would that be? URLs aren't text, and never will be. The fact that to the eye they may seem to be text-ish doesn't make them text. This *is* a case where "dont make me think" is a losing propsition: programmers who work with URLs in any non-opaque way as text are eventually going to be bitten by this issue no matter how hard we wave our hands. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiKI4ACgkQ+gerLs4ltQ56/QCbBPdj8jaPbcvPIDPb7ys04oHg fLIAnR+kA2udazsnpzTp2INGz2CoWgzj =Swjw -----END PGP SIGNATURE----- From alexander.belopolsky at gmail.com Wed Jun 23 17:37:12 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 23 Jun 2010 11:37:12 -0400 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: Message-ID: In my previous post, I forgot to include the link to the tracker issue where this problem is being worked on. http://bugs.python.org/issue7900 I'll repost my message there as an issue comment, so that a more detailed technical discussion can continue there. From tseaver at palladion.com Wed Jun 23 17:37:53 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 23 Jun 2010 11:37:53 -0400 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Alexander Belopolsky wrote: > In my view, getgroups and getgrouplist are two fundamentally different > operations and both should be provided by the os module. Redefining > os.getgroups to invoke getgrouplist instead of system getgroups on one > particular platform to work around that platform's system call > limitation is not right. +1. syscall wrappers should err on the side of thinness, even to the point of anorexia. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiKlEACgkQ+gerLs4ltQ4vKwCg3JwpWvivq8Dk7PYy2iPrKq/E 88gAn1lfeEcDJlfGm+F0jEbxsv1BfQJW =JzHS -----END PGP SIGNATURE----- From guido at python.org Wed Jun 23 17:43:46 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Jun 2010 08:43:46 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jun 23, 2010 at 8:30 AM, Tres Seaver wrote: > Stephen J. Turnbull wrote: > >> We do need str-based implementations of modules like urllib. > > Why would that be? ?URLs aren't text, and never will be. ?The fact that > to the eye they may seem to be text-ish doesn't make them text. ?This > *is* a case where "dont make me think" is a losing propsition: > programmers who work with URLs in any non-opaque way as text are > eventually going to be bitten by this issue no matter how hard we wave > our hands. This has been asserted and contested several times now, and I don't see the two positions getting any closer. So I propose that we drop the discussion "are URLs text or bytes" and try to find something more pragmatic to discuss. For example: how we can make the suite of functions used for URL processing more polymorphic, so that each developer can choose for herself how URLs need to be treated in her application. -- --Guido van Rossum (python.org/~guido) From cyounkins at gmail.com Wed Jun 23 17:51:31 2010 From: cyounkins at gmail.com (Craig Younkins) Date: Wed, 23 Jun 2010 11:51:31 -0400 Subject: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities In-Reply-To: <10286.1277242190@parc.com> References: <10286.1277242190@parc.com> Message-ID: http://bugs.python.org/issue9061 On Tue, Jun 22, 2010 at 5:29 PM, Bill Janssen wrote: > Craig Younkins wrote: > > > cgi.escape never escapes single quote characters, which can easily lead > to a > > Cross-Site Scripting (XSS) vulnerability. This seems to be known by many, > > but a quick search reveals many are using cgi.escape for HTML attribute > > escaping. > > Did you file a bug report? > > Bill > -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jun 23 18:03:27 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 23 Jun 2010 12:03:27 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100623120327.3bd030e9@heresy> On Jun 23, 2010, at 08:43 AM, Guido van Rossum wrote: >So I propose that we drop the discussion "are URLs text or bytes" and >try to find something more pragmatic to discuss. email has exactly the same question, and the answer is "yes". >For example: how we can make the suite of functions used for URL >processing more polymorphic, so that each developer can choose for >herself how URLs need to be treated in her application. I think email package hackers should watch this effort closely. RDM has written some stuff up on how we think we're going to handle this, though it's probably pretty email package specific. Maybe there's a better, general, or conventional approach lurking around somewhere. http://wiki.python.org/moin/Email%20SIG -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From janssen at parc.com Wed Jun 23 18:11:05 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 23 Jun 2010 09:11:05 PDT Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <13070.1277309465@parc.com> Tres Seaver wrote: > Stephen J. Turnbull wrote: > > > We do need str-based implementations of modules like urllib. > > Why would that be? URLs aren't text, and never will be. The fact that > to the eye they may seem to be text-ish doesn't make them text. This URLs are exactly text (strings, representable as Unicode strings in Py3K), and were designed as such from the start. The fact that some of the things tunneled or carried in URLs are string representations of non-string data shouldn't obscure that point. They're not "text-ish", they're text. They're not opaque, either; they break down in well-specified ways, mainly into strings. The trouble comes in when we try to go beyond the spec, or handle things that don't conform to the spec. Sure, a path component of a URI might actually be a %-escaped sequence of arbitrary bytes, even bytes that don't represent a string in any known encoding, but that's only *after* reversing the %-escapes, which should happen in a scheme-specific piece of code, not in generic URL parsing or manipulation. Bill From ianb at colorstudy.com Wed Jun 23 18:30:51 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 23 Jun 2010 11:30:51 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Wed, Jun 23, 2010 at 10:30 AM, Tres Seaver wrote: > Stephen J. Turnbull wrote: > > > We do need str-based implementations of modules like urllib. > > > Why would that be? URLs aren't text, and never will be. The fact that > to the eye they may seem to be text-ish doesn't make them text. This > *is* a case where "dont make me think" is a losing propsition: > programmers who work with URLs in any non-opaque way as text are > eventually going to be bitten by this issue no matter how hard we wave > our hands. > HTML is text, and URLs are embedded in that text, so it's easy to get a URL that is text. Though, with a little testing, I notice that text alone can't tell you what the right URL really is (at least the intended URL when unsafe characters are embedded in HTML). To test I created two pages, one in Latin-1 another in UTF-8, and put in the link: ./test.html?param=R?union On a Latin-1 page it created a link to test.html?param=R%E9union and on a UTF-8 page it created a link to test.html?param=R%C3%A9union (the second link displays in the URL bar as test.html?param=R?union but copies with percent encoding). Though if you link to ./R?union.html then both pages create UTF-8 links. And both pages also link http://R?union.comto http://xn--runion-bva.com/. So really neither bytes nor text works completely; query strings receive the encoding of the page, which would be handled transparently if you worked on the page's bytes. Path and domain are consistently encoded with UTF-8 and punycode respectively and so would be handled best when treated as text. And of course if you are a page with a non-ASCII-compatible encoding you really must handle encodings before the URL is sensible. Another issue here is that there's no "encoding" for turning a URL into bytes if the URL is not already ASCII. A proper way to encode a URL would be: (Totally as an aside, as I remind myself of new module names I notice it's not easy to google specifically for Python 3 docs, e.g. "python 3 urlsplit" gives me 2.6 docs) from urllib.parse import urlsplit, urlunsplit import encodings.idna def encode_http_url(url, page_encoding='ASCII', errors='strict'): scheme, netloc, path, query, fragment = urlsplit(url) scheme = scheme.encode('ASCII', errors) auth = port = None if '@' in netloc: auth, netloc = netloc.split('@', 1) if ':' in netloc: netloc, port = netloc.split(':', 1) netloc = encodings.idna.ToASCII(netloc) if port: netloc = netloc + b':' + port.encode('ASCII', errors) if auth: netloc = auth.encode('UTF-8', errors) + b'@' + netloc path = path.encode('UTF-8', errors) query = query.encode(page_encoding, errors) fragment = fragment.encode('UTF-8', errors) return urlunsplit_bytes((scheme, netloc, path, query, fragment)) Where urlunsplit_bytes handles bytes (urlunsplit does not). It's helpful for me at least to look at that code specifically: def urlunsplit(components): scheme, netloc, url, query, fragment = components if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'): if url and url[:1] != '/': url = '/' + url url = '//' + (netloc or '') + url if scheme: url = scheme + ':' + url if query: url = url + '?' + query if fragment: url = url + '#' + fragment return url In this case it really would be best to have Python 2's system where things are coerced to ASCII implicitly. Or, more specifically, if all those string literals in that routine could be implicitly converted to bytes using ASCII. Conceptually I think this is reasonable, as for URLs (at least with HTTP, but in practice I think this applies to all URLs) the ASCII bytes really do have meaning. That is, '/' (*in the context of urlunsplit*) really is \x2f specifically. Or another example, making a GET request really means sending the bytes \x47\x45\x54 and there is no other set of bytes that has that meaning. The WebSockets specification for instance defines things like "colon": http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-76#page-5 -- in an earlier version they even used bytes to describe HTTP ( http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-54#page-13), though this annoyed many people. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From janssen at parc.com Wed Jun 23 18:46:48 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 23 Jun 2010 09:46:48 PDT Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <13837.1277311608@parc.com> Guido van Rossum wrote: > So I propose that we drop the discussion "are URLs text or bytes" and > try to find something more pragmatic to discuss. > > For example: how we can make the suite of functions used for URL > processing more polymorphic, so that each developer can choose for > herself how URLs need to be treated in her application. While I agree with "find something more pragmatic to discuss", it also seems to me that introducing polymorphic URL processing might make things more confusing and error-prone. The bigger problem seems to be that we're revisiting the design discussion about urllib.parse from the summer of 2008. See http://bugs.python.org/issue3300 if you want to recall how we hashed this out 2 years ago. I didn't particularly like that design, but I had to go off on vacation :-), and things got settled while I was away. I haven't heard much from Matt Giuca since he stopped by and lobbed that patch into the standard library. But since Guido is the one who settled it, why are we talking about it again? Bill From ianb at colorstudy.com Wed Jun 23 18:49:13 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 23 Jun 2010 11:49:13 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Oops, I forgot some important quoting (important for the algorithm, maybe not actually for the discussion)... from urllib.parse import urlsplit, urlunsplit import encodings.idna # urllib.parse.quote both always returns str, and is not as conservative in quoting as required here... def quote_unsafe_bytes(b): result = [] for c in b: if c < 0x20 or c >= 0x80: result.extend(('%%%02X' % c).encode('ASCII')) else: result.append(c) return bytes(result) def encode_http_url(url, page_encoding='ASCII', errors='strict'): ??? scheme, netloc, path, query, fragment = urlsplit(url) ??? scheme = scheme.encode('ASCII', errors) ??? auth = port = None ??? if '@' in netloc: ??????? auth, netloc = netloc.split('@', 1) ??? if ':' in netloc: ??????? netloc, port = netloc.split(':', 1) ? ? netloc = encodings.idna.ToASCII(netloc) ??? if port: ??????? netloc = netloc + b':' + port.encode('ASCII', errors) ??? if auth: ??????? netloc = quote_unsafe_bytes(auth.encode('UTF-8', errors)) + b'@' + netloc ??? path = quote_unsafe_bytes(path.encode('UTF-8', errors)) ??? query = quote_unsafe_bytes(query.encode(page_encoding, errors)) ??? fragment = quote_unsafe_bytes(fragment.encode('UTF-8', errors)) ??? return urlunsplit_bytes((scheme, netloc, path, query, fragment)) -- Ian Bicking ?| ?http://blog.ianbicking.org From glyph at twistedmatrix.com Wed Jun 23 03:01:17 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Tue, 22 Jun 2010 21:01:17 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <94700B9C-25B4-4A75-BA43-20FEA3FDE772@twistedmatrix.com> Message-ID: On Jun 22, 2010, at 8:57 PM, Robert Collins wrote: > bzr has a cache of decoded strings in it precisely because decode is > slow. We accept slowness encoding to the users locale because thats > typically much less data to examine than we've examined while > generating the commit/diff/whatever. We also face memory pressure on a > regular basis, and that has been, at least partly, due to UCS4 - our > translation cache helps there because we have less duplicate UCS4 > strings. Thanks for setting the record straight - apologies if I missed this earlier in the thread. It does seem vaguely familiar. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Jun 23 19:38:05 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 23 Jun 2010 13:38:05 -0400 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: References: <201006232112.41047.steve@pearwood.info> Message-ID: On 6/23/2010 7:28 AM, Cesare Di Mauro wrote: > sorry, I made a mistake, assuming that the project was known. A common mistake of people who announce their projects ;-) Someone recently make the same mistake on python-list with respect to a 'BDD' package (the Wikipedia suggests about 6 possible expansions of the acronym. > > WPython is a CPython 2.6.4 implementation that uses "wordcodes" instead > of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) I suggest you specify the base version (2.6.4) on the project page as that would be very relevant to many who visit. One should not have to download and look at the source to discover to discover if they should bother downloading the code. Perhaps also add a sentence as to the choice (why not 3.1?). -- Terry Jan Reedy From cesare.di.mauro at gmail.com Wed Jun 23 19:53:46 2010 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Wed, 23 Jun 2010 19:53:46 +0200 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: References: <201006232112.41047.steve@pearwood.info> Message-ID: 2010/6/23 Terry Reedy > On 6/23/2010 7:28 AM, Cesare Di Mauro wrote: > WPython is a CPython 2.6.4 implementation that uses "wordcodes" instead > of bytecodes. A wordcode is a word (16 bits, two bytes, in this case) > > I suggest you specify the base version (2.6.4) on the project page as that > would be very relevant to many who visit. One should not have to download > and look at the source to discover to discover if they should bother > downloading the code. Perhaps also add a sentence as to the choice (why not > 3.1?). > > -- > Terry Jan Reedy Thanks for the suggestions. I've updated the main project accordingly. :) Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Wed Jun 23 20:23:33 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 23 Jun 2010 14:23:33 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <13837.1277311608@parc.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bill Janssen wrote: > The bigger problem seems to be that we're revisiting the design > discussion about urllib.parse from the summer of 2008. See > http://bugs.python.org/issue3300 if you want to recall how we hashed > this out 2 years ago. I didn't particularly like that design, but I had > to go off on vacation :-), and things got settled while I was away. I > haven't heard much from Matt Giuca since he stopped by and lobbed that > patch into the standard library. > > But since Guido is the one who settled it, why are we talking about it > again? Perhaps such decisions need revisiting in light of subsequent experience / pain / learning. E.g: - - the repeated inability of the web-sig to converge on appropriate semantics for a Python3-compatible version of the WSGI spec; - - the subsequent quirkiness of the Python3 wsgiref implementation; - - the breakage in cgi.py which prevents handling file uploads in a web application; - - the slow adoption / porting rate of major web frameworks and libraries to Python 3. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiUSAACgkQ+gerLs4ltQ49EwCeLYwrZs6QfairPP5zpeeUlxao qg8An37kRz1CrzGc3kScvSqVx8FPnO1M =lR6R -----END PGP SIGNATURE----- From martin at v.loewis.de Wed Jun 23 20:29:44 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 23 Jun 2010 20:29:44 +0200 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: Message-ID: <4C225298.9010701@v.loewis.de> > The problem that _DARWIN_C_SOURCE introduces is that it replaces > system getgroups with a database query effectively making the true > process' list of supplementary group IDs inaccessible to programs. > See source code at > . If that is true (i.e. the file is really the one that is being used), I think this is a severe flaw in OSX's implementation of the POSIX specification. Then, I agree that Python, in turn, should make sure that posix.getgroups is really the POSIX version of getgroups, not the Apple version. This is a general principle: if the system has two competing implementations of some API, the Python posix module should strive to call the POSIX version of the API. If the vendor's version of the API is also useful, it can be exposed under a different name (if, in turn, this is technically possible). Just my 0.02?. Regards, Martin From glyph at twistedmatrix.com Wed Jun 23 20:31:41 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Wed, 23 Jun 2010 14:31:41 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <4C21FB50.1080905@holdenweb.com> References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> <4C21FB50.1080905@holdenweb.com> Message-ID: <9A9D719C-0ED5-4061-B314-06450CC965BB@twistedmatrix.com> On Jun 23, 2010, at 8:17 AM, Steve Holden wrote: > Guido van Rossum wrote: >> On Tue, Jun 22, 2010 at 9:37 AM, Tres Seaver wrote: >>> Any "turdiness" (which I am *not* arguing for) is a natural consequence >>> of the kinds of backward incompatibilities which were *not* ruled out >>> for Python 3, along with the (early, now waning) "build it and they will >>> come" optimism about adoption rates. >> >> FWIW, my optimisim is *not* waning. I think it's good that we're >> having this discussion and I expect something useful will come out of >> it; I also expect in general that the (admittedly serious) problem of >> having to port all dependencies will be solved in the next few years. >> Not by magic, but because many people are taking small steps in the >> right direction, and there will be light eventually. In the mean time >> I don't blame anyone for sticking with 2.x or being too busy to help >> port stuff to 3.x. Python 3 has been a long time in the making -- it >> will be a bit longer still, which was expected. >> > +1 > > The important thing is to avoid bigotry and FUD, and deal with things > the way they are. The #python IRC team have just helped us make a major > step forward. This won't be a campaign with a victorious charge over > some imaginary finish line. For sure. I don't speak for Tres, but I don't think he wasn't talking about optimism about *adoption*, overall, but optimism about adoption *rates*. And I don't think he was talking about it coming from Guido :). There has definitely been some "irrational exuberance" from some quarters. The form it usually takes is someone making a blog post which assumes, because the author could port their smallish library or application without too much hassle, that Python 2.x is already dead and everyone should be off of it in a couple of weeks. I've never heard this position from the core team or any official communication or documentation. Far from it: the realistic attitude that the Python 3 migration is something that will take a while has significantly reduced my own concerns. Even the aforementioned blog posts have been encouraging in some ways, because a lot of people are reporting surprisingly easy transitions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tseaver at palladion.com Wed Jun 23 20:40:47 2010 From: tseaver at palladion.com (Tres Seaver) Date: Wed, 23 Jun 2010 14:40:47 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: <9A9D719C-0ED5-4061-B314-06450CC965BB@twistedmatrix.com> References: <20100618204831.A8F2A3A40A5@sparrow.telecommunity.com> <609CF661-AB50-49FC-BAA9-B8898C1E9A19@gmail.com> <4C21FB50.1080905@holdenweb.com> <9A9D719C-0ED5-4061-B314-06450CC965BB@twistedmatrix.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Glyph Lefkowitz wrote: > I don't speak for Tres, but I don't think he wasn't talking about > optimism about *adoption*, overall, but optimism about adoption > *rates*. And I don't think he was talking about it coming from Guido > :). You channel me correctly here. In particular, the phrase "build it and they will come" was meant to address the idea that the only thing needed to drive adoption was the release of the new, shiny Python3. That particular bit of optimism is what I meant to describe as waning: the community on the whole seems to be more realistic now than two or three years ago about the kind of extra effort required from both core developers and from existing Python 2 folks to get to Python 3. > There has definitely been some "irrational exuberance" from some > quarters. The form it usually takes is someone making a blog post > which assumes, because the author could port their smallish library > or application without too much hassle, that Python 2.x is already > dead and everyone should be off of it in a couple of weeks. > > I've never heard this position from the core team or any official > communication or documentation. Far from it: the realistic attitude > that the Python 3 migration is something that will take a while has > significantly reduced my own concerns. > > Even the aforementioned blog posts have been encouraging in some > ways, because a lot of people are reporting surprisingly easy > transitions. Indeed. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwiVS8ACgkQ+gerLs4ltQ4kQgCeJ9nwU8XyiWzOTpHSbWg21bzU 0/IAnjVOj5SlgA9mnAsx4/wMad5lNkqq =HObh -----END PGP SIGNATURE----- From solipsis at pitrou.net Wed Jun 23 21:36:45 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 23 Jun 2010 21:36:45 +0200 Subject: [Python-Dev] bytes / unicode References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> Message-ID: <20100623213645.658517d7@pitrou.net> On Wed, 23 Jun 2010 14:23:33 -0400 Tres Seaver wrote: > > Perhaps such decisions need revisiting in light of subsequent experience > / pain / learning. E.g: > > - - the repeated inability of the web-sig to converge on appropriate > semantics for a Python3-compatible version of the WSGI spec; > > - - the subsequent quirkiness of the Python3 wsgiref implementation; The way wsgiref was adapted is admittedly suboptimal. It was totally broken at first, and PJE didn't want to look very deeply into it. We therefore had to settle on a series of small modifications that seemed rather reasonable, but without any in-depth discussion of what WSGI had to look like under Python 3 (since it was not our job and responsibility). Therefore, I don't think wsgiref should be taken as a guide to what a cleaned up, Python 3-specific WSGI must look like. > - - the slow adoption / porting rate of major web frameworks and libraries > to Python 3. Some of the major web frameworks and libraries have a ton of dependencies, which would explain why they really haven't bothered yet. I don't think you can't claim, though, that Python 3 makes things significantly harder for these frameworks. The proof is that many of them already give the user unicode strings in Python 2.x. They must have somehow got the decoding right. Regards Antoine. From ronaldoussoren at mac.com Wed Jun 23 22:31:42 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 23 Jun 2010 22:31:42 +0200 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: Message-ID: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> On 23 Jun, 2010, at 16:48, Alexander Belopolsky wrote: > On Wed, Jun 23, 2010 at 2:08 AM, Ronald Oussoren wrote: > .. >>> >>>> * [Ronald's proposal] results in posix.getgroups not reflecting results of posix.setgroups >>>> >>> >>> This effectively substitutes getgrouplist called on the current user >>> for getgroups. In 3.x, I believe the correct action will be to >>> provide direct access to getgrouplist which is while not POSIX (yet?), >>> is widely available. >> >> I don't mind adding getgrouplist, but that issue is seperator from this one. BTW. Appearently getgrouplist is posix >> (), although this isn't a >> requirement for being added to the posix module. >> > > (The link you provided leads to "Linux Standard Base Core > Specification," which is different from POSIX, but the distinction is > not relevant for our discussion.) I know, but the page claims getgrouplist is in SUS. I've since looked at what claims to be a copy of SUS: http://www.unix.org/single_unix_specification/ and that does not contain getgrouplist. > >> >> It is still my opinion that the second option is preferable for better compatibility with system tools, even if the patch >> is more complicated and the library function we use can be considered to be broken. > > Let me try to formulate what the disagreement is. There are two > different group lists that can be associated with a running process: > 1) The list of current supplementary group IDs maintained by the > system for each process and stored in per-process system tables; and > 2) The list of the groups that include the uid under which the process > is running as a member. > > The first list is returned by a system call getgroups and the second > can be obtained using system database access functions as follows: > > pw = getpwuid(getuid()) > getgrouplist(pw->pw_name, ..) > > The first list can be modified by privileged processes using setgroups > system call, while the second changes when system databases change. > > The problem that _DARWIN_C_SOURCE introduces is that it replaces > system getgroups with a database query effectively making the true > process' list of supplementary group IDs inaccessible to programs. > See source code at > . > > The problem is complicated by the fact that OSX true getgroups call > appears to truncate the list of groups to NGROUPS_MAX=16. Note, > however that it is not clear whether the system call truncates the > list or the underlying process tables are limited to 16 entries and > additional groups are ignored when the process is created. > > In my view, getgroups and getgrouplist are two fundamentally different > operations and both should be provided by the os module. Redefining > os.getgroups to invoke getgrouplist instead of system getgroups on one > particular platform to work around that platform's system call > limitation is not right. But we don't redefine os.getgroups to call getgrouplist, it is the system library that seems to implement getgroups(3) using getgrouplist(3). I agree that that is odd at best, but it is IMHO functioning as designed by Apple (that is, Apple choose the pick the current behavior, they didn't accidently break this). The previous paragraph is nitpicky, but this is IMO an important distinction. I've done some more experimentation: * compat(5) lies: not setting _DARWIN_C_SOURCE is not the same as settings _DARWIN_C_SOURCE when the deployment target is 10.5, with _DARWIN_C_SOURCE getgroups it translated to the symbol "_getgroups$DARWIN_EXTSN" in the object file, without it is "_getgroups". * the id(1) command uses the version of getgroups that does not reflect setgroups. Given this script: import os os.system("id") os.setgroups([1]) os.system("id") Running it gives an unexpected output: # /usr/bin/python doit.py uid=0(root) gid=0(wheel) groups=0(wheel),204(_developer),100(_lpoperator),98(_lpadmin),80(admin),61(localaccounts),29(certusers),20(staff),12(everyone),9(procmod),8(procview),5(operator),4(tty),3(sys),2(kmem),1(daemon),401(com.apple.access_screensharing) uid=0(root) gid=0(wheel) groups=0(wheel),204(_developer),100(_lpoperator),98(_lpadmin),80(admin),61(localaccounts),29(certusers),20(staff),12(everyone),9(procmod),8(procview),5(operator),4(tty),3(sys),2(kmem),1(daemon),401(com.apple.access_screensharing) * when I add a group in the Accounts panel in System Preferences and add my account to it the id(1) command immediately reflects the change (as expected given the previous result) * adding a non-administrator account to a newly created group does not affect filesystem access for existing process (that is, if I created a file that's only readable for the new group and the test user couldn't read that file until I logged out and in again), which means the Account panel doesn't magically alter kernel state for running processes. * Setting or unsetting _DARWIN_C_SOURCE doesn't affect the contents of pyconfig.h beyond that setting: $ diff pyconfig.h-DARWIN_C_SOURCE pyconfig.h-NO_DARWIN_SOURCE 1124c1124 < #define _DARWIN_C_SOURCE 1 --- > /* #undef _DARWIN_C_SOURCE */ "pyconfig.h-DARWIN_C_SOURCE" is generated by the current configure script, the other one is generated by a configure script that was patched to not yet _DARWIN_C_SOURCE (by removing "AC_DEFINE(_DARWIN_C_SOURCE, 1, [Define on Darwin to activate all library features])" from configure.in and regenerating configure). Both were generated using "configure MACOSX_DEPLOYMENT_TARGET=10.5". * setgroups(3) cannot set more than 16 groups, that is "setgroups(17, gidset)" will always return EINVAL (this is on OSX 10.6.4). I've verified this using a C program that directly calls the right APIs. I'm busy with projects for the rest of the week and won't be able to do anything python-dev related until Sunday. Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From a.badger at gmail.com Wed Jun 23 23:30:22 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Wed, 23 Jun 2010 17:30:22 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100623213645.658517d7@pitrou.net> References: <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> <20100623213645.658517d7@pitrou.net> Message-ID: <20100623213022.GB3470@unaka.lan> On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote: > On Wed, 23 Jun 2010 14:23:33 -0400 > Tres Seaver wrote: > > - - the slow adoption / porting rate of major web frameworks and libraries > > to Python 3. > > Some of the major web frameworks and libraries have a ton of > dependencies, which would explain why they really haven't bothered yet. > > I don't think you can't claim, though, that Python 3 makes things > significantly harder for these frameworks. The proof is that many of > them already give the user unicode strings in Python 2.x. They must > have somehow got the decoding right. > Note that this assumption seems optimistic to me. I started talking to Graham Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste do decoding of bytes to unicode at different layers which caused problems for application level code that should otherwise run fine when being served by mod_wsgi or paste httpserver. That was the beginning of Graham starting to talk about what the wsgi spec really should look like under python3 instead of the broken way that the appendix to the current wsgi spec states. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From solipsis at pitrou.net Wed Jun 23 23:35:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 23 Jun 2010 23:35:12 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100623213022.GB3470@unaka.lan> References: <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> <20100623213645.658517d7@pitrou.net> <20100623213022.GB3470@unaka.lan> Message-ID: <20100623233512.50b5b710@pitrou.net> On Wed, 23 Jun 2010 17:30:22 -0400 Toshio Kuratomi wrote: > Note that this assumption seems optimistic to me. I started talking to Graham > Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste > do decoding of bytes to unicode at different layers which caused problems > for application level code that should otherwise run fine when being served > by mod_wsgi or paste httpserver. That was the beginning of Graham starting > to talk about what the wsgi spec really should look like under python3 > instead of the broken way that the appendix to the current wsgi spec states. Ok, but the reason would be that the WSGI spec is broken. Not Python 3 itself. Regards Antoine. From henry at precheur.org Wed Jun 23 23:35:38 2010 From: henry at precheur.org (Henry Precheur) Date: Wed, 23 Jun 2010 14:35:38 -0700 Subject: [Python-Dev] [Web-SIG] bytes / unicode In-Reply-To: <20100623213645.658517d7@pitrou.net> References: <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> <20100623213645.658517d7@pitrou.net> Message-ID: <20100623213538.GB9501@banane.novuscom.net> On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote: > I don't think you can't claim, though, that Python 3 makes things > significantly harder for these frameworks. The proof is that many of > them already give the user unicode strings in Python 2.x. They must > have somehow got the decoding right. Well... Frameworks usually 'simplify' the problem by partly ignoring it. By default they assume the data in the request in UTF-8. You can specify an alternative encoding in most of them. Django [1], Werkzeug [2], and WebOb [3] do that. The problem with this approach is that you still have to deal with weird requests where one thing is unicode, and another is latin-1. Sometime you can even have 2 different encodings in a single header like Cookies. There's no solution to this problem, it has to be solved on a case by case basis. There was a big discussion a while ago on web-sig. I think the consensus was that WSGI for Python 3 should assume that the data is encoded in latin-1 since it's the default encoding according to the RFC. [1] http://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.encoding [2] http://werkzeug.pocoo.org/documentation/dev/unicode.html#request-and-response-objects [3] http://pythonpaste.org/webob/reference.html#unicode-variables -- Henry Pr?cheur From tullarisc256 at gmail.com Wed Jun 23 21:08:52 2010 From: tullarisc256 at gmail.com (tullarisc) Date: Wed, 23 Jun 2010 12:08:52 -0700 (PDT) Subject: [Python-Dev] swig/python and intel's threadedbuildginblocks Message-ID: <28975580.post@talk.nabble.com> Hi, I've compiled intel's OSS threadedbuidlingblocks library on OpenBSD and put everything in some swig interfaces. Here you go: http://tullarisc.xtreemhost.com/swig.ttb.tgz Love, tullarisc. -- View this message in context: http://old.nabble.com/swig-python-and-intel%27s-threadedbuildginblocks-tp28975580p28975580.html Sent from the Python - python-dev mailing list archive at Nabble.com. From brett at python.org Wed Jun 23 23:53:36 2010 From: brett at python.org (Brett Cannon) Date: Wed, 23 Jun 2010 14:53:36 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? Message-ID: I finally realized why clang has not been silencing its warnings about unused return values: I have -Wno-unused-value set in CFLAGS which comes before OPT (which defines -Wall) as set in PY_CFLAGS in Makefile.pre.in. I could obviously set OPT in my environment, but that would override the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, but the README says that's for stuff that tweak binary compatibility. So basically what I am asking is what environment variable should I use? If CFLAGS is correct then does anyone have any issues if I change the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes after OPT? From a.badger at gmail.com Thu Jun 24 00:57:40 2010 From: a.badger at gmail.com (Toshio Kuratomi) Date: Wed, 23 Jun 2010 18:57:40 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100623233512.50b5b710@pitrou.net> References: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <13837.1277311608@parc.com> <20100623213645.658517d7@pitrou.net> <20100623213022.GB3470@unaka.lan> <20100623233512.50b5b710@pitrou.net> Message-ID: <20100623225740.GC3470@unaka.lan> On Wed, Jun 23, 2010 at 11:35:12PM +0200, Antoine Pitrou wrote: > On Wed, 23 Jun 2010 17:30:22 -0400 > Toshio Kuratomi wrote: > > Note that this assumption seems optimistic to me. I started talking to Graham > > Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste > > do decoding of bytes to unicode at different layers which caused problems > > for application level code that should otherwise run fine when being served > > by mod_wsgi or paste httpserver. That was the beginning of Graham starting > > to talk about what the wsgi spec really should look like under python3 > > instead of the broken way that the appendix to the current wsgi spec states. > > Ok, but the reason would be that the WSGI spec is broken. Not Python 3 > itself. > Agreed. Neither python2 nor python3 is broken. It's the wsgi spec and the implementation of that spec where things fall down. From your first post, I thought you were claiming that python3 was broken since web frameworks got decoding right on python2 and I just wanted to defend python3 by showing that python2 wasn't all sunshine and roses. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: From foom at fuhm.net Thu Jun 24 02:26:25 2010 From: foom at fuhm.net (James Y Knight) Date: Wed, 23 Jun 2010 20:26:25 -0400 Subject: [Python-Dev] Use of cgi.escape can lead to XSS vulnerabilities In-Reply-To: References: Message-ID: <09E6BE78-066E-4BCF-AA34-C6286CF8AB98@fuhm.net> On Jun 22, 2010, at 5:14 PM, Craig Younkins wrote: > I suggest rewording the documentation for the method making it more > clear what it should and should not be used for. I would like to see > the method changed to properly escape single-quotes, but if it is > not changed, the documentation should explicitly say this method > does not make input safe for inclusion in HTML. Well, it *does* make the input safe for inclusion in HTML...in a double-quoted attribute. The docs could make it clearer that you should always use double- quotes around your attribute values when using it, though, I agree. From janssen at parc.com Thu Jun 24 03:26:46 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 23 Jun 2010 18:26:46 PDT Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> References: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> Message-ID: <1366.1277342806@parc.com> See also http://gimper.net/viewtopic.php?f=18&t=3185. Bill From ronaldoussoren at mac.com Thu Jun 24 08:10:42 2010 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Thu, 24 Jun 2010 08:10:42 +0200 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: <1366.1277342806@parc.com> References: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> <1366.1277342806@parc.com> Message-ID: On 24 Jun, 2010, at 3:26, Bill Janssen wrote: > See also http://gimper.net/viewtopic.php?f=18&t=3185. That's because setgroups(3) is limited to 16 groups (that is, the kernel doesn't support more than 16 groups at all). Ronald -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3567 bytes Desc: not available URL: From greg.ewing at canterbury.ac.nz Thu Jun 24 09:20:34 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 24 Jun 2010 19:20:34 +1200 Subject: [Python-Dev] os.getgroups() on MacOS X Was: red buildbots on 2.7 In-Reply-To: References: <02EFB202-505A-405E-AE00-ABC0A2234DDA@mac.com> <1366.1277342806@parc.com> Message-ID: <4C230742.40103@canterbury.ac.nz> Ronald Oussoren wrote: > That's because setgroups(3) is limited to 16 groups > (that is, the kernel doesn't support more than 16 groups at all). So how does an account being a member of 18 groups ever work? -- Greg From stephen at xemacs.org Thu Jun 24 10:12:13 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 24 Jun 2010 17:12:13 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > For example: how we can make the suite of functions used for URL > processing more polymorphic, so that each developer can choose for > herself how URLs need to be treated in her application. While you have come down on the side of polymorphism (as opposed to separate functions), I'm a little nervous about it. Specifically, Philip Eby expressed a desire for earlier type errors, while polymorphism seems to ensure that you'll need to Look Before You Leap to get early error detection. From regebro at gmail.com Thu Jun 24 11:05:03 2010 From: regebro at gmail.com (Lennart Regebro) Date: Thu, 24 Jun 2010 11:05:03 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote: > Yeah. This is a real issue I have with the direction Python3 went: it pushes > you into decoding everything to unicode early, even when you don't care -- Well, yes, maybe even if *you* don't care. But often the functions you need to call must care, and then you need to decode to unicode, even if you personally don't care. And in those cases, you should deocde as early as possible. In the cases where neither you nor the functions you call care, then you don't have to decode, and you can happily pass binary data from one function to another. So this is not really a question of the direction Python 3 went. It's more a case that some methods that *could* do their transformations in a well defined way on bytes don't, and then force you to decode to unicode. But that's not a problem with direction, it's just a missing feature in the stdlib. -- Lennart Regebro: http://regebro.wordpress.com/ Python 3 Porting: http://python3porting.com/ +33 661 58 14 64 From mal at egenix.com Thu Jun 24 12:58:23 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 24 Jun 2010 12:58:23 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> Message-ID: <4C233A4F.2030607@egenix.com> Lennart Regebro wrote: > On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote: >> Yeah. This is a real issue I have with the direction Python3 went: it pushes >> you into decoding everything to unicode early, even when you don't care -- > > Well, yes, maybe even if *you* don't care. But often the functions you > need to call must care, and then you need to decode to unicode, even > if you personally don't care. And in those cases, you should deocde as > early as possible. > > In the cases where neither you nor the functions you call care, then > you don't have to decode, and you can happily pass binary data from > one function to another. > > So this is not really a question of the direction Python 3 went. It's > more a case that some methods that *could* do their transformations in > a well defined way on bytes don't, and then force you to decode to > unicode. But that's not a problem with direction, it's just a missing > feature in the stdlib. The discussion is showing that in at least a few application spaces, the stdlib should be able to work on both bytes and Unicode, preferably using the same interfaces using polymorphism, i.e. some_function(bytes) -> bytes some_function(str) -> str In Python2 this partially works due to the automatic bytes->str conversion (in some cases you get some_function(bytes) -> str), the codec base class implementations being a prime example. In Python3, things have to be done explicity and I think we need to add a few helpers to make writing such str/bytes interfaces easier. We've already had some suggestions in that area, but probably need to collect a few more ideas based on real-life porting attempts. I'd like to make this a topic at the upcoming language summit in Birmingham, if Michael agrees. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 24 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 24 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From fuzzyman at voidspace.org.uk Thu Jun 24 13:00:12 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 24 Jun 2010 12:00:12 +0100 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C233A4F.2030607@egenix.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <4C233A4F.2030607@egenix.com> Message-ID: <4C233ABC.40702@voidspace.org.uk> On 24/06/2010 11:58, M.-A. Lemburg wrote: > Lennart Regebro wrote: > >> On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote: >> >>> Yeah. This is a real issue I have with the direction Python3 went: it pushes >>> you into decoding everything to unicode early, even when you don't care -- >>> >> Well, yes, maybe even if *you* don't care. But often the functions you >> need to call must care, and then you need to decode to unicode, even >> if you personally don't care. And in those cases, you should deocde as >> early as possible. >> >> In the cases where neither you nor the functions you call care, then >> you don't have to decode, and you can happily pass binary data from >> one function to another. >> >> So this is not really a question of the direction Python 3 went. It's >> more a case that some methods that *could* do their transformations in >> a well defined way on bytes don't, and then force you to decode to >> unicode. But that's not a problem with direction, it's just a missing >> feature in the stdlib. >> > The discussion is showing that in at least a few application spaces, > the stdlib should be able to work on both bytes and Unicode, preferably > using the same interfaces using polymorphism, i.e. > > some_function(bytes) -> bytes > some_function(str) -> str > > In Python2 this partially works due to the automatic bytes->str > conversion (in some cases you get some_function(bytes) -> str), > the codec base class implementations being a prime example. > > In Python3, things have to be done explicity and I think we need > to add a few helpers to make writing such str/bytes interfaces > easier. > > We've already had some suggestions in that area, but probably need > to collect a few more ideas based on real-life porting attempts. > > I'd like to make this a topic at the upcoming language summit > in Birmingham, if Michael agrees. > > Yep, it sounds like a great topic for the language summit. Michael -- http://www.ironpythoninaction.com/ From guido at python.org Thu Jun 24 16:33:42 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 07:33:42 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Jun 24, 2010 at 1:12 AM, Stephen J. Turnbull wrote: > Guido van Rossum writes: > > ?> For example: how we can make the suite of functions used for URL > ?> processing more polymorphic, so that each developer can choose for > ?> herself how URLs need to be treated in her application. > > While you have come down on the side of polymorphism (as opposed to > separate functions), I'm a little nervous about it. ?Specifically, > Philip Eby expressed a desire for earlier type errors, while > polymorphism seems to ensure that you'll need to Look Before You Leap > to get early error detection. Understood, but both the majority of str/bytes methods and several existing APIs (e.g. many in the os module, like os.listdir()) do it this way. Also, IMO a polymorphic function should *not* accept *mixed* bytes/text input -- join('x', b'y') should be rejected. But join('x', 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me. So, actually, I *don't* understand what you mean by needing LBYL. -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Thu Jun 24 17:25:18 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 01:25:18 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 12:33 AM, Guido van Rossum wrote: > Also, IMO a polymorphic function should *not* accept *mixed* > bytes/text input -- join('x', b'y') should be rejected. But join('x', > 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me. A policy of allowing arguments to be either str or bytes, but not a mixture, actually avoids one of the more painful aspects of the 2.x "promote mixed operations to unicode" approach. Specifically, you either had to scan all the arguments up front to check for unicode, or else you had to stop what you were doing and start again with the unicode version if you encountered unicode partway through. Neither was particularly nice to implement. As you noted elsewhere, literals and string methods are still likely to be a major sticking point with that approach - common operations like ''.join(seq) and b''.join(seq) aren't polymorphic, so functions that use them won't be polymorphic either. (It's only the str->unicode promotion behaviour in 2.x that works around this problem there). Would it be heretical to suggest that sum() be allowed to work on strings to at least eliminate ''.join() as something that breaks bytes processing? It already works for bytes, although it then fails with a confusing message for bytearray: >>> sum(b"a b c".split(), b'') b'abc' >>> sum(bytearray(b"a b c").split(), bytearray(b'')) Traceback (most recent call last): File "", line 1, in TypeError: sum() can't sum bytes [use b''.join(seq) instead] >>> sum("a b c".split(), '') Traceback (most recent call last): File "", line 1, in TypeError: sum() can't sum strings [use ''.join(seq) instead] Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Thu Jun 24 17:41:14 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 08:41:14 -0700 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Jun 24, 2010 at 8:25 AM, Nick Coghlan wrote: > On Fri, Jun 25, 2010 at 12:33 AM, Guido van Rossum wrote: >> Also, IMO a polymorphic function should *not* accept *mixed* >> bytes/text input -- join('x', b'y') should be rejected. But join('x', >> 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me. > > A policy of allowing arguments to be either str or bytes, but not a > mixture, actually avoids one of the more painful aspects of the 2.x > "promote mixed operations to unicode" approach. Specifically, you > either had to scan all the arguments up front to check for unicode, or > else you had to stop what you were doing and start again with the > unicode version if you encountered unicode partway through. Neither > was particularly nice to implement. Right. Polymorphic functions should *not* allow mixing text and bytes. It's all text or all bytes. > As you noted elsewhere, literals and string methods are still likely > to be a major sticking point with that approach - common operations > like ''.join(seq) and b''.join(seq) aren't polymorphic, so functions > that use them won't be polymorphic either. (It's only the str->unicode > promotion behaviour in 2.x that works around this problem there). > > Would it be heretical to suggest that sum() be allowed to work on > strings to at least eliminate ''.join() as something that breaks bytes > processing? It already works for bytes, although it then fails with a > confusing message for bytearray: > >>>> sum(b"a b c".split(), b'') > b'abc' > >>>> sum(bytearray(b"a b c").split(), bytearray(b'')) > Traceback (most recent call last): > ?File "", line 1, in > TypeError: sum() can't sum bytes [use b''.join(seq) instead] > >>>> sum("a b c".split(), '') > Traceback (most recent call last): > ?File "", line 1, in > TypeError: sum() can't sum strings [use ''.join(seq) instead] I don't think we should abuse sum for this. A simple idiom to get the *empty* string of a particular type is x[:0] so you could write something like this to concatenate a list or strings or bytes: xs[:0].join(xs). Note that if xs is empty we wouldn't know what to do anyway so this should be disallowed. -- --Guido van Rossum (python.org/~guido) From barry at python.org Thu Jun 24 17:50:48 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 11:50:48 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 Message-ID: <20100624115048.4fd152e3@heresy> This is a follow up to PEP 3147. That PEP, already implemented in Python 3.2, allows for Python source files from different Python versions to live together in the same directory. It does this by putting a magic tag in the .pyc file name and placing the .pyc file in a __pycache__ directory. Distros such as Debian and Ubuntu will use this to greatly simplifying deploying Python, and Python applications and libraries. Debian and Ubuntu usually ship more than one version of Python, and currently have to play complex games with symlinks to make this work. PEP 3147 will go a long way to eliminating the need for extra directories and symlinks. One more thing I've found we need though, is a way to handled shared libraries for extension modules. Just as we can get name collisions on foo.pyc, we can get collisions on foo.so. We obviously cannot install foo.so built for Python 3.2 and foo.so built for Python 3.3 in the same location. So symlink nightmare's mini-me is back. I have a fairly simple fix for this. I'd actually be surprised if this hasn't been discussed before, but teh Googles hasn't turned up anything. The idea is to put the Python version number in the shared library file name, and extend .so lookup to find these extended file names. So for example, we'd see foo.3.2.so instead, and Python would know how to dynload both that and the traditional foo.so file too (for backward compatibility). (On file naming: the original patch used foo.so.3.2 and that works just as well, but I thought there might be tools that expect exactly a '.so' suffix, so I changed it to put the Major.Minor version number to the left of the extension. The exact naming scheme is of course open to debate.) This is a much simpler patch than PEP 3147, though I'm not 100% sure it's the right approach. The way this works is by modifying the configure and Makefile.pre.in to put the version number in the $SO make variable. Python parses its (generated) Makefile to find $SO and it uses this deep in the bowels of distutils to decide what suffix to use when writing shared libraries built by 'python setup.py build_ext'. This means the patched Python only writes versioned .so files by default. I personally don't see that as a problem, and it does not affect the test suite, with the exception of one easily tweaked test. I don't know if third party tools will care. The fact that traditional foo.so shared libraries will still satisfy the import should be enough, I think. The patch is currently Linux only, since I need this for Debian and Ubuntu and wanted to keep the change narrow. Other possible approaches: * Extend the distutils API so that the .so file extension can be passed in, instead of being essentially hardcoded to what Python's Makefile contains. * Keep the dynload_shlib.c change, but modify the Debian/Ubuntu build environment to pass in $SO to make (though the configure.in warning and sleep is a little annoying). * Add a ./configure option to enable this, which Debuntu's build would use. The patch is available here: http://pastebin.ubuntu.com/454512/ and my working branch is here: https://code.edge.launchpad.net/~barry/python/sovers Please let me know what you think. I'm happy to just commit this to the py3k branch if there are no objections . I don't think a new PEP is in order, but an update to PEP 3147 might make sense. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From benjamin at python.org Thu Jun 24 17:58:09 2010 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 24 Jun 2010 10:58:09 -0500 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: 2010/6/24 Barry Warsaw : > Please let me know what you think. ?I'm happy to just commit this to the py3k > branch if there are no objections . ?I don't think a new PEP is in > order, but an update to PEP 3147 might make sense. How will this interact with PEP 384 if that is implemented? -- Regards, Benjamin From daniel at stutzbachenterprises.com Thu Jun 24 18:05:29 2010 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 24 Jun 2010 11:05:29 -0500 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: On Thu, Jun 24, 2010 at 10:50 AM, Barry Warsaw wrote: > The idea is to put the Python version number in the shared library file > name, > and extend .so lookup to find these extended file names. So for example, > we'd > see foo.3.2.so instead, and Python would know how to dynload both that and > the > traditional foo.so file too (for backward compatibility). > What use case does this address? PEP 3147 addresses the fact that the user may have different versions of Python installed and each wants to write a .pyc file when loading a module. .so files are not generated simply by running the Python interpreter, ergo .so files are not an issue for that use case. If you want to make it so a system can install a package in just one location to be used by multiple Python installations, then the version number isn't enough. You also need to distinguish debug builds, profiling builds, Unicode width (see issue8654), and probably several other ./configure options. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From baptiste13z at free.fr Thu Jun 24 18:58:59 2010 From: baptiste13z at free.fr (Baptiste Carvello) Date: Thu, 24 Jun 2010 18:58:59 +0200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100621181750.267933A404D@sparrow.telecommunity.com> References: <87sk4jcejy.fsf@uwakimon.sk.tsukuba.ac.jp> <201006201204.30795.steve@pearwood.info> <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <20100621023005.EE17E3A4099@sparrow.telecommunity.com> <20100621164650.16A093A414B@sparrow.telecommunity.com> <20100621181750.267933A404D@sparrow.telecommunity.com> Message-ID: P.J. Eby a ?crit : > [...] stdlib constants are almost always ASCII, > and the main use cases for ebytes would involve ascii-extended encodings.) Then, how about a new "ascii string" literal? This would produce a special kind of string that would coerce to a normal string when mixed with a str, and to a bytes using ascii codec when mixed with a bytes. Then you could write >>> a"/".join(base, path) and not worry if base and path are both str, or both bytes (mixed being of course forbidden). B. From pje at telecommunity.com Thu Jun 24 19:07:01 2010 From: pje at telecommunity.com (P.J. Eby) Date: Thu, 24 Jun 2010 13:07:01 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100624170856.0853D3A4099@sparrow.telecommunity.com> At 05:12 PM 6/24/2010 +0900, Stephen J. Turnbull wrote: >Guido van Rossum writes: > > > For example: how we can make the suite of functions used for URL > > processing more polymorphic, so that each developer can choose for > > herself how URLs need to be treated in her application. > >While you have come down on the side of polymorphism (as opposed to >separate functions), I'm a little nervous about it. Specifically, >Philip Eby expressed a desire for earlier type errors, while >polymorphism seems to ensure that you'll need to Look Before You Leap >to get early error detection. This doesn't have to be in the functions; it can be in the *types*. Mixed-type string operations have to do type checking and upcasting already, but if the protocol were open, you could make an encoded-bytes type that would handle the error checking. (Btw, in some earlier emails, Stephen, you implied that this could be fixed with codecs -- but it can't, because the problem isn't with the bytes containing invalid Unicode, it's with the Unicode containing invalid bytes -- i.e., characters that can't be encoded to the ultimate codec target.) From janssen at parc.com Thu Jun 24 19:38:19 2010 From: janssen at parc.com (Bill Janssen) Date: Thu, 24 Jun 2010 10:38:19 PDT Subject: [Python-Dev] thoughts on the bytes/string discussion Message-ID: <11597.1277401099@parc.com> Here are a couple of ideas I'm taking away from the bytes/string discussion. First, it would probably be a good idea to have a String ABC. Secondly, maybe the string situation in 2.x wasn't as broken as we thought it was. In particular, those who deal with lots of encoded strings seemed to find it handy, and miss it in 3.x. Perhaps strings are more like numbers than we think. We have separate types for int, float, Decimal, etc. But they're all numbers, and they all cross-operate. In 2.x, it seems there were two missing features: no encoding attribute on str, which should have been there and should have been required, and the default encoding being "ASCII" (I can't tell you how many times I've had to fix that issue when a non-ASCII encoded str was passed to some output function). So maybe having a second string type in 3.x that consists of an encoded sequence of bytes plus the encoding, call it "estr", wouldn't have been a bad idea. It would probably have made sense to have estr cooperate with the str type, in the same way that two different kinds of numbers cooperate, "promoting" the result of an operation only when necessary. This would automatically achieve the kind of polymorphic functionality that Guido is suggesting, but without losing the ability to do x = e(ASCII)"bar" a = ''.join("foo", x) (or whatever the syntax for such an encoded string literal would be -- I'm not claiming this is a good one) which presume would bind "a" to a Unicode string "foobar" -- have to work out what gets promoted to what. The language moratorium kind of makes this all theoretical, but building a String ABC still would be a good start, and presumably isn't forbidden by the moratorium. Bill From brett at python.org Thu Jun 24 19:48:56 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 10:48:56 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: On Thu, Jun 24, 2010 at 08:50, Barry Warsaw wrote: > This is a follow up to PEP 3147. ?That PEP, already implemented in Python 3.2, > allows for Python source files from different Python versions to live together > in the same directory. ?It does this by putting a magic tag in the .pyc file > name and placing the .pyc file in a __pycache__ directory. > > Distros such as Debian and Ubuntu will use this to greatly simplifying > deploying Python, and Python applications and libraries. ?Debian and Ubuntu > usually ship more than one version of Python, and currently have to play > complex games with symlinks to make this work. ?PEP 3147 will go a long way to > eliminating the need for extra directories and symlinks. > > One more thing I've found we need though, is a way to handled shared libraries > for extension modules. ?Just as we can get name collisions on foo.pyc, we can > get collisions on foo.so. ?We obviously cannot install foo.so built for Python > 3.2 and foo.so built for Python 3.3 in the same location. ?So symlink > nightmare's mini-me is back. > > I have a fairly simple fix for this. ?I'd actually be surprised if this hasn't > been discussed before, but teh Googles hasn't turned up anything. > > The idea is to put the Python version number in the shared library file name, > and extend .so lookup to find these extended file names. ?So for example, we'd > see foo.3.2.so instead, and Python would know how to dynload both that and the > traditional foo.so file too (for backward compatibility). > > (On file naming: the original patch used foo.so.3.2 and that works just as > well, but I thought there might be tools that expect exactly a '.so' suffix, > so I changed it to put the Major.Minor version number to the left of the > extension. ?The exact naming scheme is of course open to debate.) > While the idea is fine with me since I won't have any of my directories cluttered with multiple .so files, I would still want to add some moniker showing that the version number represents the interpreter and not the .so file. If I read "foo.3.2.so", that naively seems to mean to mean the foo module's 3.2 release is what is in installed, not that it's built for CPython 3.2. So even though it might be redundant, I would still want the VM name added. Adding the VM name also doesn't make extension modules the exclusive domain of CPython either. If some other VM decides to make their own .so files that are not binary compatible then we should not preclude that as this solution it is nothing more than it makes a string comparison have to look at 7 more characters. -Brett P.S.: I wish we could drop use of the 'module.so' variant at the same time, for consistency sake and to cut out a stat call, but I know that is asking too much. From barry at python.org Thu Jun 24 19:51:19 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 13:51:19 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624135119.00b9ac5c@heresy> On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: >2010/6/24 Barry Warsaw : >> Please let me know what you think. ?I'm happy to just commit this to the >> py3k branch if there are no objections . ?I don't think a new PEP is >> in order, but an update to PEP 3147 might make sense. > >How will this interact with PEP 384 if that is implemented? Good question, I'd forgotten to mention that PEP. I think the PEP is a good idea, and worth working on, but it is a longer term solution to the problem of extension source code compatibility. It's longer term because extensions will have to be rewritten to use the new API defined in PEP 384. It will take a long time to get this into practice, and supporting it will be a case-by-case basis. I'm trying to come up with something that will work immediately while PEP 384 is being adopted. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From benjamin at python.org Thu Jun 24 20:00:54 2010 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 24 Jun 2010 13:00:54 -0500 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624135119.00b9ac5c@heresy> References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> Message-ID: 2010/6/24 Barry Warsaw : > On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: > >>2010/6/24 Barry Warsaw : >>> Please let me know what you think. ?I'm happy to just commit this to the >>> py3k branch if there are no objections . ?I don't think a new PEP is >>> in order, but an update to PEP 3147 might make sense. >> >>How will this interact with PEP 384 if that is implemented? > I'm trying to come up with something that will work immediately while PEP 384 > is being adopted. But how will modules specify that they support multiple ABIs then? -- Regards, Benjamin From brett at python.org Thu Jun 24 20:11:07 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 11:11:07 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <11597.1277401099@parc.com> References: <11597.1277401099@parc.com> Message-ID: On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote: [SNIP] > The language moratorium kind of makes this all theoretical, but building > a String ABC still would be a good start, and presumably isn't forbidden > by the moratorium. Because a new ABC would go into the stdlib (I assume in collections or string) the moratorium does not apply. From guido at python.org Thu Jun 24 20:27:37 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 11:27:37 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon wrote: > On Thu, Jun 24, 2010 at 08:50, Barry Warsaw wrote: >> This is a follow up to PEP 3147. ?That PEP, already implemented in Python 3.2, >> allows for Python source files from different Python versions to live together >> in the same directory. ?It does this by putting a magic tag in the .pyc file >> name and placing the .pyc file in a __pycache__ directory. >> >> Distros such as Debian and Ubuntu will use this to greatly simplifying >> deploying Python, and Python applications and libraries. ?Debian and Ubuntu >> usually ship more than one version of Python, and currently have to play >> complex games with symlinks to make this work. ?PEP 3147 will go a long way to >> eliminating the need for extra directories and symlinks. >> >> One more thing I've found we need though, is a way to handled shared libraries >> for extension modules. ?Just as we can get name collisions on foo.pyc, we can >> get collisions on foo.so. ?We obviously cannot install foo.so built for Python >> 3.2 and foo.so built for Python 3.3 in the same location. ?So symlink >> nightmare's mini-me is back. >> >> I have a fairly simple fix for this. ?I'd actually be surprised if this hasn't >> been discussed before, but teh Googles hasn't turned up anything. >> >> The idea is to put the Python version number in the shared library file name, >> and extend .so lookup to find these extended file names. ?So for example, we'd >> see foo.3.2.so instead, and Python would know how to dynload both that and the >> traditional foo.so file too (for backward compatibility). >> >> (On file naming: the original patch used foo.so.3.2 and that works just as >> well, but I thought there might be tools that expect exactly a '.so' suffix, >> so I changed it to put the Major.Minor version number to the left of the >> extension. ?The exact naming scheme is of course open to debate.) >> > > While the idea is fine with me since I won't have any of my > directories cluttered with multiple .so files, I would still want to > add some moniker showing that the version number represents the > interpreter and not the .so file. If I read "foo.3.2.so", that naively > seems to mean to mean the foo module's 3.2 release is what is in > installed, not that it's built for CPython 3.2. So even though it > might be redundant, I would still want the VM name added. Well, for versions of the .so itself, traditionally version numbers are appended *after* the .so suffix (check your /lib directory :-). > Adding the VM name also doesn't make extension modules the exclusive > domain of CPython either. If some other VM decides to make their own > .so files that are not binary compatible then we should not preclude > that as this solution it is nothing more than it makes a string > comparison have to look at 7 more characters. > > -Brett > > P.S.: I wish we could drop use of the 'module.so' variant at the same > time, for consistency sake and to cut out a stat call, but I know that > is asking too much. I wish so too. IIRC there used to be some modules that on Windows were wrappers around 3rd party DLLs and you can't have foo.dll as the module wrapping foo.dll the 3rd party DLL. (On Unix this problem doesn't exist because the 3rd party .so would be named libfoo.so, not foo.so.) -- --Guido van Rossum (python.org/~guido) From barry at python.org Thu Jun 24 20:28:30 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 14:28:30 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> Message-ID: <20100624142830.4c859faf@limelight.wooz.org> On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote: >2010/6/24 Barry Warsaw : >> On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: >> >>>2010/6/24 Barry Warsaw : >>>> Please let me know what you think. ?I'm happy to just commit this to the >>>> py3k branch if there are no objections . ?I don't think a new PEP is >>>> in order, but an update to PEP 3147 might make sense. >>> >>>How will this interact with PEP 384 if that is implemented? >> I'm trying to come up with something that will work immediately while PEP 384 >> is being adopted. > >But how will modules specify that they support multiple ABIs then? I didn't understand, so asked Benjamin for clarification in IRC. barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports the stable abi, will it load it? [14:25] gutworth: thanks, now i get it :) [14:26] gutworth: i think it should, but it wouldn't under my scheme. let me think about it -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From brett at python.org Thu Jun 24 20:47:14 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 11:47:14 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: On Thu, Jun 24, 2010 at 11:27, Guido van Rossum wrote: > On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon wrote: >> On Thu, Jun 24, 2010 at 08:50, Barry Warsaw wrote: >>> This is a follow up to PEP 3147. ?That PEP, already implemented in Python 3.2, >>> allows for Python source files from different Python versions to live together >>> in the same directory. ?It does this by putting a magic tag in the .pyc file >>> name and placing the .pyc file in a __pycache__ directory. >>> >>> Distros such as Debian and Ubuntu will use this to greatly simplifying >>> deploying Python, and Python applications and libraries. ?Debian and Ubuntu >>> usually ship more than one version of Python, and currently have to play >>> complex games with symlinks to make this work. ?PEP 3147 will go a long way to >>> eliminating the need for extra directories and symlinks. >>> >>> One more thing I've found we need though, is a way to handled shared libraries >>> for extension modules. ?Just as we can get name collisions on foo.pyc, we can >>> get collisions on foo.so. ?We obviously cannot install foo.so built for Python >>> 3.2 and foo.so built for Python 3.3 in the same location. ?So symlink >>> nightmare's mini-me is back. >>> >>> I have a fairly simple fix for this. ?I'd actually be surprised if this hasn't >>> been discussed before, but teh Googles hasn't turned up anything. >>> >>> The idea is to put the Python version number in the shared library file name, >>> and extend .so lookup to find these extended file names. ?So for example, we'd >>> see foo.3.2.so instead, and Python would know how to dynload both that and the >>> traditional foo.so file too (for backward compatibility). >>> >>> (On file naming: the original patch used foo.so.3.2 and that works just as >>> well, but I thought there might be tools that expect exactly a '.so' suffix, >>> so I changed it to put the Major.Minor version number to the left of the >>> extension. ?The exact naming scheme is of course open to debate.) >>> >> >> While the idea is fine with me since I won't have any of my >> directories cluttered with multiple .so files, I would still want to >> add some moniker showing that the version number represents the >> interpreter and not the .so file. If I read "foo.3.2.so", that naively >> seems to mean to mean the foo module's 3.2 release is what is in >> installed, not that it's built for CPython 3.2. So even though it >> might be redundant, I would still want the VM name added. > > Well, for versions of the .so itself, traditionally version numbers > are appended *after* the .so suffix (check your /lib directory :-). > Second thing you taught me today (first was the x[:0] trick)! I've also been on OS X too long; /usr/lib is just .dynalib and that puts the version number before the extension. >> Adding the VM name also doesn't make extension modules the exclusive >> domain of CPython either. If some other VM decides to make their own >> .so files that are not binary compatible then we should not preclude >> that as this solution it is nothing more than it makes a string >> comparison have to look at 7 more characters. >> >> -Brett >> >> P.S.: I wish we could drop use of the 'module.so' variant at the same >> time, for consistency sake and to cut out a stat call, but I know that >> is asking too much. > > I wish so too. IIRC there used to be some modules that on Windows were > wrappers around 3rd party DLLs and you can't have foo.dll as the > module wrapping foo.dll the 3rd party DLL. (On Unix this problem > doesn't exist because the 3rd party .so would be named libfoo.so, not > foo.so.) Wouldn't Barry's proposed solution actually fill this need since it will give the file a custom Python suffix that more-or-less guarantees no name clash with a third-party DLL? From merwok at netwok.org Thu Jun 24 20:50:41 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 24 Jun 2010 20:50:41 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: <4C23A901.7060100@netwok.org> Le 24/06/2010 17:50, Barry Warsaw (FLUFL) a ?crit : > Other possible approaches: > * Extend the distutils API so that the .so file extension can be passed in, > instead of being essentially hardcoded to what Python's Makefile contains. Third-party code rely on Distutils internal quirks, so it?s frozen. Feel free to open a bug against Distutils2 on the Python tracker if that would be generally useful. Regards From merwok at netwok.org Thu Jun 24 20:53:02 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 24 Jun 2010 20:53:02 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <4C23A98E.4080303@netwok.org> Le 24/06/2010 19:48, Brett Cannon a ?crit : > P.S.: I wish we could drop use of the 'module.so' variant at the same > time, for consistency sake and to cut out a stat call, but I know that > is asking too much. At least, looking for spam/__init__module.so could be avoided. It seems to me that the package definition does not allow that. The tradeoff would be code complication for one less stat call. Worth a bug report? Regards From fuzzyman at voidspace.org.uk Thu Jun 24 21:07:41 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Thu, 24 Jun 2010 20:07:41 +0100 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: <4C23ACFD.6040506@voidspace.org.uk> On 24/06/2010 19:11, Brett Cannon wrote: > On Thu, Jun 24, 2010 at 10:38, Bill Janssen wrote: > [SNIP] > >> The language moratorium kind of makes this all theoretical, but building >> a String ABC still would be a good start, and presumably isn't forbidden >> by the moratorium. >> > Because a new ABC would go into the stdlib (I assume in collections or > string) the moratorium does not apply. > Although it would require changes for builtin types like file to work with a new string ABC, right? Michael > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From brett at python.org Thu Jun 24 21:10:38 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 12:10:38 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C23ACFD.6040506@voidspace.org.uk> References: <11597.1277401099@parc.com> <4C23ACFD.6040506@voidspace.org.uk> Message-ID: On Thu, Jun 24, 2010 at 12:07, Michael Foord wrote: > On 24/06/2010 19:11, Brett Cannon wrote: >> >> On Thu, Jun 24, 2010 at 10:38, Bill Janssen ?wrote: >> [SNIP] >> >>> >>> The language moratorium kind of makes this all theoretical, but building >>> a String ABC still would be a good start, and presumably isn't forbidden >>> by the moratorium. >>> >> >> Because a new ABC would go into the stdlib (I assume in collections or >> string) the moratorium does not apply. >> > > Although it would require changes for builtin types like file to work with a > new string ABC, right? Only if they wanted to rely on some concrete implementation of a method contained within the ABC. Otherwise that's what abc.register exists for. From ianb at colorstudy.com Thu Jun 24 21:49:33 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 Jun 2010 14:49:33 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <11597.1277401099@parc.com> References: <11597.1277401099@parc.com> Message-ID: On Thu, Jun 24, 2010 at 12:38 PM, Bill Janssen wrote: > Here are a couple of ideas I'm taking away from the bytes/string > discussion. > > First, it would probably be a good idea to have a String ABC. > > Secondly, maybe the string situation in 2.x wasn't as broken as we > thought it was. In particular, those who deal with lots of encoded > strings seemed to find it handy, and miss it in 3.x. Perhaps strings > are more like numbers than we think. We have separate types for int, > float, Decimal, etc. But they're all numbers, and they all > cross-operate. In 2.x, it seems there were two missing features: no > encoding attribute on str, which should have been there and should have > been required, and the default encoding being "ASCII" (I can't tell you > how many times I've had to fix that issue when a non-ASCII encoded str > was passed to some output function). > I've started to form a conceptual notion that I think fits these cases. We've setup a system where we think of text as natively unicode, with encodings to put that unicode into a byte form. This is certainly appropriate in a lot of cases. But there's a significant class of problems where bytes are the native structure. Network protocols are what we've been discussing, and are a notable case of that. That is, b'/' is the most native sense of a path separator in a URL, or b':' is the most native sense of what separates a header name from a header value in HTTP. To disallow unicode URLs or unicode HTTP headers would be rather anti-social, especially because unicode is now the "native" string type in Python 3 (as an aside for the WSGI spec we've been talking about using "native" strings in some positions like dictionary keys, meaning Python 2 str and Python 3 str, while being more exacting in other areas such as a response body which would always be bytes). The HTTP spec and other network protocols seems a little fuzzy on this, because it was written before unicode even existed, and even later activity happened at a point when "unicode" and "text" weren't widely considered the same thing like they are now. But I think the original intention is revealed in a more modern specification like WebSockets, where they are very explicit that ':' is just shorthand for a particular byte, it is not "text" in our new modern notion of the term. So with this idea in mind it makes more sense to me that *specific pieces of text* can be reasonably treated as both bytes and text. All the string literals in urllib.parse.urlunspit() for example. The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not become special('/x')) and special('/')+x=='/x' (again it becomes str). This avoids some of the cases of unicode or str infecting a system as they did in Python 2 (where you might pass in unicode and everything works fine until some non-ASCII is introduced). The one place where this might be tricky is if you have an encoding that is not ASCII compatible. But we can't guard against every possibility. So it would be entirely wrong to take a string encoded with UTF-16 and start to use b'/' with it. But there are other nonsensical combinations already possible, especially with polymorphic functions, we can't guard against all of them. Also I'm unsure if something like UTF-16 is in any way compatible with the kind of legacy systems that use bytes. Can you encode your filesystem with UTF-16? I don't think you could encode a cookie with it. So maybe having a second string type in 3.x that consists of an encoded > sequence of bytes plus the encoding, call it "estr", wouldn't have been > a bad idea. It would probably have made sense to have estr cooperate > with the str type, in the same way that two different kinds of numbers > cooperate, "promoting" the result of an operation only when necessary. > This would automatically achieve the kind of polymorphic functionality > that Guido is suggesting, but without losing the ability to do > > x = e(ASCII)"bar" > a = ''.join("foo", x) > > (or whatever the syntax for such an encoded string literal would be -- > I'm not claiming this is a good one) which presume would bind "a" to a > Unicode string "foobar" -- have to work out what gets promoted to what. > I would be entirely happy without a literal syntax. But as Phillip has noted, this can't be implemented *entirely* in a library as there are some constraints with the current str/bytes implementations. Reading PEP 3003 I'm not clear if such changes are part of the moratorium? They seem like they would be (sadly), but it doesn't seem clearly noted. I think there's a *different* use case for things like bytes-in-a-utf8-encoding (e.g., to allow XML data to be decoded lazily), but that could be yet another class, and maybe shouldn't be polymorphicly usable as bytes (i.e., treat it as an optimized str representation that is otherwise semantically equivalent). A String ABC would formalize these things. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Jun 24 22:46:37 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 16:46:37 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624142830.4c859faf@limelight.wooz.org> References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> <20100624142830.4c859faf@limelight.wooz.org> Message-ID: <20100624164637.22fd9160@heresy> On Jun 24, 2010, at 02:28 PM, Barry Warsaw wrote: >On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote: > >>2010/6/24 Barry Warsaw : >>> On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: >>> >>>>2010/6/24 Barry Warsaw : >>>>> Please let me know what you think. ?I'm happy to just commit this to the >>>>> py3k branch if there are no objections . ?I don't think a new PEP is >>>>> in order, but an update to PEP 3147 might make sense. >>>> >>>>How will this interact with PEP 384 if that is implemented? >>> I'm trying to come up with something that will work immediately while PEP 384 >>> is being adopted. >> >>But how will modules specify that they support multiple ABIs then? > >I didn't understand, so asked Benjamin for clarification in IRC. > > barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports > the stable abi, will it load it? [14:25] > gutworth: thanks, now i get it :) [14:26] > gutworth: i think it should, but it wouldn't under my scheme. let me > think about it So, we could say that PEP 384 compliant extension modules would get written without a version specifier. IOW, we'd treat foo.so as using the ABI. It would then be up to the Python runtime to throw ImportErrors if in fact we were loading a legacy, non-PEP 384 compliant extension. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jun 24 22:53:36 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 16:53:36 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624165336.27fc7cc9@heresy> On Jun 24, 2010, at 10:48 AM, Brett Cannon wrote: >While the idea is fine with me since I won't have any of my >directories cluttered with multiple .so files, I would still want to >add some moniker showing that the version number represents the >interpreter and not the .so file. If I read "foo.3.2.so", that naively >seems to mean to mean the foo module's 3.2 release is what is in >installed, not that it's built for CPython 3.2. So even though it >might be redundant, I would still want the VM name added. I have a new version of my patch that steals the "magic tag" idea from PEP 3147. Note that it does not use the *actual* same piece of information to compose the file name, but for now it does match the pyc tag string. E.g. % find . -name \*.so ./build/lib.linux-x86_64-3.2/math.cpython-32.so ./build/lib.linux-x86_64-3.2/select.cpython-32.so ./build/lib.linux-x86_64-3.2/_struct.cpython-32.so ... Further, by default, ./configure doesn't add this tag so that you would have to build Python with: % SOABI=cpython-32 ./configure to get anything between the module name and the extension. I could of course make this a configure switch instead, and could default it to some other magic string instead of the empty string. >Adding the VM name also doesn't make extension modules the exclusive >domain of CPython either. If some other VM decides to make their own >.so files that are not binary compatible then we should not preclude >that as this solution it is nothing more than it makes a string >comparison have to look at 7 more characters. > >-Brett > >P.S.: I wish we could drop use of the 'module.so' variant at the same >time, for consistency sake and to cut out a stat call, but I know that >is asking too much. I think you're right that with the $SOABI trick above, you wouldn't get the name collisions Guido recalls, and you could get rid of module.so. OTOH, as I am currently only targeting Linux, it seems like the module.so stat is wasted anyway on that platform. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jun 24 22:55:33 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 16:55:33 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624165533.46a5fb8e@heresy> On Jun 24, 2010, at 11:27 AM, Guido van Rossum wrote: >On Thu, Jun 24, 2010 at 10:48 AM, Brett Cannon wrote: >> While the idea is fine with me since I won't have any of my >> directories cluttered with multiple .so files, I would still want to >> add some moniker showing that the version number represents the >> interpreter and not the .so file. If I read "foo.3.2.so", that naively >> seems to mean to mean the foo module's 3.2 release is what is in >> installed, not that it's built for CPython 3.2. So even though it >> might be redundant, I would still want the VM name added. > >Well, for versions of the .so itself, traditionally version numbers >are appended *after* the .so suffix (check your /lib directory :-). Which is probably another reason not to use foo.so.X.Y for Python extension modules. I think it would be confusing, and foo..so looks nice and is consistent with foo..pyc. (Ref to updated patch coming...) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Thu Jun 24 22:59:09 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 13:59:09 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: I see it a little differently (though there is probably a common concept lurking in here). The protocols you mention are intentionally designed to be encoding-neutral as long as the encoding is an ASCII superset. This covers ASCII itself, Latin-1, Latin-N for other values of N, MacRoman, Microsoft's code pages (most of them anyways), UTF-8, presumably at least some of the Japanese encodings, and probably a host of others. But it does not cover UTF-16, EBCDIC, and others. (Encodings that have "shift bytes" that change the meaning of some or all ordinary ASCII characters also aren't covered, unless such an encoding happens to exclude the special characters that the protocol spec cares about). The protocol specs typically go out of their way to specify what byte values they use for syntactically significant positions (e.g. ':' in headers, or '/' in URLs), while hand-waving about the meaning of "what goes in between" since it is all typically treated as "not of syntactic significance". So you can write a parser that looks at bytes exclusively, and looks for a bunch of ASCII punctuation characters (e.g. '<', '>', '/', '&'), and doesn't know or care whether the stuff in between is encoded in Latin-15, MacRoman or UTF-8 -- it never looks "inside" stretches of characters between the special characters and just copies them. (Sometimes there may be *some* sections that are required to be ASCII and there equivalence of a-z and A-Z is well defined.) But I wouldn't go so far as to claim that interpreting the protocols as text is wrong. After all we're talking exclusively about protocols that are designed intentionally to be directly "human readable" (albeit as a fall-back option) -- the only tool you need to debug the traffic on the wire or socket is something that knows which subset of ASCII is considered "printable" and which renders everything else safely as a hex escape or even a special "unknown" character (like Unicode's "?" inside a black diamond). Depending on the requirements of a specific app (or framework) it may be entirely reasonable to convert everything to Unicode and process the resulting text; in other contexts it makes more sense to keep everything as bytes. It also makes sense to have an interface library to deal with a specific protocol that treats the protocol side as bytes but interacts with the application using text, since that is often how the application programmer wants to treat it anyway. Of course, some protocols require the application programmer to be aware of bytes as well in *some* cases -- examples are email and HTTP which can be used to transfer text as well as binary data (e.g. images). There is also the bootstrap problem where the wire data must be partially parsed in order to find out the encoding to be used to convert it to text. But that doesn't mean it's invalid to think about it as text in many application contexts. Regarding the proposal of a String ABC, I hope this isn't going to become a backdoor to reintroduce the Python 2 madness of allowing equivalency between text and bytes for *some* strings of bytes and not others. Finally, I do think that we should not introduce changes to the fundamental behavior of text and bytes while the moratorium is in place. Changes to specific stdlib APIs are fine however. --Guido On Thu, Jun 24, 2010 at 12:49 PM, Ian Bicking wrote: > On Thu, Jun 24, 2010 at 12:38 PM, Bill Janssen wrote: >> >> Here are a couple of ideas I'm taking away from the bytes/string >> discussion. >> >> First, it would probably be a good idea to have a String ABC. >> >> Secondly, maybe the string situation in 2.x wasn't as broken as we >> thought it was. ?In particular, those who deal with lots of encoded >> strings seemed to find it handy, and miss it in 3.x. ?Perhaps strings >> are more like numbers than we think. ?We have separate types for int, >> float, Decimal, etc. ?But they're all numbers, and they all >> cross-operate. ?In 2.x, it seems there were two missing features: no >> encoding attribute on str, which should have been there and should have >> been required, and the default encoding being "ASCII" (I can't tell you >> how many times I've had to fix that issue when a non-ASCII encoded str >> was passed to some output function). > > I've started to form a conceptual notion that I think fits these cases. > > We've setup a system where we think of text as natively unicode, with > encodings to put that unicode into a byte form.? This is certainly > appropriate in a lot of cases.? But there's a significant class of problems > where bytes are the native structure.? Network protocols are what we've been > discussing, and are a notable case of that.? That is, b'/' is the most > native sense of a path separator in a URL, or b':' is the most native sense > of what separates a header name from a header value in HTTP.? To disallow > unicode URLs or unicode HTTP headers would be rather anti-social, especially > because unicode is now the "native" string type in Python 3 (as an aside for > the WSGI spec we've been talking about using "native" strings in some > positions like dictionary keys, meaning Python 2 str and Python 3 str, while > being more exacting in other areas such as a response body which would > always be bytes). > > The HTTP spec and other network protocols seems a little fuzzy on this, > because it was written before unicode even existed, and even later activity > happened at a point when "unicode" and "text" weren't widely considered the > same thing like they are now.? But I think the original intention is > revealed in a more modern specification like WebSockets, where they are very > explicit that ':' is just shorthand for a particular byte, it is not "text" > in our new modern notion of the term. > > So with this idea in mind it makes more sense to me that *specific pieces of > text* can be reasonably treated as both bytes and text.? All the string > literals in urllib.parse.urlunspit() for example. > > The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not > become special('/x')) and special('/')+x=='/x' (again it becomes str).? This > avoids some of the cases of unicode or str infecting a system as they did in > Python 2 (where you might pass in unicode and everything works fine until > some non-ASCII is introduced). > > The one place where this might be tricky is if you have an encoding that is > not ASCII compatible.? But we can't guard against every possibility.? So it > would be entirely wrong to take a string encoded with UTF-16 and start to > use b'/' with it.? But there are other nonsensical combinations already > possible, especially with polymorphic functions, we can't guard against all > of them.? Also I'm unsure if something like UTF-16 is in any way compatible > with the kind of legacy systems that use bytes.? Can you encode your > filesystem with UTF-16?? I don't think you could encode a cookie with it. > >> So maybe having a second string type in 3.x that consists of an encoded >> sequence of bytes plus the encoding, call it "estr", wouldn't have been >> a bad idea. ?It would probably have made sense to have estr cooperate >> with the str type, in the same way that two different kinds of numbers >> cooperate, "promoting" the result of an operation only when necessary. >> This would automatically achieve the kind of polymorphic functionality >> that Guido is suggesting, but without losing the ability to do >> >> ?x = e(ASCII)"bar" >> ?a = ''.join("foo", x) >> >> (or whatever the syntax for such an encoded string literal would be -- >> I'm not claiming this is a good one) which presume would bind "a" to a >> Unicode string "foobar" -- have to work out what gets promoted to what. > > I would be entirely happy without a literal syntax.? But as Phillip has > noted, this can't be implemented *entirely* in a library as there are some > constraints with the current str/bytes implementations.? Reading PEP 3003 > I'm not clear if such changes are part of the moratorium?? They seem like > they would be (sadly), but it doesn't seem clearly noted. > > I think there's a *different* use case for things like > bytes-in-a-utf8-encoding (e.g., to allow XML data to be decoded lazily), but > that could be yet another class, and maybe shouldn't be polymorphicly usable > as bytes (i.e., treat it as an optimized str representation that is > otherwise semantically equivalent).? A String ABC would formalize these > things. > > -- > Ian Bicking ?| ?http://blog.ianbicking.org > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > > -- --Guido van Rossum (python.org/~guido) From brett at python.org Thu Jun 24 23:08:14 2010 From: brett at python.org (Brett Cannon) Date: Thu, 24 Jun 2010 14:08:14 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23A98E.4080303@netwok.org> References: <20100624115048.4fd152e3@heresy> <4C23A98E.4080303@netwok.org> Message-ID: On Thu, Jun 24, 2010 at 11:53, ?ric Araujo wrote: > Le 24/06/2010 19:48, Brett Cannon a ?crit : >> P.S.: I wish we could drop use of the 'module.so' variant at the same >> time, for consistency sake and to cut out a stat call, but I know that >> is asking too much. > > At least, looking for spam/__init__module.so could be avoided. It seems > to me that the package definition does not allow that. I thought no one had bothered to change import.c to allow for extension modules to act as a package's __init__? As for not being allowed, I don't agree with that assessment. If you treat a package's __init__ module as simply that, a module that would be named __init__ when imported, then __init__module.c would be valid (and that's what importlib does). > The tradeoff > would be code complication for one less stat call. Worth a bug report? Nah. From barry at python.org Thu Jun 24 23:09:44 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 17:09:44 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624170944.7e68ad21@heresy> On Jun 24, 2010, at 11:05 AM, Daniel Stutzbach wrote: >On Thu, Jun 24, 2010 at 10:50 AM, Barry Warsaw wrote: > >> The idea is to put the Python version number in the shared library file >> name, >> and extend .so lookup to find these extended file names. So for example, >> we'd >> see foo.3.2.so instead, and Python would know how to dynload both that and >> the >> traditional foo.so file too (for backward compatibility). >> > >What use case does this address? Specifically, it's the use case where we (Debian/Ubuntu) plan on installing all Python 3.x packages into /usr/lib/python3/dist-packages. As of PEP 3147, we can do that without collisions on the pyc files, but would still have to symlink for extension module .so files, because they are always named foo.so and Python 3.2's foo.so won't (modulo PEP 384) be compatible with Python 3.3's foo.so. So using the same trick as in PEP 3147, if we can name Python 3.2's foo extension differently than the incompatible Python 3.3's foo extension, we can have them live in the same directory without symlink tricks. >PEP 3147 addresses the fact that the user may have different versions of >Python installed and each wants to write a .pyc file when loading a module. > .so files are not generated simply by running the Python interpreter, ergo >.so files are not an issue for that use case. See above. It doesn't matter whether the pyc or so is created at run time by the user or by the distro build system. If the files for different Python versions end up in the same directory, they must be named differently too. >If you want to make it so a system can install a package in just one >location to be used by multiple Python installations, then the version >number isn't enough. You also need to distinguish debug builds, profiling >builds, Unicode width (see issue8654), and probably several other >./configure options. This is a good point, but more easily addressed. Let's say a distro makes three Python 3.2 variants available, one "normal" build, a debug build, and UCS2 and USC4 versions of the above. All we need to do is choose a different .so ABI tag (see previous follow) for each of those builds. My updated patch (coming soon) allows you to define that tag to configure. So e.g. Normal build UCSX: SOABI=cpython-32 ./configure Debug build UCSX: SOABI=cpython-32-d ./configure Normal build UCSY: SOABI=cpython-32-w ./configure Debug build UCSY: SOABI=cpython-32-dw ./configure Mix and match for any other build options you care about. Because the distro controls how Python is configured, this should be fairly easy to achieve. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From fdrake at acm.org Thu Jun 24 23:12:21 2010 From: fdrake at acm.org (Fred Drake) Date: Thu, 24 Jun 2010 17:12:21 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624165533.46a5fb8e@heresy> References: <20100624115048.4fd152e3@heresy> <20100624165533.46a5fb8e@heresy> Message-ID: On Thu, Jun 24, 2010 at 4:55 PM, Barry Warsaw wrote: > Which is probably another reason not to use foo.so.X.Y for Python extension > modules. Clearly, foo.so.3.2 is the man page for the foo.so.3 system call. The ABI ident definitely has to be elsewhere. -Fred -- Fred L. Drake, Jr. "A storm broke loose in my mind." --Albert Einstein From barry at python.org Thu Jun 24 23:23:02 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 17:23:02 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23A901.7060100@netwok.org> References: <20100624115048.4fd152e3@heresy> <4C23A901.7060100@netwok.org> Message-ID: <20100624172302.024687ef@heresy> On Jun 24, 2010, at 08:50 PM, ?ric Araujo wrote: >Le 24/06/2010 17:50, Barry Warsaw (FLUFL) a ?crit : >> Other possible approaches: >> * Extend the distutils API so that the .so file extension can be passed in, >> instead of being essentially hardcoded to what Python's Makefile contains. > >Third-party code rely on Distutils internal quirks, so it?s frozen. Feel >free to open a bug against Distutils2 on the Python tracker if that >would be generally useful. Depending on how strict this constraint is, it could make things more difficult. I can control what shared library file names Python will load statically, but in order to support PEP 384 I think I need to be able to control what file extensions build_ext writes. My updated patch does this in a backward compatible way. Of course, distutils hacks have their tentacles all up in the distutils internals, so maybe my patch will break something after all. I can think of a few even hackier ways to work around that if necessary. My updated patch: * Adds an optional argument to build_ext.get_ext_fullpath() and build_ext.get_ext_filename(). This extra argument is the Extension instance being built. (Boy, just in case anyone's already playing with the time machine, it sure would have been nice if these methods had originally just taken the Extension instance and dug out ext.name instead of passing the string in.) * Adds an optional new keyword argument to the Extension class, called so_abi_tag. If given, this overrides the Makefile $SO variable extension. What this means is that with no changes, a non-PEP 384 compliant extension module wouldn't have to change anything: setup( name='stupid', version='0.0', packages=['stupid', 'stupid.tests'], ext_modules=[Extension('_stupid', ['src/stupid.c'], )], test_suite='stupid.tests', ) With a Python built like so: % SOABI=cpython-32 ./configure you'd end up with a _stupid.cpython-32.so module. However, if you knew your extension module was PEP 384 compliant, and could be shared on >=Python 3.2, you would do: setup( name='stupid', version='0.0', packages=['stupid', 'stupid.tests'], ext_modules=[Extension('_stupid', ['src/stupid.c'], so_abi_tag='', )], test_suite='stupid.tests', ) and now you'd end up with _stupid.so, which I propose to mean it's PEP 384 ABI compliant. (There may not be any other use case than so_abi_tag='' or so_abi_tag=None, in which case, the Extension keyword *might* be better off as a boolean.) Now of course PEP 384 isn't implemented, so it's a bit of a moot point. But if some form of versioned .so file naming is accepted for Python 3.2, I'll update PEP 384 with possible solutions. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jun 24 23:27:00 2010 From: barry at python.org (Barry Warsaw) Date: Thu, 24 Jun 2010 17:27:00 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: <20100624172700.0b837222@heresy> On Jun 24, 2010, at 11:50 AM, Barry Warsaw wrote: >Please let me know what you think. I'm happy to just commit this to the py3k >branch if there are no objections . I don't think a new PEP is in >order, but an update to PEP 3147 might make sense. Thanks for all the quick feedback. I've made some changes based on the comments so far. The bzr branch is updated, and a new patch is available here: http://pastebin.ubuntu.com/454688/ If reception continues to be mildly approving, I'll open an issue on bugs.python.org and attach the patch to that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From merwok at netwok.org Thu Jun 24 23:37:10 2010 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 24 Jun 2010 23:37:10 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624172302.024687ef@heresy> References: <20100624115048.4fd152e3@heresy> <4C23A901.7060100@netwok.org> <20100624172302.024687ef@heresy> Message-ID: <4C23D006.6080800@netwok.org> Your plan seems good. Adding keyword arguments should not create compatibility issues, and I suspect the impact on the code of build_ext may be actually quite small. I?ll try to review your patch even though I don?t know C or compiler oddities, but Tarek will have the best insight and the final word. In case the time machine?s not available, your suggestion about getting the filename from the Extension instance instead of passing in a string can most certainly land in distutils2. Regards From ianb at colorstudy.com Thu Jun 24 23:44:12 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Thu, 24 Jun 2010 16:44:12 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: On Thu, Jun 24, 2010 at 3:59 PM, Guido van Rossum wrote: > The protocol specs typically go out of their way to specify what byte > values they use for syntactically significant positions (e.g. ':' in > headers, or '/' in URLs), while hand-waving about the meaning of "what > goes in between" since it is all typically treated as "not of > syntactic significance". So you can write a parser that looks at bytes > exclusively, and looks for a bunch of ASCII punctuation characters > (e.g. '<', '>', '/', '&'), and doesn't know or care whether the stuff > in between is encoded in Latin-15, MacRoman or UTF-8 -- it never looks > "inside" stretches of characters between the special characters and > just copies them. (Sometimes there may be *some* sections that are > required to be ASCII and there equivalence of a-z and A-Z is well > defined.) > Yes, these are the specific characters that I think we can handle specially. For instance, the list of all string literals used by urlsplit and urlunsplit: '//' '/' ':' '?' '#' '' 'http' A list of all valid scheme characters (a-z etc) Some lists for scheme-specific parsing (which all contain valid scheme characters) All of these are constrained to ASCII, and must be constrained to ASCII, and everything else in a URL is treated as basically opaque. So if we turned these characters into byte-or-str objects I think we'd basically be true to the intent of the specs, and in a practical sense we'd be able to make these functions polymorphic. I suspect this same pattern will be present most places where people want polymorphic behavior. For now we could do something incomplete and just avoid using operators we can't overload (is it possible to at least make them produce a readable exception?) I think we'll avoid a lot of the confusion that was present with Python 2 by not making the coercions transitive. For instance, here's something that would work in Python 2: urlunsplit(('http', 'example.com', '/foo', u'bar=baz', '')) And you'd get out a unicode string, except that would break the first time that query string (u'bar=baz') was not ASCII (but not until then!) Here's the urlunsplit code: def urlunsplit(components): scheme, netloc, url, query, fragment = components if netloc or (scheme and scheme in uses_netloc and url[:2] != '//'): if url and url[:1] != '/': url = '/' + url url = '//' + (netloc or '') + url if scheme: url = scheme + ':' + url if query: url = url + '?' + query if fragment: url = url + '#' + fragment return url If all those literals were this new special kind of string, if you call: urlunsplit((b'http', b'example.com', b'/foo', 'bar=baz', b'')) You'd end up constructing the URL b'http://example.com/foo' and then running: url = url + special('?') + query And that would fail because b'http://example.com/foo' + special('?') would be b'http://example.com/foo?' and you cannot add that to the str 'bar=baz'. So we'd be avoiding the Python 2 craziness. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Jun 24 23:50:56 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 24 Jun 2010 23:50:56 +0200 Subject: [Python-Dev] thoughts on the bytes/string discussion References: <11597.1277401099@parc.com> <4C23ACFD.6040506@voidspace.org.uk> Message-ID: <20100624235056.5a9930e6@pitrou.net> On Thu, 24 Jun 2010 20:07:41 +0100 Michael Foord wrote: > > Although it would require changes for builtin types like file to work > with a new string ABC, right? There is no builtin file type in 3.x. Besides, it is not an ABC-level problem; the IO layer is written in C (although there's still the Python implementation to play with), which would mandate an abstract C API to access unicode-like objects (similarly as there's already the buffer API to access bytes-like objects). Regards Antoine. From scott+python-dev at scottdial.com Thu Jun 24 23:53:06 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Thu, 24 Jun 2010 17:53:06 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624170944.7e68ad21@heresy> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> Message-ID: <4C23D3C2.1060500@scottdial.com> On 6/24/2010 5:09 PM, Barry Warsaw wrote: >> What use case does this address? > > Specifically, it's the use case where we (Debian/Ubuntu) plan on installing > all Python 3.x packages into /usr/lib/python3/dist-packages. As of PEP 3147, > we can do that without collisions on the pyc files, but would still have to > symlink for extension module .so files, because they are always named foo.so > and Python 3.2's foo.so won't (modulo PEP 384) be compatible with Python 3.3's > foo.so. If the package has .so files that aren't compatible with other version of python, then what is the motivation for placing that in a shared location (since it can't actually be shared)? > So using the same trick as in PEP 3147, if we can name Python 3.2's foo > extension differently than the incompatible Python 3.3's foo extension, we can > have them live in the same directory without symlink tricks. Why would a symlink trick even be necessary if there is a version-unspecific directory and a version-specific directory on the search path? >> PEP 3147 addresses the fact that the user may have different versions of >> Python installed and each wants to write a .pyc file when loading a module. >> .so files are not generated simply by running the Python interpreter, ergo >> .so files are not an issue for that use case. > > See above. It doesn't matter whether the pyc or so is created at run time by > the user or by the distro build system. If the files for different Python > versions end up in the same directory, they must be named differently too. But the only motivation for doing this with .pyc files is that the .py files are able to be shared, since the .pyc is an on-demand-generated, version-specific artifact (and not the source). The .so file is created offline by another toolchain, is version-specific, and presumably you are not suggesting that Python generate it on-demand. > >> If you want to make it so a system can install a package in just one >> location to be used by multiple Python installations, then the version >> number isn't enough. You also need to distinguish debug builds, profiling >> builds, Unicode width (see issue8654), and probably several other >> ./configure options. > > This is a good point, but more easily addressed. Let's say a distro makes > three Python 3.2 variants available, one "normal" build, a debug build, and > UCS2 and USC4 versions of the above. All we need to do is choose a different > .so ABI tag (see previous follow) for each of those builds. My updated patch > (coming soon) allows you to define that tag to configure. So e.g. Why is this use case not already addressed by having independent directories? And why is there an incentive to co-mingle these version-punned files with version-agnostic ones? > Mix and match for any other build options you care about. Because the distro > controls how Python is configured, this should be fairly easy to achieve. For packages that have .so files, won't the distro already have to build multiple copies of that package for all version of Python? So, why can't it place them in separate directories that are version-specific at that time? This is not the same as placing .py files that are version-agnostic into a version-agnostic location. -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From tjreedy at udel.edu Fri Jun 25 00:00:30 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 24 Jun 2010 18:00:30 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <11597.1277401099@parc.com> References: <11597.1277401099@parc.com> Message-ID: On 6/24/2010 1:38 PM, Bill Janssen wrote: > > Secondly, maybe the string situation in 2.x wasn't as broken as we > thought it was. In particular, those who deal with lots of encoded > strings seemed to find it handy, and miss it in 3.x. Perhaps strings > are more like numbers than we think. We have separate types for int, > float, Decimal, etc. But they're all numbers, and they all > cross-operate. No they do not. Decimal only mixes properly with ints, but not with anything else, sometime with surprising and havoc-creating ways: >>> Decimal(0) == float(0) False I believe that and other comparisons may be fixed in 3.2, but I know there was lots of discussion of whether float + decimal should return a float or decimal, with good arguments both ways. To put it another way, there are potential problems with either choice. Automatic mixed-mode arithmetic is not always a slam-dunk, no-problem choise. That aside, there are a couple of places where I think the comparison breaks down. If one adds a thousand ints and then a float, there is only the final number to convert. If one adds a thousand bytes and then a unicode, there is the concantenation of the thousand bytes to convert. Or short the result be the concatenation of a thousand unicode conversions. This brings up the distributivity (or not) of conversion over summation. In general, float(i) + float(j) = float(i+j), for i,j ints. I an not sure the same is true if i,j are bytes with some encoding and the conversion is unicode. Does it depend on the encoding? -- Terry Jan Reedy From ncoghlan at gmail.com Fri Jun 25 00:01:38 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 08:01:38 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100624170856.0853D3A4099@sparrow.telecommunity.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> <20100624170856.0853D3A4099@sparrow.telecommunity.com> Message-ID: On Fri, Jun 25, 2010 at 3:07 AM, P.J. Eby wrote: > (Btw, in some earlier emails, Stephen, you implied that this could be fixed > with codecs -- but it can't, because the problem isn't with the bytes > containing invalid Unicode, it's with the Unicode containing invalid bytes > -- i.e., characters that can't be encoded to the ultimate codec target.) That's what the surrogateescape error handler is for though - it will happily accept mojibake on input (putting invalid bytes into the PUA), and happily generate mojibake on output (recreating the invalid bytes from the PUA) as well. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Fri Jun 25 00:01:46 2010 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jun 2010 15:01:46 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: On Thu, Jun 24, 2010 at 2:44 PM, Ian Bicking wrote: > I think we'll avoid a lot of the confusion that was present with Python 2 by > not making the coercions transitive.? For instance, here's something that > would work in Python 2: > > ? urlunsplit(('http', 'example.com', '/foo', u'bar=baz', '')) > > And you'd get out a unicode string, except that would break the first time > that query string (u'bar=baz') was not ASCII (but not until then!) Actually, that wouldn't be a problem. The problem would be this: urlunsplit(('http', 'example.com', u'/foo', 'bar=baz', '')) (I moved the "u" prefix from bar=baz to /foo.) And this would break when instead of baz there was some non-ASCII UTF-8, e.g. urlunsplit(('http', 'example.com', u'/foo', 'bar=\xe1\x88\xb4', '')) -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Fri Jun 25 00:15:02 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 08:15:02 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 1:41 AM, Guido van Rossum wrote: > I don't think we should abuse sum for this. A simple idiom to get the > *empty* string of a particular type is x[:0] so you could write > something like this to concatenate a list or strings or bytes: > xs[:0].join(xs). Note that if xs is empty we wouldn't know what to do > anyway so this should be disallowed. That's a good trick, although there's a "[0]" missing from your join example ("type(xs[0])()" is another way to spell the same idea, but the subscripting version would likely be faster since it skips the builtin lookup). Promoting that over explicit use of empty str and bytes literals is probably step 1 in eliminating gratuitous breakage of bytes/str polymorphism (this trick also has the benefit of working with non-builtin character sequence types). Use of non-empty bytes/str literals is going to be harder to handle - actually trying to apply a polymorphic philosophy to the Python 3 URL parsing libraries may be a good way to learn more on that front. Cheers, Nick. P.S. I'm off to Sydney for PyconAU this evening, so I'm not sure how much time I'll get to follow python-dev until next week. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tjreedy at udel.edu Fri Jun 25 00:20:52 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 24 Jun 2010 18:20:52 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: On 6/24/2010 4:59 PM, Guido van Rossum wrote: > But I wouldn't go so far as to claim that interpreting the protocols > as text is wrong. After all we're talking exclusively about protocols > that are designed intentionally to be directly "human readable" I agree that the claim "':' is just a byte" is a bit shortsighted. If the designers of the protocols had intended to use uninterpreted bytes as protocol markers, they could and I suspect would have used unused control codes, of which there are several. Then there would have been no need for escape mechanisms to put things like :<> into content text. I am very sure that the reason for specifying *ascii* byte values was to be crysal clear as to what *character* was meant and to *exclude* use on the internet of the main imcompatible competitor encoding -- IBM's EBCDIC -- which IBM used in all of *its* networks. Until the IBM PC came out in the early 1980s (and IBM originally saw that as a minor sideline and something of a toy), there was a battle over byte encodings between IBM and everyone else. -- Terry Jan Reedy From mal at egenix.com Fri Jun 25 00:35:05 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 25 Jun 2010 00:35:05 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D3C2.1060500@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <4C23DD99.9050604@egenix.com> Scott Dial wrote: > On 6/24/2010 5:09 PM, Barry Warsaw wrote: >>> What use case does this address? >> >>> If you want to make it so a system can install a package in just one >>> location to be used by multiple Python installations, then the version >>> number isn't enough. You also need to distinguish debug builds, profiling >>> builds, Unicode width (see issue8654), and probably several other >>> ./configure options. >> >> This is a good point, but more easily addressed. Let's say a distro makes >> three Python 3.2 variants available, one "normal" build, a debug build, and >> UCS2 and USC4 versions of the above. All we need to do is choose a different >> .so ABI tag (see previous follow) for each of those builds. My updated patch >> (coming soon) allows you to define that tag to configure. So e.g. > > Why is this use case not already addressed by having independent > directories? And why is there an incentive to co-mingle these > version-punned files with version-agnostic ones? I don't think this is a good idea. After a while your Python lib directories would need some serious dusting off to make them maintainable again. Disk space is cheap so setting up dedicated directories for each variant will result in a much easier to manage installation. If you want a really clever setup, use hard links between those directory (you can also use symlinks if you like). Then a change in one Python file will automatically propagate to all other variant dirs without any maintenance effort. Together with PYTHONHOME this makes a really nice virtualenv-like environment. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 23 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Fri Jun 25 00:35:07 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 08:35:07 +1000 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624115048.4fd152e3@heresy> References: <20100624115048.4fd152e3@heresy> Message-ID: On Fri, Jun 25, 2010 at 1:50 AM, Barry Warsaw wrote: > Please let me know what you think. ?I'm happy to just commit this to the py3k > branch if there are no objections . ?I don't think a new PEP is in > order, but an update to PEP 3147 might make sense. I like the idea, but I think summarising the rest of this discussion in its own (relatively short) PEP would be good (there are a few things that are tricky - exact versioning scheme, PEP 384 forward compatibility, impact on distutils, articulating the benefits for distro packaging, etc). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at thorne.id.au Fri Jun 25 01:28:21 2010 From: stephen at thorne.id.au (Stephen Thorne) Date: Fri, 25 Jun 2010 09:28:21 +1000 Subject: [Python-Dev] "2 or 3" link on python.org Message-ID: <20100624232821.GB10805@thorne.id.au> Steve Holden Wrote: > Given the amount of interest this thread has generated I can't help > wondering why it isn't more prominent in python.org content. Is the > developer community completely disjoint with the web content editor > community? > > If there is such a disconnect we should think about remedying it: a > large "Python 2 or 3?" button could link to a reasoned discussion of the > pros and cons as evinced in this thread. That way people will end up > with the right version more often (and be writing Python 2 that will > more easily migrate to Python 3, if they cannot yet use 3). > > There seems to be a perception that the PSF can help fund developments, > and indeed Jesse Noller has made a small start with his sprint funding > proposal (which now has some funding behind it). I think if it is to do > so the Foundation will have to look for substantial new funding. I do > not currently understand where this funding would come from, and would > like to tap your developer creativity in helping to define how the > Foundation can effectively commit more developer time to Python. > > GSoC and GHOP are great examples, but there is plenty of room for all > sorts of initiatives that result in development opportunities. I'd like > to help. I am extremely keen for this to happen. Does anyone have ownership of this project? There was some discussion of it up-list but the discussion fizzled. -- Regards, Stephen Thorne Development Engineer Netbox Blue From martin at v.loewis.de Fri Jun 25 02:00:45 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 02:00:45 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100624232821.GB10805@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> Message-ID: <4C23F1AD.9040809@v.loewis.de> Am 25.06.2010 01:28, schrieb Stephen Thorne: > Steve Holden Wrote: >> Given the amount of interest this thread has generated I can't help >> wondering why it isn't more prominent in python.org content. Is the >> developer community completely disjoint with the web content editor >> community? >> >> If there is such a disconnect we should think about remedying it: a >> large "Python 2 or 3?" button could link to a reasoned discussion of the >> pros and cons as evinced in this thread. That way people will end up >> with the right version more often (and be writing Python 2 that will >> more easily migrate to Python 3, if they cannot yet use 3). >> >> There seems to be a perception that the PSF can help fund developments, >> and indeed Jesse Noller has made a small start with his sprint funding >> proposal (which now has some funding behind it). I think if it is to do >> so the Foundation will have to look for substantial new funding. I do >> not currently understand where this funding would come from, and would >> like to tap your developer creativity in helping to define how the >> Foundation can effectively commit more developer time to Python. >> >> GSoC and GHOP are great examples, but there is plenty of room for all >> sorts of initiatives that result in development opportunities. I'd like >> to help. > > I am extremely keen for this to happen. Does anyone have ownership of this > project? There was some discussion of it up-list but the discussion fizzled. Can you please explain what "this project" is, in the context of your message? GSoC? GHOP? Regards, Martin From foom at fuhm.net Fri Jun 25 02:23:51 2010 From: foom at fuhm.net (James Y Knight) Date: Thu, 24 Jun 2010 20:23:51 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D3C2.1060500@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: > On 6/24/2010 5:09 PM, Barry Warsaw wrote: >>> What use case does this address? >> >> Specifically, it's the use case where we (Debian/Ubuntu) plan on >> installing >> all Python 3.x packages into /usr/lib/python3/dist-packages. As of >> PEP 3147, >> we can do that without collisions on the pyc files, but would still >> have to >> symlink for extension module .so files, because they are always >> named foo.so >> and Python 3.2's foo.so won't (modulo PEP 384) be compatible with >> Python 3.3's >> foo.so. > > If the package has .so files that aren't compatible with other version > of python, then what is the motivation for placing that in a shared > location (since it can't actually be shared) Because python looks for .so files in the same place it looks for the .py files of the same package. E.g., given a module like lxml, it contains the following files (among others): lxml/ lxml/__init__.py lxml/__init__.pyc lxml/builder.py lxml/builder.pyc lxml/etree.so And you can only put it in one place. Really, python should store the .py files in /usr/share/python/, the .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc files in /var/lib/python2.5- debug. But python doesn't work like that. James From stephen at thorne.id.au Fri Jun 25 02:31:49 2010 From: stephen at thorne.id.au (Stephen Thorne) Date: Fri, 25 Jun 2010 10:31:49 +1000 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C23F1AD.9040809@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> Message-ID: <20100625003149.GA16084@thorne.id.au> On 2010-06-25, "Martin v. L?wis" wrote: > Am 25.06.2010 01:28, schrieb Stephen Thorne: > > Steve Holden Wrote: > >> Given the amount of interest this thread has generated I can't help > >> wondering why it isn't more prominent in python.org content. Is the > >> developer community completely disjoint with the web content editor > >> community? > >> > >> If there is such a disconnect we should think about remedying it: a > >> large "Python 2 or 3?" button could link to a reasoned discussion of the > >> pros and cons as evinced in this thread. That way people will end up > >> with the right version more often (and be writing Python 2 that will > >> more easily migrate to Python 3, if they cannot yet use 3). > >> > >> There seems to be a perception that the PSF can help fund developments, > >> and indeed Jesse Noller has made a small start with his sprint funding > >> proposal (which now has some funding behind it). I think if it is to do > >> so the Foundation will have to look for substantial new funding. I do > >> not currently understand where this funding would come from, and would > >> like to tap your developer creativity in helping to define how the > >> Foundation can effectively commit more developer time to Python. > >> > >> GSoC and GHOP are great examples, but there is plenty of room for all > >> sorts of initiatives that result in development opportunities. I'd like > >> to help. > > > > I am extremely keen for this to happen. Does anyone have ownership of this > > project? There was some discussion of it up-list but the discussion fizzled. > > Can you please explain what "this project" is, in the context of your > message? GSoC? GHOP? Oh, I thought this was quite clear. I was specifically meaning the large "Python 2 or 3" button on python.org. It would help users who want to know what version of python to use if they had a clear guide as to what version to download. It doesn't help if someone goes to do greenfield development in python if a library they depend upon has yet to be ported, and they're trying to use python 3. (As an addendum add pygtk to the list of libs that python 3 users on #python are alarmed to find haven't been ported yet) -- Regards, Stephen Thorne Development Engineer Netbox Blue From healey.rich at gmail.com Fri Jun 25 02:51:18 2010 From: healey.rich at gmail.com (Rich Healey) Date: Fri, 25 Jun 2010 10:51:18 +1000 Subject: [Python-Dev] docs - Copy Message-ID: http://docs.python.org/library/copy.html Just near the bottom it reads: """Shallow copies of dictionaries can be made using?dict.copy(), and of lists by assigning a slice of the entire list, for example, copied_list?=?original_list[:].""" Surely this is a typo? To my understanding, copied_list = original_list[:] gives you a clean copy (slicing returns a new object....) Can this be updated? Or someone explain to me why it's correct? Cheers Example: >>> t = [1, 2, 3] >>> y = t >>> u = t[:] >>> y[1] = "rawr" >>> t [1, 'rawr', 3] >>> u [1, 2, 3] >>> From ben+python at benfinney.id.au Fri Jun 25 02:54:30 2010 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 25 Jun 2010 10:54:30 +1000 Subject: [Python-Dev] FHS compliance of Python installation (was: versioned .so files for Python 3.2) References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <876318lynt.fsf_-_@benfinney.id.au> James Y Knight writes: > Really, python should store the .py files in /usr/share/python/, the > .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc > files in /var/lib/python2.5- debug. But python doesn't work like that. +1 So who's going to draft the ?Filesystem Hierarchy Standard compliance? PEP? :-) -- \ ?Having sex with Rachel is like going to a concert. She yells a | `\ lot, and throws frisbees around the room; and when she wants | _o__) more, she lights a match.? ?Steven Wright | Ben Finney From steve at holdenweb.com Fri Jun 25 02:58:41 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 24 Jun 2010 20:58:41 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100625003149.GA16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: <4C23FF41.5020006@holdenweb.com> Stephen Thorne wrote: > On 2010-06-25, "Martin v. L?wis" wrote: >> Am 25.06.2010 01:28, schrieb Stephen Thorne: >>> Steve Holden Wrote: >>>> Given the amount of interest this thread has generated I can't help >>>> wondering why it isn't more prominent in python.org content. Is the >>>> developer community completely disjoint with the web content editor >>>> community? >>>> >>>> If there is such a disconnect we should think about remedying it: a >>>> large "Python 2 or 3?" button could link to a reasoned discussion of the >>>> pros and cons as evinced in this thread. That way people will end up >>>> with the right version more often (and be writing Python 2 that will >>>> more easily migrate to Python 3, if they cannot yet use 3). >>>> >>>> There seems to be a perception that the PSF can help fund developments, >>>> and indeed Jesse Noller has made a small start with his sprint funding >>>> proposal (which now has some funding behind it). I think if it is to do >>>> so the Foundation will have to look for substantial new funding. I do >>>> not currently understand where this funding would come from, and would >>>> like to tap your developer creativity in helping to define how the >>>> Foundation can effectively commit more developer time to Python. >>>> >>>> GSoC and GHOP are great examples, but there is plenty of room for all >>>> sorts of initiatives that result in development opportunities. I'd like >>>> to help. >>> I am extremely keen for this to happen. Does anyone have ownership of this >>> project? There was some discussion of it up-list but the discussion fizzled. >> Can you please explain what "this project" is, in the context of your >> message? GSoC? GHOP? > > Oh, I thought this was quite clear. I was specifically meaning the large > "Python 2 or 3" button on python.org. It would help users who want to know > what version of python to use if they had a clear guide as to what version > to download. > > It doesn't help if someone goes to do greenfield development in python > if a library they depend upon has yet to be ported, and they're trying to > use python 3. > > (As an addendum add pygtk to the list of libs that python 3 users on #python > are alarmed to find haven't been ported yet) > This topic really needs to go to the pydotorg list, as the guys there maintain the site content. I know that Michael Foord is on both lists, so he may be a good candidate for leading the charge, so to speak. This topic is likely to assume increasing importance. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Fri Jun 25 02:58:41 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 24 Jun 2010 20:58:41 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100625003149.GA16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: <4C23FF41.5020006@holdenweb.com> Stephen Thorne wrote: > On 2010-06-25, "Martin v. L?wis" wrote: >> Am 25.06.2010 01:28, schrieb Stephen Thorne: >>> Steve Holden Wrote: >>>> Given the amount of interest this thread has generated I can't help >>>> wondering why it isn't more prominent in python.org content. Is the >>>> developer community completely disjoint with the web content editor >>>> community? >>>> >>>> If there is such a disconnect we should think about remedying it: a >>>> large "Python 2 or 3?" button could link to a reasoned discussion of the >>>> pros and cons as evinced in this thread. That way people will end up >>>> with the right version more often (and be writing Python 2 that will >>>> more easily migrate to Python 3, if they cannot yet use 3). >>>> >>>> There seems to be a perception that the PSF can help fund developments, >>>> and indeed Jesse Noller has made a small start with his sprint funding >>>> proposal (which now has some funding behind it). I think if it is to do >>>> so the Foundation will have to look for substantial new funding. I do >>>> not currently understand where this funding would come from, and would >>>> like to tap your developer creativity in helping to define how the >>>> Foundation can effectively commit more developer time to Python. >>>> >>>> GSoC and GHOP are great examples, but there is plenty of room for all >>>> sorts of initiatives that result in development opportunities. I'd like >>>> to help. >>> I am extremely keen for this to happen. Does anyone have ownership of this >>> project? There was some discussion of it up-list but the discussion fizzled. >> Can you please explain what "this project" is, in the context of your >> message? GSoC? GHOP? > > Oh, I thought this was quite clear. I was specifically meaning the large > "Python 2 or 3" button on python.org. It would help users who want to know > what version of python to use if they had a clear guide as to what version > to download. > > It doesn't help if someone goes to do greenfield development in python > if a library they depend upon has yet to be ported, and they're trying to > use python 3. > > (As an addendum add pygtk to the list of libs that python 3 users on #python > are alarmed to find haven't been ported yet) > This topic really needs to go to the pydotorg list, as the guys there maintain the site content. I know that Michael Foord is on both lists, so he may be a good candidate for leading the charge, so to speak. This topic is likely to assume increasing importance. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Fri Jun 25 03:04:03 2010 From: steve at holdenweb.com (Steve Holden) Date: Thu, 24 Jun 2010 21:04:03 -0400 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: Rich Healey wrote: > http://docs.python.org/library/copy.html > > Just near the bottom it reads: > > """Shallow copies of dictionaries can be made using dict.copy(), and > of lists by assigning a slice of the entire list, for example, > copied_list = original_list[:].""" > > > Surely this is a typo? To my understanding, copied_list = > original_list[:] gives you a clean copy (slicing returns a new > object....) > Yes, but it's a shallow copy: the new object references exactly the same objects as the original list (not copies of those objects). A deep copy would need to copy any referenced lists, and so on. > Can this be updated? Or someone explain to me why it's correct? > It sounds correct to me. regards Steve > Cheers > > Example: > > >>>> t = [1, 2, 3] >>>> y = t >>>> u = t[:] >>>> y[1] = "rawr" >>>> t > [1, 'rawr', 3] >>>> u > [1, 2, 3] -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From alexander.belopolsky at gmail.com Fri Jun 25 03:05:09 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 24 Jun 2010 21:05:09 -0400 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: On Thu, Jun 24, 2010 at 8:51 PM, Rich Healey wrote: > http://docs.python.org/library/copy.html > > Just near the bottom it reads: > > """Shallow copies of dictionaries can be made using?dict.copy(), and > of lists by assigning a slice of the entire list, for example, > copied_list?=?original_list[:].""" > > > Surely this is a typo? To my understanding, copied_list = > original_list[:] gives you a clean copy (slicing returns a new > object....) > If you read the doc excerpt carefully, you will realize that it says the same thing. I agree that the language can be improved, though. There is no need to bring in assignment to explain that a[:] makes a copy of list a. Please create a documentation issue at http://bugs.python.org . If you can suggest a better formulation, it is likely to be accepted. From greg.ewing at canterbury.ac.nz Fri Jun 25 03:18:18 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 Jun 2010 13:18:18 +1200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D3C2.1060500@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <4C2403DA.5000907@canterbury.ac.nz> Scott Dial wrote: > But the only motivation for doing this with .pyc files is that the .py > files are able to be shared, In an application made up of a mixture of pure Python and extension modules, the .py files are able to be shared too. Seems to me that a similar motivation exists here as well. Not exactly the same, but closely related. -- Greg From healey.rich at gmail.com Fri Jun 25 03:14:39 2010 From: healey.rich at gmail.com (Rich Healey) Date: Fri, 25 Jun 2010 11:14:39 +1000 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: On Fri, Jun 25, 2010 at 11:04 AM, Steve Holden wrote: > Rich Healey wrote: >> http://docs.python.org/library/copy.html >> >> Just near the bottom it reads: >> >> """Shallow copies of dictionaries can be made using dict.copy(), and >> of lists by assigning a slice of the entire list, for example, >> copied_list = original_list[:].""" >> >> >> Surely this is a typo? To my understanding, copied_list = >> original_list[:] gives you a clean copy (slicing returns a new >> object....) >> > Yes, but it's a shallow copy: the new object references exactly the same > objects as the original list (not copies of those objects). A deep copy > would need to copy any referenced lists, and so on. > My apologies guys, I see now. I will see if I can think of a less ambiguous way to word this and submit a bug. Thankyou! From tjreedy at udel.edu Fri Jun 25 03:18:13 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 24 Jun 2010 21:18:13 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100625003149.GA16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: On 6/24/2010 8:31 PM, Stephen Thorne wrote: > Oh, I thought this was quite clear. I was specifically meaning the large > "Python 2 or 3" button on python.org. It would help users who want to know > what version of python to use if they had a clear guide as to what version > to download. I think everyone on pydev agrees that that would be good, but I do believe anyone has taken ownership of the issue as yet. I am not sure who currently maintains the site and whether such are aware of the proposal. I believe there is material on the wiki as well as the two existing pages on other sites that were discussed here. So a new page on python.org could consist of a few links. Someone just has to write it. > > It doesn't help if someone goes to do greenfield development in python > if a library they depend upon has yet to be ported, and they're trying to > use python 3. > > (As an addendum add pygtk to the list of libs that python 3 users on #python > are alarmed to find haven't been ported yet) The list, if it exists, should be on the wiki, where any registered user can edit it, rather than on the .org page. I suspect that the feedback about Python on #python is somewhat different from that on python-list. I also suspect that some of it could be used to improve python, the docs, and the site. Is that happening much? I know I regularly open tracker issues (such as 6507, 8824, and 8945) based on python-list discussions , and I know others have made wiki edits. -- Terry Jan Reedy From greg.ewing at canterbury.ac.nz Fri Jun 25 03:28:14 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 25 Jun 2010 13:28:14 +1200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: <4C24062E.7040105@canterbury.ac.nz> Terry Reedy wrote: > On 6/24/2010 1:38 PM, Bill Janssen wrote: > >> We have separate types for int, >> float, Decimal, etc. But they're all numbers, and they all >> cross-operate. > > No they do not. Decimal only mixes properly with ints, but not with > anything else I think there are also some important differences between numbers and strings concerning how they interact with C code. In C there are really only two choices for representing a Python number in a way that C code can directly operate on -- long or double -- and there is a set of functions for coercing a Python object into one of these that C code almost universally uses. So a new number type only has to implement the appropriate conversion methods to be usable by all of that C code. On the other hand, the existing C code that operates on Python strings often assumes that it has a particular internal representation. A new abstract string-access API would have to be devised, and all existing C code updated to use it. Also, this new API would not be as easy to use as the number API, because it would involve asking for the data in some specified encoding, which would require memory allocation and management. -- Greg From ncoghlan at gmail.com Fri Jun 25 05:34:33 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jun 2010 13:34:33 +1000 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: On Fri, Jun 25, 2010 at 11:18 AM, Terry Reedy wrote: > I believe there is material on the wiki as well as the two existing pages on > other sites that were discussed here. So a new page on python.org could > consist of a few links. Someone just has to write it. There's material on the wiki *now* (the Python2orPython3 page), but there wasn't before the recent discussion started. The whole Beginner's Guide on the wiki could actually use some TLC to bring it up to speed with the existence of Python 3.x. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From orsenthil at gmail.com Fri Jun 25 06:54:07 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Fri, 25 Jun 2010 10:24:07 +0530 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: <20100625045407.GA3191@remy> On Thu, Jun 24, 2010 at 09:05:09PM -0400, Alexander Belopolsky wrote: > On Thu, Jun 24, 2010 at 8:51 PM, Rich Healey wrote: > > http://docs.python.org/library/copy.html > > > > Just near the bottom it reads: > > > > """Shallow copies of dictionaries can be made using?dict.copy(), and > > of lists by assigning a slice of the entire list, for example, > > copied_list?=?original_list[:].""" > > > > > > Surely this is a typo? To my understanding, copied_list = > > original_list[:] gives you a clean copy (slicing returns a new > > object....) > > > > the same thing. I agree that the language can be improved, though. > There is no need to bring in assignment to explain that a[:] makes a > copy of list a. Please create a documentation issue at Better still, add your doc change suggestion (possible explanation) to this issue: http://bugs.python.org/issue9021 -- Senthil From stephen at xemacs.org Fri Jun 25 09:05:43 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 Jun 2010 16:05:43 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <878w63oam0.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > On Thu, Jun 24, 2010 at 1:12 AM, Stephen J. Turnbull wrote: > Understood, but both the majority of str/bytes methods and several > existing APIs (e.g. many in the os module, like os.listdir()) do it > this way. Understood. > Also, IMO a polymorphic function should *not* accept *mixed* > bytes/text input -- join('x', b'y') should be rejected. Agreed. > But join('x', 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make > sense to me. > > So, actually, I *don't* understand what you mean by needing LBYL. Consider docutils. Some folks assert that URIs *are* bytes and should be manipulated as such. So base URIs should be bytes. But there are various ways to refer to a base URI and combine it with relative URI taken from literal text in reST. That literal text will be represented as str. So you want to use urljoin, but this usage isn't polymorphic. If you forget to do a conversion here, urljoin will raise, of course. But late conversion may not be appropriate. AIUI Philip at least wants ways to raise exceptions earlier than that on some code paths. That's LBYL, no? From stephen at xemacs.org Fri Jun 25 09:49:16 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 Jun 2010 16:49:16 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100624170856.0853D3A4099@sparrow.telecommunity.com> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> <20100624170856.0853D3A4099@sparrow.telecommunity.com> Message-ID: <877hlno8lf.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > This doesn't have to be in the functions; it can be in the > *types*. Mixed-type string operations have to do type checking and > upcasting already, but if the protocol were open, you could make an > encoded-bytes type that would handle the error checking. Don't you realize that "encoded-bytes" is equivalent to use of a very limited profile of ISO 2022 coding extensions? Such as Emacs/MULE internal encoding or TRON code? It has been tried. It does not work. I understand how types can do such checking; my point is that the encoded-bytes type doesn't have enough information to do it in the cases where you think it is better than converting to str. There are *no useful operations* that can be done on two encoded-bytes with different encodings unless you know the ultimate target codec. The only sensible way to define the concatenation of ('ascii', 'English') with ('euc-jp','??????') is something like ('ascii', 'English', 'euc-jp','??????'), and *not* ('euc-jp','English??????'), because you don't know that the ultimate target codec is 'euc-jp'-compatible. Worse, you need to build in all the information about which codecs are mutually compatible into the encoded-bytes type. For example, if the ultimate target is known to be 'shift_jis', it's trivially compatible with 'ascii' and 'euc-jp' requires a conversion, but latin-9 you can't have. > (Btw, in some earlier emails, Stephen, you implied that this could be > fixed with codecs -- but it can't, because the problem isn't with the > bytes containing invalid Unicode, it's with the Unicode containing > invalid bytes -- i.e., characters that can't be encoded to the > ultimate codec target.) No, the problem is not with the Unicode, it is with the code that allows characters not encodable with the target codec. If you don't have a target codec, there are ascii-safe source codecs, such as 'latin-1' or 'ascii' with surrogateescape, that will work any time that bytes-oriented processing can work. From scott+python-dev at scottdial.com Fri Jun 25 10:53:21 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 25 Jun 2010 04:53:21 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <4C246E81.3020302@scottdial.com> On 6/24/2010 8:23 PM, James Y Knight wrote: > On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: >> If the package has .so files that aren't compatible with other version >> of python, then what is the motivation for placing that in a shared >> location (since it can't actually be shared) > > Because python looks for .so files in the same place it looks for the > .py files of the same package. My suggestion was that a package that contains .so files should not be shared (e.g., the entire lxml package should be placed in a version-specific path). The motivation for this PEP was to simplify the installation python packages for distros; it was not to reduce the number of .py files on the disk. Placing .so files together does not simplify that install process in any way. You will still have to handle such packages in a special way. You must still compile the package multiple times for each relevant version of python (with special tagging that I imagine distutils can take care of) and, worse yet, you have created a more trick install than merely having multiple search paths (e.g., installing/uninstalling lxml for *one* version of python is actually more difficult in this scheme). Either the motivation for this PEP is inaccurate or I am failing to understand how this is *simpler*. In the case of pure-python, this PEP is clearly a win, but I have not seen an argument that it is a win for .so files. Moreover, the PEP itself is titled "PYC Repository Directories" (not "shared site-packages") and makes no mention of .so files at all. -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From scott+python-dev at scottdial.com Fri Jun 25 11:02:24 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 25 Jun 2010 05:02:24 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C2403DA.5000907@canterbury.ac.nz> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C2403DA.5000907@canterbury.ac.nz> Message-ID: <4C2470A0.4000802@scottdial.com> On 6/24/2010 9:18 PM, Greg Ewing wrote: > Scott Dial wrote: > >> But the only motivation for doing this with .pyc files is that the .py >> files are able to be shared, > > In an application made up of a mixture of pure Python and > extension modules, the .py files are able to be shared too. > Seems to me that a similar motivation exists here as well. > Not exactly the same, but closely related. > If I recall Barry's motivation correctly, the PEP was intended to simplify the installation of packages for multiple versions of Python, although the PEP states that in a less direct way. In the case of pure-python packages, this is merely about avoiding .pyc collisions. But, in the case of packages with .so files, I fail to see how this is simpler (in face, I believe it to be more complicated). So, I am not sure the PEP supports this feature being proposed (since it makes no mention of .so files), and more importantly, I am not sure it actually makes anything better for anyone (still requires multiple compilations and un/install gymnastics). -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From lvh at laurensvh.be Fri Jun 25 11:18:18 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Fri, 25 Jun 2010 11:18:18 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: On Fri, Jun 25, 2010 at 5:34 AM, Nick Coghlan wrote: > On Fri, Jun 25, 2010 at 11:18 AM, Terry Reedy wrote: >> I believe there is material on the wiki as well as the two existing pages on >> other sites that were discussed here. So a new page on python.org could >> consist of a few links. Someone just has to write it. > > There's material on the wiki *now* (the Python2orPython3 page), but > there wasn't before the recent discussion started. The whole > Beginner's Guide on the wiki could actually use some TLC to bring it > up to speed with the existence of Python 3.x. > > Cheers, > Nick. > +1, this definitely sounds like a good idea to me. cheers, Laurens From stephen at xemacs.org Fri Jun 25 12:06:33 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 Jun 2010 19:06:33 +0900 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> Ian Bicking writes: > We've setup a system where we think of text as natively unicode, with > encodings to put that unicode into a byte form. This is certainly > appropriate in a lot of cases. But there's a significant class of problems > where bytes are the native structure. Network protocols are what we've been > discussing, and are a notable case of that. That is, b'/' is the most > native sense of a path separator in a URL, or b':' is the most native sense > of what separates a header name from a header value in HTTP. IMHO, URIs don't have a native language in this sense. Network programmers do, however, and it is bytes. Text-handling programmers also do, and it is str. > So with this idea in mind it makes more sense to me that *specific pieces of > text* can be reasonably treated as both bytes and text. All the string > literals in urllib.parse.urlunspit() for example. > > The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does not > become special('/x')) and special('/')+x=='/x' (again it becomes str). This > avoids some of the cases of unicode or str infecting a system as they did in > Python 2 (where you might pass in unicode and everything works fine until > some non-ASCII is introduced). I think you need to give explicit examples where this actually helps in terms of "type contagion". I expect that it doesn't help at all, especially not for the people whose native language for URIs is bytes. These specials are still going to flip to unicode as soon as it comes in, and that will be incompatible with the bytes they'll need later. So they're still going to need to filter out unicode on input. It looks like it would be useful for programmers of polymorphic functions, though. From pje at telecommunity.com Fri Jun 25 15:07:46 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 25 Jun 2010 09:07:46 -0400 Subject: [Python-Dev] bytes / unicode Message-ID: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> At 04:49 PM 6/25/2010 +0900, Stephen J. Turnbull wrote: >P.J. Eby writes: > > > This doesn't have to be in the functions; it can be in the > > *types*. Mixed-type string operations have to do type checking and > > upcasting already, but if the protocol were open, you could make an > > encoded-bytes type that would handle the error checking. > >Don't you realize that "encoded-bytes" is equivalent to use of a very >limited profile of ISO 2022 coding extensions? Such as Emacs/MULE >internal encoding or TRON code? It has been tried. It does not work. > >I understand how types can do such checking; my point is that the >encoded-bytes type doesn't have enough information to do it in the >cases where you think it is better than converting to str. There are >*no useful operations* that can be done on two encoded-bytes with >different encodings unless you know the ultimate target codec. I do know the ultimate target codec -- that's the point. IOW, I want to be able to do to all my operations by passing target-encoded strings to polymorphic functions. Then, the moment something creeps in that won't go to the target codec, I'll be able to track down the hole in the legacy code that's letting bad data creep in. > The >only sensible way to define the concatenation of ('ascii', 'English') >with ('euc-jp','??????') is something like ('ascii', 'English', >'euc-jp','??????'), and *not* ('euc-jp','English??????'), because you >don't know that the ultimate target codec is 'euc-jp'-compatible. >Worse, you need to build in all the information about which codecs are >mutually compatible into the encoded-bytes type. For example, if the >ultimate target is known to be 'shift_jis', it's trivially compatible >with 'ascii' and 'euc-jp' requires a conversion, but latin-9 you can't >have. The interaction won't be with other encoded bytes, it'll be with other *unicode* strings. Ones coming from other code, and literals embedded in the stdlib. >No, the problem is not with the Unicode, it is with the code that >allows characters not encodable with the target codec. And which code that is, precisely, is the thing that may be very difficult to find, unless I can identify it at the first point it enters (and corrupts) my output data. When dealing with a large code base, this may be a nontrivial problem. From ianb at colorstudy.com Fri Jun 25 17:35:44 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 10:35:44 -0500 Subject: [Python-Dev] bytes / unicode In-Reply-To: <878w63oam0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> <878w63oam0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 2:05 AM, Stephen J. Turnbull wrote: > > But join('x', 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make > > sense to me. > > > > So, actually, I *don't* understand what you mean by needing LBYL. > > Consider docutils. Some folks assert that URIs *are* bytes and should > be manipulated as such. So base URIs should be bytes. I don't get what you are arguing against. Are you worried that if we make URL code polymorphic that this will mean some code will treat URLs as bytes, and that code will be incompatible with URLs as text? No one is arguing we remove text support from any of these functions, only that we allow bytes. -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianb at colorstudy.com Fri Jun 25 17:40:56 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 10:40:56 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> References: <11597.1277401099@parc.com> <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 5:06 AM, Stephen J. Turnbull wrote: > > So with this idea in mind it makes more sense to me that *specific > pieces of > > text* can be reasonably treated as both bytes and text. All the string > > literals in urllib.parse.urlunspit() for example. > > > > The semantics I imagine are that special('/')+b'x'==b'/x' (i.e., it does > not > > become special('/x')) and special('/')+x=='/x' (again it becomes str). > This > > avoids some of the cases of unicode or str infecting a system as they > did in > > Python 2 (where you might pass in unicode and everything works fine > until > > some non-ASCII is introduced). > > I think you need to give explicit examples where this actually helps > in terms of "type contagion". I expect that it doesn't help at all, > especially not for the people whose native language for URIs is bytes. > These specials are still going to flip to unicode as soon as it comes > in, and that will be incompatible with the bytes they'll need later. > So they're still going to need to filter out unicode on input. > > It looks like it would be useful for programmers of polymorphic > functions, though. > I'm proposing these specials would be used in polymorphic functions, like the functions in urllib.parse. I would not personally use them in my own code (unless of course I was writing my own polymorphic functions). This also makes it less important that the objects be a full stand-in for text, as their use should be isolated to specific functions, they aren't objects that should be passed around much. So you can easily identify and quickly detect if you use unsupported operations on those text-like objects. (This is all a very different use case from bytes+encoding, I think) -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From status at bugs.python.org Fri Jun 25 18:08:26 2010 From: status at bugs.python.org (Python tracker) Date: Fri, 25 Jun 2010 18:08:26 +0200 (CEST) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20100625160826.0C34078182@psf.upfronthosting.co.za> ACTIVITY SUMMARY (2010-06-18 - 2010-06-25) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2795 open (+38) / 18104 closed (+14) / 20899 total (+52) Open issues with patches: 1130 Average duration of open issues: 712 days. Median duration of open issues: 503 days. Open Issues Breakdown open 2765 (+38) languishing 13 ( +0) pending 16 ( +0) Issues Created Or Reopened (55) _______________________________ os.path.normcase documentation/behaviour unclear on Mac OS X 2010-06-25 http://bugs.python.org/issue3485 reopened ezio.melotti patch uuid.uuid4() generates non-unique values on OSX 2010-06-21 http://bugs.python.org/issue8621 reopened skrah patch test_support.run_unittest cmdline options and arguments 2010-06-20 http://bugs.python.org/issue9028 reopened techtonik errors='replace' works in IDLE, fails at Windows command line. 2010-06-18 http://bugs.python.org/issue9029 created jvanpraag ctypes variable limits 2010-06-18 http://bugs.python.org/issue9030 created kumma distutils uses invalid "-Wstrict-prototypes" flag when compili 2010-06-18 http://bugs.python.org/issue9031 created matteo.vescovi xmlrpc: Transport.request() should also catch socket.error(EPI 2010-06-18 http://bugs.python.org/issue9032 created haypo patch cmd module tab misbehavior 2010-06-19 http://bugs.python.org/issue9033 created slcott datetime module should use int32_t for date/time components 2010-06-20 http://bugs.python.org/issue9034 created belopolsky os.path.ismount on windows doesn't support windows mount point 2010-06-20 http://bugs.python.org/issue9035 created Oren_Held Simplify Py_CHARMASK 2010-06-20 http://bugs.python.org/issue9036 created skrah patch, needs review Add explanation as to how to raise a custom exception in the e 2010-06-20 http://bugs.python.org/issue9037 created jonathan.underwood patch test_distutils failure 2010-06-20 http://bugs.python.org/issue9038 created pitrou IDLE and module Doc 2010-06-20 http://bugs.python.org/issue9039 created Yoda_Uchiha using MIMEApplication to attach a PDF raises a TypeError excep 2010-06-21 http://bugs.python.org/issue9040 created Enrico.Sartori raised exception is misleading 2010-06-21 http://bugs.python.org/issue9041 created kumma Gettext cache and classes 2010-06-21 http://bugs.python.org/issue9042 created v_peter patch 2to3 doesn't handle byte comparison well 2010-06-21 CLOSED http://bugs.python.org/issue9043 created vdupras [optparse] confusion over an option and its value without any 2010-06-21 http://bugs.python.org/issue9044 created kszawala 2.7rc1: 64-bit OSX installer is not built with 64-bit tkinter 2010-06-21 http://bugs.python.org/issue9045 created srid Python 2.7rc2 doesn't build on Mac OS X 10.4 2010-06-21 http://bugs.python.org/issue9046 created lemburg Python 2.7rc2 includes -isysroot twice on each gcc command lin 2010-06-21 http://bugs.python.org/issue9047 created lemburg no OS X buildbots in the stable list 2010-06-21 http://bugs.python.org/issue9048 created janssen buildbot UnboundLocalError in nested function 2010-06-21 CLOSED http://bugs.python.org/issue9049 created Andreas Hofmeister UnboundLocalError in nested function 2010-06-21 CLOSED http://bugs.python.org/issue9050 created Andreas Hofmeister Improve pickle format for aware datetime instances 2010-06-21 http://bugs.python.org/issue9051 created belopolsky 2.7rc2 fails test_urllib_localnet tests on OS X 2010-06-21 CLOSED http://bugs.python.org/issue9052 created janssen distutils compiles extensions so that Python.h cannot be found 2010-06-21 http://bugs.python.org/issue9053 created exarkun pyexpat configured with "--with-system-expat" is incompatible 2010-06-21 http://bugs.python.org/issue9054 created dmalcolm patch test_issue_8959_b fails when run from a service 2010-06-21 http://bugs.python.org/issue9055 created pmoore buildbot Adding additional level of bookmarks and section numbers in py 2010-06-22 http://bugs.python.org/issue9056 created pengyu.ut Distutils2 needs a home page 2010-06-22 http://bugs.python.org/issue9057 created dabrahams PyUnicodeDecodeError_Create asserts that various arguments are 2010-06-22 CLOSED http://bugs.python.org/issue9058 created dmalcolm patch Backwards compatibility 2010-06-23 CLOSED http://bugs.python.org/issue9059 created Raven Python/dup2.c doesn't compile on (at least) newlib 2010-06-23 http://bugs.python.org/issue9060 created torne patch cgi.escape Can Lead To XSS Vulnerabilities 2010-06-23 http://bugs.python.org/issue9061 created Craig.Younkins urllib.urlopen crashes when launched from a thread 2010-06-23 CLOSED http://bugs.python.org/issue9062 created olivier-berten TZ examples in datetime.rst are incorrect 2010-06-23 http://bugs.python.org/issue9063 created belopolsky pdb enhancement up/down traversals 2010-06-23 http://bugs.python.org/issue9064 created vandyswa patch tarfile: default root:root ownership is incorrect. 2010-06-23 http://bugs.python.org/issue9065 created jsbronder patch Standard type codes for array.array, same as struct 2010-06-24 http://bugs.python.org/issue9066 created cmcqueen1975 Use macros from pyctype.h 2010-06-24 http://bugs.python.org/issue9067 created skrah "from . import *" 2010-06-24 CLOSED http://bugs.python.org/issue9068 created bhy test_float failure on Solaris 2010-06-24 http://bugs.python.org/issue9069 created mark.dickinson Timestamps are rounded differently in py3k and trunk 2010-06-24 CLOSED http://bugs.python.org/issue9070 created belopolsky TarFile doesn't support member files with a leading "./" 2010-06-24 CLOSED http://bugs.python.org/issue9071 created free.ekanayaka Unloading modules - memleaks? 2010-06-24 CLOSED http://bugs.python.org/issue9072 created yappie Tkinter module missing from install on OS X 10.6.4 2010-06-24 http://bugs.python.org/issue9073 created RolandJ [includes patch] subprocess module closes standard file descri 2010-06-24 http://bugs.python.org/issue9074 created kr patch ssl module sets "debug" flag on SSL struct 2010-06-24 CLOSED http://bugs.python.org/issue9075 created pitrou Add C-API documentation for PyUnicode_AsDecodedObject/Unicode 2010-06-24 http://bugs.python.org/issue9076 created haypo patch argparse does not handle arguments correctly after -- 2010-06-24 CLOSED http://bugs.python.org/issue9077 created iElectric Fix C API documentation of unicode 2010-06-24 http://bugs.python.org/issue9078 created haypo patch Make gettimeofday available in time module 2010-06-25 http://bugs.python.org/issue9079 created belopolsky patch, needs review Provide list prepend method (even though it's not efficient) 2010-06-25 CLOSED http://bugs.python.org/issue9080 created andybuckley Issues Now Closed (43) ______________________ MultiMethods with type annotations in 3000 1035 days http://bugs.python.org/issue1004 benjamin.peterson patch subprocess.list2cmdline doesn't do pipe symbols 975 days http://bugs.python.org/issue1300 chops at demiurgestudios.com easy Popen.poll always returns None 816 days http://bugs.python.org/issue2475 tjreedy Python interpreter uses Unicode surrogate pairs only before th 713 days http://bugs.python.org/issue3297 haypo patch py3k shouldn't use -fno-strict-aliasing anymore 712 days http://bugs.python.org/issue3326 benjamin.peterson patch create a numbits() method for int and long types 699 days http://bugs.python.org/issue3439 mark.dickinson patch, needs review os.path.realpath() get the wrong result 554 days http://bugs.python.org/issue4654 r.david.murray Compiling python 2.5.2 under Wine on linux. 527 days http://bugs.python.org/issue4883 BreamoreBoy 3.0 sqlite doc: most examples refer to pysqlite2, use 2.x synt 516 days http://bugs.python.org/issue5005 tjreedy Implement a way to change the python process name 448 days http://bugs.python.org/issue5672 piro patch setup build with Platform SDK, finding vcvarsall.bat 407 days http://bugs.python.org/issue5969 georg.brandl Failing test_signal.py on Redhat 4.1.2-44 407 days http://bugs.python.org/issue5972 georg.brandl datetime.strptime doesn't support %z format ? 1 days http://bugs.python.org/issue6641 merwok patch webbrowser.get("firefox") does not work on Mac with installed 243 days http://bugs.python.org/issue7192 ronaldoussoren patch Backport 3.x nonlocal keyword to 2.7 117 days http://bugs.python.org/issue8018 mark.dickinson test_heapq interfering with test_import on py3k 65 days http://bugs.python.org/issue8440 tim.golden enumerate() test cases do not cover optional start argument 46 days http://bugs.python.org/issue8636 merwok patch _ssl.c uses PyWeakref_GetObject but doesn't incref result 45 days http://bugs.python.org/issue8682 pitrou patch Remove "w" format of PyParse_ParseTuple() 27 days http://bugs.python.org/issue8850 haypo patch msvc9compiler.py: find_vcvarsall() doesn't work with VS2008 on 23 days http://bugs.python.org/issue8854 lemburg patch, 64bit execfile does not work with UNC paths 21 days http://bugs.python.org/issue8869 tim.golden getargs.c: release the buffer on error 18 days http://bugs.python.org/issue8926 haypo patch PyArg_Parse*(): "z" should not accept bytes 16 days http://bugs.python.org/issue8949 haypo patch PyArg_Parse*(): factorize code of 's' and 'z' formats, and 'u' 16 days http://bugs.python.org/issue8951 haypo patch WINFUNCTYPE wrapped ctypes callbacks not functioning correctly 12 days http://bugs.python.org/issue8959 theller Year range in timetuple 5 days http://bugs.python.org/issue9005 belopolsky patch os.path.normcase(None) does not raise an error on linux and sh 8 days http://bugs.python.org/issue9018 ezio.melotti patch, easy 2to3 doesn't handle byte comparison well 0 days http://bugs.python.org/issue9043 merwok UnboundLocalError in nested function 1 days http://bugs.python.org/issue9049 mark.dickinson UnboundLocalError in nested function 0 days http://bugs.python.org/issue9050 merwok 2.7rc2 fails test_urllib_localnet tests on OS X 0 days http://bugs.python.org/issue9052 belopolsky PyUnicodeDecodeError_Create asserts that various arguments are 0 days http://bugs.python.org/issue9058 benjamin.peterson patch Backwards compatibility 0 days http://bugs.python.org/issue9059 ezio.melotti urllib.urlopen crashes when launched from a thread 0 days http://bugs.python.org/issue9062 orsenthil "from . import *" 0 days http://bugs.python.org/issue9068 brett.cannon Timestamps are rounded differently in py3k and trunk 0 days http://bugs.python.org/issue9070 belopolsky TarFile doesn't support member files with a leading "./" 1 days http://bugs.python.org/issue9071 free.ekanayaka Unloading modules - memleaks? 0 days http://bugs.python.org/issue9072 yappie ssl module sets "debug" flag on SSL struct 0 days http://bugs.python.org/issue9075 pitrou argparse does not handle arguments correctly after -- 1 days http://bugs.python.org/issue9077 iElectric Provide list prepend method (even though it's not efficient) 0 days http://bugs.python.org/issue9080 andybuckley webbrowser.open_new() opens in an existing browser window 2463 days http://bugs.python.org/issue812089 r.david.murray mbcs encoding ignores errors 2394 days http://bugs.python.org/issue850997 haypo patch Top Issues Most Discussed (10) ______________________________ 19 Non-uniformity in randrange for large arguments. 7 days open http://bugs.python.org/issue9025 19 2.7: eval hangs on AIX 8 days open http://bugs.python.org/issue9020 17 Python 2.7rc2 doesn't build on Mac OS X 10.4 4 days open http://bugs.python.org/issue9046 14 test_float failure on Solaris 1 days open http://bugs.python.org/issue9069 13 msvc9compiler.py: find_vcvarsall() doesn't work with VS2008 on 23 days closed http://bugs.python.org/issue8854 10 no OS X buildbots in the stable list 4 days open http://bugs.python.org/issue9048 10 os.path.normcase(None) does not raise an error on linux and sho 8 days closed http://bugs.python.org/issue9018 8 Provide list prepend method (even though it's not efficient) 0 days closed http://bugs.python.org/issue9080 8 Improve quality of Python/dtoa.c 9 days open http://bugs.python.org/issue9009 8 Add Mercurial support to patchcheck 10 days open http://bugs.python.org/issue8999 From barry at python.org Fri Jun 25 18:18:47 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 12:18:47 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 Message-ID: <20100625121847.60331d9e@heresy> Benjamin is still planning to release Python 2.7 final on 2010-07-03, so it's time for me to work out the release schedule for Python 2.6.6 - likely the last maintenance release for Python 2.6. Because summer schedules are crazy, and I want to leave two weeks between 2.6.6 rc1 and 2.6.6 final, my current schedule looks like: * Python 2.6.6 rc 1 on Monday 2010-08-02 * Python 2.6.6 final on Monday 2010-08-16 This should give folks plenty of time to relax after 2.7 final, and still be able to get those last minute fixes into the 2.6 tree. Let me know if these dates don't work for you. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Fri Jun 25 18:18:33 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jun 2010 01:18:33 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> Message-ID: <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > I do know the ultimate target codec -- that's the point. > > IOW, I want to be able to do to all my operations by passing > target-encoded strings to polymorphic functions. IOW, you *do* have text and (ignoring efficiency issues) could just as well use str. But That Other Code is unreliable, so you need a marker for your own internal strings indicating that they are validated, while other strings are not. This has nothing to do with bytes vs. str as string types, then; it's all about validated (which your architecture indicates by using the bytes type) vs. unvalidated (which your architecture indicates with unicode). Eg, in the case of your USPS vs. ecommerce example, you can't even handle all bytes, so not all possible bytes objects are valid. And other applications might not be able to handle all Japanese, but only a subset, so having valid EUC-JP wouldn't be enough, you'd have to check repertoire -- might as well use str. It seems to me what is wanted here is something like Perl's taint mechanism, for *both* kinds of strings. Am I missing something? But with your architecture, it seems to me that you actually don't want polymorphic functions in the stdlib. You want the stdlib functions to be bytes-oriented if and only if they are reliable. (This is what I was saying to Guido elsewhere.) BTW, this was a little unclear to me: > [Collisions will] be with other *unicode* strings. Ones coming > from other code, and literals embedded in the stdlib. What about the literals in the stdlib? Are you saying they contain invalid code points for your known output encoding? Or are you saying that with non-polymorphic unicode stdlib, you get lots of false positives when combining with your validated bytes? From barry at python.org Fri Jun 25 18:28:29 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 12:28:29 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <20100625121847.60331d9e@heresy> References: <20100625121847.60331d9e@heresy> Message-ID: <20100625122829.30b20e67@heresy> On Jun 25, 2010, at 12:18 PM, Barry Warsaw wrote: >* Python 2.6.6 rc 1 on Monday 2010-08-02 >* Python 2.6.6 final on Monday 2010-08-16 I've also updated the Google calendar of Python releases: http://www.google.com/calendar/ical/b6v58qvojllt0i6ql654r1vh00%40group.calendar.google.com/public/basic.ics -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From stephen at xemacs.org Fri Jun 25 18:30:08 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jun 2010 01:30:08 +0900 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <871vbvnkhb.fsf@uwakimon.sk.tsukuba.ac.jp> Ian Bicking writes: > I'm proposing these specials would be used in polymorphic functions, like > the functions in urllib.parse. I would not personally use them in my own > code (unless of course I was writing my own polymorphic functions). > > This also makes it less important that the objects be a full stand-in for > text, as their use should be isolated to specific functions, they aren't > objects that should be passed around much. So you can easily identify and > quickly detect if you use unsupported operations on those text-like > objects. OK. That sounds reasonable to me, but I don't see any need for a builtin type for it. Inclusion in the stdlib is not quite a no-brainer, but given Guido's endorsement of polymorphism, I can't bring myself to go lower than +0.9 . > (This is all a very different use case from bytes+encoding, I think) Very much so. From stephen at xemacs.org Fri Jun 25 18:37:58 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jun 2010 01:37:58 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100620184120.10EFB3A4099@sparrow.telecommunity.com> <20100620234723.600ad4a8@pitrou.net> <87wrtsd44p.fsf@uwakimon.sk.tsukuba.ac.jp> <87631c4bca.fsf@uwakimon.sk.tsukuba.ac.jp> <20100621165611.GW5787@unaka.lan> <87r5jz3h8u.fsf@uwakimon.sk.tsukuba.ac.jp> <20100622055040.GE5787@unaka.lan> <87d3vj2tj2.fsf@uwakimon.sk.tsukuba.ac.jp> <0D1D2134-2CF9-4F93-BE82-912C5297D36F@fuhm.net> <87zkymns55.fsf@uwakimon.sk.tsukuba.ac.jp> <87mxukonmq.fsf@uwakimon.sk.tsukuba.ac.jp> <878w63oam0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87zkyjm5jt.fsf@uwakimon.sk.tsukuba.ac.jp> Ian Bicking writes: > I don't get what you are arguing against. Are you worried that if > we make URL code polymorphic that this will mean some code will > treat URLs as bytes, and that code will be incompatible with URLs > as text? No one is arguing we remove text support from any of > these functions, only that we allow bytes. No, I understand what Guido means by "polymorphic". I'm arguing that as I understand one of Philip Eby's use cases, "bytes" is a misspelling of "validated" and "unicode" is a misspelling of "unvalidated". In case of some kind of bug, polymorphic stdlib functions would allow propagation of unvalidated/unicode within the validated zone, aka "errors passing silently". Now that I understand that that use case doesn't actually care about bytes vs. unicode *string* semantics at all, the argument becomes moot, I guess. From ianb at colorstudy.com Fri Jun 25 18:54:05 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 11:54:05 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <871vbvnkhb.fsf@uwakimon.sk.tsukuba.ac.jp> References: <11597.1277401099@parc.com> <876317o28m.fsf@uwakimon.sk.tsukuba.ac.jp> <871vbvnkhb.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Fri, Jun 25, 2010 at 11:30 AM, Stephen J. Turnbull wrote: > Ian Bicking writes: > > > I'm proposing these specials would be used in polymorphic functions, > like > > the functions in urllib.parse. I would not personally use them in my > own > > code (unless of course I was writing my own polymorphic functions). > > > > This also makes it less important that the objects be a full stand-in > for > > text, as their use should be isolated to specific functions, they aren't > > objects that should be passed around much. So you can easily identify > and > > quickly detect if you use unsupported operations on those text-like > > objects. > > OK. That sounds reasonable to me, but I don't see any need for > a builtin type for it. Inclusion in the stdlib is not quite a > no-brainer, but given Guido's endorsement of polymorphism, I can't > bring myself to go lower than +0.9 . > Agreed on a builtin; I think it would be fine to put something in the strings module, and then in these examples code that used '/' would instead use strings.ascii('/') (not sure so sure of what the name should be though). -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Jun 25 18:57:50 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 25 Jun 2010 12:57:50 -0400 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: On 6/24/2010 8:51 PM, Rich Healey wrote: > http://docs.python.org/library/copy.html Discussion of the wording of current docs should go to python-list. Py-dev is for development of future Python. -- Terry Jan Reedy From fuzzyman at voidspace.org.uk Fri Jun 25 20:35:35 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 25 Jun 2010 19:35:35 +0100 Subject: [Python-Dev] Creating APIs that work as both decorators and context managers Message-ID: <4C24F6F7.4040200@voidspace.org.uk> Hello all, I've put a recipe up on the Python cookbook for creating APIs that work as both decorators and context managers and wonder if it would be considered a useful addition to the functools module. http://code.activestate.com/recipes/577273-decorator-and-context-manager-from-a-single-api/ I wrote this after writing almost identical code the second time for "patch" in the mock module. (The patch decorator can be used as a decorator or as a context manager and I was writing a new variant.) Both py.test and django have similar code in places, so it is not an uncommon pattern. It is only 40 odd lines (ignore the ugly Python 2 & 3 compatibility hack), so I'm fine with it living on the cookbook - but it is at least slightly fiddly to write and has the added niceness of providing the optional exception handling semantics of __exit__ for decorators as well. Example use (really hope email doesn't swallow the whitespace - my apologies in advance if it does): from context import Context class mycontext(Context): def __init__(self, *args): """Normal initialiser""" def start(self): """ Called on entering the with block or starting the decorated function. If used in a with statement whatever this method returns will be the context manager. """ def finish(self, *exc): """ Called on exit. Arguments and return value of this method have the same meaning as the __exit__ method of a normal context manager. """ @mycontext('some', 'args') def function(): pass with mycontext('some', 'args') as something: pass I'm not entirely happy with the name of the class or the start and finish methods, so open to suggestions there. start and finish *could* be __enter__ and __exit__ - but that would make the class you implement *look* like a normal context manager and I thought it was better to distinguish them. Perhaps before and after? All the best, Michael Foord -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Fri Jun 25 20:58:42 2010 From: brett at python.org (Brett Cannon) Date: Fri, 25 Jun 2010 11:58:42 -0700 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C246E81.3020302@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: On Fri, Jun 25, 2010 at 01:53, Scott Dial wrote: > On 6/24/2010 8:23 PM, James Y Knight wrote: >> On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: >>> If the package has .so files that aren't compatible with other version >>> of python, then what is the motivation for placing that in a shared >>> location (since it can't actually be shared) >> >> Because python looks for .so files in the same place it looks for the >> .py files of the same package. > > My suggestion was that a package that contains .so files should not be > shared (e.g., the entire lxml package should be placed in a > version-specific path). The motivation for this PEP was to simplify the > installation python packages for distros; it was not to reduce the > number of .py files on the disk. I assume you are talking about PEP 3147. You're right that the PEP was for pyc files and that's it. No one is talking about rewriting the PEP. The motivation Barry is using is an overarching one of distros wanting to use a single directory install location for all installed Python versions. That led to PEP 3147 and now this work. > > Placing .so files together does not simplify that install process in any > way. You will still have to handle such packages in a special way. You > must still compile the package multiple times for each relevant version > of python (with special tagging that I imagine distutils can take care > of) and, worse yet, you have created a more trick install than merely > having multiple search paths (e.g., installing/uninstalling lxml for > *one* version of python is actually more difficult in this scheme). This is meant to be used by distros in a programmatic fashion, so my response is "so what?" Their package management system is going to maintain the directory, not a person. You and I are not going to be using this for anything. This is purely meant for Linux OS vendors (maybe OS X) to manage their installs through their package software. I honestly do not expect human beings to be mucking around with these installs (and I suspect Barry doesn't either). > > Either the motivation for this PEP is inaccurate or I am failing to > understand how this is *simpler*. In the case of pure-python, this PEP > is clearly a win, but I have not seen an argument that it is a win for > .so files. Moreover, the PEP itself is titled "PYC Repository > Directories" (not "shared site-packages") and makes no mention of .so > files at all. You're conflating what is being discussed with PEP 3147. That PEP is independent of this. PEP 3147 just empowered this work to be relevant. -Brett > > -- > Scott Dial > scott at scottdial.com > scodial at cs.indiana.edu > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From scott+python-dev at scottdial.com Fri Jun 25 21:42:38 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 25 Jun 2010 15:42:38 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <4C2506AE.3060002@scottdial.com> On 6/25/2010 2:58 PM, Brett Cannon wrote: > I assume you are talking about PEP 3147. You're right that the PEP was > for pyc files and that's it. No one is talking about rewriting the > PEP. Yes, I am making reference to PEP 3147. I make reference to that PEP because this change is of the same order of magnitude as the .pyc change, and we asked for a PEP for that, and if this .so stuff is an extension of that thought process, then it should either be reflected by that PEP or a new PEP. > The motivation Barry is using is an overarching one of distros > wanting to use a single directory install location for all installed > Python versions. That led to PEP 3147 and now this work. It's unclear to me that that is the correct motivation, which you are divining. As I understand it, the motivation to be to *simplify installation* for distros, which may or may not be achieved by using a single directory. In the case of pure-python packages, a single directory is an obvious win. In the case of mixed-python packages, I remain to be persuaded there is any improvement achieved. > This is meant to be used by distros in a programmatic fashion, so my > response is "so what?" Their package management system is going to > maintain the directory, not a person. Then why is the status quo unacceptable? I have already explained how this will still require programmatic steps of at least the same difficulty as the status quo requires, so why should we change anything? I am skeptical that this is a simple programmatic problem either: take any random package on PyPI and tell me whether or not it has a .so file that must be compiled. If such a .so file exists, then this package must be special-cased and compiled for each version of Python on the system (or will ever be on the system?). Such a package yields an arbitrary number of .so files due to the number of version of Python on the machine, and I can't imagine how it is simpler to manage all of those files than it is to manage multiple site-packages. > You're conflating what is being discussed with PEP 3147. That PEP is > independent of this. PEP 3147 just empowered this work to be relevant. Without a PEP (be it PEP 3147 or some other), what is the justification for doing this? The burden should be on "you" to explain why this is a good idea and not just a clever idea. -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From dickinsm at gmail.com Fri Jun 25 22:02:36 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 25 Jun 2010 21:02:36 +0100 Subject: [Python-Dev] Creating APIs that work as both decorators and context managers In-Reply-To: <4C24F6F7.4040200@voidspace.org.uk> References: <4C24F6F7.4040200@voidspace.org.uk> Message-ID: On Fri, Jun 25, 2010 at 7:35 PM, Michael Foord wrote: > Hello all, > > I've put a recipe up on the Python cookbook for creating APIs that work as > both decorators and context managers and wonder if it would be considered a > useful addition to the functools module. > http://code.activestate.com/recipes/577273-decorator-and-context-manager-from-a-single-api/ It's an interesting idea. I wanted almost exactly this a little while ago, while doing some experiments to add an IEEE 754-compliance wrapper to the decimal module (for my own use). It seems quite natural that one might want to wrap both functions and blocks in the same way. [1] In case anyone wants the details, this was for a 'delay-exceptions' operation, that allows you to execute some number of arithmetic operations, keeping track of the floating-point signals that they produce but not raising the corresponding exceptions until the end of the block; obviously this idea applies equally well to functions as to blocks. It's one of the recommended exception handling modes from section 8 of IEEE 754-2008. Mark From foom at fuhm.net Fri Jun 25 22:12:34 2010 From: foom at fuhm.net (James Y Knight) Date: Fri, 25 Jun 2010 16:12:34 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C246E81.3020302@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: > On 6/24/2010 8:23 PM, James Y Knight wrote: >> On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: >>> If the package has .so files that aren't compatible with other >>> version >>> of python, then what is the motivation for placing that in a shared >>> location (since it can't actually be shared) >> >> Because python looks for .so files in the same place it looks for the >> .py files of the same package. > > My suggestion was that a package that contains .so files should not be > shared (e.g., the entire lxml package should be placed in a > version-specific path). The motivation for this PEP was to simplify > the > installation python packages for distros; it was not to reduce the > number of .py files on the disk. > > Placing .so files together does not simplify that install process in > any > way. You will still have to handle such packages in a special way. This is a good point, but I think still falls short of a solution. For a package like lxml, indeed you are correct. Since debian needs to build it once per version, it could just put the entire package (.py files and .so files) into a different per-python-version directory. However, then you have to also consider python packages made up of multiple distro packages -- like twisted or zope. Twisted includes some C extensions in the core package. But then there are other twisted modules (installed under a "twisted.foo" name) which do not include C extensions. If the base twisted package is installed under a version-specific directory, then all of the submodule packages need to also be installed under the same version-specific directory (and thus built for all versions). In the past, it has proven somewhat tricky to coordinate which directory the modules for package "foo" should be installed in, because you need to know whether *any* of the related packages includes a native ".so" file, not just the current package. The converse situation, where a base package did *not* get installed into a version-specific directory because it includes no native code, but a submodule *does* include a ".so" file, is even trickier. James From martin at v.loewis.de Fri Jun 25 22:27:31 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:27:31 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100625003149.GA16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> Message-ID: <4C251133.2090505@v.loewis.de> >>> I am extremely keen for this to happen. Does anyone have ownership of this >>> project? There was some discussion of it up-list but the discussion fizzled. >> >> Can you please explain what "this project" is, in the context of your >> message? GSoC? GHOP? > > Oh, I thought this was quite clear. I was specifically meaning the large > "Python 2 or 3" button on python.org. It would help users who want to know > what version of python to use if they had a clear guide as to what version > to download. Ah, ok. No, nobody has taken ownership of that project, and likely, nobody actually will - unless you volunteer. Regards, Martin From martin at v.loewis.de Fri Jun 25 22:30:34 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:30:34 +0200 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: <4C2511EA.3000200@v.loewis.de> Am 25.06.2010 18:57, schrieb Terry Reedy: > On 6/24/2010 8:51 PM, Rich Healey wrote: >> http://docs.python.org/library/copy.html > > Discussion of the wording of current docs should go to python-list. > Py-dev is for development of future Python. No no no. Mis-worded documentation is a bug, just like any other bug, and deserves being discussed here. Furthermore, a sufficient condition for mis-wording is if a user read it in full, and still managed to misunderstand (as happened here). Regards, Martin From martin at v.loewis.de Fri Jun 25 22:31:28 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:31:28 +0200 Subject: [Python-Dev] docs - Copy In-Reply-To: References: Message-ID: <4C251220.3050106@v.loewis.de> > My apologies guys, I see now. > > I will see if I can think of a less ambiguous way to word this and submit a bug. Please don't take out or rephrase the word "shallow", though. This has a long CS tradition of meaning exactly what is meant here. Regards, Martin From martin at v.loewis.de Fri Jun 25 22:33:38 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 22:33:38 +0200 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <20100625121847.60331d9e@heresy> References: <20100625121847.60331d9e@heresy> Message-ID: <4C2512A2.1040404@v.loewis.de> Am 25.06.2010 18:18, schrieb Barry Warsaw: > Benjamin is still planning to release Python 2.7 final on 2010-07-03, so it's > time for me to work out the release schedule for Python 2.6.6 - likely the > last maintenance release for Python 2.6. > > Because summer schedules are crazy, and I want to leave two weeks between > 2.6.6 rc1 and 2.6.6 final, my current schedule looks like: > > * Python 2.6.6 rc 1 on Monday 2010-08-02 > * Python 2.6.6 final on Monday 2010-08-16 That would barely work for me. If schedule slips in any way, we'll have to move the release into end-of-September (but the days as proposed are fine). Regards, Martin From glyph at twistedmatrix.com Fri Jun 25 22:43:55 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Fri, 25 Jun 2010 16:43:55 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> Message-ID: <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > Regarding the proposal of a String ABC, I hope this isn't going to > become a backdoor to reintroduce the Python 2 madness of allowing > equivalency between text and bytes for *some* strings of bytes and not > others. For my part, what I want out of a string ABC is simply the ability to do application-specific optimizations. There are many applications where all input and output is text, but _must_ be UTF-8. Even GTK uses UTF-8 as its native text representation, so "output" could just be display. Right now, in Python 3, the only way to be "correct" about this is to copy every byte of input into 4 bytes of output, then copy each code point *back* into a single byte of output. If all your application does is rewrite the occasional XML attribute, for example, this cost can be significant, if not overwhelming. I'd like a version of 'decode' which would give me a type that was, in every respect, unicode, and responded to all protocols exactly as other unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) do, but wouldn't actually copy any of that memory unless it really needed to (for example, to pass to a C API that expected native wide characters), and that would hold on to the original bytes so that it could produce them on demand if encoded to the same encoding again. So, as others in this thread have mentioned, the 'ABC' really implies some stuff about C APIs as well. I'm not sure about the exact performance impact of such a class, which is why I'd like the ability to implement it *outside* of the stdlib and see how it works on a project, and return with a proposal along with some data. There are also different ways to implement this, and other optimizations (like ropes) which might be better. You can almost do this today, but the lack of things like the hypothetical "__rcontains__" does make it impossible to be totally transparent about it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Jun 25 22:59:44 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 16:59:44 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D006.6080800@netwok.org> References: <20100624115048.4fd152e3@heresy> <4C23A901.7060100@netwok.org> <20100624172302.024687ef@heresy> <4C23D006.6080800@netwok.org> Message-ID: <20100625165944.2cac0053@heresy> On Jun 24, 2010, at 11:37 PM, ?ric Araujo wrote: >Your plan seems good. Adding keyword arguments should not create >compatibility issues, and I suspect the impact on the code of build_ext >may be actually quite small. I?ll try to review your patch even though I >don?t know C or compiler oddities, but Tarek will have the best insight >and the final word. The C and configure/Makefile bits are pretty trivial. It basically extends the list of shared library extensions searched for on *nix machines, and allows that to be set on the ./configure command. As for the impact on distutils, with updated tests, it's less than 100 lines of diff. Again there it essentially allows us to pass the extension that build_ext writes to from the setup.py, via the Extension class. Because distutil's default is to use the $SO variable from the system-installed Makefile, with the change to dynload_shlib.c, configure.in, and Makefile.pre.in, we would get distutils writing the versioned .so files for free. I'll note further that if you *don't* specify this to ./configure, nothing much changes[1]. The distutils part of the patch is only there to disable or override the default, and *that's* only there to support proposed semantics that foo.so be used for PEP 384-compliant ABI extension modules. IOW, until PEP 384 is actually implemented, the distutils part of the patch is unnecessary. However, if the other changes are accepted, then I will add a discussion of this issue to PEP 384, and we can figure out the best semantics and implementation at that point. I honestly don't know if I am going to get to work on PEP 384 before 3.2 beta. >In case the time machine?s not available, your suggestion about getting >the filename from the Extension instance instead of passing in a string >can most certainly land in distutils2. Cool. -Barry [1] Well, I now realize you'll get an extra useless stat call, but I will fix that. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Fri Jun 25 23:02:05 2010 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Jun 2010 14:02:05 -0700 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz wrote: > > On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > > Regarding the proposal of a String ABC, I hope this isn't going to > become a backdoor to reintroduce the Python 2 madness of allowing > equivalency between text and bytes for *some* strings of bytes and not > others. > > For my part, what I want out of a string ABC is simply the ability to do > application-specific optimizations. > There are many applications where all input and output is text, but _must_ > be UTF-8. ?Even GTK uses UTF-8 as its native text representation, so > "output" could just be display. > Right now, in Python 3, the only way to be "correct" about this is to copy > every byte of input into 4 bytes of output, then copy each code point *back* > into a single byte of output. ?If all your application does is rewrite the > occasional XML attribute, for example, this cost can be significant, if not > overwhelming. > I'd like a version of 'decode' which would give me a type that was, in every > respect, unicode, and responded to all protocols exactly as other > unicode?objects?(or "str objects", if you prefer py3 nomenclature ;-)) do, > but wouldn't actually copy any of that memory unless it really needed to > (for example, to pass to a C API that expected native wide characters), and > that would hold on to the original bytes so that it could produce them on > demand if encoded to the same encoding again.?So, as others in this thread > have mentioned, the 'ABC' really implies some stuff about C APIs as well. > I'm not sure about the exact performance impact of such a class, which is > why I'd like the ability to implement it *outside* of the stdlib and see how > it works on a project, and return with a proposal along with some data. > ?There are also different ways to implement this, and other optimizations > (like ropes) which might be better. > You can almost do this today, but the lack of things like the hypothetical > "__rcontains__" does make it impossible to be totally transparent about it. But you'd still have to validate it, right? You wouldn't want to go on using what you thought was wrapped UTF-8 if it wasn't actually valid UTF-8 (or you'd be worse off than in Python 2). So you're really just worried about space consumption. I'd like to see a lot of hard memory profiling data before I got overly worried about that. -- --Guido van Rossum (python.org/~guido) From barry at python.org Fri Jun 25 23:03:22 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 17:03:22 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <4C2512A2.1040404@v.loewis.de> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> Message-ID: <20100625170322.5ece724f@heresy> On Jun 25, 2010, at 10:33 PM, Martin v. L?wis wrote: >Am 25.06.2010 18:18, schrieb Barry Warsaw: >> Benjamin is still planning to release Python 2.7 final on 2010-07-03, so it's >> time for me to work out the release schedule for Python 2.6.6 - likely the >> last maintenance release for Python 2.6. >> >> Because summer schedules are crazy, and I want to leave two weeks between >> 2.6.6 rc1 and 2.6.6 final, my current schedule looks like: >> >> * Python 2.6.6 rc 1 on Monday 2010-08-02 >> * Python 2.6.6 final on Monday 2010-08-16 > >That would barely work for me. If schedule slips in any way, we'll have >to move the release into end-of-September (but the days as proposed are >fine). Would that be bad or good (slipping into September)? I'd like to get a release out as soon after 2.7 final as possible, but it's an entirely self-imposed deadline. There's no reason why we can't push the whole 2.6.6 thing later if that works better for you. OTOH, I can't go much earlier so if September is bad for you, then we'll stick to the above dates. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From fuzzyman at voidspace.org.uk Fri Jun 25 23:06:00 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 25 Jun 2010 22:06:00 +0100 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251133.2090505@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> Message-ID: <4C251A38.3090205@voidspace.org.uk> On 25/06/2010 21:27, "Martin v. L?wis" wrote: >>>> I am extremely keen for this to happen. Does anyone have ownership of this >>>> project? There was some discussion of it up-list but the discussion fizzled. >>>> >>> Can you please explain what "this project" is, in the context of your >>> message? GSoC? GHOP? >>> >> Oh, I thought this was quite clear. I was specifically meaning the large >> "Python 2 or 3" button on python.org. It would help users who want to know >> what version of python to use if they had a clear guide as to what version >> to download. >> > Ah, ok. No, nobody has taken ownership of that project, and likely, > nobody actually will - unless you volunteer. > What page were we suggesting linking to? IIRC someone made a good start in the wiki. I'll move the discussion to pydotorg-www (still need the question about answering) and see if we can get it done. All the best, Michael > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From martin at v.loewis.de Fri Jun 25 23:14:53 2010 From: martin at v.loewis.de (=?windows-1252?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 23:14:53 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251A38.3090205@voidspace.org.uk> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> <4C251A38.3090205@voidspace.org.uk> Message-ID: <4C251C4D.50806@v.loewis.de> > What page were we suggesting linking to? I don't think anybody proposed anything specific. Steve Holden suggested it should go to "reasoned discussion of the pros and cons as evinced in this thread". Stephen Thorne didn't propose anything specific but to have a large button. > I'll move the discussion to pydotorg-www I'll predict that this is its death :-( Regards, Martin From martin at v.loewis.de Fri Jun 25 23:16:23 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 25 Jun 2010 23:16:23 +0200 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <20100625170322.5ece724f@heresy> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> <20100625170322.5ece724f@heresy> Message-ID: <4C251CA7.3070902@v.loewis.de> > Would that be bad or good (slipping into September)? I'd like to get a > release out as soon after 2.7 final as possible, but it's an entirely > self-imposed deadline. There's no reason why we can't push the whole 2.6.6 > thing later if that works better for you. OTOH, I can't go much earlier so if > September is bad for you, then we'll stick to the above dates. I think we can strive for your original proposal. If it slips, we let it slip by a month or two. Regards, Martin From fuzzyman at voidspace.org.uk Fri Jun 25 23:31:45 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 25 Jun 2010 22:31:45 +0100 Subject: [Python-Dev] Creating APIs that work as both decorators and context managers In-Reply-To: <4C24F6F7.4040200@voidspace.org.uk> References: <4C24F6F7.4040200@voidspace.org.uk> Message-ID: <4C252041.7000808@voidspace.org.uk> On 25/06/2010 19:35, Michael Foord wrote: > Hello all, > > I've put a recipe up on the Python cookbook for creating APIs that > work as both decorators and context managers and wonder if it would be > considered a useful addition to the functools module. > > http://code.activestate.com/recipes/577273-decorator-and-context-manager-from-a-single-api/ Actually contextlib would be a much more sensible home for it. Michael > > I wrote this after writing almost identical code the second time for > "patch" in the mock module. (The patch decorator can be used as a > decorator or as a context manager and I was writing a new variant.) > Both py.test and django have similar code in places, so it is not an > uncommon pattern. > > It is only 40 odd lines (ignore the ugly Python 2 & 3 compatibility > hack), so I'm fine with it living on the cookbook - but it is at least > slightly fiddly to write and has the added niceness of providing the > optional exception handling semantics of __exit__ for decorators as well. > > Example use (really hope email doesn't swallow the whitespace - my > apologies in advance if it does): > > from context import Context > > class mycontext(Context): > def __init__(self, *args): > """Normal initialiser""" > > def start(self): > """ > Called on entering the with block or starting the decorated > function. > > If used in a with statement whatever this method returns will > be the > context manager. > """ > > def finish(self, *exc): > """ > Called on exit. Arguments and return value of this method have > the same meaning as the __exit__ method of a normal context > manager. > """ > > @mycontext('some', 'args') > def function(): > pass > > with mycontext('some', 'args') as something: > pass > > I'm not entirely happy with the name of the class or the start and > finish methods, so open to suggestions there. start and finish *could* > be __enter__ and __exit__ - but that would make the class you > implement *look* like a normal context manager and I thought it was > better to distinguish them. Perhaps before and after? > > All the best, > > Michael Foord > -- > http://www.ironpythoninaction.com/ > http://www.voidspace.org.uk/blog > > READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Fri Jun 25 23:40:34 2010 From: glyph at twistedmatrix.com (Glyph Lefkowitz) Date: Fri, 25 Jun 2010 17:40:34 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote: > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried about space consumption. So, yes, I am mainly worried about memory consumption, but don't underestimate the pure CPU cost of doing all the copying. It's quite a bit faster to simply scan through a string than to scan and while you're scanning, keep faulting out the L2 cache while you're accessing some other area of memory to store the copy. Plus, If I am decoding with the surrogateescape error handler (or its effective equivalent), then no, I don't need to validate it in advance; interpretation can be done lazily as necessary. I realize that this is just GIGO, but I wouldn't be doing this on data that didn't have an explicitly declared or required encoding in the first place. > I'd like to see a lot of hard memory profiling data before I got overly worried about that. I know of several Python applications that are already constrained by memory. I don't have a lot of hard memory profiling data, but in an environment where you're spawning as many processes as you can in order to consume _all_ the physically available RAM for string processing, it stands to reason that properly decoding everything and thereby exploding everything out into 4x as much data (or 2x, if you're lucky) would result in a commensurate decrease in throughput. I don't think I could even reasonably _propose_ that such a project stop treating textual data as bytes, because there's no optimization strategy once that sort of architecture has been put into place. If your function says "this takes unicode", then you just have to bite the bullet and decode it, or rewrite it again to have a different requirement. So, right now, I don't know where I'd get the data with to make the argument in the first place :). If there were some abstraction in the core's treatment of strings, though, and I could decode things and note their encoding without immediately paying this cost (or alternately, paying the cost to see if it's so bad, but with the option of managing it or optimizing it separately). This is why I'm asking for a way for me to implement my own string type, and not for a change of behavior or an optimization in the stdlib itself: I could be wrong, I don't have a particularly high level of certainty in my performance estimates, but I think that my concerns are realistic enough that I don't want to embark on a big re-architecture of text-handling only to have it become a performance nightmare that needs to be reverted. As Robert Collins pointed out, they already have performance issues related to encoding in Bazaar. I know they've done a lot of profiling in that area, so I hope eventually someone from that project will show up with some data to demonstrate it :). And I've definitely heard many, many anecdotes (some of them in this thread) about people distorting their data structures in various ways to avoid paying decoding cost in the ASCII/latin1 case, whether it's *actually* a significant performance issue or not. I would very much like to tell those people "Just call .decode(), and if it turns out to actually be a performance issue, you can always deal with it later, with a custom string type." I'm confident that in *most* cases, it would not be. Anyway, this may be a serious issue, but I increasingly feel like I'm veering into python-ideas territory, so perhaps I'll just have to burn this bridge when I come to it. Hopefully after the moratorium. -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Fri Jun 25 23:53:06 2010 From: barry at python.org (Barry Warsaw) Date: Fri, 25 Jun 2010 17:53:06 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <4C251CA7.3070902@v.loewis.de> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> <20100625170322.5ece724f@heresy> <4C251CA7.3070902@v.loewis.de> Message-ID: <20100625175306.6fa9e1eb@heresy> On Jun 25, 2010, at 11:16 PM, Martin v. L?wis wrote: >> Would that be bad or good (slipping into September)? I'd like to get a >> release out as soon after 2.7 final as possible, but it's an entirely >> self-imposed deadline. There's no reason why we can't push the whole 2.6.6 >> thing later if that works better for you. OTOH, I can't go much earlier so if >> September is bad for you, then we'll stick to the above dates. > >I think we can strive for your original proposal. If it slips, we let it >slip by a month or two. Cool, thanks Martin. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From fuzzyman at voidspace.org.uk Fri Jun 25 23:53:29 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Fri, 25 Jun 2010 22:53:29 +0100 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251C4D.50806@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> <4C251A38.3090205@voidspace.org.uk> <4C251C4D.50806@v.loewis.de> Message-ID: <4C252559.5060800@voidspace.org.uk> On 25/06/2010 22:14, "Martin v. L?wis" wrote: >> What page were we suggesting linking to? >> > I don't think anybody proposed anything specific. Steve Holden > suggested it should go to "reasoned discussion of the > pros and cons as evinced in this thread". Stephen Thorne didn't > propose anything specific but to have a large button. > > Earlier in this discussion *someone* did start a page on the wiki, with this use case in mind... You forced me to actually look it up: http://wiki.python.org/moin/Python2orPython3 >> I'll move the discussion to pydotorg-www >> > I'll predict that this is its death :-( > Heh. Michael > Regards, > Martin > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From tseaver at palladion.com Sat Jun 26 00:12:10 2010 From: tseaver at palladion.com (Tres Seaver) Date: Fri, 25 Jun 2010 18:12:10 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guido van Rossum wrote: > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried about space consumption. I'd like to see a lot of hard memory > profiling data before I got overly worried about that. I do know for a fact that using a UCS2-compiled Python instead of the system's UCS4-compiled Python leads to measurable, noticable drop in memory consumption of long-running webserver processes using Unicode (Zope, repoze.bfg, etc). We routinely build Python from source for deployments precisely because of this fact (in part -- the absurd choices made by packagers to exclude crucial bits on various pltaforms is the other part). Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkwlKbQACgkQ+gerLs4ltQ4TfACdHgLXPHeGw42GidhQdzABkQaR +nEAoLE1sd+g1aJuxSn6swvvX0g52EU4 =MSwx -----END PGP SIGNATURE----- From ianb at colorstudy.com Sat Jun 26 00:26:20 2010 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 25 Jun 2010 17:26:20 -0500 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote: > On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz > > I'd like a version of 'decode' which would give me a type that was, in > every > > respect, unicode, and responded to all protocols exactly as other > > unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) > do, > > but wouldn't actually copy any of that memory unless it really needed to > > (for example, to pass to a C API that expected native wide characters), > and > > that would hold on to the original bytes so that it could produce them on > > demand if encoded to the same encoding again. So, as others in this > thread > > have mentioned, the 'ABC' really implies some stuff about C APIs as well. > > I'm not sure about the exact performance impact of such a class, which is > > why I'd like the ability to implement it *outside* of the stdlib and see > how > > it works on a project, and return with a proposal along with some data. > > There are also different ways to implement this, and other optimizations > > (like ropes) which might be better. > > You can almost do this today, but the lack of things like the > hypothetical > > "__rcontains__" does make it impossible to be totally transparent about > it. > > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). So you're really just > worried about space consumption. I'd like to see a lot of hard memory > profiling data before I got overly worried about that. > It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and found that using Python 2 strs in some cases provided notable advantages. I know Stefan copied ElementTree in this regard in lxml, maybe he also did a benchmark or knows of one? -- Ian Bicking | http://blog.ianbicking.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pje at telecommunity.com Sat Jun 26 00:27:04 2010 From: pje at telecommunity.com (P.J. Eby) Date: Fri, 25 Jun 2010 18:27:04 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100625222722.594D23A4099@sparrow.telecommunity.com> At 01:18 AM 6/26/2010 +0900, Stephen J. Turnbull wrote: >It seems to me what is wanted here is something like Perl's taint >mechanism, for *both* kinds of strings. Am I missing something? You could certainly view it as a kind of tainting. The part where the type would be bytes-based is indeed somewhat incidental to the actual use case -- it's just that if you already have the bytes, and all you want to do is tag them (e.g. the WSGI headers case), the extra encoding step seems pointless. A string coercion protocol (that would be used by .join(), .format(), __contains__, __mod__, etc.) would allow you to do whatever sort of tainted-string or tainted-bytes implementations one might wish to have. I suppose that tainting user inputs (as in Perl) would be just as useful of an application of the same coercion protocol. Actually, I have another use case for this custom string coercion, which is that I once wrote a string subclass whose purpose was to track the original file and line number of some text. Even though only my code was manipulating the strings, it was very difficult to get the tainting to work correctly without extreme care as to the string methods used. (For example, I had to use string addition rather than %-formatting.) >But with your architecture, it seems to me that you actually don't >want polymorphic functions in the stdlib. You want the stdlib >functions to be bytes-oriented if and only if they are reliable. (This >is what I was saying to Guido elsewhere.) I'm not sure I follow you. What I want is for the stdlib to create stringlike objects of a type determined by the types of the inputs -- where the logic for deciding this coercion can be controlled by the input objects' types, rather than putting this in the hands of the stdlib function. And of course, this applies to non-stdlib functions, too -- anything that simply manipulates user-defined string classes, should allow the user-defined classes to determine the coercion of the result. >BTW, this was a little unclear to me: > > > [Collisions will] be with other *unicode* strings. Ones coming > > from other code, and literals embedded in the stdlib. > >What about the literals in the stdlib? Are you saying they contain >invalid code points for your known output encoding? Or are you saying >that with non-polymorphic unicode stdlib, you get lots of false >positives when combining with your validated bytes? No, I mean that the current string coercion rules cause everything to be converted to unicode, thereby discarding the tainting information, so to speak. This applies equally to other tainting use cases, and other uses for custom stringlike objects. From steve at holdenweb.com Sat Jun 26 00:38:38 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:38:38 -0400 Subject: [Python-Dev] Signs of neglect? Message-ID: I was pretty stunned when I tried this. Remember that the Tools subdirectory is distributed with Windows, so this means we got through almost two releases without anyone realizing that 2to3 does not appear to have touched this code. Yes, I have: http://bugs.python.org/issue9083 When's 3.2 due out? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From janssen at parc.com Sat Jun 26 00:40:52 2010 From: janssen at parc.com (Bill Janssen) Date: Fri, 25 Jun 2010 15:40:52 PDT Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: <26215.1277505652@parc.com> Guido van Rossum wrote: > On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz > wrote: > > > > On Jun 24, 2010, at 4:59 PM, Guido van Rossum wrote: > > > > Regarding the proposal of a String ABC, I hope this isn't going to > > become a backdoor to reintroduce the Python 2 madness of allowing > > equivalency between text and bytes for *some* strings of bytes and not > > others. I never actually replied to this... Absolutely right, which is why you might really want another kind of string, rather than a way to treat some bytes values as strings in some places. Both Python 2 and Python 3 are missing one of the three types. Python 1 and 2 didn't have "bytes", and this caused problems because "str" was pressed into use to hold arbitrary byte sequences. (Python 2 "str" has other problems as well, like losing track of the encoding.) Python 3 doesn't have Python 2's "str" (encoded string), and bytes are being pressed into use for that. Each of these uses is an ad hoc hijack of an inappropriate type, and additional frameworks not directly supported by the Python language are being jury-rigged to try to support the uses. On the other hand, this is all in the eye of the beholder. Both byte sequences and strings are horrible formless things; they remind me of BLISS. You seldom really have a byte sequence; what you have is an XDR float or an encoded string or an IP header or an email message. Similarly for strings; they are really file names or city names or English sentences or URIs or other things with significant semantic constraints not captured by the typical type system. So, yes, there *is* an inescapable equivalency between text and bytes for *some* sequences of bytes (those that represent encoded strings) and not others (those that contain the XDR float, for instance). Creating a separate encoded string type would be one way to keep that straight. > > For my part, what I want out of a string ABC is simply the ability to do > > application-specific optimizations. > > There are many applications where all input and output is text, but _must_ > > be UTF-8. ?Even GTK uses UTF-8 as its native text representation, so > > "output" could just be display. > > Right now, in Python 3, the only way to be "correct" about this is to copy > > every byte of input into 4 bytes of output, then copy each code point *back* > > into a single byte of output. ?If all your application does is rewrite the > > occasional XML attribute, for example, this cost can be significant, if not > > overwhelming. > > I'd like a version of 'decode' which would give me a type that was, in every > > respect, unicode, and responded to all protocols exactly as other > > unicode?objects?(or "str objects", if you prefer py3 nomenclature ;-)) do, > > but wouldn't actually copy any of that memory unless it really needed to > > (for example, to pass to a C API that expected native wide characters), and > > that would hold on to the original bytes so that it could produce them on > > demand if encoded to the same encoding again.?So, as others in this thread > > have mentioned, the 'ABC' really implies some stuff about C APIs as well. Seems like it. > > I'm not sure about the exact performance impact of such a class, which is > > why I'd like the ability to implement it *outside* of the stdlib and see how > > it works on a project, and return with a proposal along with some data. Yes, exactly. > > ?There are also different ways to implement this, and other optimizations > > (like ropes) which might be better. > > You can almost do this today, but the lack of things like the hypothetical > > "__rcontains__" does make it impossible to be totally transparent about it. > > But you'd still have to validate it, right? You wouldn't want to go on > using what you thought was wrapped UTF-8 if it wasn't actually valid > UTF-8 (or you'd be worse off than in Python 2). Yes, but there are different ways to validate it that have different performance impacts. Simply trusting the source of the string, for example, would be appropriate in some cases. > So you're really just worried about space consumption. I'd like to see > a lot of hard memory profiling data before I got overly worried about > that. While I've seen some big Web pages, I think the email folks, who often have to process messages with attachments measuring in the tens of megabytes, have the stronger problems here, and I think speed may be more important than memory. I've built both a Web server and an IMAP server in Python, and the IMAP server is where the issues of storage management really prevail. If you have to convert a 20 MB encoded string into a Unicode string just to look at the headers as strings, you have issues. (The Python email package doesn't do that, by the way.) Bill From steve at holdenweb.com Sat Jun 26 00:51:53 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:51:53 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> Message-ID: Glyph Lefkowitz wrote: > > On Jun 25, 2010, at 5:02 PM, Guido van Rossum wrote: > >> But you'd still have to validate it, right? You wouldn't want to go on >> using what you thought was wrapped UTF-8 if it wasn't actually valid >> UTF-8 (or you'd be worse off than in Python 2). So you're really just >> worried about space consumption. > > So, yes, I am mainly worried about memory consumption, but don't > underestimate the pure CPU cost of doing all the copying. It's quite a > bit faster to simply scan through a string than to scan and while you're > scanning, keep faulting out the L2 cache while you're accessing some > other area of memory to store the copy. > Yes, but you are already talking about optimizations that might be significant for large-ish strings (where large-ish depends on exactly where Moore's Law is currently delivering computational performance) - the amount of cache consumed by a ten-byte string will slip by unnoticed, but at L2 levels megabytes would effectively flush the cache. > Plus, If I am decoding with the surrogateescape error handler (or its > effective equivalent), then no, I don't need to validate it in advance; > interpretation can be done lazily as necessary. I realize that this is > just GIGO, but I wouldn't be doing this on data that didn't have an > explicitly declared or required encoding in the first place. > >> I'd like to see a lot of hard memory profiling data before I got >> overly worried about that. > > I know of several Python applications that are already constrained by > memory. I don't have a lot of hard memory profiling data, but in an > environment where you're spawning as many processes as you can in order > to consume _all_ the physically available RAM for string processing, it > stands to reason that properly decoding everything and thereby exploding > everything out into 4x as much data (or 2x, if you're lucky) would > result in a commensurate decrease in throughput. > Yes, UCS-4's impact does seem like to could be horrible for these use cases. But "knowing of several Python applications that are already constrained by memory" doesn't mean that it's a bad general decision. Most users will never notice the difference, so we should try to accommodate those who do notice a difference without inconveniencing the rest too much. > I don't think I could even reasonably _propose_ that such a project stop > treating textual data as bytes, because there's no optimization strategy > once that sort of architecture has been put into place. If your function > says "this takes unicode", then you just have to bite the bullet and > decode it, or rewrite it again to have a different requirement. > That has always been my understanding. I regard it as a sort of intellectual tax on the United States (and its Western collaborators) for being too dim to realise that eventually they would end up selling computers to people with more than 256 characters in their alphabet). Sorry guys, but your computers are only as fast as you think they are when you only talk to each other. > So, right now, I don't know where I'd get the data with to make the > argument in the first place :). If there were some abstraction in the > core's treatment of strings, though, and I could decode things and note > their encoding without immediately paying this cost (or alternately, > paying the cost to see if it's so bad, but with the option of managing > it or optimizing it separately). This is why I'm asking for a way for > me to implement my own string type, and not for a change of behavior or > an optimization in the stdlib itself: I could be wrong, I don't have a > particularly high level of certainty in my performance estimates, but I > think that my concerns are realistic enough that I don't want to embark > on a big re-architecture of text-handling only to have it become a > performance nightmare that needs to be reverted. > Recent experience with the thoroughness of the Python 3 release preparations leads me to believe that *anything* new needs to prove its worth outside the stdlib for a while. > As Robert Collins pointed out, they already have performance issues > related to encoding in Bazaar. I know they've done a lot of profiling > in that area, so I hope eventually someone from that project will show > up with some data to demonstrate it :). And I've definitely heard many, > many anecdotes (some of them in this thread) about people distorting > their data structures in various ways to avoid paying decoding cost in > the ASCII/latin1 case, whether it's *actually* a significant performance > issue or not. I would very much like to tell those people "Just call > .decode(), and if it turns out to actually be a performance issue, you > can always deal with it later, with a custom string type." I'm > confident that in *most* cases, it would not be. > Well that would be a nice win. > Anyway, this may be a serious issue, but I increasingly feel like I'm > veering into python-ideas territory, so perhaps I'll just have to burn > this bridge when I come to it. Hopefully after the moratorium. > Sounds like it's worth pursuing, though. I mean after all, we don't want to leave *all* the bit-twiddling to the low-level language users ;-). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 00:57:10 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:57:10 -0400 Subject: [Python-Dev] docs - Copy In-Reply-To: <4C2511EA.3000200@v.loewis.de> References: <4C2511EA.3000200@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Am 25.06.2010 18:57, schrieb Terry Reedy: >> On 6/24/2010 8:51 PM, Rich Healey wrote: >>> http://docs.python.org/library/copy.html >> Discussion of the wording of current docs should go to python-list. >> Py-dev is for development of future Python. > > No no no. [...] It isn't always easy to tell, but I think Martin meant "no". regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 00:54:24 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:54:24 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <4C2512A2.1040404@v.loewis.de> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Am 25.06.2010 18:18, schrieb Barry Warsaw: >> Benjamin is still planning to release Python 2.7 final on 2010-07-03, so it's >> time for me to work out the release schedule for Python 2.6.6 - likely the >> last maintenance release for Python 2.6. >> >> Because summer schedules are crazy, and I want to leave two weeks between >> 2.6.6 rc1 and 2.6.6 final, my current schedule looks like: >> >> * Python 2.6.6 rc 1 on Monday 2010-08-02 >> * Python 2.6.6 final on Monday 2010-08-16 > > That would barely work for me. If schedule slips in any way, we'll have > to move the release into end-of-September (but the days as proposed are > fine). > > Regards, > Martin A six-week slippage wouldn't be good. What's the relevant chaos theory when a one- or two-day hold leads to a six-week delivery slippage? Let's hope things don't slip! regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 01:00:19 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 19:00:19 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251133.2090505@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> Message-ID: <4C253503.6080300@holdenweb.com> Martin v. L?wis wrote: >>>> I am extremely keen for this to happen. Does anyone have ownership of this >>>> project? There was some discussion of it up-list but the discussion fizzled. >>> Can you please explain what "this project" is, in the context of your >>> message? GSoC? GHOP? >> Oh, I thought this was quite clear. I was specifically meaning the large >> "Python 2 or 3" button on python.org. It would help users who want to know >> what version of python to use if they had a clear guide as to what version >> to download. > > Ah, ok. No, nobody has taken ownership of that project, and likely, > nobody actually will - unless you volunteer. > Or perhaps spur the pydotorg community on with some well-placed encouragement. Nobody ever seems to say "thanks" to those guys except the jobs posters - *they* seem pretty happy. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 00:55:16 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 18:55:16 -0400 Subject: [Python-Dev] Schedule for Python 2.6.6 In-Reply-To: <4C251CA7.3070902@v.loewis.de> References: <20100625121847.60331d9e@heresy> <4C2512A2.1040404@v.loewis.de> <20100625170322.5ece724f@heresy> <4C251CA7.3070902@v.loewis.de> Message-ID: Martin v. L?wis wrote: >> Would that be bad or good (slipping into September)? I'd like to get a >> release out as soon after 2.7 final as possible, but it's an entirely >> self-imposed deadline. There's no reason why we can't push the whole 2.6.6 >> thing later if that works better for you. OTOH, I can't go much earlier so if >> September is bad for you, then we'll stick to the above dates. > > I think we can strive for your original proposal. If it slips, we let it > slip by a month or two. > > Regards, > Martin I suppose for 2..6. it's not really critical. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Sat Jun 26 01:00:19 2010 From: steve at holdenweb.com (Steve Holden) Date: Fri, 25 Jun 2010 19:00:19 -0400 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251133.2090505@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> Message-ID: <4C253503.6080300@holdenweb.com> Martin v. L?wis wrote: >>>> I am extremely keen for this to happen. Does anyone have ownership of this >>>> project? There was some discussion of it up-list but the discussion fizzled. >>> Can you please explain what "this project" is, in the context of your >>> message? GSoC? GHOP? >> Oh, I thought this was quite clear. I was specifically meaning the large >> "Python 2 or 3" button on python.org. It would help users who want to know >> what version of python to use if they had a clear guide as to what version >> to download. > > Ah, ok. No, nobody has taken ownership of that project, and likely, > nobody actually will - unless you volunteer. > Or perhaps spur the pydotorg community on with some well-placed encouragement. Nobody ever seems to say "thanks" to those guys except the jobs posters - *they* seem pretty happy. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From benjamin at python.org Sat Jun 26 01:23:02 2010 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 25 Jun 2010 18:23:02 -0500 Subject: [Python-Dev] Signs of neglect? In-Reply-To: References: Message-ID: 2010/6/25 Steve Holden : > I was pretty stunned when I tried this. Remember that the Tools > subdirectory is distributed with Windows, so this means we got through > almost two releases without anyone realizing that 2to3 does not appear > to have touched this code. I would call it more a sign of no tests rather than one of neglect and perhaps also an indication of the usefulness of those tools. > > Yes, I have: http://bugs.python.org/issue9083 > > When's 3.2 due out? PEP 392. -- Regards, Benjamin From fijall at gmail.com Sat Jun 26 01:27:52 2010 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 25 Jun 2010 17:27:52 -0600 Subject: [Python-Dev] PyPy 1.3 released Message-ID: ======================= PyPy 1.3: Stabilization ======================= Hello. We're please to announce release of PyPy 1.3. This release has two major improvements. First of all, we stabilized the JIT compiler since 1.2 release, answered user issues, fixed bugs, and generally improved speed. We're also pleased to announce alpha support for loading CPython extension modules written in C. While the main purpose of this release is increased stability, this feature is in alpha stage and it is not yet suited for production environments. Highlights of this release ========================== * We introduced support for CPython extension modules written in C. As of now, this support is in alpha, and it's very unlikely unaltered C extensions will work out of the box, due to missing functions or refcounting details. The support is disable by default, so you have to do:: import cpyext before trying to import any .so file. Also, libraries are source-compatible and not binary-compatible. That means you need to recompile binaries, using for example:: python setup.py build Details may vary, depending on your build system. Make sure you include the above line at the beginning of setup.py or put it in your PYTHONSTARTUP. This is alpha feature. It'll likely segfault. You have been warned! * JIT bugfixes. A lot of bugs reported for the JIT have been fixed, and its stability greatly improved since 1.2 release. * Various small improvements have been added to the JIT code, as well as a great speedup of compiling time. Cheers, Maciej Fijalkowski, Armin Rigo, Alex Gaynor, Amaury Forgeot d'Arc and the PyPy team From ncoghlan at gmail.com Sat Jun 26 02:19:51 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 26 Jun 2010 10:19:51 +1000 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: On Sat, Jun 26, 2010 at 6:12 AM, James Y Knight wrote: > However, then you have to also consider python packages made up of multiple > distro packages -- like twisted or zope. Twisted includes some C extensions > in the core package. But then there are other twisted modules (installed > under a "twisted.foo" name) which do not include C extensions. If the base > twisted package is installed under a version-specific directory, then all of > the submodule packages need to also be installed under the same > version-specific directory (and thus built for all versions). > > In the past, it has proven somewhat tricky to coordinate which directory the > modules for package "foo" should be installed in, because you need to know > whether *any* of the related packages includes a native ".so" file, not just > the current package. > > The converse situation, where a base package did *not* get installed into a > version-specific directory because it includes no native code, but a > submodule *does* include a ".so" file, is even trickier. I think there are two major ways to tackle this: - allow multiple versions of a .so file within a single directory (i.e Barry's current suggestion) - enhanced namespace packages, allowing a single package to be spread across multiple directories, some of which may be Python version specific (i.e. modifications to PEP 382 to support references to version-specific directories) I think a new PEP is definitely in order, especially to explain why enhancing PEP 382 to support saying "look over here for the .so files for this version" isn't a preferable approach. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at thorne.id.au Sat Jun 26 02:41:34 2010 From: stephen at thorne.id.au (Stephen Thorne) Date: Sat, 26 Jun 2010 10:41:34 +1000 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <4C251C4D.50806@v.loewis.de> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> <4C251A38.3090205@voidspace.org.uk> <4C251C4D.50806@v.loewis.de> Message-ID: <20100626004134.GB16084@thorne.id.au> On 2010-06-25, "Martin v. L?wis" wrote: > > What page were we suggesting linking to? > > I don't think anybody proposed anything specific. Steve Holden > suggested it should go to "reasoned discussion of the > pros and cons as evinced in this thread". Stephen Thorne didn't > propose anything specific but to have a large button. I didn't propose anything, I heard a good idea that I'd like to see followed through. -- Regards, Stephen Thorne From martin at v.loewis.de Sat Jun 26 02:49:49 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 26 Jun 2010 02:49:49 +0200 Subject: [Python-Dev] "2 or 3" link on python.org In-Reply-To: <20100626004134.GB16084@thorne.id.au> References: <20100624232821.GB10805@thorne.id.au> <4C23F1AD.9040809@v.loewis.de> <20100625003149.GA16084@thorne.id.au> <4C251133.2090505@v.loewis.de> <4C251A38.3090205@voidspace.org.uk> <4C251C4D.50806@v.loewis.de> <20100626004134.GB16084@thorne.id.au> Message-ID: <4C254EAD.4060006@v.loewis.de> Am 26.06.2010 02:41, schrieb Stephen Thorne: > On 2010-06-25, "Martin v. L?wis" wrote: >>> What page were we suggesting linking to? >> >> I don't think anybody proposed anything specific. Steve Holden >> suggested it should go to "reasoned discussion of the >> pros and cons as evinced in this thread". Stephen Thorne didn't >> propose anything specific but to have a large button. > > I didn't propose anything, I heard a good idea that I'd like to see followed > through. Ah, ok. I thought "I am extremely keen for this to happen" indicated that you would be willing to volunteer time to make it happen. Regards, Martin From ncoghlan at gmail.com Sat Jun 26 04:59:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 26 Jun 2010 12:59:31 +1000 Subject: [Python-Dev] Signs of neglect? In-Reply-To: References: Message-ID: On Sat, Jun 26, 2010 at 9:23 AM, Benjamin Peterson wrote: > 2010/6/25 Steve Holden : > I would call it more a sign of no tests rather than one of neglect and > perhaps also an indication of the usefulness of those tools. Less than useful tools with no tests probably qualify as neglected... An assessment of the contents of the Py3k tools directory is probably in order, with at least a basic "will it run?" check added for those we decide to keep.. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Sat Jun 26 05:42:25 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jun 2010 12:42:25 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100625222722.594D23A4099@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> Message-ID: <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > it's just that if you already have the bytes, and all you want to > do is tag them (e.g. the WSGI headers case), the extra encoding > step seems pointless. Well, I'll have to concede that unless and until I get involved in the WSGI development effort. > >But with your architecture, it seems to me that you actually don't > >want polymorphic functions in the stdlib. You want the stdlib > >functions to be bytes-oriented if and only if they are reliable. (This > >is what I was saying to Guido elsewhere.) > > I'm not sure I follow you. What I'm saying here is that if bytes are the signal of validity, and the stdlib functions preserve validity, then it's better to have the stdlib functions object to unicode data as an argument. Compare the alternative: it returns a unicode object which might get passed around for a while before one of your functions receives it and identifies it as unvalidated data. But you agree that there are better mechanisms for validation (although not available in Python yet), so I don't see this as an potential obstacle to polymorphism now. > What I want is for the stdlib to create stringlike objects of a > type determined by the types of the inputs -- In general this is a hard problem, though. Polymorphism, OK, one-way tainting OK, but in general combining related types is pretty arbitrary, and as in the encoded-bytes case, the result type often varies depending on expectations of callers, not the types of the data. From greg.ewing at canterbury.ac.nz Sat Jun 26 09:58:17 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 26 Jun 2010 19:58:17 +1200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: <4C25B319.8040804@canterbury.ac.nz> Tres Seaver wrote: > I do know for a fact that using a UCS2-compiled Python instead of the > system's UCS4-compiled Python leads to measurable, noticable drop in > memory consumption of long-running webserver processes using Unicode Would there be any sanity in having an option to compile Python with UTF-8 as the internal string representation? -- Greg From stefan_ml at behnel.de Sat Jun 26 11:34:56 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 26 Jun 2010 11:34:56 +0200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> Message-ID: Ian Bicking, 26.06.2010 00:26: > On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote: >> On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz >>> I'd like a version of 'decode' which would give me a type that was, in >> every >>> respect, unicode, and responded to all protocols exactly as other >>> unicode objects (or "str objects", if you prefer py3 nomenclature ;-)) >> do, >>> but wouldn't actually copy any of that memory unless it really needed to >>> (for example, to pass to a C API that expected native wide characters), >> and >>> that would hold on to the original bytes so that it could produce them on >>> demand if encoded to the same encoding again. So, as others in this >> thread >>> have mentioned, the 'ABC' really implies some stuff about C APIs as well. Well, there's the buffer API, so you can already create something that refers to an existing C buffer. However, with respect to a string, you will have to make sure the underlying buffer doesn't get freed while the string is still in use. That will be hard and sometimes impossible to do at the C-API level, even if the string is allowed to keep a reference to something that holds the buffer. At least in lxml, such a feature would be completely worthless, as text is never held by any ref-counted Python wrapper object. It's only part of the XML tree, which is allowed to change at (more or less) any time, so the underlying char* buffer could just get freed without further notice. Adding a guard against that would likely have a larger impact on the performance than the decoding operations. >>> I'm not sure about the exact performance impact of such a class, which is >>> why I'd like the ability to implement it *outside* of the stdlib and see >> how >>> it works on a project, and return with a proposal along with some data. >>> There are also different ways to implement this, and other optimizations >>> (like ropes) which might be better. >>> You can almost do this today, but the lack of things like the >> hypothetical >>> "__rcontains__" does make it impossible to be totally transparent about >> it. >> >> But you'd still have to validate it, right? You wouldn't want to go on >> using what you thought was wrapped UTF-8 if it wasn't actually valid >> UTF-8 (or you'd be worse off than in Python 2). So you're really just >> worried about space consumption. I'd like to see a lot of hard memory >> profiling data before I got overly worried about that. > > It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically > benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and > found that using Python 2 strs in some cases provided notable advantages. I > know Stefan copied ElementTree in this regard in lxml, maybe he also did a > benchmark or knows of one? Actually, bytes vs. unicode doesn't make that a big difference in Py2 for lxml. ElementTree is a lot older, so I guess it made a larger difference when its code was written (and I even think I recall seeing numbers for lxml where it seemed to make a notable difference). In lxml, text content is stored in the C tree of libxml2 as UTF-8 encoded char* text. On request, lxml creates a string object from it and returns it. In Py2, it checks for plain ASCII content first and returns a byte string for that. Only non-ASCII strings are returned as decoded unicode strings. In Py3, it always returns unicode strings. When I run a little benchmark on lxml in Py2.6.5 that just reads some short text content from an Element object, I only see a tiny difference between unicode strings and byte strings. The gap obviously increases when the text gets longer, e.g. when I serialise the complete text content of an XML document to either a byte string or a unicode string. But even for documents in the megabyte range we are still talking about single milliseconds here, and the difference stays well below 10%. It's seriously hard to make that the performance bottleneck in an XML application. Also, since the string objects are only instantiated at request, memory isn't an issue either. That's different for (c)ElementTree again, where string content is stored as Python objects. Four times the size even for plain ASCII strings (e.g. numbers, IDs or even trailing whitespace!) can well become a problem there, and can easily dominate the overall size of the in-memory tree. Plain ASCII content is surprisingly common in XML documents. Stefan From stefan_ml at behnel.de Sat Jun 26 11:41:48 2010 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 26 Jun 2010 11:41:48 +0200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C25B319.8040804@canterbury.ac.nz> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> Message-ID: Greg Ewing, 26.06.2010 09:58: > Tres Seaver wrote: > >> I do know for a fact that using a UCS2-compiled Python instead of the >> system's UCS4-compiled Python leads to measurable, noticable drop in >> memory consumption of long-running webserver processes using Unicode > > Would there be any sanity in having an option to compile > Python with UTF-8 as the internal string representation? It would break Py_UNICODE, because the internal size of a unicode character would no longer be fixed. Stefan From steve at holdenweb.com Sat Jun 26 13:18:37 2010 From: steve at holdenweb.com (Steve Holden) Date: Sat, 26 Jun 2010 07:18:37 -0400 Subject: [Python-Dev] Signs of neglect? In-Reply-To: References: Message-ID: <4C25E20D.2040007@holdenweb.com> Nick Coghlan wrote: > On Sat, Jun 26, 2010 at 9:23 AM, Benjamin Peterson wrote: >> 2010/6/25 Steve Holden : >> I would call it more a sign of no tests rather than one of neglect and >> perhaps also an indication of the usefulness of those tools. > > Less than useful tools with no tests probably qualify as neglected... > > An assessment of the contents of the Py3k tools directory is probably > in order, with at least a basic "will it run?" check added for those > we decide to keep.. > Neither webchecker nor wcgui.py will run - the former breaks because sgmllib is mossing, the latter because it uses the wrong name for "tkinter" (but overcoming this will throw it bak to an sgmllib dependency too). Guido thinks it's OK to abandon at least some of them, so I don't see the rest getting much love in the future. They do need sorting through - I don't see anyone wanting xxci.py, for example ("check in files for which rcsdiff returns nonzero exit status"). But I'm grateful you agree with my diagnosis of neglect (not that a diagnosis in itself is going to help in fixing things). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From nagle at animats.com Sat Jun 26 08:11:49 2010 From: nagle at animats.com (John Nagle) Date: Fri, 25 Jun 2010 23:11:49 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL Message-ID: <4C259A25.1060705@animats.com> We have just released a proof-of-concept implementation of a new approach to thread management - "newthreading". It is available for download at https://sourceforge.net/projects/newthreading/ The user's guide is at http://www.animats.com/papers/languages/newthreadingintro.html This is a pure Python implementation of synchronized objects, along with a set of restrictions which make programs race-condition free, even without a Global Interpreter Lock. The basic idea is that classes derived from SynchronizedObject are automatically locked at entry and unlocked at exit. They're also unlocked when a thread blocks within the class. So at no time can two threads be active in such a class at one time. In addition, only "frozen" objects can be passed in and out of synchronized objects. (This is somewhat like the multiprocessing module, where you can only pass objects that can be "pickled". But it's not as restrictive; multiple threads can access the same synchronized object, one at a time. This pure Python implementation is usable, but does not improve performance. It's a proof of concept implementation so that programmers can try out synchronized classes and see what it's like to work within those restrictions. The semantics of Python don't change for single-thread programs. But when the program forks off the first new thread, the rules change, and some of the dynamic features of Python are disabled. Some of the ideas are borrowed from Java, and some are from "safethreading". The point is to come up with a set of liveable restrictions which would allow getting rid of the GIL. This is becoming essential as Unladen Swallow starts to work and the number of processors per machine keeps climbing. This may in time become a Python Enhancement Proposal. We'd like to get some experience with it first. Try it out and report back. The SourceForge forum for the project is the best place to report problems. John Nagle From arigo at tunes.org Sat Jun 26 10:34:57 2010 From: arigo at tunes.org (Armin Rigo) Date: Sat, 26 Jun 2010 10:34:57 +0200 Subject: [Python-Dev] [pypy-dev] PyPy 1.3 released In-Reply-To: References: Message-ID: <20100626083457.GA14816@code0.codespeak.net> Hi, On Fri, Jun 25, 2010 at 05:27:52PM -0600, Maciej Fijalkowski wrote: > python setup.py build As corrected on the blog (http://morepypy.blogspot.com/), this line should read: pypy setup.py build Armin. From fuzzyman at voidspace.org.uk Sat Jun 26 15:29:24 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 26 Jun 2010 14:29:24 +0100 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C259A25.1060705@animats.com> References: <4C259A25.1060705@animats.com> Message-ID: <4C2600B4.5020503@voidspace.org.uk> On 26/06/2010 07:11, John Nagle wrote: > We have just released a proof-of-concept implementation of a new > approach to thread management - "newthreading". It is available > for download at > > https://sourceforge.net/projects/newthreading/ > > The user's guide is at > > http://www.animats.com/papers/languages/newthreadingintro.html The user guide says: The suggested import is from newthreading import * The import * form is considered bad practise in *general* and should not be recommended unless there is a good reason. This is slightly off-topic for python-dev, although I appreciate that you want feedback with the eventual goal of producing a PEP - however the introduction of free-threading in Python has not been hampered by lack of synchronization primitives but by the difficulty of changing the interpreter without unduly impacting single threaded code. Providing an alternative garbage collection mechanism other than reference counting would be a more interesting first-step as far as I can see, as that removes the locking required around every access to an object (which currently touches the reference count). Introducing free-threading by *changing* the threading semantics (so you can't share non-frozen objects between threads) would not be acceptable. That comment is likely to be based on a misunderstanding of your future intentions though. :-) All the best, Michael Foord > > This is a pure Python implementation of synchronized objects, along > with a set of restrictions which make programs race-condition free, > even without a Global Interpreter Lock. The basic idea is that > classes derived from SynchronizedObject are automatically locked > at entry and unlocked at exit. They're also unlocked when a thread > blocks within the class. So at no time can two threads be active > in such a class at one time. > > In addition, only "frozen" objects can be passed in and out of > synchronized objects. (This is somewhat like the multiprocessing > module, where you can only pass objects that can be "pickled". > But it's not as restrictive; multiple threads can access the > same synchronized object, one at a time. > > This pure Python implementation is usable, but does not improve > performance. It's a proof of concept implementation so that > programmers can try out synchronized classes and see what it's > like to work within those restrictions. > > The semantics of Python don't change for single-thread programs. > But when the program forks off the first new thread, the rules > change, and some of the dynamic features of Python are disabled. > > Some of the ideas are borrowed from Java, and some are from > "safethreading". The point is to come up with a set of liveable > restrictions which would allow getting rid of the GIL. This > is becoming essential as Unladen Swallow starts to work and the > number of processors per machine keeps climbing. > > This may in time become a Python Enhancement Proposal. We'd like > to get some experience with it first. Try it out and report back. > The SourceForge forum for the project is the best place to report > problems. > > John Nagle > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From jnoller at gmail.com Sat Jun 26 16:28:50 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sat, 26 Jun 2010 10:28:50 -0400 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C2600B4.5020503@voidspace.org.uk> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> Message-ID: On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord wrote: > On 26/06/2010 07:11, John Nagle wrote: >> >> We have just released a proof-of-concept implementation of a new >> approach to thread management - "newthreading". It is available >> for download at >> >> https://sourceforge.net/projects/newthreading/ >> >> The user's guide is at >> >> http://www.animats.com/papers/languages/newthreadingintro.html > > The user guide says: > > The suggested import is > > from newthreading import * > > The import * form is considered bad practise in *general* and should not be > recommended unless there is a good reason. This is slightly off-topic for > python-dev, although I appreciate that you want feedback with the eventual > goal of producing a PEP - however the introduction of free-threading in > Python has not been hampered by lack of synchronization primitives but by > the difficulty of changing the interpreter without unduly impacting single > threaded code. > I asked John to drop a message here for this project - so feel free to flame me if anyone. This *is* relevant, and I'd guess fairly interesting to the group as a whole. jesse From solipsis at pitrou.net Sat Jun 26 16:34:12 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 26 Jun 2010 16:34:12 +0200 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> Message-ID: <20100626163412.25b68be6@pitrou.net> On Sat, 26 Jun 2010 14:29:24 +0100 Michael Foord wrote: > > the introduction of > free-threading in Python has not been hampered by lack of > synchronization primitives but by the difficulty of changing the > interpreter without unduly impacting single threaded code. Exactly what I think too. cheers Antoine. From jnoller at gmail.com Sat Jun 26 16:44:15 2010 From: jnoller at gmail.com (Jesse Noller) Date: Sat, 26 Jun 2010 10:44:15 -0400 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C2600B4.5020503@voidspace.org.uk> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> Message-ID: On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord wrote: > On 26/06/2010 07:11, John Nagle wrote: >> >> We have just released a proof-of-concept implementation of a new >> approach to thread management - "newthreading". It is available >> for download at >> >> https://sourceforge.net/projects/newthreading/ >> >> The user's guide is at >> >> http://www.animats.com/papers/languages/newthreadingintro.html > > The user guide says: > > The suggested import is > > from newthreading import * > > The import * form is considered bad practise in *general* and should not be > recommended unless there is a good reason. This is slightly off-topic for > python-dev, although I appreciate that you want feedback with the eventual > goal of producing a PEP - however the introduction of free-threading in > Python has not been hampered by lack of synchronization primitives but by > the difficulty of changing the interpreter without unduly impacting single > threaded code. > > Providing an alternative garbage collection mechanism other than reference > counting would be a more interesting first-step as far as I can see, as that > removes the locking required around every access to an object (which > currently touches the reference count). Introducing free-threading by > *changing* the threading semantics (so you can't share non-frozen objects > between threads) would not be acceptable. That comment is likely to be based > on a misunderstanding of your future intentions though. :-) > > All the best, > > Michael Foord I'd also like to point out, that one of the project John cites is Adam Olsen's Safethread work: http://code.google.com/p/python-safethread/ Which, in and of itself is a good read. From stephen at xemacs.org Sat Jun 26 19:24:50 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 27 Jun 2010 02:24:50 +0900 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C25B319.8040804@canterbury.ac.nz> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> Message-ID: <87d3vdn1ul.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Would there be any sanity in having an option to compile > Python with UTF-8 as the internal string representation? Losing Py_UNICODE as mentioned by Stefan Behnel (IIRC) is just the beginning of the pain. If Emacs's experience is any guide, the cost in speed and complexity of a variable-width internal representation is high. There are a number of tricks you can use, but basically everything becomes O(n) for the natural implementation of most operations (such as indexing by character). You can get around that with a position cache, of course, but that adds complexity, and really cuts into the space saving (and worse, adds another chunk that may or may not be paged in when you need it). What we're considering is a system where buffers come in 1-, 2-, and 4-octet widechars, with automatic translation depending on content. But the buffer is the primary random-access structure in Emacsen, so optimizing it is probably worth our effort. I doubt it would be worth it for Python, but my intuitions here are not reliable. From nagle at animats.com Sat Jun 26 18:39:19 2010 From: nagle at animats.com (John Nagle) Date: Sat, 26 Jun 2010 09:39:19 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> Message-ID: <4C262D37.7020807@animats.com> On 6/26/2010 7:44 AM, Jesse Noller wrote: > On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord > wrote: >> On 26/06/2010 07:11, John Nagle wrote: >>> >>> We have just released a proof-of-concept implementation of a new >>> approach to thread management - "newthreading". .... >> The import * form is considered bad practise in *general* and >> should not be recommended unless there is a good reason. I agree. I just did that to make the examples cleaner. >> however the introduction of free-threading in Python has not been >> hampered by lack of synchronization primitives but by the >> difficulty of changing the interpreter without unduly impacting >> single threaded code. That's what I'm trying to address here. >> Providing an alternative garbage collection mechanism other than >> reference counting would be a more interesting first-step as far as >> I can see, as that removes the locking required around every access >> to an object (which currently touches the reference count). >> Introducing free-threading by *changing* the threading semantics >> (so you can't share non-frozen objects between threads) would not >> be acceptable. That comment is likely to be based on a >> misunderstanding of your future intentions though. :-) This work comes out of a discussion a few of us had at a restaurant in Palo Alto after a Stanford talk by the group at Facebook which is building a JIT compiler for PHP. We were discussing how to make threading both safe for the average programmer and efficient. Javascript and PHP don't have threads at all; Python has safe threading, but it's slow. C/C++/Java all have race condition problems, of course. The Facebook guy pointed out that you can't redefine a function dynamically in PHP, and they get a performance win in their JIT by exploiting this. I haven't gone into the memory model in enough detail in the technical paper. The memory model I envision for this has three memory zones: 1. Shared fully-immutable objects: primarily strings, numbers, and tuples, all of whose elements are fully immutable. These can be shared without locking, and reclaimed by a concurrent garbage collector like Boehm's. They have no destructors, so finalization is not an issue. 2. Local objects. These are managed as at present, and require no locking. These can either be thread-local, or local to a synchronized object. There are no links between local objects under different "ownership". Whether each thread and object has its own private heap, or whether there's a common heap with locks at the allocator is an implementation decision. 3. Shared mutable objects: mostly synchronized objects, but also immutable objects like tuples which contain references to objects that aren't fully immutable. These are the high-overhead objects, and require locking during reference count updates, or atomic reference count operations if supported by the hardware. The general idea is to minimize the number of objects in this zone. The zone of an object is determined when the object is created, and never changes. This is relatively simple to implement. Tuples (and frozensets, frozendicts, etc.) are normally zone 2 objects. Only "freeze" creates collections in zones 1 and 3. Synchronized objects are always created in zone 3. There are no difficult handoffs, where an object that was previously thread-local now has to be shared and has to acquire locks during the transition. Existing interlinked data structures, like parse trees and GUIs, are by default zone 2 objects, with the same semantics as at present. They can be placed inside a SynchronizedObject if desired, which makes them usable from multiple threads. That's optional; they're thread-local otherwise. The rationale behind "freezing" some of the language semantics when the program goes multi-thread comes from two sources - Adam Olsen's Safethread work, and the acceptance of the multiprocessing module. Olsen tried to retain all the dynamism of the language in a multithreaded environment, but locking all the underlying dictionaries was a boat-anchor on the whole system, and slowed things down so much that he abandoned the project. The Unladen Swallow documentation indicates that early thinking on the project was that Olsen's approach would allow getting rid of the GIL, but later notes indicate that no path to a GIL-free JIT system is currently in development. The multiprocessing module provides semantics similar to threading with "freezing". Data passed between processes is "frozen" by pickling. Processes can't modify each other's code. Restrictive though the multiprocessing module is, it appears to be useful. It is sometimes recommended as the Pythonic approach to multi-core CPUs. This is an indication that "freezing" is not unacceptable to the user community. Most of the real-world use cases for extreme dynamism involve events that happen during startup. Configuration files are read, modules are selectively included, functions are overridden, tables of references to functions are set up, regular expressions are compiled, and the code is brought into the appropriately configured state. Then the worker threads are started and the real work starts. The "newthreading" approach allows all that. After two decades of failed attempts remove the Global Interpreter Lock without making performance worse, it is perhaps time to take a harder look at scaleable threading semantics. John Nagle Animats From pje at telecommunity.com Sat Jun 26 20:17:44 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 26 Jun 2010 14:17:44 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20100626181753.601473A4108@sparrow.telecommunity.com> At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote: >What I'm saying here is that if bytes are the signal of validity, and >the stdlib functions preserve validity, then it's better to have the >stdlib functions object to unicode data as an argument. Compare the >alternative: it returns a unicode object which might get passed around >for a while before one of your functions receives it and identifies it >as unvalidated data. I still don't follow, since passing in bytes should return bytes. Returning unicode would be an error, in the case of a "polymorphic" function (per Guido). >But you agree that there are better mechanisms for validation >(although not available in Python yet), so I don't see this as an >potential obstacle to polymorphism now. Nope. I'm just saying that, given two bytestrings to url-join or path join or whatever, a polymorph should hand back a bytestring. This seems pretty uncontroversial. > > What I want is for the stdlib to create stringlike objects of a > > type determined by the types of the inputs -- > >In general this is a hard problem, though. Polymorphism, OK, one-way >tainting OK, but in general combining related types is pretty >arbitrary, and as in the encoded-bytes case, the result type often >varies depending on expectations of callers, not the types of the >data. But the caller can enforce those expectations by passing in arguments whose types do what they want in such cases, as long as the string literals used by the function don't get to override the relevant parts of the string protocol(s). The idea that I'm proposing is that the basic string and byte types should defer to "user-defined" string types for mixed type operations, so that polymorphism of string-manipulation functions is the *default* case, rather than a *special* case. This makes tainting easier to implement, as well as optimizing and other special cases (like my "source string w/file and line info", or a string with font/formatting attributes). >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >http://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: >http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com From doko at ubuntu.com Sat Jun 26 22:06:30 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:06:30 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <4C265DC6.4080600@ubuntu.com> On 25.06.2010 22:12, James Y Knight wrote: > > On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: > >> On 6/24/2010 8:23 PM, James Y Knight wrote: >>> On Jun 24, 2010, at 5:53 PM, Scott Dial wrote: >>>> If the package has .so files that aren't compatible with other version >>>> of python, then what is the motivation for placing that in a shared >>>> location (since it can't actually be shared) >>> >>> Because python looks for .so files in the same place it looks for the >>> .py files of the same package. >> >> My suggestion was that a package that contains .so files should not be >> shared (e.g., the entire lxml package should be placed in a >> version-specific path). The motivation for this PEP was to simplify the >> installation python packages for distros; it was not to reduce the >> number of .py files on the disk. >> >> Placing .so files together does not simplify that install process in any >> way. You will still have to handle such packages in a special way. > > > This is a good point, but I think still falls short of a solution. For a > package like lxml, indeed you are correct. Since debian needs to build > it once per version, it could just put the entire package (.py files and > .so files) into a different per-python-version directory. This is what is currently done. This will increase the size of packages by duplicating the .py files, or you have to install the .py in a common location (irrelevant to sys.path), and provide (sym)links to the expected location. A "different per-python-version directory" also has the disadvantage that file conflicts between (distribution) packages cannot be detected. > However, then you have to also consider python packages made up of > multiple distro packages -- like twisted or zope. Twisted includes some > C extensions in the core package. But then there are other twisted > modules (installed under a "twisted.foo" name) which do not include C > extensions. If the base twisted package is installed under a > version-specific directory, then all of the submodule packages need to > also be installed under the same version-specific directory (and thus > built for all versions). > > In the past, it has proven somewhat tricky to coordinate which directory > the modules for package "foo" should be installed in, because you need > to know whether *any* of the related packages includes a native ".so" > file, not just the current package. > > The converse situation, where a base package did *not* get installed > into a version-specific directory because it includes no native code, > but a submodule *does* include a ".so" file, is even trickier. I don't think that installation into different locations based on the presence of extension will work. Should a location really change if an extension is added as an optimization? Splitting a (python) package into different installation locations should be avoided. Matthias From doko at ubuntu.com Sat Jun 26 22:14:54 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:14:54 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <4C265FBE.9070809@ubuntu.com> On 26.06.2010 02:19, Nick Coghlan wrote: > On Sat, Jun 26, 2010 at 6:12 AM, James Y Knight wrote: >> However, then you have to also consider python packages made up of multiple >> distro packages -- like twisted or zope. Twisted includes some C extensions >> in the core package. But then there are other twisted modules (installed >> under a "twisted.foo" name) which do not include C extensions. If the base >> twisted package is installed under a version-specific directory, then all of >> the submodule packages need to also be installed under the same >> version-specific directory (and thus built for all versions). >> >> In the past, it has proven somewhat tricky to coordinate which directory the >> modules for package "foo" should be installed in, because you need to know >> whether *any* of the related packages includes a native ".so" file, not just >> the current package. >> >> The converse situation, where a base package did *not* get installed into a >> version-specific directory because it includes no native code, but a >> submodule *does* include a ".so" file, is even trickier. > > I think there are two major ways to tackle this: > - allow multiple versions of a .so file within a single directory (i.e > Barry's current suggestion) we already do this, see the naming of the extensions of a python debug build on Windows. Several distributions (Debian, Fedora, Ubuntu) do use this as well to provide extensions for python debug builds. > - enhanced namespace packages, allowing a single package to be spread > across multiple directories, some of which may be Python version > specific (i.e. modifications to PEP 382 to support references to > version-specific directories) this is not what I want to use in a distribution. package management systems like rpm and dpkg do handle conflicts and replacements of files pretty well, having the same file in potentially different locations in the file system doesn't help detecting conflicts and duplicate packages. Matthias From doko at ubuntu.com Sat Jun 26 22:22:29 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:22:29 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100624164637.22fd9160@heresy> References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> <20100624142830.4c859faf@limelight.wooz.org> <20100624164637.22fd9160@heresy> Message-ID: <4C266185.7080509@ubuntu.com> On 24.06.2010 22:46, Barry Warsaw wrote: > On Jun 24, 2010, at 02:28 PM, Barry Warsaw wrote: > >> On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote: >> >>> 2010/6/24 Barry Warsaw: >>>> On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote: >>>> >>>>> 2010/6/24 Barry Warsaw: >>>>>> Please let me know what you think. I'm happy to just commit this to the >>>>>> py3k branch if there are no objections. I don't think a new PEP is >>>>>> in order, but an update to PEP 3147 might make sense. >>>>> >>>>> How will this interact with PEP 384 if that is implemented? >>>> I'm trying to come up with something that will work immediately while PEP 384 >>>> is being adopted. >>> >>> But how will modules specify that they support multiple ABIs then? >> >> I didn't understand, so asked Benjamin for clarification in IRC. >> >> barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports >> the stable abi, will it load it? [14:25] >> gutworth: thanks, now i get it :) [14:26] >> gutworth: i think it should, but it wouldn't under my scheme. let me >> think about it > > So, we could say that PEP 384 compliant extension modules would get written > without a version specifier. IOW, we'd treat foo.so as using the ABI. It > would then be up to the Python runtime to throw ImportErrors if in fact we > were loading a legacy, non-PEP 384 compliant extension. Is it realistic to never break the ABI? I would think of having the ABI encoded in the file name as well, and only bump the ABI if it does change. With the "versioned .so files" proposal an ABI bump is necessary with every python version, with PEP 384 the ABI bump will be decoupled from the python version. Matthias From doko at ubuntu.com Sat Jun 26 22:25:28 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:25:28 +0200 Subject: [Python-Dev] FHS compliance of Python installation In-Reply-To: <876318lynt.fsf_-_@benfinney.id.au> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <876318lynt.fsf_-_@benfinney.id.au> Message-ID: <4C266238.2020107@ubuntu.com> On 25.06.2010 02:54, Ben Finney wrote: > James Y Knight writes: > >> Really, python should store the .py files in /usr/share/python/, the >> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc >> files in /var/lib/python2.5- debug. But python doesn't work like that. > > +1 > > So who's going to draft the ?Filesystem Hierarchy Standard compliance? > PEP? :-) This has nothing to do with the FHS. The FHS talks about data, not code. From ctb at msu.edu Sat Jun 26 22:30:27 2010 From: ctb at msu.edu (C. Titus Brown) Date: Sat, 26 Jun 2010 13:30:27 -0700 Subject: [Python-Dev] FHS compliance of Python installation In-Reply-To: <4C266238.2020107@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <876318lynt.fsf_-_@benfinney.id.au> <4C266238.2020107@ubuntu.com> Message-ID: <20100626203024.GA19754@idyll.org> On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: > On 25.06.2010 02:54, Ben Finney wrote: >> James Y Knight writes: >> >>> Really, python should store the .py files in /usr/share/python/, the >>> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc >>> files in /var/lib/python2.5- debug. But python doesn't work like that. >> >> +1 >> >> So who's going to draft the ???Filesystem Hierarchy Standard compliance??? >> PEP? :-) > > This has nothing to do with the FHS. The FHS talks about data, not code. Really? It has some guidelines here for object files, etc., at least as of 2004. http://www.pathname.com/fhs/pub/fhs-2.3.html A quick scan suggests /usr/lib is the right place to look: http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA cheers, --titus -- C. Titus Brown, ctb at msu.edu From doko at ubuntu.com Sat Jun 26 22:35:40 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:35:40 +0200 Subject: [Python-Dev] FHS compliance of Python installation In-Reply-To: <20100626203024.GA19754@idyll.org> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <876318lynt.fsf_-_@benfinney.id.au> <4C266238.2020107@ubuntu.com> <20100626203024.GA19754@idyll.org> Message-ID: <4C26649C.1000507@ubuntu.com> On 26.06.2010 22:30, C. Titus Brown wrote: > On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: >> On 25.06.2010 02:54, Ben Finney wrote: >>> James Y Knight writes: >>> >>>> Really, python should store the .py files in /usr/share/python/, the >>>> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc >>>> files in /var/lib/python2.5- debug. But python doesn't work like that. >>> >>> +1 >>> >>> So who's going to draft the ???Filesystem Hierarchy Standard compliance??? >>> PEP? :-) >> >> This has nothing to do with the FHS. The FHS talks about data, not code. > > Really? It has some guidelines here for object files, etc., at least as > of 2004. > > http://www.pathname.com/fhs/pub/fhs-2.3.html > > A quick scan suggests /usr/lib is the right place to look: > > http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA agreed for object files, but http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA explicitely states "The /usr/share hierarchy is for all read-only architecture independent *data* files". From doko at ubuntu.com Sat Jun 26 22:45:54 2010 From: doko at ubuntu.com (Matthias Klose) Date: Sat, 26 Jun 2010 22:45:54 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <4C266702.4010102@ubuntu.com> On 25.06.2010 20:58, Brett Cannon wrote: > On Fri, Jun 25, 2010 at 01:53, Scott Dial >> Placing .so files together does not simplify that install process in any >> way. You will still have to handle such packages in a special way. You >> must still compile the package multiple times for each relevant version >> of python (with special tagging that I imagine distutils can take care >> of) and, worse yet, you have created a more trick install than merely >> having multiple search paths (e.g., installing/uninstalling lxml for >> *one* version of python is actually more difficult in this scheme). > > This is meant to be used by distros in a programmatic fashion, so my > response is "so what?" Their package management system is going to > maintain the directory, not a person. You and I are not going to be > using this for anything. This is purely meant for Linux OS vendors > (maybe OS X) to manage their installs through their package software. > I honestly do not expect human beings to be mucking around with these > installs (and I suspect Barry doesn't either). Placing files for a distribution in a version-independent path does help distributions handling file conflicts, detecting duplicates and with moving files between different (distribution) packages. Having non-conflicting extension names is a schema which already is used on some platforms (debug builds on Windows). The question for me is, if just a renaming of the .so files is acceptable for upstream, or if distributors should implement this on their own, as something like: if ext_path.startswith('/usr/') and not ext_path.startswith('/usr/local/'): load_ext('foo.2.6.so') else: load_ext('foo.so') I fear this will cause issues when e.g. virtualenv environments start copying parts from the system installation instead of symlinking it. Matthias From bugtrack at roumenpetrov.info Sat Jun 26 22:40:07 2010 From: bugtrack at roumenpetrov.info (Roumen Petrov) Date: Sat, 26 Jun 2010 23:40:07 +0300 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: Message-ID: <4C2665A7.6080601@roumenpetrov.info> Brett Cannon wrote: > I finally realized why clang has not been silencing its warnings about > unused return values: I have -Wno-unused-value set in CFLAGS which > comes before OPT (which defines -Wall) as set in PY_CFLAGS in > Makefile.pre.in. > > I could obviously set OPT in my environment, but that would override > the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, > but the README says that's for stuff that tweak binary compatibility. > > So basically what I am asking is what environment variable should I > use? If CFLAGS is correct then does anyone have any issues if I change > the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes > after OPT? It is not important to me as flags set to BASECFLAGS, CFLAGS, OPT or EXTRA_CFLAGS will set makefile macros CFLAGS and after distribution python distutil will use them to build extension modules. So all variable are equal for builds. Also after configure without OPT variable set we could check what script select for build platform and to rerun configure with OPT+own_flags set on command line (! ;) ) . Roumen From foom at fuhm.net Sat Jun 26 23:10:42 2010 From: foom at fuhm.net (James Y Knight) Date: Sat, 26 Jun 2010 17:10:42 -0400 Subject: [Python-Dev] FHS compliance of Python installation In-Reply-To: <4C26649C.1000507@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <876318lynt.fsf_-_@benfinney.id.au> <4C266238.2020107@ubuntu.com> <20100626203024.GA19754@idyll.org> <4C26649C.1000507@ubuntu.com> Message-ID: On Jun 26, 2010, at 4:35 PM, Matthias Klose wrote: > On 26.06.2010 22:30, C. Titus Brown wrote: >> On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote: >>> On 25.06.2010 02:54, Ben Finney wrote: >>>> James Y Knight writes: >>>> >>>>> Really, python should store the .py files in /usr/share/python/, >>>>> the >>>>> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and >>>>> the .pyc >>>>> files in /var/lib/python2.5- debug. But python doesn't work like >>>>> that. >>>> >>>> +1 >>>> >>>> So who's going to draft the ???Filesystem Hierarchy Standard >>>> compliance??? >>>> PEP? :-) >>> >>> This has nothing to do with the FHS. The FHS talks about data, >>> not code. >> >> Really? It has some guidelines here for object files, etc., at >> least as >> of 2004. >> >> http://www.pathname.com/fhs/pub/fhs-2.3.html >> >> A quick scan suggests /usr/lib is the right place to look: >> >> http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA > > agreed for object files, but > http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA > explicitely states "The /usr/share hierarchy is for all read-only > architecture independent *data* files". I always figured the "read-only architecture independent" bit was the important part there, and "code is data". Emacs's el files go into / usr/share/emacs, for instance. James From tjreedy at udel.edu Sun Jun 27 00:11:03 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 26 Jun 2010 18:11:03 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> Message-ID: The several posts in this and other threads go me to think about text versus number computing (which I am more familiar with). For numbers, we have in Python three builtins, the general purpose ints and floats and the more specialized complex. Two other rational types can be imported for specialized uses. And then there are 3rd-party libraries like mpz and numpy with more number and array of number types. What makes these all potentially work together is the special method system, including, in particular, the rather complete set of __rxxx__ number methods. The latter allow non-commutative operations to be mixed either way and ease mixed commutative operations. For text, we have general purpose str and encoded bytes (and bytearry). I think these are sufficient for general use and I am not sure there should even be anything else in the stdlib. But I think it should be possible to experiment with and use specialized 3rd-party text classes just as one can with number classes. I can imagine that inter-operation, when appropriate, might work better with addition of a couple of missing __rxxx__ methods, such as the mentioned __rcontains__. Although adding such would affect the implementation of a core syntax feature, it would not affect syntax as such as seen by the user. -- Terry Jan Reedy From brett at python.org Sun Jun 27 00:30:43 2010 From: brett at python.org (Brett Cannon) Date: Sat, 26 Jun 2010 15:30:43 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: Message-ID: On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: > I finally realized why clang has not been silencing its warnings about > unused return values: I have -Wno-unused-value set in CFLAGS which > comes before OPT (which defines -Wall) as set in PY_CFLAGS in > Makefile.pre.in. > > I could obviously set OPT in my environment, but that would override > the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, > but the README says that's for stuff that tweak binary compatibility. > > So basically what I am asking is what environment variable should I > use? If CFLAGS is correct then does anyone have any issues if I change > the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes > after OPT? > Since no one objected I swapped the order in r82259. In case anyone else uses clang to compile Python, this means that -Wno-unused-value will now work to silence the warning about unused return values that is caused by some macros. Probably using -Wno-empty-body is also good to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. From scott+python-dev at scottdial.com Sun Jun 27 00:50:27 2010 From: scott+python-dev at scottdial.com (Scott Dial) Date: Sat, 26 Jun 2010 18:50:27 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C265DC6.4080600@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> <4C265DC6.4080600@ubuntu.com> Message-ID: <4C268433.30405@scottdial.com> On 6/26/2010 4:06 PM, Matthias Klose wrote: > On 25.06.2010 22:12, James Y Knight wrote: >> On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: >>> Placing .so files together does not simplify that install process in any >>> way. You will still have to handle such packages in a special way. >> >> This is a good point, but I think still falls short of a solution. For a >> package like lxml, indeed you are correct. Since debian needs to build >> it once per version, it could just put the entire package (.py files and >> .so files) into a different per-python-version directory. > > This is what is currently done. This will increase the size of packages > by duplicating the .py files, or you have to install the .py in a common > location (irrelevant to sys.path), and provide (sym)links to the > expected location. "This is what is currently done" and "provide (sym)links to the expected location" are conflicting statements. If you are symlinking .py files from a shared location, then that is not the same as "just install the package into a version-specific location". What motivation is there for preferring symlinks? Who cares if a ditro package install yields duplicate .py files? Nor am I motivated by having to carry duplicate .py files in a distribution package (I imagine the compression of duplicate .py files is amazing). > A "different per-python-version directory" also has the disadvantage > that file conflicts between (distribution) packages cannot be detected. Why? That sounds like a broken tool, maybe I am naive, please explain. If two packages install /usr/lib/python2.6/foo.so that should be just as detectable two installing /usr/lib/python-shared/foo.cpython-26.so If you *must* compile .so files for every supported version of python at packaging time, then you are already saying the set of python versions is known. I fail to see the difference between a package that installs .py and .so files into many directories than having many .so files in a single directory; except that many directories *already* works. The only gain I can see is that you save duplicate .py files in the package and on the filesystem, and I don't feel that gain alone warrants this fundamental change. I would appreciate a proper explanation of why/how a single directory is better for your distribution. Also, I haven't heard anyone that wasn't using debian tools chime in with support for any of this, so I would like to know how this can help RPMs and ebuilds and the like. > I don't think that installation into different locations based on the > presence of extension will work. Should a location really change if an > extension is added as an optimization? Splitting a (python) package > into different installation locations should be avoided. I'm not sure why changing paths would matter; any package that writes data in its install location would be considered broken by your distro already, so what harm is there in having the packaging tool move it later? Your tool will remove the old path and place it in a new path. All of these shenanigans seem to manifest from your distro's python-support/-central design, which seems to be entirely motivated by reducing duplicate files and *not* simplifying the packaging. While this plan works rather well with .py files, the devil is in the details. I don't think Python should be getting involved in what I believe is a flawed design. What happens to the distro packaging if a python package splits the codebase between 2.x and 3.x (meaning they have distinct .py files)? As someone else mentioned, how is virtualenv going to interact with packages that install like this? -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From mal at egenix.com Sun Jun 27 01:37:02 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 27 Jun 2010 01:37:02 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: Message-ID: <4C268F1E.5070506@egenix.com> Brett Cannon wrote: > On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >> I finally realized why clang has not been silencing its warnings about >> unused return values: I have -Wno-unused-value set in CFLAGS which >> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >> Makefile.pre.in. >> >> I could obviously set OPT in my environment, but that would override >> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >> but the README says that's for stuff that tweak binary compatibility. >> >> So basically what I am asking is what environment variable should I >> use? If CFLAGS is correct then does anyone have any issues if I change >> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >> after OPT? >> > > Since no one objected I swapped the order in r82259. In case anyone > else uses clang to compile Python, this means that -Wno-unused-value > will now work to silence the warning about unused return values that > is caused by some macros. Probably using -Wno-empty-body is also good > to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. I think you need to come up with a different solution and revert the change... OPT has historically been the only variable to use for adjusting the Python C compiler settings. As the name implies this was usually used to adjust the optimizer settings, including raising the optimization level from the default or disabling it. With your change CFLAGS will always override OPT and thus any optimization definitions made in OPT will no longer have an effect. Note that CFLAGS defines -O2 on many platforms. In your particular case, you should try setting OPT to "... -Wno-unused-value ..." (ie. replace -Wall with your setting). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 27 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 21 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Sun Jun 27 02:13:20 2010 From: brett at python.org (Brett Cannon) Date: Sat, 26 Jun 2010 17:13:20 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C268F1E.5070506@egenix.com> References: <4C268F1E.5070506@egenix.com> Message-ID: On Sat, Jun 26, 2010 at 16:37, M.-A. Lemburg wrote: > Brett Cannon wrote: >> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >>> I finally realized why clang has not been silencing its warnings about >>> unused return values: I have -Wno-unused-value set in CFLAGS which >>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >>> Makefile.pre.in. >>> >>> I could obviously set OPT in my environment, but that would override >>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >>> but the README says that's for stuff that tweak binary compatibility. >>> >>> So basically what I am asking is what environment variable should I >>> use? If CFLAGS is correct then does anyone have any issues if I change >>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >>> after OPT? >>> >> >> Since no one objected I swapped the order in r82259. In case anyone >> else uses clang to compile Python, this means that -Wno-unused-value >> will now work to silence the warning about unused return values that >> is caused by some macros. Probably using -Wno-empty-body is also good >> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. > > I think you need to come up with a different solution and revert > the change... > > OPT has historically been the only variable to use for > adjusting the Python C compiler settings. Just found the relevant section in the README. > > As the name implies this was usually used to adjust the > optimizer settings, including raising the optimization level > from the default or disabling it. It meant optional to me, not optimization. I hate abbreviations sometimes. > > With your change CFLAGS will always override OPT and thus > any optimization definitions made in OPT will no longer > have an effect. That was the point; OPT defines defaults through configure.in and I simply wanted to add to those instead of having OPT completely overwritten by me. > > Note that CFLAGS defines -O2 on many platforms. So then wouldn't that mean they want that to be the optimization level? Or is the historical reason that default exists is so that some default exists but to expect the application to override as desired? > > In your particular case, you should try setting OPT to > "... -Wno-unused-value ..." (ie. replace -Wall with your > setting). So what is CFLAGS for then? ``configure -h`` says it's for "C compiler flags"; that's extremely ambiguous. And it doesn't help that OPT is not mentioned by ``configure -h`` as that is what I have always gone by to know what flags are available for compilation. -Brett > > -- > Marc-Andre Lemburg > eGenix.com > > Professional Python Services directly from the Source ?(#1, Jun 27 2010) >>>> Python/Zope Consulting and Support ... ? ? ? ?http://www.egenix.com/ >>>> mxODBC.Zope.Database.Adapter ... ? ? ? ? ? ? http://zope.egenix.com/ >>>> mxODBC, mxDateTime, mxTextTools ... ? ? ? ?http://python.egenix.com/ > ________________________________________________________________________ > 2010-07-19: EuroPython 2010, Birmingham, UK ? ? ? ? ? ? ? ?21 days to go > > ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: > > > ? eGenix.com Software, Skills and Services GmbH ?Pastor-Loeh-Str.48 > ? ?D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg > ? ? ? ? ? Registered at Amtsgericht Duesseldorf: HRB 46611 > ? ? ? ? ? ? ? http://www.egenix.com/company/contact/ > From ncoghlan at gmail.com Sun Jun 27 04:43:23 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Jun 2010 12:43:23 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100626181753.601473A4108@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> Message-ID: On Sun, Jun 27, 2010 at 4:17 AM, P.J. Eby wrote: > The idea that I'm proposing is that the basic string and byte types should > defer to "user-defined" string types for mixed type operations, so that > polymorphism of string-manipulation functions is the *default* case, rather > than a *special* case. ?This makes tainting easier to implement, as well as > optimizing and other special cases (like my "source string w/file and line > info", or a string with font/formatting attributes). Rather than building this into the base string type, perhaps it would be better (at least initially) to add in a polymorphic str subtype that worked along the following lines: 1. Has an encoded argument in the constructor (e.g. poly_str("/", encoded=b"/") 2. If given objects with an encode() method, assumes they're strings and uses its own parent class methods 3. If given objects with a decode() method, assumes they're encoded and delegates to the encoded attribute str/bytes agnostic functions would need to invoke poly_str deliberately, while bytes-only and text-only algorithms could just use the appropriate literals. Third party types would be supported to some degree (by having either encode or decode methods), although they could still run into trouble with some operations (While full support for third party strings and byte sequence implementations is an interesting idea, I think it's overkill for the specific problem of making it easier to write str/bytes agnostic functions for tasks like URL parsing). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jun 27 04:59:07 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Jun 2010 12:59:07 +1000 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <51EFE211-DBCA-497E-9BC5-CC0D2256173E@twistedmatrix.com> Message-ID: On Sun, Jun 27, 2010 at 8:11 AM, Terry Reedy wrote: > I can imagine that inter-operation, when appropriate, might work better with > addition of a couple of ?missing __rxxx__ methods, such as the mentioned > __rcontains__. Although adding such would affect the implementation of a > core syntax feature, it would not affect syntax as such as seen by the user. The problem with strings isn't really the binary operations like __contains__ - adding __rcontains__ would be a fairly simple extrapolation of the existing approaches. Where it gets really messy for strings is the fact that whereas invoking named methods directly on numbers is rare, invoking them on strings is very common, and some of those methods (e.g. split(), join(), __mod__()) allow or require an iterable rather than a single object. This extends the range of use cases to be covered beyond those with syntactic support to potentially include all string methods that take arguments. Creating minimally surprising semantics for the methods which accept iterables is also rather challenging. It's an interesting idea, but I think it's overkill for the specific problem of making it easier to perform more text-like manipulations in a bytes-only domain. Cheers, NIck. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pje at telecommunity.com Sun Jun 27 05:49:11 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sat, 26 Jun 2010 23:49:11 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> Message-ID: <20100627034922.31A663A4108@sparrow.telecommunity.com> At 12:43 PM 6/27/2010 +1000, Nick Coghlan wrote: >While full support for third party strings and >byte sequence implementations is an interesting idea, I think it's >overkill for the specific problem of making it easier to write >str/bytes agnostic functions for tasks like URL parsing. OTOH, to write your partial implementation is almost as complex - it still must take into account joining and formatting, and so by that point, you've just proposed a new protocol for coercion... so why not just make the coercion protocol explicit in the first place, rather than hardwiring a third type's worth of special cases? Remember, bytes and strings already have to detect mixed-type operations. If there was an API for that, then the hardcoded special cases would just be replaced, or supplemented with type slot checks and calls after the special cases. To put it another way, if you already have two types special-casing their interactions with each other, then rather than add a *third* type to that mix, maybe it's time to have a protocol instead, so that the types that care can do the special-casing themselves, and you generalize to N user types. (Btw, those who are saying that the resulting potential for N*N interaction makes the feature unworkable seem to be overlooking metaclasses and custom numeric types -- two Python features that in principle have the exact same problem, when you use them beyond a certain scope. At least with those features, though, you can generally mix your user-defined metaclasses or numeric types with the Python-supplied basic ones and call arbitrary Python functions on them, without as much heartbreak as you'll get with a from-scratch stringlike object.) All that having been said, a new protocol probably falls under the heading of the language moratorium, unless it can be considered "new methods on builtins"? (But that seems like a stretch even to me.) I just hate the idea that functions taking strings should have to be *rewritten* to be explicitly type-agnostic. It seems *so* un-Pythonic... like if all the bitmasking functions you'd ever written using 32-bit int constants had to be rewritten just because we added longs to the language, and you had to upcast them to be compatible or something. Sounds too much like C or Java or some other non-Python language, where dynamism and polymorphy are the special case, instead of the general rule. From jyasskin at gmail.com Sun Jun 27 07:46:24 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sat, 26 Jun 2010 22:46:24 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C268F1E.5070506@egenix.com> References: <4C268F1E.5070506@egenix.com> Message-ID: On Sat, Jun 26, 2010 at 4:37 PM, M.-A. Lemburg wrote: > Brett Cannon wrote: >> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >>> I finally realized why clang has not been silencing its warnings about >>> unused return values: I have -Wno-unused-value set in CFLAGS which >>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >>> Makefile.pre.in. >>> >>> I could obviously set OPT in my environment, but that would override >>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >>> but the README says that's for stuff that tweak binary compatibility. >>> >>> So basically what I am asking is what environment variable should I >>> use? If CFLAGS is correct then does anyone have any issues if I change >>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >>> after OPT? >>> >> >> Since no one objected I swapped the order in r82259. In case anyone >> else uses clang to compile Python, this means that -Wno-unused-value >> will now work to silence the warning about unused return values that >> is caused by some macros. Probably using -Wno-empty-body is also good >> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. > > I think you need to come up with a different solution and revert > the change... > > OPT has historically been the only variable to use for > adjusting the Python C compiler settings. > > As the name implies this was usually used to adjust the > optimizer settings, including raising the optimization level > from the default or disabling it. > > With your change CFLAGS will always override OPT and thus > any optimization definitions made in OPT will no longer > have an effect. > > Note that CFLAGS defines -O2 on many platforms. > > In your particular case, you should try setting OPT to > "... -Wno-unused-value ..." (ie. replace -Wall with your > setting). The python configure environment variables are really confused. If OPT is intended to be user-overridden for optimization settings, it shouldn't be used to set -Wall and -Wstrict-prototypes. If it's intended to set warning options, it shouldn't also set optimization options. Setting the user-visible customization option on the configure command line shouldn't stomp unrelated defaults. In configure-based systems, CFLAGS is traditionally (http://sources.redhat.com/automake/automake.html#Flag-Variables-Ordering) the way to tack options onto the end of the command line. Python breaks this by threading flags through CFLAGS in the makefile, which means they all get stomped if the user sets CFLAGS on the make command line. We should instead use another spelling ("CFlags"?) for the internal variable, and append $(CFLAGS) to it. AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). If Python's configure.in sets an otherwise-empty CFLAGS to -g before calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just preserve the users CFLAGS setting across AC_PROG_CC regardless of whether it's set, to let the user set CFLAGS on the configure line without stomping any defaults. From ncoghlan at gmail.com Sun Jun 27 07:53:59 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Jun 2010 15:53:59 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100627034922.31A663A4108@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> Message-ID: On Sun, Jun 27, 2010 at 1:49 PM, P.J. Eby wrote: > I just hate the idea that functions taking strings should have to be > *rewritten* to be explicitly type-agnostic. ?It seems *so* un-Pythonic... > ?like if all the bitmasking functions you'd ever written using 32-bit int > constants had to be rewritten just because we added longs to the language, > and you had to upcast them to be compatible or something. ?Sounds too much > like C or Java or some other non-Python language, where dynamism and > polymorphy are the special case, instead of the general rule. The difference is that we have three classes of algorithm here: - those that work only on octet sequences - those that work only on character sequences - those that can work on either Python 2 lumped all 3 classes of algorithm together through the multi-purpose 8-bit str type. The unicode type provided some scope to separate out the second category, but the divisions were rather blurry. Python 3 forces the first two to be separated by using either octets (bytes/bytearray) or characters (str). There are a *very small* number of APIs where it is appropriate to be polymorphic, but this is currently difficult due to the need to supply literals of the appropriate type for the objects being operated on. This isn't ever going to happen automagically due to the need to explicitly provide two literals (one for octet sequences, one for character sequences). The virtues of a separate poly_str type are that: 1. It can be simple and implemented in Python, dispatching to str or bytes as appropriate (probably in the strings module) 2. No chance of impacting the performance of the core interpreter (as builtins are not affected) 3. Lower impact if it turns out to have been a bad idea We could talk about this even longer, but the most effective way forward is going to be a patch that improves the URL parsing situation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sun Jun 27 11:10:59 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 27 Jun 2010 11:10:59 +0200 Subject: [Python-Dev] bytes / unicode References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> Message-ID: <20100627111059.49cdb698@pitrou.net> On Sat, 26 Jun 2010 23:49:11 -0400 "P.J. Eby" wrote: > > Remember, bytes and strings already have to detect mixed-type > operations. Not in Python 3. They just raise a TypeError on bad ("mixed-type") arguments. Regards Antoine. From greg.ewing at canterbury.ac.nz Sun Jun 27 11:48:22 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 27 Jun 2010 21:48:22 +1200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> Message-ID: <4C271E66.5050902@canterbury.ac.nz> Stefan Behnel wrote: > Greg Ewing, 26.06.2010 09:58: > >> Would there be any sanity in having an option to compile >> Python with UTF-8 as the internal string representation? > > It would break Py_UNICODE, because the internal size of a unicode > character would no longer be fixed. It's not fixed anyway with the 2-char build -- some characters are represented using a pair of surrogates. -- Greg From g.brandl at gmx.net Sun Jun 27 11:41:56 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 27 Jun 2010 11:41:56 +0200 Subject: [Python-Dev] Adopt A Demo [was: Signs of neglect?] In-Reply-To: References: Message-ID: Am 26.06.2010 00:38, schrieb Steve Holden: > I was pretty stunned when I tried this. Remember that the Tools > subdirectory is distributed with Windows, so this means we got through > almost two releases without anyone realizing that 2to3 does not appear > to have touched this code. > > Yes, I have: http://bugs.python.org/issue9083 > > When's 3.2 due out? The alpha stage is beginning next week; still enough time to fix the Tools and Demos. I can do some of the work, however, if I have to do it all, I'll just throw out the majority of that stuff. So -- if every dev "adopted" a Tool or Demo, that would be quite a manageable piece of work, and maybe a few demos can be brought up to scratch instead of be deleted. I'll go ahead and promise to care for the "Demo/classes" subdir. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sun Jun 27 11:44:31 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 27 Jun 2010 11:44:31 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: Am 22.06.2010 01:01, schrieb Terry Reedy: > On 6/21/2010 3:59 PM, Steve Holden wrote: >> Terry Reedy wrote: >>> On 6/21/2010 8:33 AM, Nick Coghlan wrote: >>> >>>> P.S. (We're going to have a tough decision to make somewhere along the >>>> line where docs.python.org is concerned, too - when do we flick the >>>> switch and make a 3.x version of the docs the default? >>> >>> Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'. >>> Trunk released always take over docs.python.org. To do otherwise would >>> be to say that 3.2 is not a real trunk release and not yet ready for >>> real use -- a major slam. >>> >>> Actually, I thought this was already discussed and decided ;-). >>> >> This also gives the 2.7 release it's day in the sun before relegation to >> maintenance status. > > Every new version (except 3.0 and 3.1) has gone to maintenance status > *and* becomes the featured release on docs.python.org the day it was > released. 2.7 would just spend less time as the featured release on > that page. I'm not sure 3.2 should take over in December just yet. (There's also docs3.python.org that always lands at the latest 3.x documentation). However, there will be enough time to discuss this when 3.2 is actually about to be released. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From dickinsm at gmail.com Sun Jun 27 11:57:08 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jun 2010 10:57:08 +0100 Subject: [Python-Dev] Adopt A Demo [was: Signs of neglect?] In-Reply-To: References: Message-ID: On Sun, Jun 27, 2010 at 10:41 AM, Georg Brandl wrote: > So -- if every dev "adopted" a Tool or Demo, that would be quite a > manageable piece of work, and maybe a few demos can be brought up > to scratch instead of be deleted. > > I'll go ahead and promise to care for the "Demo/classes" subdir. Bagsy the Demo/parser subdirectory. Fixing up unparse.py looks like it could be fun. Mark From eric at trueblade.com Sun Jun 27 12:53:00 2010 From: eric at trueblade.com (Eric Smith) Date: Sun, 27 Jun 2010 06:53:00 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C271E66.5050902@canterbury.ac.nz> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> <4C271E66.5050902@canterbury.ac.nz> Message-ID: <4C272D8C.6010406@trueblade.com> On 6/27/2010 5:48 AM, Greg Ewing wrote: > Stefan Behnel wrote: >> Greg Ewing, 26.06.2010 09:58: >> >>> Would there be any sanity in having an option to compile >>> Python with UTF-8 as the internal string representation? >> >> It would break Py_UNICODE, because the internal size of a unicode >> character would no longer be fixed. > > It's not fixed anyway with the 2-char build -- some > characters are represented using a pair of surrogates. > But isn't this currently ignored everywhere in python's code? Eric. From stephen at xemacs.org Sun Jun 27 16:03:06 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 27 Jun 2010 23:03:06 +0900 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100626181753.601473A4108@sparrow.telecommunity.com> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> Message-ID: <877hlkmv39.fsf@uwakimon.sk.tsukuba.ac.jp> P.J. Eby writes: > At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote: > >What I'm saying here is that if bytes are the signal of validity, and > >the stdlib functions preserve validity, then it's better to have the > >stdlib functions object to unicode data as an argument. Compare the > >alternative: it returns a unicode object which might get passed around > >for a while before one of your functions receives it and identifies it > >as unvalidated data. > > I still don't follow, OK, I give up, since it was your use case that concerned me. I obviously misunderstood. Sorry for the confusion. Sign me, +1 on polymorphic functions in Tsukuba Japan > >In general this is a hard problem, though. Polymorphism, OK, one-way > >tainting OK, but in general combining related types is pretty > >arbitrary, and as in the encoded-bytes case, the result type often > >varies depending on expectations of callers, not the types of the > >data. > > But the caller can enforce those expectations by passing in arguments > whose types do what they want in such cases, as long as the string > literals used by the function don't get to override the relevant > parts of the string protocol(s). This simply isn't true for encoded bytes as proposed. For encoded text, the current encoding has no deterministic relationship to the desired encoding (at the level of generality of the stdlib; of course in specific applications it may be mandated by a standard or private convention). I will have to pass on your other user-defined string types. I've never tried to implement one. I only wanted to point out that a user-controllable tainted string type would be preferable to confounding "unicode" with "tainted". From alexander.belopolsky at gmail.com Sun Jun 27 16:47:08 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 27 Jun 2010 10:47:08 -0400 Subject: [Python-Dev] Adopt A Demo [was: Signs of neglect?] In-Reply-To: References: Message-ID: On Sun, Jun 27, 2010 at 5:57 AM, Mark Dickinson wrote: > On Sun, Jun 27, 2010 at 10:41 AM, Georg Brandl wrote: >> So -- if every dev "adopted" a Tool or Demo, that would be quite a >> manageable piece of work, and maybe a few demos can be brought up >> to scratch instead of be deleted. >> >> I'll go ahead and promise to care for the "Demo/classes" subdir. > > Bagsy the Demo/parser subdirectory. ?Fixing up unparse.py looks like > it could be fun. I have a patch for pybench attached to a not so related issue at http://bugs.python.org/issue5180 . All it took was a 2to3 run and a one line change. Of course it need a review before it can go in, but I am surprised that something like pybench was not updated long time ago. Is it supposed to be single source? That would make sense given the nature of the tool. > > Mark > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com > From mal at egenix.com Sun Jun 27 18:33:53 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 27 Jun 2010 18:33:53 +0200 Subject: [Python-Dev] Adopt A Demo [was: Signs of neglect?] In-Reply-To: References: Message-ID: <4C277D71.1010802@egenix.com> Alexander Belopolsky wrote: > On Sun, Jun 27, 2010 at 5:57 AM, Mark Dickinson wrote: >> On Sun, Jun 27, 2010 at 10:41 AM, Georg Brandl wrote: >>> So -- if every dev "adopted" a Tool or Demo, that would be quite a >>> manageable piece of work, and maybe a few demos can be brought up >>> to scratch instead of be deleted. >>> >>> I'll go ahead and promise to care for the "Demo/classes" subdir. >> >> Bagsy the Demo/parser subdirectory. Fixing up unparse.py looks like >> it could be fun. > > I have a patch for pybench attached to a not so related issue at > http://bugs.python.org/issue5180 . All it took was a 2to3 run and a > one line change. Of course it need a review before it can go in, but > I am surprised that something like pybench was not updated long time > ago. Is it supposed to be single source? Yes, the idea was to keep the number of changes to a minimum and to have the Python3 version work with Python 2.6, 2.7 and 3.x. Antoine worked on that, AFAIR. The Python2 version of pybench needs to work with more than just Python 2.6 and 2.7 to be able to compare performance of the various releases back to version 2.3. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 27 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 21 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From pje at telecommunity.com Sun Jun 27 19:02:28 2010 From: pje at telecommunity.com (P.J. Eby) Date: Sun, 27 Jun 2010 13:02:28 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> Message-ID: <20100627170805.1785F3A4099@sparrow.telecommunity.com> At 03:53 PM 6/27/2010 +1000, Nick Coghlan wrote: >We could talk about this even longer, but the most effective way >forward is going to be a patch that improves the URL parsing >situation. Certainly, it's the only practical solution for the immediate problems in 3.2. I only mentioned that I "hate the idea" because I'd be more comfortable if it was explicitly declared to be a temporary hack to work around the absence of a string coercion protocol, due to the moratorium on language changes. But, since the moratorium *is* in effect, I'll try to make this my last post on string protocols for a while... and maybe wait until I've looked at the code (str/bytes C implementations) in more detail and can make a more concrete proposal for what the protocol would be and how it would work. (Not to mention closer to the end of the moratorium.) >There are a *very small* number of APIs where it is appropriate to >be polymorphic This is only true if you focus exclusively on bytes vs. unicode, rather than the general issue that it's currently impractical to pass *any* sort of user-defined string type through code that you don't directly control (stdlib or third-party). >The virtues of a separate poly_str type are that: >1. It can be simple and implemented in Python, dispatching to str or >bytes as appropriate (probably in the strings module) >2. No chance of impacting the performance of the core interpreter (as >builtins are not affected) Note that adding a string coercion protocol isn't going to change core performance for existing cases, since any place where the protocol would be invoked would be a code branch that either throws an error or *already* falls back to some other protocol (e.g. the buffer protocol). >3. Lower impact if it turns out to have been a bad idea How many protocols have been added that turned out to be bad ideas? The only ones that have been removed in 3.x, IIRC, are three-way compare, slice-specific operations, and __coerce__... and I'm going to miss __cmp__. ;-) However, IIUC, the reason these protocols were dropped isn't because they were "bad ideas". Rather, they're things that can be implemented in terms of a finer-grained protocol. i.e., if you want __cmp__ or __getslice__ or __coerce__, you can always implement them via a mixin that converts the newer fine-grained protocols into invocations of the older protocol. (As I plan to do for __cmp__ in the handful of places I use it.) At the moment, however, this isn't possible for multi-string operations outside of __add__/__radd__ and comparison -- the coercion rules are hard-wired and can't be overridden by user-defined types. From solipsis at pitrou.net Sun Jun 27 19:50:33 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 27 Jun 2010 19:50:33 +0200 Subject: [Python-Dev] pybench References: Message-ID: <20100627195033.224713c2@pitrou.net> On Sun, 27 Jun 2010 10:47:08 -0400 Alexander Belopolsky wrote: > > I have a patch for pybench attached to a not so related issue at > http://bugs.python.org/issue5180 . All it took was a 2to3 run and a > one line change. Of course it need a review before it can go in, but > I am surprised that something like pybench was not updated long time > ago. Why do you say that? pybench works fine under Python 3 (the py3k branch version of pybench, that is). The patch doesn't look necessary to me. From tjreedy at udel.edu Sun Jun 27 21:03:31 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 27 Jun 2010 15:03:31 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: On 6/27/2010 5:44 AM, Georg Brandl wrote: > Am 22.06.2010 01:01, schrieb Terry Reedy: >> On 6/21/2010 3:59 PM, Steve Holden wrote: >>> Terry Reedy wrote: >>>> On 6/21/2010 8:33 AM, Nick Coghlan wrote: >>>> >>>>> P.S. (We're going to have a tough decision to make somewhere along the >>>>> line where docs.python.org is concerned, too - when do we flick the >>>>> switch and make a 3.x version of the docs the default? >>>> >>>> Easy. When 3.2 is released. When 2.7 is released, 3.2 becomes 'trunk'. >>>> Trunk released always take over docs.python.org. To do otherwise would >>>> be to say that 3.2 is not a real trunk release and not yet ready for >>>> real use -- a major slam. >>>> >>>> Actually, I thought this was already discussed and decided ;-). >>>> >>> This also gives the 2.7 release it's day in the sun before relegation to >>> maintenance status. >> >> Every new version (except 3.0 and 3.1) has gone to maintenance status >> *and* becomes the featured release on docs.python.org the day it was >> released. 2.7 would just spend less time as the featured release on >> that page. > > I'm not sure 3.2 should take over in December just yet. (There's also > docs3.python.org that always lands at the latest 3.x documentation). > > However, there will be enough time to discuss this when 3.2 is actually > about to be released. Sure. Since I expect that the argument for treating 3.2 as a regular production-use-ready release will be stronger then than now, I agree on differing discussion. -- Terry Jan Reedy From martin at v.loewis.de Sun Jun 27 21:25:06 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 27 Jun 2010 21:25:06 +0200 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: <201006211113.06767.stephan.richter@gmail.com> References: <20100618050712.GC20639@thorne.id.au> <201006211113.06767.stephan.richter@gmail.com> Message-ID: <4C27A592.8010206@v.loewis.de> Am 21.06.2010 17:13, schrieb Stephan Richter: > On Monday, June 21, 2010, Nick Coghlan wrote: >> A decent listing of major packages that already support Python 3 would >> be very handy for the new Python2orPython3 page I created on the wiki, >> and easier to keep up-to-date. (the old Early2to3Migrations page >> didn't look particularly up to date, but hopefully we can keep the new >> list in a happier state). > > I really just want to be able to go to PyPI, Click on "Browse packages" and > then select "Python 3" (it can currently be accomplished by clicking "Python" > and then "3"). Or you can use the link "Python 3 packages" on PyPI's main menu. Regards, Martin From bugtrack at roumenpetrov.info Sun Jun 27 21:25:16 2010 From: bugtrack at roumenpetrov.info (Roumen Petrov) Date: Sun, 27 Jun 2010 22:25:16 +0300 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: <4C27A59C.6040005@roumenpetrov.info> Brett Cannon wrote: > On Sat, Jun 26, 2010 at 16:37, M.-A. Lemburg wrote: >> Brett Cannon wrote: >>> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: [SKIP] >>> Since no one objected I swapped the order in r82259. In case anyone >>> else uses clang to compile Python, this means that -Wno-unused-value >>> will now work to silence the warning about unused return values that >>> is caused by some macros. Probably using -Wno-empty-body is also good >>> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. Right now you cannot change order of CFLAGS and OPT >> I think you need to come up with a different solution and revert >> the change... >> >> OPT has historically been the only variable to use for >> adjusting the Python C compiler settings. > > Just found the relevant section in the README. > >> >> As the name implies this was usually used to adjust the >> optimizer settings, including raising the optimization level >> from the default or disabling it. > > It meant optional to me, not optimization. I hate abbreviations sometimes. > >> >> With your change CFLAGS will always override OPT and thus >> any optimization definitions made in OPT will no longer >> have an effect. > > That was the point; OPT defines defaults through configure.in and I > simply wanted to add to those instead of having OPT completely > overwritten by me. Now if you confirm that (see configure.in ) : # Optimization messes up debuggers, so turn it off for # debug builds. OPT="-g -O0 -Wall $STRICT_PROTO" is not issue for py3k then left you commit as is (Note that Mark point this). But if optimization "messes up debuggers" you may revert change. I know that is difficult to reach consensus on compiler/preprocessor flags for python build process. Next is a shot list with issues about this: - "Python 2.5 64 bit compile fails on Solaris 10/gcc 4.1.1" : http://bugs.python.org/issue1628484 (committed/rejected) - "Python does not honor "CFLAGS" environment variable" : http://bugs.python.org/issue1453 (wont fix) - "configure: allow user-provided CFLAGS to override AC_PROG_CC defaults" : http://bugs.python.org/issue8211 (fixed) This is still open "configure doesn't set up CFLAGS properly" ( http://bugs.python.org/issue1104249 ) - must be closed as fixed. >> Note that CFLAGS defines -O2 on many platforms. > > So then wouldn't that mean they want that to be the optimization > level? Or is the historical reason that default exists is so that some > default exists but to expect the application to override as desired? > >> >> In your particular case, you should try setting OPT to >> "... -Wno-unused-value ..." (ie. replace -Wall with your >> setting). > > So what is CFLAGS for then? ``configure -h`` says it's for "C compiler > flags"; that's extremely ambiguous. And it doesn't help that OPT is > not mentioned by ``configure -h`` as that is what I have always gone > by to know what flags are available for compilation. > > -Brett If you like to see some flags the could you look into http://bugs.python.org/issue3718 how to define an option to be visible by configure --help. In addition AC_ARG_VAR will allow environment variable to be cached for subsequent run of config.status otherwise you must specify only on configure command line. About all XXflags variables if is good configure script to be simplified to use only CPPFLAGS and CFLAGS to minimize configuration troubles and other build falures. A good sample if configure set preprocessor/compiler flags other then CPPFLAGS/CFLAGS is this issue "OSX: duplicate -arch flags in CFLAGS breaks sysconfig" ( http://bugs.python.org/issue8607 ) Roumen From dickinsm at gmail.com Sun Jun 27 21:43:34 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jun 2010 20:43:34 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C268F1E.5070506@egenix.com> References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 12:37 AM, M.-A. Lemburg wrote: > Brett Cannon wrote: >> On Wed, Jun 23, 2010 at 14:53, Brett Cannon wrote: >>> I finally realized why clang has not been silencing its warnings about >>> unused return values: I have -Wno-unused-value set in CFLAGS which >>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in >>> Makefile.pre.in. >>> >>> I could obviously set OPT in my environment, but that would override >>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS, >>> but the README says that's for stuff that tweak binary compatibility. >>> >>> So basically what I am asking is what environment variable should I >>> use? If CFLAGS is correct then does anyone have any issues if I change >>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes >>> after OPT? >>> >> >> Since no one objected I swapped the order in r82259. In case anyone >> else uses clang to compile Python, this means that -Wno-unused-value >> will now work to silence the warning about unused return values that >> is caused by some macros. Probably using -Wno-empty-body is also good >> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs. > > I think you need to come up with a different solution and revert > the change... Agreed; this needs more thought. For one thing, Brett's change has the result that --with-pydebug builds end up being built with -O2 instead of -O0, which can make debugging (e.g., with gdb) somewhat awkward. Mark From dickinsm at gmail.com Sun Jun 27 22:04:56 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jun 2010 21:04:56 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 6:46 AM, Jeffrey Yasskin wrote: > AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based > systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). > If Python's configure.in sets an otherwise-empty CFLAGS to -g before > calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just > preserve the users CFLAGS setting across AC_PROG_CC regardless of > whether it's set, to let the user set CFLAGS on the configure line > without stomping any defaults. I think saving and restoring CFLAGS across AC_PROG_CC was attempted in http://bugs.python.org/issue8211 . It turned out that it broke OS X universal builds. I'm not sure I understand the importance of allowing AC_PROG_CC to set CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); can someone give an example of why this is necessary? Mark From tjreedy at udel.edu Sun Jun 27 22:07:56 2010 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 27 Jun 2010 16:07:56 -0400 Subject: [Python-Dev] #Python3 ! ? (was Python Library Support in 3.x) In-Reply-To: References: <20100618050712.GC20639@thorne.id.au> <20100619121256.2412.244251859.divmod.xquotient.130@localhost.localdomain> <63486FB9-866D-47D3-AF04-0A621AB416A4@ikanobori.jp> Message-ID: > Sure. Since I expect that the argument for treating 3.2 as a regular > production-use-ready release will be stronger then than now, I agree on > differing discussion. I meant 'deferring' -- Terry Jan Reedy From jyasskin at gmail.com Sun Jun 27 22:37:48 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sun, 27 Jun 2010 13:37:48 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: > On Sun, Jun 27, 2010 at 6:46 AM, Jeffrey Yasskin wrote: >> AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based >> systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). >> If Python's configure.in sets an otherwise-empty CFLAGS to -g before >> calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just >> preserve the users CFLAGS setting across AC_PROG_CC regardless of >> whether it's set, to let the user set CFLAGS on the configure line >> without stomping any defaults. > > I think saving and restoring CFLAGS across AC_PROG_CC was attempted in > http://bugs.python.org/issue8211 . It turned out that it broke OS X > universal builds. Thanks for the link to the issue. http://bugs.python.org/issue8366 says Ronald Oussoren fixed the universal builds without reverting the CFLAGS propagation. > I'm not sure I understand the importance of allowing AC_PROG_CC to set > CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); ?can > someone give an example of why this is necessary? Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds other flags as well (it currently doesn't, but that may well change in future versions of autoconf)." That seems a little weak to constrain fixing actual problems today. If it ever adds more arguments, we'll need to inspect them anyway to see if they're more like -g or -O2 (wanted or harmful). Jeffrey From brett at python.org Sun Jun 27 22:50:23 2010 From: brett at python.org (Brett Cannon) Date: Sun, 27 Jun 2010 13:50:23 -0700 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 13:37, Jeffrey Yasskin wrote: > On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >> On Sun, Jun 27, 2010 at 6:46 AM, Jeffrey Yasskin wrote: >>> AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based >>> systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). >>> If Python's configure.in sets an otherwise-empty CFLAGS to -g before >>> calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just >>> preserve the users CFLAGS setting across AC_PROG_CC regardless of >>> whether it's set, to let the user set CFLAGS on the configure line >>> without stomping any defaults. >> >> I think saving and restoring CFLAGS across AC_PROG_CC was attempted in >> http://bugs.python.org/issue8211 . It turned out that it broke OS X >> universal builds. > > Thanks for the link to the issue. http://bugs.python.org/issue8366 > says Ronald Oussoren fixed the universal builds without reverting the > CFLAGS propagation. > >> I'm not sure I understand the importance of allowing AC_PROG_CC to set >> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); ?can >> someone give an example of why this is necessary? > > Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds > other flags as well (it currently doesn't, but that may well change in > future versions of autoconf)." That seems a little weak to constrain > fixing actual problems today. If it ever adds more arguments, we'll > need to inspect them anyway to see if they're more like -g or -O2 > (wanted or harmful). > I went ahead and reverted the change, but it does seem like the build environment could use a cleanup. From dickinsm at gmail.com Sun Jun 27 22:54:06 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jun 2010 21:54:06 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: On Sun, Jun 27, 2010 at 9:37 PM, Jeffrey Yasskin wrote: > On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >> I think saving and restoring CFLAGS across AC_PROG_CC was attempted in >> http://bugs.python.org/issue8211 . It turned out that it broke OS X >> universal builds. > > Thanks for the link to the issue. http://bugs.python.org/issue8366 > says Ronald Oussoren fixed the universal builds without reverting the > CFLAGS propagation. Yes, you're right (of course). Thanks. Looking at the current configure.in, CFLAGS *does* get saved and restored across the AC_PROG_CC call if it's non-empty; I'm not sure whether this actually (currently) has any effect, since as I understand the documentation CFLAGS won't be touched by AC_PROG_CC if it's already set. >> I'm not sure I understand the importance of allowing AC_PROG_CC to set >> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); ?can >> someone give an example of why this is necessary? > > Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds > other flags as well (it currently doesn't, but that may well change in > future versions of autoconf)." That seems a little weak to constrain > fixing actual problems today. If it ever adds more arguments, we'll > need to inspect them anyway to see if they're more like -g or -O2 > (wanted or harmful). Okay; thanks for the explanation. Mark From greg.ewing at canterbury.ac.nz Mon Jun 28 00:35:36 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 28 Jun 2010 10:35:36 +1200 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <4C272D8C.6010406@trueblade.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <4C25B319.8040804@canterbury.ac.nz> <4C271E66.5050902@canterbury.ac.nz> <4C272D8C.6010406@trueblade.com> Message-ID: <4C27D238.3060100@canterbury.ac.nz> Eric Smith wrote: > But isn't this currently ignored everywhere in python's code? It's true that code using a utf-8 build would have to be aware of the fact much more often. But I'm thinking of applications that would otherwise want to keep all their strings encoded to save memory. If they do that, they also need to deal with sequence items not corresponding to characters. If they can handle that, they may be able to handle utf-8 just as well. -- Greg From rdmurray at bitdance.com Mon Jun 28 01:31:21 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 27 Jun 2010 19:31:21 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> Message-ID: <20100627233121.E1E0821948D@kimball.webabinitio.net> I've been watching this discussion with intense interest, but have been so lagged in following the thread that I haven't replied. I got caught up today.... On Sun, 27 Jun 2010 15:53:59 +1000, Nick Coghlan wrote: > The difference is that we have three classes of algorithm here: > - those that work only on octet sequences > - those that work only on character sequences > - those that can work on either > > Python 2 lumped all 3 classes of algorithm together through the > multi-purpose 8-bit str type. The unicode type provided some scope to > separate out the second category, but the divisions were rather > blurry. > > Python 3 forces the first two to be separated by using either octets > (bytes/bytearray) or characters (str). There are a *very small* number > of APIs where it is appropriate to be polymorphic, but this is > currently difficult due to the need to supply literals of the > appropriate type for the objects being operated on. > > This isn't ever going to happen automagically due to the need to > explicitly provide two literals (one for octet sequences, one for > character sequences). In email6 I'm currently handling this by putting the algorithm on a base class and the literals on 'Bytes...' and 'String...' subclasses as class variables. Slightly ugly, but it works. The current design also speaks to an earlier point someone made about the fact that we are really dealing with more complex, and domain specific, data, not simply "byte strings". A "BytesMessage" contains lots of structured encoding information as well as the possibility of 'garbage' bytes. A StringMessage contains text and data decoded into objects (ex: an image object), possibly with some PEP 383 surrogates included (haven't quite figured that part out yet). So, a BytesMessage object isn't just a byte string, it's a load of structured data that requires the associated algorithms to convert into meaningful text and objects. Going the other way, the decisions made about character encodings need to be encoded into the structured bytes representation that could ultimately go out on the wire. I suspect that the same thing needs to be done for URIs/IRIs, and html/MIME and the corresponding text and objects. It is my hope that the email6 work will lay a firm foundation for the latter, but URI/IRI is a whole different protocol that I'm glad I don't have to deal with :) > The virtues of a separate poly_str type are that: Having such a poly_str type would probably make my life easier. I also would like just vent a little frustration at having to use single-character-slice notation when I want to index a character in a string in my algorithms.... -- R. David Murray www.bitdance.com From rdmurray at bitdance.com Mon Jun 28 01:41:48 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 27 Jun 2010 19:41:48 -0400 Subject: [Python-Dev] thoughts on the bytes/string discussion In-Reply-To: <26215.1277505652@parc.com> References: <11597.1277401099@parc.com> <96ADD4CE-3A24-45A7-B219-2940195DC3D0@twistedmatrix.com> <26215.1277505652@parc.com> Message-ID: <20100627234148.9618021948F@kimball.webabinitio.net> On Fri, 25 Jun 2010 15:40:52 -0700, Bill Janssen wrote: > Guido van Rossum wrote: > > So you're really just worried about space consumption. I'd like to see > > a lot of hard memory profiling data before I got overly worried about > > that. > > While I've seen some big Web pages, I think the email folks, who often > have to process messages with attachments measuring in the tens of > megabytes, have the stronger problems here, and I think speed may be > more important than memory. I've built both a Web server and an IMAP > server in Python, and the IMAP server is where the issues of storage > management really prevail. If you have to convert a 20 MB encoded > string into a Unicode string just to look at the headers as strings, you > have issues. (The Python email package doesn't do that, by the way.) Unfortunately in the current Python3 email package (email5), this is no longer true. You have to decode everything *first* in order to pass it through email (which presents a few problems when dealing with 8bit data, as has been mentioned here before). eamil6 intends to fix this, and once again allow you to decode to text only the bits you actually need to access and manipulate. -- R. David Murray www.bitdance.com From rdmurray at bitdance.com Mon Jun 28 02:00:17 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Sun, 27 Jun 2010 20:00:17 -0400 Subject: [Python-Dev] email package status in 3.X In-Reply-To: References: Message-ID: <20100628000017.F3B732194BA@kimball.webabinitio.net> On Fri, 18 Jun 2010 18:52:45 -0000, lutz at rmi.net wrote: > What I'm suggesting is that extreme caution be exercised from > this point forward with all things 3.X-related. Whether you > wish to accept this or not, 3.X has a negative image to many. > This suggestion specifically includes not abandoning current > 3.X email package users as a case in point. Ripping the rug > out from new 3.X users after they took the time to port seems > like it may be just enough to tip the scales altogether. Catching up on my python-dev email, I just want to clarify this with respect to email. (1) I suspect that the new API will be enough of a carrot that they won't mind converting to it, BUT, (2) the plan is to provide a compatibility API that will fully support the current Python3 email5 API (but with fewer bugs in areas such as header folding and unfolding). -- R. David Murray www.bitdance.com From greg at krypto.org Mon Jun 28 06:33:36 2010 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 27 Jun 2010 21:33:36 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C262D37.7020807@animats.com> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> <4C262D37.7020807@animats.com> Message-ID: fyi - newthreading has been picked up by lwn. http://lwn.net/Articles/393822/#Comments -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Jun 28 06:33:36 2010 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 27 Jun 2010 21:33:36 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C262D37.7020807@animats.com> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> <4C262D37.7020807@animats.com> Message-ID: fyi - newthreading has been picked up by lwn. http://lwn.net/Articles/393822/#Comments -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Jun 28 10:28:45 2010 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 28 Jun 2010 20:28:45 +1200 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100627233121.E1E0821948D@kimball.webabinitio.net> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> <20100627233121.E1E0821948D@kimball.webabinitio.net> Message-ID: <4C285D3D.80907@canterbury.ac.nz> R. David Murray wrote: > Having such a poly_str type would probably make my life easier. A thought on this poly_str type: perhaps it could be called "ascii", since that's what it would have to be restricted to, and have a'xxx' as a literal syntax for it, seeing as literals seem to be one of its main use cases. > I also would like just vent a little frustration at having to > use single-character-slice notation when I want to index a character > in a string in my algorithms.... Thinking way outside the square, and probably the pale as well, maybe @ could be pressed into service as an infix operator, with s at i being equivalent to s[i:i+1] -- Greg From orsenthil at gmail.com Mon Jun 28 10:25:26 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Mon, 28 Jun 2010 13:55:26 +0530 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C285D3D.80907@canterbury.ac.nz> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> <20100627233121.E1E0821948D@kimball.webabinitio.net> <4C285D3D.80907@canterbury.ac.nz> Message-ID: <20100628082526.GA6509@remy> On Mon, Jun 28, 2010 at 08:28:45PM +1200, Greg Ewing wrote: > A thought on this poly_str type: perhaps it could be > called "ascii", since that's what it would have to be > restricted to, and have > > a'xxx' > > as a literal syntax for it, seeing as literals seem to > be one of its main use cases. This seems like a good idea. > > Thinking way outside the square, and probably the pale > as well, maybe @ could be pressed into service as an > infix operator, with > > s at i > > being equivalent to > > s[i:i+1] > And this is way beyond being intuitive. -- Senthil From rdmurray at bitdance.com Mon Jun 28 13:24:48 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 28 Jun 2010 07:24:48 -0400 Subject: [Python-Dev] bytes / unicode In-Reply-To: <20100628082526.GA6509@remy> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> <20100627233121.E1E0821948D@kimball.webabinitio.net> <4C285D3D.80907@canterbury.ac.nz> <20100628082526.GA6509@remy> Message-ID: <20100628112448.348771FD0CD@kimball.webabinitio.net> On Mon, 28 Jun 2010 13:55:26 +0530, Senthil Kumaran wrote: > On Mon, Jun 28, 2010 at 08:28:45PM +1200, Greg Ewing wrote: > > Thinking way outside the square, and probably the pale > > as well, maybe @ could be pressed into service as an > > infix operator, with > > > > s at i > > > > being equivalent to > > > > s[i:i+1] > > > > And this is way beyond being intuitive. Agreed, -1 on that. Like I said, I was just venting. The decision to have indexing bytes return an int is set in stone now and I just have to live with it. -- R. David Murray www.bitdance.com From mal at egenix.com Mon Jun 28 13:38:31 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 13:38:31 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> Message-ID: <4C2889B7.2060105@egenix.com> Brett Cannon wrote: > On Sun, Jun 27, 2010 at 13:37, Jeffrey Yasskin wrote: >> On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >>> On Sun, Jun 27, 2010 at 6:46 AM, Jeffrey Yasskin wrote: >>>> AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based >>>> systems (http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842). >>>> If Python's configure.in sets an otherwise-empty CFLAGS to -g before >>>> calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just >>>> preserve the users CFLAGS setting across AC_PROG_CC regardless of >>>> whether it's set, to let the user set CFLAGS on the configure line >>>> without stomping any defaults. >>> >>> I think saving and restoring CFLAGS across AC_PROG_CC was attempted in >>> http://bugs.python.org/issue8211 . It turned out that it broke OS X >>> universal builds. >> >> Thanks for the link to the issue. http://bugs.python.org/issue8366 >> says Ronald Oussoren fixed the universal builds without reverting the >> CFLAGS propagation. >> >>> I'm not sure I understand the importance of allowing AC_PROG_CC to set >>> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); can >>> someone give an example of why this is necessary? >> >> Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds >> other flags as well (it currently doesn't, but that may well change in >> future versions of autoconf)." That seems a little weak to constrain >> fixing actual problems today. If it ever adds more arguments, we'll >> need to inspect them anyway to see if they're more like -g or -O2 >> (wanted or harmful). Please see the discussion on the ticket for details. AC_PROG_CC provides the basic defaults for the CFLAGS compiler settings depending on which compiler is chosen/found: http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html > I went ahead and reverted the change, but it does seem like the build > environment could use a cleanup. Thanks and, indeed, the build system environment variable usage does need a cleanup. It's a larger project, though, and one that will likely break existing build setups. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Mon Jun 28 14:13:53 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 28 Jun 2010 22:13:53 +1000 Subject: [Python-Dev] bytes / unicode In-Reply-To: <4C285D3D.80907@canterbury.ac.nz> References: <20100625130801.1E9A83A4099@sparrow.telecommunity.com> <8739wbnl0m.fsf@uwakimon.sk.tsukuba.ac.jp> <20100625222722.594D23A4099@sparrow.telecommunity.com> <87oceympcu.fsf@uwakimon.sk.tsukuba.ac.jp> <20100626181753.601473A4108@sparrow.telecommunity.com> <20100627034922.31A663A4108@sparrow.telecommunity.com> <20100627233121.E1E0821948D@kimball.webabinitio.net> <4C285D3D.80907@canterbury.ac.nz> Message-ID: On Mon, Jun 28, 2010 at 6:28 PM, Greg Ewing wrote: > R. David Murray wrote: > >> Having such a poly_str type would probably make my life easier. > > A thought on this poly_str type: perhaps it could be > called "ascii", since that's what it would have to be > restricted to, and have > > ?a'xxx' > > as a literal syntax for it, seeing as literals seem to > be one of its main use cases. One of the virtues of doing this as a helper type in a module somewhere (probably string) is that we can defer that kind of decision until later. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dickinsm at gmail.com Mon Jun 28 15:50:37 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 28 Jun 2010 14:50:37 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C2889B7.2060105@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> Message-ID: On Mon, Jun 28, 2010 at 12:38 PM, M.-A. Lemburg wrote: >> On Sun, Jun 27, 2010 at 13:37, Jeffrey Yasskin wrote: >>> On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >>>> I'm not sure I understand the importance of allowing AC_PROG_CC to set >>>> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); ?can >>>> someone give an example of why this is necessary? >>> >>> Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds >>> other flags as well (it currently doesn't, but that may well change in >>> future versions of autoconf)." That seems a little weak to constrain >>> fixing actual problems today. If it ever adds more arguments, we'll >>> need to inspect them anyway to see if they're more like -g or -O2 >>> (wanted or harmful). > > Please see the discussion on the ticket for details. Yes, I've done that. It's repeatedly asserted in that discussion that AC_PROG_CC should be allowed to initialize an otherwise empty CFLAGS, but nowhere in that discussion does it explain *why* this is desirable. What would be so bad about not allowing AC_PROG_CC to initialize CFLAGS? (E.g., by setting an otherwise empty CFLAGS to '-g' before the AC_PROG_CC invocation.) That would fix the issue of the unwanted -O2 flag that AC_PROG_CC otherwise adds. Mark From mal at egenix.com Mon Jun 28 16:04:04 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 16:04:04 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> Message-ID: <4C28ABD4.1030000@egenix.com> Mark Dickinson wrote: > On Mon, Jun 28, 2010 at 12:38 PM, M.-A. Lemburg wrote: >>> On Sun, Jun 27, 2010 at 13:37, Jeffrey Yasskin wrote: >>>> On Sun, Jun 27, 2010 at 1:04 PM, Mark Dickinson wrote: >>>>> I'm not sure I understand the importance of allowing AC_PROG_CC to set >>>>> CFLAGS (if CFLAGS is undefined at the point of the AC_PROG_CC); can >>>>> someone give an example of why this is necessary? >>>> >>>> Marc-Andre's argument seems to be "it's possible that AC_PROG_CC adds >>>> other flags as well (it currently doesn't, but that may well change in >>>> future versions of autoconf)." That seems a little weak to constrain >>>> fixing actual problems today. If it ever adds more arguments, we'll >>>> need to inspect them anyway to see if they're more like -g or -O2 >>>> (wanted or harmful). >> >> Please see the discussion on the ticket for details. > > Yes, I've done that. It's repeatedly asserted in that discussion that > AC_PROG_CC should be allowed to initialize an otherwise empty CFLAGS, > but nowhere in that discussion does it explain *why* this is > desirable. What would be so bad about not allowing AC_PROG_CC to > initialize CFLAGS? (E.g., by setting an otherwise empty CFLAGS to > '-g' before the AC_PROG_CC invocation.) That would fix the issue of > the unwanted -O2 flag that AC_PROG_CC otherwise adds. Why do you think that the default -O2 is unwanted and how do you know whether the compiler accepts -g as option ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dickinsm at gmail.com Mon Jun 28 16:22:19 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 28 Jun 2010 15:22:19 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28ABD4.1030000@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> Message-ID: On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: > Why do you think that the default -O2 is unwanted Because it can cause debug builds of Python to be built with optimization enabled, as we've already seen at least twice. > and how do you know > whether the compiler accepts -g as option ? I don't. It could easily be tested for, though. Alternatively, setting an empty CFLAGS to '-g' could be done just for gcc, since this is the only compiler for which AC_PROG_CC adds -O2. Mark From mal at egenix.com Mon Jun 28 17:28:03 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 17:28:03 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> Message-ID: <4C28BF83.9080903@egenix.com> Mark Dickinson wrote: > On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: >> Why do you think that the default -O2 is unwanted > > Because it can cause debug builds of Python to be built with > optimization enabled, as we've already seen at least twice. Then let me put it this way: How many Python users will compile Python in debug mode ? The point is that the default build of Python should use the correct production settings for the C compiler out of the box and that's what AC_PROG_CC is all about. I'm pretty sure that Python developers who want to use a debug build have enough code foo to get the -O2 turned into a -O0 either by adjust OPT and/or by providing their own CFLAGS env var. Also note that in some cases you may actually want to have a debug build with optimizations turned on, e.g. to track down a compiler optimization bug. >> and how do you know >> whether the compiler accepts -g as option ? > > I don't. It could easily be tested for, though. Alternatively, > setting an empty CFLAGS to '-g' could be done just for gcc, since this > is the only compiler for which AC_PROG_CC adds -O2. ... and then end up with default Python builds which don't have debug symbols available to track down core dumps, etc. ? AC_PROG_CC checks whether the compiler supports -g and always uses it in that case. The option is supported by more compilers than just GCC. E.g. IBM's xlC and Intel's icl compilers support that option as well. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From mal at egenix.com Mon Jun 28 17:31:40 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 17:31:40 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28BF83.9080903@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> Message-ID: <4C28C05C.80008@egenix.com> M.-A. Lemburg wrote: > Mark Dickinson wrote: >> On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: >>> Why do you think that the default -O2 is unwanted >> >> Because it can cause debug builds of Python to be built with >> optimization enabled, as we've already seen at least twice. > > Then let me put it this way: > > How many Python users will compile Python in debug mode ? > > The point is that the default build of Python should use > the correct production settings for the C compiler out of > the box and that's what AC_PROG_CC is all about. > > I'm pretty sure that Python developers who want to use a > debug build have enough code foo to get the -O2 turned into a -O0 > either by adjust OPT and/or by providing their own CFLAGS env var. > > Also note that in some cases you may actually want to have > a debug build with optimizations turned on, e.g. to track down > a compiler optimization bug. > >>> and how do you know >>> whether the compiler accepts -g as option ? >> >> I don't. It could easily be tested for, though. Alternatively, >> setting an empty CFLAGS to '-g' could be done just for gcc, since this >> is the only compiler for which AC_PROG_CC adds -O2. > > ... and then end up with default Python builds which don't have > debug symbols available to track down core dumps, etc. ? > > AC_PROG_CC checks whether the compiler supports -g and always > uses it in that case. The option is supported by more compilers > than just GCC. E.g. IBM's xlC and Intel's icl compilers support > that option as well. Sorry, Intel's compiler is called "icc", not "icl": http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/mac/man/icc.txt IBM's compiler: http://publib.boulder.ibm.com/infocenter/macxhelp/v6v81/index.jsp?topic=/com.ibm.vacpp6m.doc/compiler/ref/ruoptlst.htm -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From guido at python.org Mon Jun 28 17:39:22 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Jun 2010 08:39:22 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> <4C262D37.7020807@animats.com> Message-ID: On Sun, Jun 27, 2010 at 9:33 PM, Gregory P. Smith wrote: > fyi - newthreading has been picked up by lwn. > http://lwn.net/Articles/393822/#Comments Do you know if any of the commenters is Nagle himself (and if so, which)? The discussion is hard to follow since the context of replies isn't always clear. There also seems to be a bunch of C++ thinking (and some knee-jerk responses by people who aren't actually all that familiar with Python) although I admit I don't have much of an intuition about memory models for fully free threading myself. It's a brave new world... --Guido -- --Guido van Rossum (python.org/~guido) From dickinsm at gmail.com Mon Jun 28 17:44:00 2010 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 28 Jun 2010 16:44:00 +0100 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28BF83.9080903@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> Message-ID: On Mon, Jun 28, 2010 at 4:28 PM, M.-A. Lemburg wrote: > Mark Dickinson wrote: >> On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: >>> Why do you think that the default -O2 is unwanted >> >> Because it can cause debug builds of Python to be built with >> optimization enabled, as we've already seen at least twice. > > Then let me put it this way: > > How many Python users will compile Python in debug mode ? > > The point is that the default build of Python should use > the correct production settings for the C compiler out of > the box and that's what AC_PROG_CC is all about. > > I'm pretty sure that Python developers who want to use a > debug build have enough code foo to get the -O2 turned into a -O0 > either by adjust OPT and/or by providing their own CFLAGS env var. Shrug. Clearly someone at some point in the past thought it was a good idea to have --with-pydebug builds use -O0. If there's going to be a deliberate decision to drop that now, then that's fine with me. >> I don't. ?It could easily be tested for, though. ?Alternatively, >> setting an empty CFLAGS to '-g' could be done just for gcc, since this >> is the only compiler for which AC_PROG_CC adds -O2. > > ... and then end up with default Python builds which don't have > debug symbols available to track down core dumps, etc. ? No, I don't see how that follows. I was suggesting that *for gcc only*, an empty CFLAGS be set to '-g' before calling AC_PROG_CC. The *only* effect this would have would be that for gcc, if the user hasn't specified CFLAGS, then CFLAGS ends up being '-g' rather than '-g -O2' after the AC_PROG_CC call. But I'm really not looking for an argument here; I just wanted to understand why you thought AC_PROG_CC setting CFLAGS was important, and you've explained that. Thanks. Mark From mal at egenix.com Mon Jun 28 18:03:23 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 28 Jun 2010 18:03:23 +0200 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> Message-ID: <4C28C7CB.8030600@egenix.com> Mark Dickinson wrote: > On Mon, Jun 28, 2010 at 4:28 PM, M.-A. Lemburg wrote: >> Mark Dickinson wrote: >>> On Mon, Jun 28, 2010 at 3:04 PM, M.-A. Lemburg wrote: >>>> Why do you think that the default -O2 is unwanted >>> >>> Because it can cause debug builds of Python to be built with >>> optimization enabled, as we've already seen at least twice. >> >> Then let me put it this way: >> >> How many Python users will compile Python in debug mode ? >> >> The point is that the default build of Python should use >> the correct production settings for the C compiler out of >> the box and that's what AC_PROG_CC is all about. >> >> I'm pretty sure that Python developers who want to use a >> debug build have enough code foo to get the -O2 turned into a -O0 >> either by adjust OPT and/or by providing their own CFLAGS env var. > > Shrug. Clearly someone at some point in the past thought it was a > good idea to have --with-pydebug builds use -O0. If there's going to > be a deliberate decision to drop that now, then that's fine with me. Ah right, the time machine again :-) OPT already uses -O0 if --with-pydebug is used and the compiler supports -g. Since OPT gets added after CFLAGS, the override already happens... >>> I don't. It could easily be tested for, though. Alternatively, >>> setting an empty CFLAGS to '-g' could be done just for gcc, since this >>> is the only compiler for which AC_PROG_CC adds -O2. >> >> ... and then end up with default Python builds which don't have >> debug symbols available to track down core dumps, etc. ? > > No, I don't see how that follows. I was suggesting that *for gcc > only*, an empty CFLAGS be set to '-g' before calling AC_PROG_CC. The > *only* effect this would have would be that for gcc, if the user > hasn't specified CFLAGS, then CFLAGS ends up being '-g' rather than > '-g -O2' after the AC_PROG_CC call. But I'm really not looking for an > argument here; I just wanted to understand why you thought AC_PROG_CC > setting CFLAGS was important, and you've explained that. Thanks. Sorry, that was a misunderstand on my part. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 28 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 20 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From techtonik at gmail.com Mon Jun 28 18:05:13 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 28 Jun 2010 19:05:13 +0300 Subject: [Python-Dev] WPython 1.1 was released In-Reply-To: References: <201006232112.41047.steve@pearwood.info> Message-ID: It would be interesting to see benchmark diagrams inline on one page with overall summaries. I've posted a enhancement to http://code.google.com/p/unladen-swallow/issues/detail?id=145 if somebody is going to look at that. I wonder if 32bit version can bring more speedups? -- anatoly t. From techtonik at gmail.com Mon Jun 28 20:09:56 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 28 Jun 2010 21:09:56 +0300 Subject: [Python-Dev] Pickle security and remote logging Message-ID: Hello, I need to send logging module output over the network. The module has everything to make this happen, except security. SocketHandler and DatagramHandler examples are using pickle module that is said to be insecure. SocketHandler and DatagramHandler docs should at least contain a warning about danger of exposing unpickling interfaces to insecure networks. pickle documentation mentions that it is possible to control what gets unpickled, but there is any no example or security analysis if the proposed solution will be secure. Is there any way to implement secure network logging? I do not care about data encryption - I just do not want my server exploited by malformed data. -- anatoly t. From zohair_ms at hotmail.com Mon Jun 28 20:09:35 2010 From: zohair_ms at hotmail.com (Zohair) Date: Mon, 28 Jun 2010 11:09:35 -0700 (PDT) Subject: [Python-Dev] Access a function Message-ID: <29008798.post@talk.nabble.com> I am a very new to python and have a small question.. I have a function: set_time_at_next_pps(self, *args, **kwargs) but don't know how to use it... Askign for your help please. Cheers, Zoh -- View this message in context: http://old.nabble.com/Access-a-function-tp29008798p29008798.html Sent from the Python - python-dev mailing list archive at Nabble.com. From fuzzyman at voidspace.org.uk Mon Jun 28 20:39:08 2010 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Mon, 28 Jun 2010 19:39:08 +0100 Subject: [Python-Dev] Access a function In-Reply-To: <29008798.post@talk.nabble.com> References: <29008798.post@talk.nabble.com> Message-ID: <4C28EC4C.7030905@voidspace.org.uk> On 28/06/2010 19:09, Zohair wrote: > I am a very new to python and have a small question.. > > I have a function: > set_time_at_next_pps(self, *args, **kwargs) but don't know how to use it... > Askign for your help please. > Hi Zoh, This mailing list is for the development *of* Python, not for questions about developing *with* Python. You should ask your question on a mailing list / newsgroup like python-list or python-tutor. python-list is available via google groups: https://groups.google.com/group/comp.lang.python/topics You haven't given enough information to answer the question however. The first argument 'self' means that the function is probably a method of a class, and should be called from a class instance. The *args / **kwargs means that the function can take any number of arguments or keyword arguments, which doesn't tell us anything about the function should be used. You can find out more on Python functions in the tutorial: http://docs.python.org/tutorial/controlflow.html#more-on-defining-functions All the best, Michael Foord > Cheers, > > Zoh > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog READ CAREFULLY. By accepting and reading this email you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies (?BOGUS AGREEMENTS?) that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer. From phd at phd.pp.ru Mon Jun 28 20:42:28 2010 From: phd at phd.pp.ru (Oleg Broytman) Date: Mon, 28 Jun 2010 22:42:28 +0400 Subject: [Python-Dev] Access a function In-Reply-To: <29008798.post@talk.nabble.com> References: <29008798.post@talk.nabble.com> Message-ID: <20100628184228.GA17475@phd.pp.ru> Hello. We'are sorry but we cannot help you. This mailing list is to work on developing Python (fixing bugs and adding new features to Python itself); if you're having problems using Python, please find another forum. Probably python-list (comp.lang.python) news group/mailing list is the best place. See http://www.python.org/community/lists/ for other lists/news groups/fora. Thank you for understanding. On Mon, Jun 28, 2010 at 11:09:35AM -0700, Zohair wrote: > > I am a very new to python and have a small question.. > > I have a function: > set_time_at_next_pps(self, *args, **kwargs) but don't know how to use it... > Askign for your help please. > > Cheers, > > Zoh > -- > View this message in context: http://old.nabble.com/Access-a-function-tp29008798p29008798.html > Sent from the Python - python-dev mailing list archive at Nabble.com. > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/phd%40phd.pp.ru Oleg. -- Oleg Broytman http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From alexander.belopolsky at gmail.com Mon Jun 28 21:59:00 2010 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 28 Jun 2010 15:59:00 -0400 Subject: [Python-Dev] How to spell PyInstance_NewRaw in py3k? Message-ID: Issue #5180 [1] presented an interesting challenge: how to unpickle instances of old-style classes when a pickle created with 2.x is loaded in 3.x python? The problem is that pickle protocol requires that unpickled instances be created without calling the __init__ method. This is necessary because pickle file may not contain information about how __init__ method should be invoked. Instead, implementations are required to bypass __init__ and populate instance's __dict__ directly using data found in the pickle. Pure python implementation uses the following trick that happens to work in 3.x: class Empty: pass pickled = Empty() pickled.__class__ = Pickled This of course, creates a new-style class in 3.x, but if 3.x version of Pickled behaves similarly to its 2.x predecessor, it should work. The cPickle implementation, on the other hand uses 2.x C API which is not available in 3.x. Namely, the PyInstance_NewRaw function. In order to fix the bug described in issue #5180, I had to emulate PyInstance_NewRaw using type->tp_alloc. I considered an rejected the idea to use tp_new instead. [2] Is this the right way to proceed? The patch is attached to the issue. [3] [1] http://bugs.python.org/issue5180 [2] http://bugs.python.org/issue5180#msg108846 [3] http://bugs.python.org/file17792/issue5180.diff From lvh at laurensvh.be Mon Jun 28 23:33:05 2010 From: lvh at laurensvh.be (Laurens Van Houtven) Date: Mon, 28 Jun 2010 23:33:05 +0200 Subject: [Python-Dev] Access a function In-Reply-To: <20100628184228.GA17475@phd.pp.ru> References: <29008798.post@talk.nabble.com> <20100628184228.GA17475@phd.pp.ru> Message-ID: Of course I concur with the two posters above me, but in order to advertise for my own shop... If you're stuck with a lot of newbie questions like these you might want to try #python (the IRC channel on irc.freenode.net). You're more likely to get quick successive responses there than on other media (which are more suitable for bigger, more complex questions). cheers Laurens From guido at python.org Tue Jun 29 01:09:55 2010 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Jun 2010 16:09:55 -0700 Subject: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL In-Reply-To: <4C262D37.7020807@animats.com> References: <4C259A25.1060705@animats.com> <4C2600B4.5020503@voidspace.org.uk> <4C262D37.7020807@animats.com> Message-ID: I'm moving this thread to python-ideas, where it belongs. I've looked at the implementation code (even stepped through it with pdb!), read the sample/test code, and read the two papers on animats.com fairly closely (they have a lot of overlap, and the memory model described below seems copied verbatim from http://www.animats.com/papers/languages/pythonconcurrency.html version 0.8). Some reactions (trying to hide my responses to the details of the code): - First of all, I'm very happy to see radical ideas proposed, even if they are at present unrealistic. We need a big brainstorm to come up with ideas from which an eventual solution to the multicore problem might be chosen. (Jesse Noller's multiprocessing is another; Adam Olsen's work yet another, at a different end of the spectrum.) - The proposed new semantics (frozen objects, memory model, auto-freezing of globals, enforcement of naming conventions) are radically different from Python's current semantics. They will break every 3rd party library in many more ways than Python 3. This is not surprising given the goals of the proposal (and its roots in Adam Olsen's work) but places a huge roadblock for acceptance. I see no choice but to keep trying to come up with a compromise that is more palatable and compatible without throwing away all the advantages. As it now stands, the proposal might as well be a new and different language. - SynchronizedObject looks like a mixture of a Java synchronized class (a non-standard concept in Java but easily understood as a class all whose public methods are synchronized) and a condition variable (which has the same semantics of releasing the lock while waiting but without crawling the stack for other locks to release). It looks like the examples showing off SynchronizedObject could be implemented just as elegantly using a condition variable (and voluntary abstention from using shared mutable objects). - If the goal is to experiment with new control structures, I recommend decoupling them from the memory model and frozen objects, instead relying (as is traditional in Python) on programmer caution to avoid races. This would make it much easier to see how programmers respond to the new control structures. - You could add the freeze() function for voluntary use, and you could even add automatic wrapping of arguments and return values for certain classes using a class decorator or a metaclass, but the performance overhead makes this unlikely to win over many converts. I don't see much use for the "whole program freezing" done by the current prototype -- there are way too many backdoors in Python for the prototype approach to be anywhere near foolproof, and if we want a non-foolproof approach, voluntary constraint (and, in some cases, voluntary, i.e. explicit, wrapping of modules or classes) would work just as well. - For a larger-scale experiment with the new memory model and semantic restrictions (or would it be better to call them syntactic restrictions? -- after all they are about statically detectable properties like naming conventions) I recommend looking at PyPy, which has as one of its explicitly stated project goals easy experimentation with different object models. - I'm sure I've forgotten something, but I wanted to keep my impressions fresh. - Again, John, thanks for taking the time to come up with an implementation of your idea! --Guido On Sat, Jun 26, 2010 at 9:39 AM, John Nagle wrote: > On 6/26/2010 7:44 AM, Jesse Noller wrote: >> >> On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord >> ?wrote: >>> >>> On 26/06/2010 07:11, John Nagle wrote: >>>> >>>> We have just released a proof-of-concept implementation of a new >>>> approach to thread management - "newthreading". > > .... > >>> The import * form is considered bad practise in *general* and >>> should not be recommended unless there is a good reason. > > ? I agree. ?I just did that to make the examples cleaner. > >>> however the introduction of free-threading in Python has not been >>> hampered by lack of synchronization primitives but by the >>> difficulty of changing the interpreter without unduly impacting >>> single threaded code. > > ? ?That's what I'm trying to address here. > >>> Providing an alternative garbage collection mechanism other than >>> reference counting would be a more interesting first-step as far as >>> I can see, as that removes the locking required around every access >>> to an object (which currently touches the reference count). >>> Introducing free-threading by *changing* the threading semantics >>> (so you can't share non-frozen objects between threads) would not >>> be acceptable. That comment is likely to be based on a >>> misunderstanding of your future intentions though. :-) > > ? ?This work comes out of a discussion a few of us had at a restaurant > in Palo Alto after a Stanford talk by the group at Facebook which > is building a JIT compiler for PHP. ?We were discussing how to > make threading both safe for the average programmer and efficient. > Javascript and PHP don't have threads at all; Python has safe > threading, but it's slow. ?C/C++/Java all have race condition > problems, of course. ?The Facebook guy pointed out that you > can't redefine a function dynamically in PHP, and they get > a performance win in their JIT by exploiting this. > > ? ?I haven't gone into the memory model in enough detail in the > technical paper. ?The memory model I envision for this has three > memory zones: > > ? ?1. ?Shared fully-immutable objects: primarily strings, numbers, > and tuples, all of whose elements are fully immutable. ?These can > be shared without locking, and reclaimed by a concurrent garbage > collector like Boehm's. ?They have no destructors, so finalization > is not an issue. > > ? ?2. ?Local objects. ?These are managed as at present, and > require no locking. ?These can either be thread-local, or local > to a synchronized object. ?There are no links between local > objects under different "ownership". ?Whether each thread and > object has its own private heap, or whether there's a common heap with > locks at the allocator is an implementation decision. > > ? ?3. ?Shared mutable objects: mostly synchronized objects, but > also immutable objects like tuples which contain references > to objects that aren't fully immutable. ?These are the high-overhead > objects, and require locking during reference count updates, or > atomic reference count operations if supported by the hardware. > The general idea is to minimize the number of objects in this > zone. > > ? ?The zone of an object is determined when the object is created, > and never changes. ? This is relatively simple to implement. > Tuples (and frozensets, frozendicts, etc.) are normally zone 2 > objects. ?Only "freeze" creates collections in zones 1 and 3. > Synchronized objects are always created in zone 3. > There are no difficult handoffs, where an object that was previously > thread-local now has to be shared and has to acquire locks during > the transition. > > ? ?Existing interlinked data structures, like parse trees and GUIs, > are by default zone 2 objects, with the same semantics as at > present. ?They can be placed inside a SynchronizedObject if > desired, which makes them usable from multiple threads. > That's optional; they're thread-local otherwise. > > ? ?The rationale behind "freezing" some of the language semantics > when the program goes multi-thread comes from two sources - > Adam Olsen's Safethread work, and the acceptance of the > multiprocessing module. ?Olsen tried to retain all the dynamism of > the language in a multithreaded environment, but locking all the > underlying dictionaries was a boat-anchor on the whole system, > and slowed things down so much that he abandoned the project. > The Unladen Swallow documentation indicates that early thinking > on the project was that Olsen's approach would allow getting > rid of the GIL, but later notes indicate that no path to a > GIL-free JIT system is currently in development. > > ? ?The multiprocessing module provides semantics similar to > threading with "freezing". ?Data passed between processes is "frozen" > by pickling. ?Processes can't modify each other's code. ?Restrictive > though the multiprocessing module is, it appears to be useful. > It is sometimes recommended as the Pythonic approach to multi-core CPUs. > This is an indication that "freezing" is not unacceptable to the > user community. > > ? ?Most of the real-world use cases for extreme dynamism > involve events that happen during startup. ?Configuration files are > read, modules are selectively included, functions are overridden, tables > of references to functions are set up, regular expressions are compiled, > and the code is brought into the appropriately configured state. ?Then > the worker threads are started and the real work starts. The > "newthreading" approach allows all that. > > ? ?After two decades of failed attempts remove the Global > Interpreter Lock without making performance worse, it is perhaps > time to take a harder look at scaleable threading semantics. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?John Nagle > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Animats -- --Guido van Rossum (python.org/~guido) From steve at holdenweb.com Tue Jun 29 15:56:11 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 09:56:11 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes Message-ID: I hope this is an appropriate dev topic. It seems to me that the unicode discussions of recent days are well highlighted by difficulties I am having using the mailbox module (hardly surprising given the difficulties of handling email generally) even though it passes its tests. I can't find anything related in the issue tracker (symptoms: one program that works fine under Python 2 in under twenty seconds takes forever (over ten minutes) to fail while creating the (start, stop) index to the mailbox). My code reads Thunderbird mailboxen from file store on my Windows Vista system under 3.1. The failures I am experiencing could easily be encoding issues so I won't post any detail yet, but I am concerned about the timing - even when the code is "fixed", if it needs to be, the performance may still make the module of dubious value. Can someone who is set up to do easily just do a timing of test_mailbox under 2.6 and 3.2, to verify they see the same disparity as me? The test takes about twice as long under 3.1 here (and I am concerned that unexercised aspects of the code may extend real-world problem run times by an order of magnitude or more). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From miki.tebeka at gmail.com Tue Jun 29 16:10:20 2010 From: miki.tebeka at gmail.com (Miki Tebeka) Date: Tue, 29 Jun 2010 07:10:20 -0700 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: Message-ID: Hello Steve, > Can someone who is set up to do easily just do a timing of test_mailbox > under 2.6 and 3.2, to verify they see the same disparity as me? The test > takes about twice as long under 3.1 here On Ubuntu timing was: Python 2.6.5: 23.8sec Python 2.7rc2: 32.7sec Python 3.1.2: 32.3sec All the best, -- Miki From orsenthil at gmail.com Tue Jun 29 16:11:20 2010 From: orsenthil at gmail.com (Senthil Kumaran) Date: Tue, 29 Jun 2010 19:41:20 +0530 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: Message-ID: <20100629141120.GA7448@remy> On Tue, Jun 29, 2010 at 09:56:11AM -0400, Steve Holden wrote: > Can someone who is set up to do easily just do a timing of test_mailbox > under 2.6 and 3.2, to verify they see the same disparity as me? The test Actually, No. Python 2.7b2+ (trunk:81685M, Jun 4 2010, 21:52:06) Ran 274 tests in 27.231s OK real 0m27.769s user 0m1.110s sys 0m0.440s Python 3.2a0 (py3k:82364M, Jun 29 2010, 19:37:27 Ran 268 tests in 24.444s OK real 0m25.126s user 0m2.810s sys 0m0.270s 07:39 PM:senthil@:~/python/py3k This is under Ubuntu 64 Bit. Perhaps, the problem you are observing is Windows Only? -- Senthil Banectomy, n.: The removal of bruises on a banana. -- Rich Hall, "Sniglets" From ncoghlan at gmail.com Tue Jun 29 16:14:31 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Jun 2010 00:14:31 +1000 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: Message-ID: Command line: ./python -m test.regrtest -v test_mailbox trunk: Ran 274 tests in 25.239s py3k: Ran 268 tests in 26.263s So I don't see any substantial difference on a Kubuntu 10.04 box (both builds are recent'ish, but not completely up to date). However, the underlying IO access is significantly different between POSIX and Windows, so there could still be something pathological happening at the filesystem manipulation layer. My comparisons are also 2.7 vs 3.2 rather than 2.6 vs 3.1. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at holdenweb.com Tue Jun 29 16:26:28 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 10:26:28 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: Message-ID: <4C2A0294.3070806@holdenweb.com> Nick Coghlan wrote: > Command line: ./python -m test.regrtest -v test_mailbox > > trunk: Ran 274 tests in 25.239s > py3k: Ran 268 tests in 26.263s > > So I don't see any substantial difference on a Kubuntu 10.04 box (both > builds are recent'ish, but not completely up to date). > > However, the underlying IO access is significantly different between > POSIX and Windows, so there could still be something pathological > happening at the filesystem manipulation layer. My comparisons are > also 2.7 vs 3.2 rather than 2.6 vs 3.1. > > Cheers, > Nick. > Thanks for all the timings! If a Windows user could do the same thing that would help ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Tue Jun 29 16:49:00 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 10:49:00 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A0294.3070806@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> Message-ID: Steve Holden wrote: > Nick Coghlan wrote: >> Command line: ./python -m test.regrtest -v test_mailbox >> >> trunk: Ran 274 tests in 25.239s >> py3k: Ran 268 tests in 26.263s >> >> So I don't see any substantial difference on a Kubuntu 10.04 box (both >> builds are recent'ish, but not completely up to date). >> >> However, the underlying IO access is significantly different between >> POSIX and Windows, so there could still be something pathological >> happening at the filesystem manipulation layer. My comparisons are >> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >> >> Cheers, >> Nick. >> > Thanks for all the timings! If a Windows user could do the same thing > that would help ... > And there is *definitely a performance issue. I created a Thunderbird folder of 26 Google alerts and just parsed then all after reading them in from the mailbox. 2.5 (!): 0.78 sec 3.1 : 42.80 sec Rather than debate the code here perhaps I should just open an issue for this? I can then provide both a program and some data, which can be added to the tests if appropriate. The issue can clearly stand some investigation. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From barry at python.org Tue Jun 29 16:50:12 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 29 Jun 2010 10:50:12 -0400 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28BF83.9080903@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> Message-ID: <20100629105012.341adc7b@heresy> On Jun 28, 2010, at 05:28 PM, M.-A. Lemburg wrote: >How many Python users will compile Python in debug mode ? How many Python users compile Python at all? :) >The point is that the default build of Python should use >the correct production settings for the C compiler out of >the box and that's what AC_PROG_CC is all about. Sure. >I'm pretty sure that Python developers who want to use a >debug build have enough code foo to get the -O2 turned into a -O0 >either by adjust OPT and/or by providing their own CFLAGS env var. Yes, but it's a PITA for several reasons, IMO: * It's pretty underdocumented * It's obscure * It's hard to remember the exact fu needed because you do it infrequently * I usually only remember my mistake when gdb acts funny I strongly suggest that --with-pydebug should be all you need to ensure the best debugging environment, which means turning off compiler optimization. Last time I tried, the -O0 was added and it worked well. (I know this has been in flux though.) >Also note that in some cases you may actually want to have >a debug build with optimizations turned on, e.g. to track down >a compiler optimization bug. Yes, but that's *much* more rare than wanting to step through some bit of C code without going crazy. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From mail at timgolden.me.uk Tue Jun 29 16:51:00 2010 From: mail at timgolden.me.uk (Tim Golden) Date: Tue, 29 Jun 2010 15:51:00 +0100 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A0294.3070806@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> Message-ID: <4C2A0854.5060004@timgolden.me.uk> On 29/06/2010 15:26, Steve Holden wrote: > Nick Coghlan wrote: >> Command line: ./python -m test.regrtest -v test_mailbox >> >> trunk: Ran 274 tests in 25.239s >> py3k: Ran 268 tests in 26.263s >> >> So I don't see any substantial difference on a Kubuntu 10.04 box (both >> builds are recent'ish, but not completely up to date). >> >> However, the underlying IO access is significantly different between >> POSIX and Windows, so there could still be something pathological >> happening at the filesystem manipulation layer. My comparisons are >> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >> >> Cheers, >> Nick. >> > Thanks for all the timings! If a Windows user could do the same thing > that would help ... WinXP SP3 2.6 Ran 272 tests in 13.172s 3.1 Ran 267 tests in 15.735s py3k A *lot* of ERROR and FAIL tests WinXP SP3 TJG From barry at python.org Tue Jun 29 16:51:35 2010 From: barry at python.org (Barry Warsaw) Date: Tue, 29 Jun 2010 10:51:35 -0400 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <4C28C7CB.8030600@egenix.com> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> <4C28C7CB.8030600@egenix.com> Message-ID: <20100629105135.245bf5d7@heresy> On Jun 28, 2010, at 06:03 PM, M.-A. Lemburg wrote: >OPT already uses -O0 if --with-pydebug is used and the >compiler supports -g. Since OPT gets added after CFLAGS, the override >already happens... So nobody's proposing to drop that? Good! Ignore my last message then. :) -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Tue Jun 29 16:56:22 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jun 2010 07:56:22 -0700 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> Message-ID: On Tue, Jun 29, 2010 at 7:49 AM, Steve Holden wrote: > Steve Holden wrote: >> Nick Coghlan wrote: >>> Command line: ./python -m test.regrtest -v test_mailbox >>> >>> trunk: Ran 274 tests in 25.239s >>> py3k: Ran 268 tests in 26.263s >>> >>> So I don't see any substantial difference on a Kubuntu 10.04 box (both >>> builds are recent'ish, but not completely up to date). >>> >>> However, the underlying IO access is significantly different between >>> POSIX and Windows, so there could still be something pathological >>> happening at the filesystem manipulation layer. My comparisons are >>> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >>> >>> Cheers, >>> Nick. >>> >> Thanks for all the timings! If a Windows user could do the same thing >> that would help ... >> > And there is *definitely a performance issue. I created a Thunderbird > folder of 26 Google alerts and just parsed then all after reading them > in from the mailbox. > > 2.5 (!): ?0.78 sec > 3.1 ? ?: 42.80 sec > > Rather than debate the code here perhaps I should just open an issue for > this? I can then provide both a program and some data, which can be > added to the tests if appropriate. The issue can clearly stand some > investigation. Since you have such a great reproducible test case, could you point the profiler at it? (Perhaps on a reduced dataset... The profiler multiples your run time by some number between 2 and 10 IIRC.) -- --Guido van Rossum (python.org/~guido) From mail at timgolden.me.uk Tue Jun 29 17:04:48 2010 From: mail at timgolden.me.uk (Tim Golden) Date: Tue, 29 Jun 2010 16:04:48 +0100 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A0854.5060004@timgolden.me.uk> References: <4C2A0294.3070806@holdenweb.com> <4C2A0854.5060004@timgolden.me.uk> Message-ID: <4C2A0B90.9020705@timgolden.me.uk> On 29/06/2010 15:51, Tim Golden wrote: > On 29/06/2010 15:26, Steve Holden wrote: >> Nick Coghlan wrote: >>> Command line: ./python -m test.regrtest -v test_mailbox >>> >>> trunk: Ran 274 tests in 25.239s >>> py3k: Ran 268 tests in 26.263s >>> >>> So I don't see any substantial difference on a Kubuntu 10.04 box (both >>> builds are recent'ish, but not completely up to date). >>> >>> However, the underlying IO access is significantly different between >>> POSIX and Windows, so there could still be something pathological >>> happening at the filesystem manipulation layer. My comparisons are >>> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >>> >>> Cheers, >>> Nick. >>> >> Thanks for all the timings! If a Windows user could do the same thing >> that would help ... > > WinXP SP3 > > 2.6 Ran 272 tests in 13.172s > 3.1 Ran 267 tests in 15.735s > py3k A *lot* of ERROR and FAIL tests py3k HEAD on Win7 Ran 268 tests in 34.055s TJG From vinay_sajip at yahoo.co.uk Tue Jun 29 17:15:22 2010 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Tue, 29 Jun 2010 15:15:22 +0000 (UTC) Subject: [Python-Dev] Pickle security and remote logging References: Message-ID: anatoly techtonik gmail.com> writes: > insecure. SocketHandler and DatagramHandler docs should at least > contain a warning about danger of exposing unpickling interfaces to > insecure networks. I've updated the documentation of SocketHandler.makePickle to mention security concerns, and that the method can be overridden to use a more secure implementation (e.g. HMAC-signed pickles). Regards, Vinay Sajip From steve at holdenweb.com Tue Jun 29 17:29:55 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 11:29:55 -0400 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: <20100629105012.341adc7b@heresy> References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> <20100629105012.341adc7b@heresy> Message-ID: Barry Warsaw wrote: > On Jun 28, 2010, at 05:28 PM, M.-A. Lemburg wrote: > >> How many Python users will compile Python in debug mode ? > > How many Python users compile Python at all? :) > >> The point is that the default build of Python should use >> the correct production settings for the C compiler out of >> the box and that's what AC_PROG_CC is all about. > > Sure. > >> I'm pretty sure that Python developers who want to use a >> debug build have enough code foo to get the -O2 turned into a -O0 >> either by adjust OPT and/or by providing their own CFLAGS env var. > > Yes, but it's a PITA for several reasons, IMO: > > * It's pretty underdocumented > * It's obscure > * It's hard to remember the exact fu needed because you do it infrequently > * I usually only remember my mistake when gdb acts funny > > I strongly suggest that --with-pydebug should be all you need to ensure the > best debugging environment, which means turning off compiler optimization. > Last time I tried, the -O0 was added and it worked well. (I know this has > been in flux though.) > >> Also note that in some cases you may actually want to have >> a debug build with optimizations turned on, e.g. to track down >> a compiler optimization bug. > > Yes, but that's *much* more rare than wanting to step through some bit of C > code without going crazy. I agree - trying to step through -O2 optimized code isn't going to help debug your code, it's going to help you debug the optimizer. That's a very rare use case. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From steve at holdenweb.com Tue Jun 29 17:40:50 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 11:40:50 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> Message-ID: Guido van Rossum wrote: > On Tue, Jun 29, 2010 at 7:49 AM, Steve Holden wrote: >> Steve Holden wrote: >>> Nick Coghlan wrote: >>>> Command line: ./python -m test.regrtest -v test_mailbox >>>> >>>> trunk: Ran 274 tests in 25.239s >>>> py3k: Ran 268 tests in 26.263s >>>> >>>> So I don't see any substantial difference on a Kubuntu 10.04 box (both >>>> builds are recent'ish, but not completely up to date). >>>> >>>> However, the underlying IO access is significantly different between >>>> POSIX and Windows, so there could still be something pathological >>>> happening at the filesystem manipulation layer. My comparisons are >>>> also 2.7 vs 3.2 rather than 2.6 vs 3.1. >>>> >>>> Cheers, >>>> Nick. >>>> >>> Thanks for all the timings! If a Windows user could do the same thing >>> that would help ... >>> >> And there is *definitely a performance issue. I created a Thunderbird >> folder of 26 Google alerts and just parsed then all after reading them >> in from the mailbox. >> >> 2.5 (!): 0.78 sec >> 3.1 : 42.80 sec >> >> Rather than debate the code here perhaps I should just open an issue for >> this? I can then provide both a program and some data, which can be >> added to the tests if appropriate. The issue can clearly stand some >> investigation. > > Since you have such a great reproducible test case, could you point > the profiler at it? (Perhaps on a reduced dataset... The profiler > multiples your run time by some number between 2 and 10 IIRC.) > Sure. I attach the outputs of both files, as well as the program and the data. With profiling (python -m cProfile test3.py) the run took less than a third of a second under 2.5, and 168 seconds under 3.1. I'd say that was problematical :) I will leave the profiler output to speak for itself, since I can find nothing much to say about it except that there's a hell of a lot of decoding going on inside mailbox.iterkeys(). regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test3.1.out URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test2.5.out URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test3.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test.mailbox URL: From solipsis at pitrou.net Tue Jun 29 18:34:22 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 29 Jun 2010 18:34:22 +0200 Subject: [Python-Dev] Mailbox module - timings and functionality changes References: <4C2A0294.3070806@holdenweb.com> Message-ID: <20100629183422.00f1997d@pitrou.net> On Tue, 29 Jun 2010 11:40:50 -0400 Steve Holden wrote: > Sure. I attach the outputs of both files, as well as the program and the > data. With profiling (python -m cProfile test3.py) the run took less > than a third of a second under 2.5, and 168 seconds under 3.1. I'd say > that was problematical :) > > I will leave the profiler output to speak for itself, since I can find > nothing much to say about it except that there's a hell of a lot of > decoding going on inside mailbox.iterkeys(). Ok, a lot of time is spent in cp1252 decoding. Somewhat less time, but still too much of it, is spent in TextIOWrapper.tell(). This seems to imply that mailbox files are opened in text mode, which sounds wrong to me. Perhaps Andrew can shed more light on this? From amk at amk.ca Tue Jun 29 18:34:42 2010 From: amk at amk.ca (A.M. Kuchling) Date: Tue, 29 Jun 2010 12:34:42 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> Message-ID: <20100629163442.GA5051@amk-desktop.matrixgroup.net> On Tue, Jun 29, 2010 at 07:56:22AM -0700, Guido van Rossum wrote: > Since you have such a great reproducible test case, could you point > the profiler at it? (Perhaps on a reduced dataset... The profiler > multiples your run time by some number between 2 and 10 IIRC.) Let me underline Guido's suggestion. Steve, I've done a lot of mailbox.py stuff and can look at your problem, but off the top of my head, my suspicion would be that I/O is the culprit, and a profile could confirm that. My thought is that mailbox.py is opening the file in some reading mode that ends up doing a lot more processing on Windows than on Unix because of universal newlines or something like that. --amk From amk at amk.ca Tue Jun 29 18:52:28 2010 From: amk at amk.ca (A.M. Kuchling) Date: Tue, 29 Jun 2010 12:52:28 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> Message-ID: <20100629165228.GA5350@amk-desktop.matrixgroup.net> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: > I will leave the profiler output to speak for itself, since I can find > nothing much to say about it except that there's a hell of a lot of > decoding going on inside mailbox.iterkeys(). The problem is actually in _generate_toc(), which is reading through the entire file to figure out where all the 'From' lines that start messages are located. TextIOWrapper()'s tell() method seems to be very slow, so one help is to only call tell() when necessary; patch: -> svn diff Lib/ Index: Lib/mailbox.py =================================================================== --- Lib/mailbox.py (revision 82346) +++ Lib/mailbox.py (working copy) @@ -775,13 +775,14 @@ starts, stops = [], [] self._file.seek(0) while True: - line_pos = self._file.tell() line = self._file.readline() if line.startswith('From '): + line_pos = self._file.tell() if len(stops) < len(starts): stops.append(line_pos - len(os.linesep)) starts.append(line_pos) elif not line: + line_pos = self._file.tell() stops.append(line_pos) break self._toc = dict(enumerate(zip(starts, stops))) But should mailboxes really be opened in a UTF-8 encoding, or should they be treated as 7-bit text? I'll have to think about this. --amk From rdmurray at bitdance.com Tue Jun 29 19:20:35 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 29 Jun 2010 13:20:35 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100629183422.00f1997d@pitrou.net> References: <4C2A0294.3070806@holdenweb.com> <20100629183422.00f1997d@pitrou.net> Message-ID: <20100629172035.8348D21A2AF@kimball.webabinitio.net> On Tue, 29 Jun 2010 18:34:22 +0200, Antoine Pitrou wrote: > On Tue, 29 Jun 2010 11:40:50 -0400 > Steve Holden wrote: > > Sure. I attach the outputs of both files, as well as the program and the > > data. With profiling (python -m cProfile test3.py) the run took less > > than a third of a second under 2.5, and 168 seconds under 3.1. I'd say > > that was problematical :) > > > > I will leave the profiler output to speak for itself, since I can find > > nothing much to say about it except that there's a hell of a lot of > > decoding going on inside mailbox.iterkeys(). > > Ok, a lot of time is spent in cp1252 decoding. Somewhat less time, but > still too much of it, is spent in TextIOWrapper.tell(). This seems to > imply that mailbox files are opened in text mode, which sounds wrong to > me. Perhaps Andrew can shed more light on this? Given the current state of the email package for python3, it makes sense that it would open them in text mode. email can't currently process bytes, only text. -- R. David Murray www.bitdance.com From solipsis at pitrou.net Tue Jun 29 19:30:53 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 29 Jun 2010 19:30:53 +0200 Subject: [Python-Dev] Mailbox module - timings and functionality changes References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> Message-ID: <20100629193053.750991e1@pitrou.net> On Tue, 29 Jun 2010 12:52:28 -0400 "A.M. Kuchling" wrote: > > But should mailboxes really be opened in a UTF-8 encoding, or should > they be treated as 7-bit text? I'll have to think about this. I don't see how you can assume UTF-8 for mailbox files, given that each message will have its particular encoding. Besides, Steve's profile results show that you are not using UTF-8, but rather the local encoding, which is cp1252 under his Windows setup. Regards Antoine. From steve at holdenweb.com Tue Jun 29 19:54:09 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 13:54:09 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100629165228.GA5350@amk-desktop.matrixgroup.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> Message-ID: <4C2A3341.4010705@holdenweb.com> A.M. Kuchling wrote: > On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: >> I will leave the profiler output to speak for itself, since I can find >> nothing much to say about it except that there's a hell of a lot of >> decoding going on inside mailbox.iterkeys(). > > The problem is actually in _generate_toc(), which is reading through > the entire file to figure out where all the 'From' lines that start > messages are located. TextIOWrapper()'s tell() method seems to be > very slow, so one help is to only call tell() when necessary; patch: > > -> svn diff Lib/ > Index: Lib/mailbox.py > =================================================================== > --- Lib/mailbox.py (revision 82346) > +++ Lib/mailbox.py (working copy) > @@ -775,13 +775,14 @@ > starts, stops = [], [] > self._file.seek(0) > while True: > - line_pos = self._file.tell() > line = self._file.readline() > if line.startswith('From '): > + line_pos = self._file.tell() > if len(stops) < len(starts): > stops.append(line_pos - len(os.linesep)) > starts.append(line_pos) > elif not line: > + line_pos = self._file.tell() > stops.append(line_pos) > break > self._toc = dict(enumerate(zip(starts, stops))) > > But should mailboxes really be opened in a UTF-8 encoding, or should > they be treated as 7-bit text? I'll have to think about this. Neither! You can't open them as 7-bit text, because real-world email does contain bytes whose ordinal value exceeds 127. You can't open them using a text encoding because theoretically there might be ASCII headers that indicate that parts of the content are in specific character sets or encodings. If only we had a data structure that easily allowed us to manipulate 8-bit characters ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From guido at python.org Tue Jun 29 21:26:31 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jun 2010 12:26:31 -0700 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A3341.4010705@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> Message-ID: It should probably be opened in binary mode. Binary files do have a .readline() method (returning a bytes object), and bytes objects have a .startswith() method. The tell positions computed this way are even compatible with those used by the text file. So you could do it this way: - open binary stream - compute TOC by reading through it using .readline() and .tell() - rewind (don't close) - wrap the binary stream in a text stream - use that for the rest of the code --Guido On Tue, Jun 29, 2010 at 10:54 AM, Steve Holden wrote: > A.M. Kuchling wrote: >> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: >>> I will leave the profiler output to speak for itself, since I can find >>> nothing much to say about it except that there's a hell of a lot of >>> decoding going on inside mailbox.iterkeys(). >> >> The problem is actually in _generate_toc(), which is reading through >> the entire file to figure out where all the 'From' lines that start >> messages are located. ?TextIOWrapper()'s tell() method seems to be >> very slow, so one help is to only call tell() when necessary; patch: >> >> -> svn diff Lib/ >> Index: Lib/mailbox.py >> =================================================================== >> --- Lib/mailbox.py ? ?(revision 82346) >> +++ Lib/mailbox.py ? ?(working copy) >> @@ -775,13 +775,14 @@ >> ? ? ? ? ?starts, stops = [], [] >> ? ? ? ? ?self._file.seek(0) >> ? ? ? ? ?while True: >> - ? ? ? ? ? ?line_pos = self._file.tell() >> ? ? ? ? ? ? ?line = self._file.readline() >> ? ? ? ? ? ? ?if line.startswith('From '): >> + ? ? ? ? ? ? ? ?line_pos = self._file.tell() >> ? ? ? ? ? ? ? ? ?if len(stops) < len(starts): >> ? ? ? ? ? ? ? ? ? ? ?stops.append(line_pos - len(os.linesep)) >> ? ? ? ? ? ? ? ? ?starts.append(line_pos) >> ? ? ? ? ? ? ?elif not line: >> + ? ? ? ? ? ? ? ?line_pos = self._file.tell() >> ? ? ? ? ? ? ? ? ?stops.append(line_pos) >> ? ? ? ? ? ? ? ? ?break >> ? ? ? ? ?self._toc = dict(enumerate(zip(starts, stops))) >> >> But should mailboxes really be opened in a UTF-8 encoding, or should >> they be treated as 7-bit text? ?I'll have to think about this. > > Neither! You can't open them as 7-bit text, because real-world email > does contain bytes whose ordinal value exceeds 127. You can't open them > using a text encoding because theoretically there might be ASCII headers > that indicate that parts of the content are in specific character sets > or encodings. > > If only we had a data structure that easily allowed us to manipulate > 8-bit characters ... > > regards > ?Steve > -- > Steve Holden ? ? ? ? ? +1 571 484 6266 ? +1 800 494 3119 > See Python Video! ? ? ? http://python.mirocommunity.org/ > Holden Web LLC ? ? ? ? ? ? ? ? http://www.holdenweb.com/ > UPCOMING EVENTS: ? ? ? ?http://holdenweb.eventbrite.com/ > "All I want for my birthday is another birthday" - > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Ian Dury, 1942-2000 > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido) From steve at holdenweb.com Tue Jun 29 23:02:14 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 17:02:14 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> Message-ID: <4C2A5F56.2010700@holdenweb.com> Guido van Rossum wrote: > It should probably be opened in binary mode. Binary files do have a > .readline() method (returning a bytes object), and bytes objects have > a .startswith() method. The tell positions computed this way are even > compatible with those used by the text file. So you could do it this > way: > > - open binary stream > - compute TOC by reading through it using .readline() and .tell() > - rewind (don't close) Because closing is inefficient, or because it breaks the algorithm? > - wrap the binary stream in a text stream "wrap" how? The ultimate destiny of the text is twofold: 1) To be stored as some kind of LOB in a database, and 2) Therefrom to be reconstituted and parsed into email.Message objects. Is the wrapping a one-off operation or a software layer? Sorry, being a bit dense here, I know. regards Steve > - use that for the rest of the code > > --Guido > > On Tue, Jun 29, 2010 at 10:54 AM, Steve Holden wrote: >> A.M. Kuchling wrote: >>> On Tue, Jun 29, 2010 at 11:40:50AM -0400, Steve Holden wrote: >>>> I will leave the profiler output to speak for itself, since I can find >>>> nothing much to say about it except that there's a hell of a lot of >>>> decoding going on inside mailbox.iterkeys(). >>> The problem is actually in _generate_toc(), which is reading through >>> the entire file to figure out where all the 'From' lines that start >>> messages are located. TextIOWrapper()'s tell() method seems to be >>> very slow, so one help is to only call tell() when necessary; patch: >>> >>> -> svn diff Lib/ >>> Index: Lib/mailbox.py >>> =================================================================== >>> --- Lib/mailbox.py (revision 82346) >>> +++ Lib/mailbox.py (working copy) >>> @@ -775,13 +775,14 @@ >>> starts, stops = [], [] >>> self._file.seek(0) >>> while True: >>> - line_pos = self._file.tell() >>> line = self._file.readline() >>> if line.startswith('From '): >>> + line_pos = self._file.tell() >>> if len(stops) < len(starts): >>> stops.append(line_pos - len(os.linesep)) >>> starts.append(line_pos) >>> elif not line: >>> + line_pos = self._file.tell() >>> stops.append(line_pos) >>> break >>> self._toc = dict(enumerate(zip(starts, stops))) >>> >>> But should mailboxes really be opened in a UTF-8 encoding, or should >>> they be treated as 7-bit text? I'll have to think about this. >> Neither! You can't open them as 7-bit text, because real-world email >> does contain bytes whose ordinal value exceeds 127. You can't open them >> using a text encoding because theoretically there might be ASCII headers >> that indicate that parts of the content are in specific character sets >> or encodings. >> >> If only we had a data structure that easily allowed us to manipulate >> 8-bit characters ... >> >> regards >> Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From techtonik at gmail.com Wed Jun 30 01:22:59 2010 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 30 Jun 2010 02:22:59 +0300 Subject: [Python-Dev] Pickle security and remote logging In-Reply-To: References: Message-ID: On Tue, Jun 29, 2010 at 6:15 PM, Vinay Sajip wrote: > > I've updated the documentation of SocketHandler.makePickle to mention security > concerns, and that the method can be overridden to use a more secure > implementation (e.g. HMAC-signed pickles). Thanks. But I doubt HMAC complication helps to protect logging server. If shared key is compromised -server becomes vulnerable. I would prefer approach when no code execution is possible. Some alternative serialization way for transmitting log data structures over network. Protocol buffers first come in mind, but they seem to be an overkill, and stdlib doesn't include any implementation. -- anatoly t. From guido at python.org Wed Jun 30 01:41:52 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jun 2010 16:41:52 -0700 Subject: [Python-Dev] Pickle security and remote logging In-Reply-To: References: Message-ID: On Tue, Jun 29, 2010 at 4:22 PM, anatoly techtonik wrote: > On Tue, Jun 29, 2010 at 6:15 PM, Vinay Sajip wrote: >> >> I've updated the documentation of SocketHandler.makePickle to mention security >> concerns, and that the method can be overridden to use a more secure >> implementation (e.g. HMAC-signed pickles). > > Thanks. But I doubt HMAC complication helps to protect logging server. > If shared key is compromised -server becomes vulnerable. I would > prefer approach when no code execution is possible. Some alternative > serialization way for transmitting log data structures over network. > Protocol buffers first come in mind, but they seem to be an overkill, > and stdlib doesn't include any implementation. You could use marshal by default. It does not execute code when unmarshalling. A limitation is that it only supports built-in types like list, dict, string etc. but that might be just fine for logging data. Another option would be JSON. (Or XML, if you want bulky. :-) As for protocol buffers, assuming its absence (so far :-) from the stdlib is the only objection, how hard would it be to make the logging package "prepared" so that if one *did* have protocol buffers installed, it would be a one-line config setting to use them? -- --Guido van Rossum (python.org/~guido) From rdmurray at bitdance.com Wed Jun 30 01:56:30 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 29 Jun 2010 19:56:30 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A3341.4010705@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> Message-ID: <20100629235630.E02B61FDDBE@kimball.webabinitio.net> On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden wrote: > A.M. Kuchling wrote: > > But should mailboxes really be opened in a UTF-8 encoding, or should > > they be treated as 7-bit text? I'll have to think about this. > > Neither! You can't open them as 7-bit text, because real-world email > does contain bytes whose ordinal value exceeds 127. You can't open them > using a text encoding because theoretically there might be ASCII headers > that indicate that parts of the content are in specific character sets > or encodings. > > If only we had a data structure that easily allowed us to manipulate > 8-bit characters ... email6 *will* handle this use case. When it exists :) But note that it is *not* just a matter of easily handling 8 bit characters. There are a whole bunch of algorithms needed for interpreting that 7 and 8 bit data. All the info is there in the email headers, but being able to do string operations on 8 bit byte strings doesn't get you the answers you need by itself. It really is the case that the Python3 bytes/unicode split forces us to redo most of the algorithms so that they handle bytes and text *correctly*. This isn't a trivial undertaking, but the end result will be well worth it. -- R. David Murray www.bitdance.com From rdmurray at bitdance.com Wed Jun 30 02:05:29 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Tue, 29 Jun 2010 20:05:29 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <4C2A5F56.2010700@holdenweb.com> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> Message-ID: <20100630000529.3AA351FF08C@kimball.webabinitio.net> On Tue, 29 Jun 2010 17:02:14 -0400, Steve Holden wrote: > Guido van Rossum wrote: > > > - wrap the binary stream in a text stream > > "wrap" how? The ultimate destiny of the text is twofold: I would imagine Guido is talking about an io.TextIOWrapper...in other words, take the binary file you've just finished grabbing info from, and reread it as a text file in order to grab the actual message content. If you have messages in your files that are using an 8bit content transfer encoding, then you (currently) will have some problems unless the charset happens to be the one you use when you wrap the binary stream as a text stream. -- R. David Murray www.bitdance.com From steve at holdenweb.com Wed Jun 30 02:31:59 2010 From: steve at holdenweb.com (Steve Holden) Date: Tue, 29 Jun 2010 20:31:59 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100629235630.E02B61FDDBE@kimball.webabinitio.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <20100629235630.E02B61FDDBE@kimball.webabinitio.net> Message-ID: <4C2A907F.1010409@holdenweb.com> R. David Murray wrote: > On Tue, 29 Jun 2010 13:54:09 -0400, Steve Holden wrote: >> A.M. Kuchling wrote: >>> But should mailboxes really be opened in a UTF-8 encoding, or should >>> they be treated as 7-bit text? I'll have to think about this. >> Neither! You can't open them as 7-bit text, because real-world email >> does contain bytes whose ordinal value exceeds 127. You can't open them >> using a text encoding because theoretically there might be ASCII headers >> that indicate that parts of the content are in specific character sets >> or encodings. >> >> If only we had a data structure that easily allowed us to manipulate >> 8-bit characters ... > > email6 *will* handle this use case. When it exists :) But note that it > is *not* just a matter of easily handling 8 bit characters. There are > a whole bunch of algorithms needed for interpreting that 7 and 8 bit data. > All the info is there in the email headers, but being able to do string > operations on 8 bit byte strings doesn't get you the answers you need > by itself. > > It really is the case that the Python3 bytes/unicode split forces us > to redo most of the algorithms so that they handle bytes and text > *correctly*. This isn't a trivial undertaking, but the end result > will be well worth it. > I completely agree. The unusual thing here is that I of all people should find himself running into these issues, since my use of Python is normally pretty conservative. Since the course I am currently writing is already overdue I have to find answers now to problems that were present in the initial 3.0 release and have not received much attention since. You know that I support your work to revise the email package. I hope that we can eventually have it incorporate mailbox readers as well. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ UPCOMING EVENTS: http://holdenweb.eventbrite.com/ "All I want for my birthday is another birthday" - Ian Dury, 1942-2000 From janssen at parc.com Wed Jun 30 04:55:12 2010 From: janssen at parc.com (Bill Janssen) Date: Tue, 29 Jun 2010 19:55:12 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? Message-ID: <71728.1277866512@parc.com> My Leopard and Tiger PPC buildbots are momentarily green! But I'm looking into why I'm skipping some tests. My buildbots are up-to-date OS-wise and very vanilla, with the latest applicable Xcode. 4 skips unexpected on darwin: test_gdb test_ioctl test_readline test_ttk_guionly Three of these (gdb, readline, ttk_guionly) are just bad predictions of which tests should skip on Darwin, I think -- gdb is only version 6, so that test won't run, readline doesn't get built, ttk doesn't work without Tcl/Tk 8.5. But the the skip of test_ioctl baffles me. "test_ioctl skipped -- Unable to open /dev/tty" But when I log in via ssh and try it with the system python: ~ wjanssen$ python python Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> open("/dev/tty") open("/dev/tty") >>> Seems to work fine. So this I don't understand. Any ideas, anyone? Bill From stephen at xemacs.org Wed Jun 30 04:55:02 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 30 Jun 2010 11:55:02 +0900 Subject: [Python-Dev] what environment variable should contain compiler warning suppression flags? In-Reply-To: References: <4C268F1E.5070506@egenix.com> <4C2889B7.2060105@egenix.com> <4C28ABD4.1030000@egenix.com> <4C28BF83.9080903@egenix.com> <20100629105012.341adc7b@heresy> Message-ID: <87y6dxb56h.fsf@uwakimon.sk.tsukuba.ac.jp> Steve Holden writes: > I agree - trying to step through -O2 optimized code isn't going to > help debug your code, it's going to help you debug the > optimizer. That's a very rare use case. Not really. I don't have a lot of practice in debugging at that level, so take it with a grain of salt, but what I've found with XEmacs code is that debugging at -O0 is less often helpful than debugging at -O2. Quite often a naive compilation strategy is used which basically turns those C statements into macros for the underlying assembler, and the code works the way the author thinks it should. But his assumptions are invalid, and when optimized it fails. So I guess you can call that "debugging the optimizer" if you like.... From guido at python.org Wed Jun 30 05:57:09 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jun 2010 20:57:09 -0700 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <71728.1277866512@parc.com> References: <71728.1277866512@parc.com> Message-ID: On Tue, Jun 29, 2010 at 7:55 PM, Bill Janssen wrote: > My Leopard and Tiger PPC buildbots are momentarily green! ?But I'm > looking into why I'm skipping some tests. ?My buildbots are up-to-date > OS-wise and very vanilla, with the latest applicable Xcode. > > 4 skips unexpected on darwin: > ? ?test_gdb test_ioctl test_readline test_ttk_guionly > > Three of these (gdb, readline, ttk_guionly) are just bad predictions of > which tests should skip on Darwin, I think -- gdb is only version 6, so > that test won't run, readline doesn't get built, ttk doesn't work > without Tcl/Tk 8.5. So it looks like you gould get readline and ttk to run and pass by separately downloading and installing readline (I've done this many times before) and Tcl/Tk (no idea but I suppose it should work). >?But the the skip of test_ioctl baffles me. > > "test_ioctl skipped -- Unable to open /dev/tty" > > But when I log in via ssh and try it with the system python: > > ~ wjanssen$ python > python > Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34) > [GCC 4.0.1 (Apple Inc. build 5465)] on darwin > Type "help", "copyright", "credits" or "license" for more information. >>>> open("/dev/tty") > open("/dev/tty") > >>>> > > Seems to work fine. ?So this I don't understand. ?Any ideas, anyone? Maybe the buildbot runs the tests as a tty-less daemon process. If you ask me it's pretty crazy to have a test that requires a tty. But there you have it -- and it's the same in Python 3. (But then again, who knows, I might have written that test. ;-) -- --Guido van Rossum (python.org/~guido) From martin at v.loewis.de Wed Jun 30 07:24:33 2010 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 30 Jun 2010 07:24:33 +0200 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <71728.1277866512@parc.com> References: <71728.1277866512@parc.com> Message-ID: <4C2AD511.5020709@v.loewis.de> > Seems to work fine. So this I don't understand. Any ideas, anyone? Didn't we discuss this before? The buildbot slave has no controlling terminal anymore, hence it cannot open /dev/tty. If you are curious, just patch your checkout to output the exact errno (e.g. to stdout), and trigger a build through the web. Regards, Martin From martin at v.loewis.de Wed Jun 30 07:37:18 2010 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 30 Jun 2010 07:37:18 +0200 Subject: [Python-Dev] Taking over the Mercurial Migration Message-ID: <4C2AD80E.9010404@v.loewis.de> It seems that both Dirkjan and Brett are very caught up with real life for the coming months. So I suggest that some other committer who favors the Mercurial transition steps forward and takes over this project. If nobody volunteers, I propose that we release 3.2 from Subversion, and reconsider Mercurial migration next year. Regards, Martin From stephen at xemacs.org Wed Jun 30 08:19:37 2010 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 30 Jun 2010 15:19:37 +0900 Subject: [Python-Dev] Taking over the Mercurial Migration In-Reply-To: <4C2AD80E.9010404@v.loewis.de> References: <4C2AD80E.9010404@v.loewis.de> Message-ID: <87sk45avpi.fsf@uwakimon.sk.tsukuba.ac.jp> "Martin v. L?wis" writes: > It seems that both Dirkjan and Brett are very caught up > with real life for the coming months. So I suggest that > some other committer who favors the Mercurial transition > steps forward and takes over this project. I am not a committer, and am not intimately familiar with PEP 385, so not appropriate to become the proponent, I think. However, I am one of the PEP 374 co-authors, and have experience with previous transition to Mercurial of similar scale (XEmacs). I can promise to devote time to the transition in July and August, in support of whoever might step forward. I hope someone does. > If nobody volunteers, I propose that we release 3.2 > from Subversion, and reconsider Mercurial migration > next year. In the absence of a volunteer, I think that's probably necessary. From g.brandl at gmx.net Wed Jun 30 10:41:51 2010 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 30 Jun 2010 10:41:51 +0200 Subject: [Python-Dev] Taking over the Mercurial Migration In-Reply-To: <4C2AD80E.9010404@v.loewis.de> References: <4C2AD80E.9010404@v.loewis.de> Message-ID: Am 30.06.2010 07:37, schrieb "Martin v. L?wis": > It seems that both Dirkjan and Brett are very caught up > with real life for the coming months. So I suggest that > some other committer who favors the Mercurial transition > steps forward and takes over this project. > > If nobody volunteers, I propose that we release 3.2 > from Subversion, and reconsider Mercurial migration > next year. IIUC, Dirkjan is only caught up for another month. I have no problems with releasing a first 3.2 alpha from SVN and then switching, so I propose that we target the migration for August -- I can help in the second half of August if needed. Georg From vinay_sajip at yahoo.co.uk Wed Jun 30 11:23:37 2010 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Wed, 30 Jun 2010 09:23:37 +0000 (UTC) Subject: [Python-Dev] Pickle security and remote logging References: Message-ID: Guido van Rossum python.org> writes: > As for protocol buffers, assuming its absence (so far from the > stdlib is the only objection, how hard would it be to make the logging > package "prepared" so that if one *did* have protocol buffers > installed, it would be a one-line config setting to use them? I envisage that if protocol buffers were available, and if support for them in logging was to be added, this could be done via an optional keyword arg to the SocketHandler which sets a handler attribute, which would then be used in makePickle to make the required serialized form. @anatoly: The documentation just mentions HMAC as an example; the levels of paranoia to be applied are different for different people, different times and different situations ;-) I assume that someone reading the docs could readily see that they could substitute "sign the pickle" with some alternative strategy in makePickle. You could implement marshal, protocol buffers etc. right now just by overriding SocketHandler.makePickle in your custom class. An alternative strategy would be to provide an optional serializer=None callable in the SocketHandler constructor. If specified, then makePickle would call this serializer with the LogRecord instance as the only argument, and use the return value as the serialized form, instead of calling pickle.dumps. Regards, Vinay Sajip From exarkun at twistedmatrix.com Wed Jun 30 13:32:32 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 11:32:32 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2AD511.5020709@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> Message-ID: <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> On 05:24 am, martin at v.loewis.de wrote: >>Seems to work fine. So this I don't understand. Any ideas, anyone? > >Didn't we discuss this before? The buildbot slave has no controlling >terminal anymore, hence it cannot open /dev/tty. If you are curious, >just patch your checkout to output the exact errno (e.g. to stdout), >and trigger a build through the web. Could the test be rewritten (or supplemented) to use a pty? Most or perhaps all of the same operations should be supported. Jean-Paul From steve at holdenweb.com Wed Jun 30 14:42:05 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 30 Jun 2010 08:42:05 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100630000529.3AA351FF08C@kimball.webabinitio.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> Message-ID: <4C2B3B9D.3080200@holdenweb.com> R. David Murray wrote: > On Tue, 29 Jun 2010 17:02:14 -0400, Steve Holden wrote: >> Guido van Rossum wrote: >> >>> - wrap the binary stream in a text stream >> "wrap" how? The ultimate destiny of the text is twofold: > > I would imagine Guido is talking about an io.TextIOWrapper...in other > words, take the binary file you've just finished grabbing info > from, and reread it as a text file in order to grab the actual > message content. > > If you have messages in your files that are using an 8bit content > transfer encoding, then you (currently) will have some problems > unless the charset happens to be the one you use when you wrap > the binary stream as a text stream. > http://bugs.python.org/issue9124 regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ From steve at holdenweb.com Wed Jun 30 14:42:05 2010 From: steve at holdenweb.com (Steve Holden) Date: Wed, 30 Jun 2010 08:42:05 -0400 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100630000529.3AA351FF08C@kimball.webabinitio.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> Message-ID: <4C2B3B9D.3080200@holdenweb.com> R. David Murray wrote: > On Tue, 29 Jun 2010 17:02:14 -0400, Steve Holden wrote: >> Guido van Rossum wrote: >> >>> - wrap the binary stream in a text stream >> "wrap" how? The ultimate destiny of the text is twofold: > > I would imagine Guido is talking about an io.TextIOWrapper...in other > words, take the binary file you've just finished grabbing info > from, and reread it as a text file in order to grab the actual > message content. > > If you have messages in your files that are using an 8bit content > transfer encoding, then you (currently) will have some problems > unless the charset happens to be the one you use when you wrap > the binary stream as a text stream. > http://bugs.python.org/issue9124 regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 DjangoCon US September 7-9, 2010 http://djangocon.us/ See Python Video! http://python.mirocommunity.org/ Holden Web LLC http://www.holdenweb.com/ From janssen at parc.com Wed Jun 30 18:00:09 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 30 Jun 2010 09:00:09 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: References: <71728.1277866512@parc.com> Message-ID: <68796.1277913609@parc.com> Guido van Rossum wrote: > On Tue, Jun 29, 2010 at 7:55 PM, Bill Janssen wrote: > > My Leopard and Tiger PPC buildbots are momentarily green! ?But I'm > > looking into why I'm skipping some tests. ?My buildbots are up-to-date > > OS-wise and very vanilla, with the latest applicable Xcode. > > > > 4 skips unexpected on darwin: > > ? ?test_gdb test_ioctl test_readline test_ttk_guionly > > > > Three of these (gdb, readline, ttk_guionly) are just bad predictions of > > which tests should skip on Darwin, I think -- gdb is only version 6, so > > that test won't run, readline doesn't get built, ttk doesn't work > > without Tcl/Tk 8.5. > > So it looks like you gould get readline and ttk to run and pass by > separately downloading and installing readline (I've done this many > times before) and Tcl/Tk (no idea but I suppose it should work). Sure. But the skips should be expected "on Darwin", since a vanilla OS X system apparently won't have the necessary bits. At the very least, regrtest.py should test for these conditions and add them to the "expected skips" list if necessary. I'll work up a patch. > >?But the the skip of test_ioctl baffles me. > > > > "test_ioctl skipped -- Unable to open /dev/tty" > > > > But when I log in via ssh and try it with the system python: > > > > ~ wjanssen$ python > > python > > Python 2.5.1 (r251:54863, Jun 17 2009, 20:37:34) > > [GCC 4.0.1 (Apple Inc. build 5465)] on darwin > > Type "help", "copyright", "credits" or "license" for more information. > >>>> open("/dev/tty") > > open("/dev/tty") > > > >>>> > > > > Seems to work fine. ?So this I don't understand. ?Any ideas, anyone? > > Maybe the buildbot runs the tests as a tty-less daemon process. If you > ask me it's pretty crazy to have a test that requires a tty. But there > you have it -- and it's the same in Python 3. (But then again, who > knows, I might have written that test. ;-) So, my question then is, why are these skips "unexpected"? Seems to me that if this is the case, this test will never run on any platform. Bill From janssen at parc.com Wed Jun 30 18:03:15 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 30 Jun 2010 09:03:15 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2AD511.5020709@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> Message-ID: <68821.1277913795@parc.com> Martin v. L?wis wrote: > > Seems to work fine. So this I don't understand. Any ideas, anyone? > > Didn't we discuss this before? Possibly, but I don't recall doing so. > The buildbot slave has no controlling > terminal anymore, hence it cannot open /dev/tty. If you are curious, > just patch your checkout to output the exact errno (e.g. to stdout), > and trigger a build through the web. So, why is skipping this test "unexpected"? I see "x86 Tiger" is also showing this as an unexpected skip. Should I just add it to the list of expected skips on Darwin? Actually, will it run on any platform? Bill From janssen at parc.com Wed Jun 30 18:26:24 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 30 Jun 2010 09:26:24 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> Message-ID: <69334.1277915184@parc.com> exarkun at twistedmatrix.com wrote: > Could the test be rewritten (or supplemented) to use a pty? Most or > perhaps all of the same operations should be supported. Buildbot seems to be explicitly not using a PTY. From the the top of the test output: make buildbottest in dir /Users/buildbot/buildarea/trunk.parc-leopard-1/build (timeout 1800 secs) watching logfiles {} argv: ['make', 'buildbottest'] [...] closing stdin using PTY: False I believe this is specified by the build master. This test seems to work on Ubuntu and FreeBSD, though. Bill From solipsis at pitrou.net Wed Jun 30 18:42:58 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jun 2010 18:42:58 +0200 Subject: [Python-Dev] Mailbox module - timings and functionality changes References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> Message-ID: <20100630184258.473d8535@pitrou.net> On Tue, 29 Jun 2010 20:05:29 -0400 "R. David Murray" wrote: > > I would imagine Guido is talking about an io.TextIOWrapper...in other > words, take the binary file you've just finished grabbing info > from, and reread it as a text file in order to grab the actual > message content. This sounds a bit suboptimal to me (and introduces race conditions if e.g. the file is replaced with another one before you reopen it). You could instead decode the binary data by yourself, especially if you have already stored that data somewhere. Also, please note that values used by seek() and tell() on text I/O are "opaque cookies". While they can happen to match the raw binary file position, it is a mere coincidence (or an implementation detail, at your will). Therefore, reusing tell() values of a binary file to seek() a TextIOWrapper accessing the same file is wrong. From solipsis at pitrou.net Wed Jun 30 18:44:57 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jun 2010 18:44:57 +0200 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? References: <71728.1277866512@parc.com> <68796.1277913609@parc.com> Message-ID: <20100630184457.10067764@pitrou.net> On Wed, 30 Jun 2010 09:00:09 PDT Bill Janssen wrote: > > So, my question then is, why are these skips "unexpected"? Seems to me > that if this is the case, this test will never run on any platform. You can change the value of the "usepty" option in your buildbot.tac. (you will also have to restart the buildslave process) Regards Antoine. From guido at python.org Wed Jun 30 19:03:49 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Jun 2010 10:03:49 -0700 Subject: [Python-Dev] Mailbox module - timings and functionality changes In-Reply-To: <20100630184258.473d8535@pitrou.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> <20100630184258.473d8535@pitrou.net> Message-ID: On Wed, Jun 30, 2010 at 9:42 AM, Antoine Pitrou wrote: > On Tue, 29 Jun 2010 20:05:29 -0400 > "R. David Murray" wrote: >> >> I would imagine Guido is talking about an io.TextIOWrapper...in other >> words, take the binary file you've just finished grabbing info >> from, and reread it as a text file in order to grab the actual >> message content. > > This sounds a bit suboptimal to me (and introduces race conditions if > e.g. the file is replaced with another one before you reopen it). You > could instead decode the binary data by yourself, especially if you > have already stored that data somewhere. That's why I proposed not reopening but wrapping. Of course the contents of the file could still change, but that's a limitation of how the mailbox module works -- it builds a TOC and expects the file not to change. > Also, please note that values used by seek() and tell() on > text I/O are "opaque cookies". While they can happen to match the > raw binary file position, it is a mere coincidence (or an > implementation detail, at your will). Therefore, reusing tell() values > of a binary file to seek() a TextIOWrapper accessing the same file > is wrong. Well, um, I actually designed it carefully so that bytes offsets *would* work as text offsets in those cases where they make sense at all. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Wed Jun 30 19:20:34 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jun 2010 19:20:34 +0200 Subject: [Python-Dev] TextIOWrapper.tell() In-Reply-To: References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> <20100630184258.473d8535@pitrou.net> Message-ID: <20100630192034.5740825b@pitrou.net> On Wed, 30 Jun 2010 10:03:49 -0700 Guido van Rossum wrote: > > > Also, please note that values used by seek() and tell() on > > text I/O are "opaque cookies". While they can happen to match the > > raw binary file position, it is a mere coincidence (or an > > implementation detail, at your will). Therefore, reusing tell() values > > of a binary file to seek() a TextIOWrapper accessing the same file > > is wrong. > > Well, um, I actually designed it carefully so that bytes offsets > *would* work as text offsets in those cases where they make sense at > all. Ah, this is embarrassing. I always assumed it was an implementation detail since neither the PEP nor the module docs say otherwise. PEP 3116 clearly says: ?Unlike with raw I/O, the units for .seek() are not specified - some implementations (e.g. StringIO) use characters and others (e.g. TextIOWrapper) use bytes.? And also: ?.seek(pos: object, whence: int = 0) -> int Seek to position pos. If pos is non-zero, it must be a cookie returned from .tell() and whence must be zero.? ?it must be a cookie returned from .tell()? here seems to imply that non-zero values of other origin should not be used. Regards Antoine. From guido at python.org Wed Jun 30 19:28:10 2010 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Jun 2010 10:28:10 -0700 Subject: [Python-Dev] TextIOWrapper.tell() In-Reply-To: <20100630192034.5740825b@pitrou.net> References: <4C2A0294.3070806@holdenweb.com> <20100629165228.GA5350@amk-desktop.matrixgroup.net> <4C2A3341.4010705@holdenweb.com> <4C2A5F56.2010700@holdenweb.com> <20100630000529.3AA351FF08C@kimball.webabinitio.net> <20100630184258.473d8535@pitrou.net> <20100630192034.5740825b@pitrou.net> Message-ID: On Wed, Jun 30, 2010 at 10:20 AM, Antoine Pitrou wrote: > On Wed, 30 Jun 2010 10:03:49 -0700 > Guido van Rossum wrote: >> >> > Also, please note that values used by seek() and tell() on >> > text I/O are "opaque cookies". While they can happen to match the >> > raw binary file position, it is a mere coincidence (or an >> > implementation detail, at your will). Therefore, reusing tell() values >> > of a binary file to seek() a TextIOWrapper accessing the same file >> > is wrong. >> >> Well, um, I actually designed it carefully so that bytes offsets >> *would* work as text offsets in those cases where they make sense at >> all. > > Ah, this is embarrassing. I always assumed it was an implementation > detail since neither the PEP nor the module docs say otherwise. > > PEP 3116 clearly says: > > ?Unlike with raw I/O, the units for .seek() are not specified - some > implementations (e.g. StringIO) use characters and others (e.g. > TextIOWrapper) use bytes.? > > And also: > > ?.seek(pos: object, whence: int = 0) -> int > > ? ?Seek to position pos. If pos is non-zero, it must be a cookie > ? ?returned from .tell() and whence must be zero.? > > ?it must be a cookie returned from .tell()? here seems to imply that > non-zero values of other origin should not be used. Guilty as charged. I really did take care that it would work, but forgot to mention it. I guess we can depend on this property *inside* the stdlib (as long as there are tests for each piece of code depending on it that would break if it ever changed) but should not advertise it widely. Note that it doesn't go the other way -- due to encoding state, text streams can certainly return cookies that make no sense to binary streams. But text streams take byte offsets too and do the best they can. (Obviously if a byte offset points in the middle of a multibyte character all bets are off.) The C stdlib has a similar thing -- while AFAIK POSIX lseek() really is required to return and take byte offsets, this is not required for fseek() and ftell() according to the C std -- but I think it's still a pretty safe bet, and I betcha lots of apps are making this assumption. -- --Guido van Rossum (python.org/~guido) From martin at v.loewis.de Wed Jun 30 19:29:36 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 30 Jun 2010 19:29:36 +0200 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> Message-ID: <4C2B7F00.2010602@v.loewis.de> Am 30.06.2010 13:32, schrieb exarkun at twistedmatrix.com: > On 05:24 am, martin at v.loewis.de wrote: >>> Seems to work fine. So this I don't understand. Any ideas, anyone? >> >> Didn't we discuss this before? The buildbot slave has no controlling >> terminal anymore, hence it cannot open /dev/tty. If you are curious, >> just patch your checkout to output the exact errno (e.g. to stdout), >> and trigger a build through the web. > > Could the test be rewritten (or supplemented) to use a pty? Most or > perhaps all of the same operations should be supported. I'm not sure. It uses TIOCGPGRP, basically to establish that ioctl can also put results into a Python array (IIUC). This goes back to http://bugs.python.org/555817 Somebody rewriting it would need to make sure the original test purpose is still met. Regards, Martin From barry at python.org Wed Jun 30 20:16:14 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:16:14 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23D3C2.1060500@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> Message-ID: <20100630141614.10dbccde@heresy> I'm trying to catch up on this thread, so I may collapse some responses or refer to points others have brought up. On Jun 24, 2010, at 05:53 PM, Scott Dial wrote: >If the package has .so files that aren't compatible with other version >of python, then what is the motivation for placing that in a shared >location (since it can't actually be shared)? I think Matthias has described the motivation for the Debian/Ubuntu case, and James describes Python's current search algorithm for a packages .py[c] and .so files. There are a few points that you've made that I want to respond to. You claim that versioned .so files scheme is "more complicated" than multiple version-specific search paths (if I understand your counter proposal correctly). It all depends on your point of view. From mine, a 100 line patch that almost nobody but (some) distros will care about or be affected by, and that only changes a fairly obscure build-time configuration, is much simpler than trying to make version-specific search paths work. If you build Python from source, you do not care about this patch and you'll never see its effects. If you get Python on a distribution that only gives you one version of Python at a time, you also will probably never care or see the effects of this patch. If you're a Debian or Ubuntu user who wants to use Python 3.2 and 3.3, you *might* care about it, but most likely it'll just work behind the scenes. If you're a Python packager or work on the Python infrastructure for one of those platforms, then you will care. About just sharing the py files. You say that would be acceptable to you, but it's actually a pretty big deal. If you're supporting two versions of Python, then every distro Python package doubles in size. Even with compression, you're talking longer download times and probably more critically, you've greatly increased CDROM space pressures. The Ubuntu CDROM is already essentially at capacity so doubling the size of all Python packages (most of which btw do not have extension modules) makes such an approach impossible. Moving to a DVD image has been discussed, but it is currently believed not in the best interest of users, especially on slow links, to do so at this time. The versioned .so approach will of course increase the size of packages by twice the contained .so file size, and that's already an uncomfortable but acceptable increase. It's acceptable because of the gain users get by having multiple versions of Python available and the fact that there aren't nearly as many extension modules as there are Python files. Doubling the size of .py files as well isn't acceptable. >But the only motivation for doing this with .pyc files is that the .py >files are able to be shared, since the .pyc is an on-demand-generated, >version-specific artifact (and not the source). The .so file is created >offline by another toolchain, is version-specific, and presumably you >are not suggesting that Python generate it on-demand. Definitely not. pyc files are generated upon installation of the distro package, but of course the .so files must be compiled on a build machine and included in the distro package. The whole process is much simpler if the versioned .so files can just live in the same directory. >For packages that have .so files, won't the distro already have to build >multiple copies of that package for all version of Python? So, why can't >it place them in separate directories that are version-specific at that >time? This is not the same as placing .py files that are >version-agnostic into a version-agnostic location. It's not a matter of "could", it's a matter of simplicity, and I think versioned .so files are the simplest solution given all the constraints. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 20:31:05 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:31:05 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C266185.7080509@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624135119.00b9ac5c@heresy> <20100624142830.4c859faf@limelight.wooz.org> <20100624164637.22fd9160@heresy> <4C266185.7080509@ubuntu.com> Message-ID: <20100630143105.37e1225e@heresy> On Jun 26, 2010, at 10:22 PM, Matthias Klose wrote: >On 24.06.2010 22:46, Barry Warsaw wrote: >> So, we could say that PEP 384 compliant extension modules would get written >> without a version specifier. IOW, we'd treat foo.so as using the ABI. It >> would then be up to the Python runtime to throw ImportErrors if in fact we >> were loading a legacy, non-PEP 384 compliant extension. > >Is it realistic to never break the ABI? I would think of having the ABI >encoded in the file name as well, and only bump the ABI if it does change. >With the "versioned .so files" proposal an ABI bump is necessary with every >python version, with PEP 384 the ABI bump will be decoupled from the python >version. You're right that the ABI will break, requiring a bump, and I think you're right that this means that PEP 384 compliant shared libraries would have to have a version number in their file name (assuming the versioned .so proposal is accepted). The problem is that we would need two version numbers, one for extension modules that are not PEP 384 complaint (and thus get bumped for every new Python version), and one for modules that are PEP 384 compliant (and thus only get bumped once in a while). The reason is that I think it will always be the case that we will have PEP 384 compliant and non-compliant extension modules. Perhaps identifying the underlying problems will lead to a more acceptable patch for Python. My patch tries to take a simple (perhaps too simplistic) solution, and I'm not married to it, but I think the general idea of versioned .so files is the right one. 1. The file name extensions that Python searches for are hardcoded and compiled in. dyload_shlib.c hard codes the file name pattern that extension modules must have in order for Python to load them. They must be .so or module.so. This gets compiled into Python at build time and there's no way for a distro (or anyone else who builds Python from source) to extend the file name patterns without modifying the source code. 2. The extension that distutils writes for shared libraries is dictated by build-time options and cannot be overridden. When you ./configure Python, autoconf figures out what shared library extension your platform uses. It substitutes this into a Makefile variable. That Makefile gets installed into your system with the base Python package and distutils parses the Makefile looking for this variable. When distutils calls your platform compiler, it uses this Makefile variable as the file name extension to use for your shared library. You cannot change this or override it to get distutils to write some other file name extension, well. Of these two problems, #1 is more serious because we have to modify the Python source code to hack in addition shared library search suffixes. #2 can be worked around by renaming the .so file after the build. The disadvantage of this though is that if you're a local packager, you'll have to remember to do the same thing if you want multiple Python version support, because distutils won't take care of it for you. Maybe that's okay, in which case it would still be good to address #1. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 20:39:50 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:39:50 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C246E81.3020302@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <20100630143950.60da41f7@heresy> On Jun 25, 2010, at 04:53 AM, Scott Dial wrote: >My suggestion was that a package that contains .so files should not be >shared (e.g., the entire lxml package should be placed in a >version-specific path). Matthias outlined some of the pitfalls with this approach. >The motivation for this PEP was to simplify the installation python packages >for distros; it was not to reduce the number of .py files on the disk. As others have pointed out, versioned so files is not part of PEP 3147. That PEP does reduce the number of py files on disk, which as I explained in a previous follow, is an important consideration. >Placing .so files together does not simplify that install process in any >way. I disagree of course. :) >You will still have to handle such packages in a special way. You must still >compile the package multiple times for each relevant version of python (with >special tagging that I imagine distutils can take care of) and, worse yet, No, distutils cannot take care of this. There is no way currently to tell distutils to generate a .so file with anything but the platform-specific way of spelling "shared library". >you have created a more trick install than merely having multiple search >paths (e.g., installing/uninstalling lxml for *one* version of python is >actually more difficult in this scheme). That's not a use case we care about. If you have Python 3.2 and 3.3 installed on your system, why would you want lxml installed for one but not the other? And even if for some reason you did, the only way to do that would be in a way similar to handling the PEP 3147 pyc files. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From exarkun at twistedmatrix.com Wed Jun 30 20:46:02 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 18:46:02 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <20100630184457.10067764@pitrou.net> References: <71728.1277866512@parc.com> <68796.1277913609@parc.com> <20100630184457.10067764@pitrou.net> Message-ID: <20100630184602.1937.1550858232.divmod.xquotient.569@localhost.localdomain> On 04:44 pm, solipsis at pitrou.net wrote: >On Wed, 30 Jun 2010 09:00:09 PDT >Bill Janssen wrote: >> >>So, my question then is, why are these skips "unexpected"? Seems to >>me >>that if this is the case, this test will never run on any platform. > >You can change the value of the "usepty" option in your buildbot.tac. >(you will also have to restart the buildslave process) But don't do this. The usepty option is completely unrelated to the suggestion I was making. Flipping it to True will only cause other things to break and have no impact on this test. Jean-Paul From exarkun at twistedmatrix.com Wed Jun 30 20:49:54 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 18:49:54 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <69334.1277915184@parc.com> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> <69334.1277915184@parc.com> Message-ID: <20100630184954.1937.1956849777.divmod.xquotient.577@localhost.localdomain> On 04:26 pm, janssen at parc.com wrote: >exarkun at twistedmatrix.com wrote: >>Could the test be rewritten (or supplemented) to use a pty? Most or >>perhaps all of the same operations should be supported. > >Buildbot seems to be explicitly not using a PTY. From the the top of >the test output: > >make buildbottest >in dir /Users/buildbot/buildarea/trunk.parc-leopard-1/build (timeout >1800 secs) >watching logfiles {} >argv: ['make', 'buildbottest'] >[...] >closing stdin >using PTY: False This output is telling you that the build slave isn't giving the child processes it creates a pty. What I had in mind was writing the test to create a new pty, instead of trying to use the controlling tty. So basically, the two things are completely unrelated and this buildbot configuration isn't hurting anything (and in fact is likely helping quite a few things, so I suggest leaving it alone). > >I believe this is specified by the build master. > >This test seems to work on Ubuntu and FreeBSD, though. That's interesting. I wonder if those slaves are able to open /dev/tty for some reason? The slave is supposed to detach from the controlling terminal when it daemonizes. There could be a bug in that code, I suppose, or the slaves could be running without daemonization for some reason. The operators would have to tell us about that, I think. Or, another possibility is that /dev/tty doesn't work how I expect it to and on Ubuntu and FreeBSD it can be opened even if you don't have a controlling terminal. Hopefully not, though. Jean-Paul From barry at python.org Wed Jun 30 20:53:29 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:53:29 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C268433.30405@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> <4C265DC6.4080600@ubuntu.com> <4C268433.30405@scottdial.com> Message-ID: <20100630145329.736f2aab@heresy> On Jun 26, 2010, at 06:50 PM, Scott Dial wrote: >On 6/26/2010 4:06 PM, Matthias Klose wrote: >> On 25.06.2010 22:12, James Y Knight wrote: >>> On Jun 25, 2010, at 4:53 AM, Scott Dial wrote: >>>> Placing .so files together does not simplify that install process in any >>>> way. You will still have to handle such packages in a special way. >>> >>> This is a good point, but I think still falls short of a solution. For a >>> package like lxml, indeed you are correct. Since debian needs to build >>> it once per version, it could just put the entire package (.py files and >>> .so files) into a different per-python-version directory. >> >> This is what is currently done. This will increase the size of packages >> by duplicating the .py files, or you have to install the .py in a common >> location (irrelevant to sys.path), and provide (sym)links to the >> expected location. > >"This is what is currently done" and "provide (sym)links to the >expected location" are conflicting statements. I think Matthias was referring to "what is currently done" to your statement "debian needs to build it once per version". Providing symlinks is how we are able to make it appear that there are version-specific py files without actually doing so. >If you are symlinking .py files from a shared location, then that is not the >same as "just install the package into a version-specific location". What >motivation is there for preferring symlinks? This reduces .py file duplications in distro packages. >Who cares if a ditro package install yields duplicate .py files? Nor am >I motivated by having to carry duplicate .py files in a distribution >package (I imagine the compression of duplicate .py files is amazing). It might be amazing, but it's still a significant overhead. As I've described, multiply that by all the py files in all the distro packages containing Python source code, and then still try to fit it on a CDROM. >What happens to the distro packaging if a python package splits the >codebase between 2.x and 3.x (meaning they have distinct .py files)? The Debian/Ubuntu approach to Python 2/3 support is to provide them in separate distro packages. E.g. for Python package foo, you would have Debuntu package python-foo (for the Python 2.x version) and python3-foo. We do not share source between Python 2 and 3 versions, at least not yet . This doesn't hurt us much because the number of Python packages that are source compatible between the two is pretty low (Benjamin's 'six' package might change that :), and not much depends on Python 3 yet. >As someone else mentioned, how is virtualenv going to interact with packages >that install like this? This is a good question, but I *think* it won't affect it much at all. To test for sure I'd either need a Python 3 compatible virtualenv or backport my patch to Python 2.6 and 2.7. But still, I'm not sure it would matter since the same shared library import suffix is used in either case. I actually think version-specific search paths would have a greater impact on virtualenv. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 20:55:16 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:55:16 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> Message-ID: <20100630145516.08b5b2ec@heresy> On Jun 25, 2010, at 11:58 AM, Brett Cannon wrote: >> Placing .so files together does not simplify that install process in any >> way. You will still have to handle such packages in a special way. You >> must still compile the package multiple times for each relevant version >> of python (with special tagging that I imagine distutils can take care >> of) and, worse yet, you have created a more trick install than merely >> having multiple search paths (e.g., installing/uninstalling lxml for >> *one* version of python is actually more difficult in this scheme). > >This is meant to be used by distros in a programmatic fashion, so my >response is "so what?" Their package management system is going to >maintain the directory, not a person. You and I are not going to be >using this for anything. This is purely meant for Linux OS vendors >(maybe OS X) to manage their installs through their package software. >I honestly do not expect human beings to be mucking around with these >installs (and I suspect Barry doesn't either). Spot on. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 20:58:00 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 14:58:00 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C266702.4010102@ubuntu.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> <4C266702.4010102@ubuntu.com> Message-ID: <20100630145800.7658936e@heresy> On Jun 26, 2010, at 10:45 PM, Matthias Klose wrote: >Having non-conflicting extension names is a schema which already is used on >some platforms (debug builds on Windows). The question for me is, if just a >renaming of the .so files is acceptable for upstream, or if distributors >should implement this on their own, as something like: > > if ext_path.startswith('/usr/') and not ext_path.startswith('/usr/local/'): > load_ext('foo.2.6.so') > else: > load_ext('foo.so') > >I fear this will cause issues when e.g. virtualenv environments start copying >parts from the system installation instead of symlinking it. I concur. I think my patch will have much less impact on virtualenv and similar tools because there's nothing much magical about it. It just says "oh there's another file suffix you should consider when looking for a shared library", which as you point out is already done on Windows. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 21:03:28 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 15:03:28 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C2506AE.3060002@scottdial.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C246E81.3020302@scottdial.com> <4C2506AE.3060002@scottdial.com> Message-ID: <20100630150328.281f5d5f@heresy> On Jun 25, 2010, at 03:42 PM, Scott Dial wrote: >On 6/25/2010 2:58 PM, Brett Cannon wrote: >> I assume you are talking about PEP 3147. You're right that the PEP was >> for pyc files and that's it. No one is talking about rewriting the >> PEP. > >Yes, I am making reference to PEP 3147. I make reference to that PEP >because this change is of the same order of magnitude as the .pyc >change, and we asked for a PEP for that, and if this .so stuff is an >extension of that thought process, then it should either be reflected by >that PEP or a new PEP. I think it's not nearly on the order of magnitude as PEP 3147. One way to measure that is the size of the patch required to implement the feature and ensure the test suite still works. My versioned so patch is *way* smaller. I actually think because this is almost exclusively an extension to a build-time configuration option, and doesn't really change the language, a PEP shouldn't be necessary. But by the same token, I'm willing to write a new one (and *not* touch PEP 3147) just so that we have a point of reference to record the discussion and decision. So I'll do that. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Wed Jun 30 21:06:10 2010 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jun 2010 15:06:10 -0400 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <4C23DD99.9050604@egenix.com> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C23DD99.9050604@egenix.com> Message-ID: <20100630150610.7ae4ac6a@heresy> On Jun 25, 2010, at 12:35 AM, M.-A. Lemburg wrote: >Scott Dial wrote: >> On 6/24/2010 5:09 PM, Barry Warsaw wrote: >>>> What use case does this address? >>> >>>> If you want to make it so a system can install a package in just one >>>> location to be used by multiple Python installations, then the version >>>> number isn't enough. You also need to distinguish debug builds, profiling >>>> builds, Unicode width (see issue8654), and probably several other >>>> ./configure options. >>> >>> This is a good point, but more easily addressed. Let's say a distro makes >>> three Python 3.2 variants available, one "normal" build, a debug build, and >>> UCS2 and USC4 versions of the above. All we need to do is choose a different >>> .so ABI tag (see previous follow) for each of those builds. My updated patch >>> (coming soon) allows you to define that tag to configure. So e.g. >> >> Why is this use case not already addressed by having independent >> directories? And why is there an incentive to co-mingle these >> version-punned files with version-agnostic ones? > >I don't think this is a good idea. After a while your Python >lib directories would need some serious dusting off to make them >maintainable again. > >Disk space is cheap so setting up dedicated directories for each >variant will result in a much easier to manage installation. > >If you want a really clever setup, use hard links between those >directory (you can also use symlinks if you like). >Then a change in one Python file will automatically >propagate to all other variant dirs without any maintenance >effort. Together with PYTHONHOME this makes a really nice >virtualenv-like environment. Note that I do believe there is a difference between what users maintaining their own Python installations might want, and what a distro needs to maintain its entire Python stack. So while dedicated directories might make more sense if you're maintaining your own Python built from source, it doesn't make as much sense for a distro, as described in previous responses by Matthias. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From exarkun at twistedmatrix.com Wed Jun 30 21:10:05 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 19:10:05 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2B7F00.2010602@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <20100630113232.1937.151974582.divmod.xquotient.556@localhost.localdomain> <4C2B7F00.2010602@v.loewis.de> Message-ID: <20100630191005.1937.1474314461.divmod.xquotient.617@localhost.localdomain> On 05:29 pm, martin at v.loewis.de wrote: >Am 30.06.2010 13:32, schrieb exarkun at twistedmatrix.com: >>On 05:24 am, martin at v.loewis.de wrote: >>>>Seems to work fine. So this I don't understand. Any ideas, anyone? >>> >>>Didn't we discuss this before? The buildbot slave has no controlling >>>terminal anymore, hence it cannot open /dev/tty. If you are curious, >>>just patch your checkout to output the exact errno (e.g. to stdout), >>>and trigger a build through the web. >> >>Could the test be rewritten (or supplemented) to use a pty? Most or >>perhaps all of the same operations should be supported. > >I'm not sure. It uses TIOCGPGRP, basically to establish that ioctl >can also put results into a Python array (IIUC). This goes back to >http://bugs.python.org/555817 > >Somebody rewriting it would need to make sure the original test purpose >is still met. Absolutely. And even so, it may still make sense to run the test against both /dev/tty and a pty (or whatever subset of those things can be acquired in the testing environment). You can do a TIOCGPGRP on a new pty (created by os.openpty) but it produces somewhat less interesting results than doing it on /dev/tty. FIONREAD might be a nice alternative. It produces interesting (ie, non- zero) values in an easily predictable/controllable way (it tells you how many bytes are in the read buffer). Jean-Paul From exarkun at twistedmatrix.com Wed Jun 30 21:11:22 2010 From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com) Date: Wed, 30 Jun 2010 19:11:22 -0000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <20100630184602.1937.1550858232.divmod.xquotient.569@localhost.localdomain> References: <71728.1277866512@parc.com> <68796.1277913609@parc.com> <20100630184457.10067764@pitrou.net> <20100630184602.1937.1550858232.divmod.xquotient.569@localhost.localdomain> Message-ID: <20100630191122.1937.493523511.divmod.xquotient.619@localhost.localdomain> On 06:46 pm, exarkun at twistedmatrix.com wrote: > >On 04:44 pm, solipsis at pitrou.net wrote: >>On Wed, 30 Jun 2010 09:00:09 PDT >>Bill Janssen wrote: >>> >>>So, my question then is, why are these skips "unexpected"? Seems to >>>me >>>that if this is the case, this test will never run on any platform. >> >>You can change the value of the "usepty" option in your buildbot.tac. >>(you will also have to restart the buildslave process) > >But don't do this. The usepty option is completely unrelated to the >suggestion I was making. Flipping it to True will only cause other >things to break and have no impact on this test. Ah, sorry. I confused myself. The option is related. But it will also break other things, so I still would recommend looking for other solutions. Jean-Paul From brett at python.org Wed Jun 30 21:28:03 2010 From: brett at python.org (Brett Cannon) Date: Wed, 30 Jun 2010 12:28:03 -0700 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <68821.1277913795@parc.com> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> Message-ID: On Wed, Jun 30, 2010 at 09:03, Bill Janssen wrote: > Martin v. L?wis wrote: > >> > Seems to work fine. ?So this I don't understand. ?Any ideas, anyone? >> >> Didn't we discuss this before? > > Possibly, but I don't recall doing so. > >> The buildbot slave has no controlling >> terminal anymore, hence it cannot open /dev/tty. If you are curious, >> just patch your checkout to output the exact errno (e.g. to stdout), >> and trigger a build through the web. > > So, why is skipping this test "unexpected"? ?I see "x86 Tiger" is also > showing this as an unexpected skip. ?Should I just add it to the list of > expected skips on Darwin? ?Actually, will it run on any platform? The whole "unexpected" skipping is somewhat of a mess. In an ideal situation modules that are optionally built should be allowed to skip, and on a per-platform basis certain OS-specific tests (whether they be exclusive to a specific OS or run on all OSs except Windows) should be skipped. Otherwise any import failure should be a test failure. The "unexpected" test skipping was meant to solve both of these situations, but in an imperfect way. My PSF grant proposal to work on Python full-time for two to three months after my Ph.D. is complete (assuming the PSF gives me the grant this would start most likely in November or December) includes cleaning up the test suite and this would be the first thing I tackle. From martin at v.loewis.de Wed Jun 30 21:53:08 2010 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Wed, 30 Jun 2010 21:53:08 +0200 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> Message-ID: <4C2BA0A4.9070002@v.loewis.de> > The whole "unexpected" skipping is somewhat of a mess. In an ideal > situation modules that are optionally built should be allowed to skip, While this may be the wide-spread interpretation, it is definitely *not* the original intention of the feature. When Tim Peters added it, he wanted it to tell him whether he did the Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can possibly work on Windows. If you try to generalize this beyond Windows, then the only skips that are expected are the ones for tests that absolutely cannot work on the platform - i.e. Unix tests on Windows, and Windows tests on Unix. Otherwise, if you can get it to pass by installing additional software, Tim did *not* mean this to be an expected skip. Regards, Martin From janssen at parc.com Wed Jun 30 22:21:51 2010 From: janssen at parc.com (Bill Janssen) Date: Wed, 30 Jun 2010 13:21:51 PDT Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2BA0A4.9070002@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> <4C2BA0A4.9070002@v.loewis.de> Message-ID: <76469.1277929311@parc.com> Martin v. L?wis wrote: > > The whole "unexpected" skipping is somewhat of a mess. In an ideal > > situation modules that are optionally built should be allowed to skip, > > While this may be the wide-spread interpretation, it is definitely *not* > the original intention of the feature. > > When Tim Peters added it, he wanted it to tell him whether he did the > Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can > possibly work on Windows. If you try to generalize this beyond Windows, > then the only skips that are expected are the ones for tests that > absolutely cannot work on the platform - i.e. Unix tests on Windows, > and Windows tests on Unix. Otherwise, if you can get it to pass by > installing additional software, Tim did *not* mean this to be an > expected skip. Perfectly reasonable, good to know. So on my OS X buildbots I should update gdb, tcl/tk, and readline, so that those tests can run. Probably be good to put a note in the regrtest.py comments to this effect, as I don't see a PEP about testing or buildbots. Bill From mal at egenix.com Wed Jun 30 22:35:56 2010 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 30 Jun 2010 22:35:56 +0200 Subject: [Python-Dev] versioned .so files for Python 3.2 In-Reply-To: <20100630150610.7ae4ac6a@heresy> References: <20100624115048.4fd152e3@heresy> <20100624170944.7e68ad21@heresy> <4C23D3C2.1060500@scottdial.com> <4C23DD99.9050604@egenix.com> <20100630150610.7ae4ac6a@heresy> Message-ID: <4C2BAAAC.5090101@egenix.com> Barry Warsaw wrote: > On Jun 25, 2010, at 12:35 AM, M.-A. Lemburg wrote: > >> Scott Dial wrote: >>> On 6/24/2010 5:09 PM, Barry Warsaw wrote: >>>>> What use case does this address? >>>> >>>>> If you want to make it so a system can install a package in just one >>>>> location to be used by multiple Python installations, then the version >>>>> number isn't enough. You also need to distinguish debug builds, profiling >>>>> builds, Unicode width (see issue8654), and probably several other >>>>> ./configure options. >>>> >>>> This is a good point, but more easily addressed. Let's say a distro makes >>>> three Python 3.2 variants available, one "normal" build, a debug build, and >>>> UCS2 and USC4 versions of the above. All we need to do is choose a different >>>> .so ABI tag (see previous follow) for each of those builds. My updated patch >>>> (coming soon) allows you to define that tag to configure. So e.g. >>> >>> Why is this use case not already addressed by having independent >>> directories? And why is there an incentive to co-mingle these >>> version-punned files with version-agnostic ones? >> >> I don't think this is a good idea. After a while your Python >> lib directories would need some serious dusting off to make them >> maintainable again. >> >> Disk space is cheap so setting up dedicated directories for each >> variant will result in a much easier to manage installation. >> >> If you want a really clever setup, use hard links between those >> directory (you can also use symlinks if you like). >> Then a change in one Python file will automatically >> propagate to all other variant dirs without any maintenance >> effort. Together with PYTHONHOME this makes a really nice >> virtualenv-like environment. > > Note that I do believe there is a difference between what users maintaining > their own Python installations might want, and what a distro needs to maintain > its entire Python stack. So while dedicated directories might make more sense > if you're maintaining your own Python built from source, it doesn't make as > much sense for a distro, as described in previous responses by Matthias. Fair enough. I haven't followed the thread closely, so Matthias will probably already have answered this: The Python default installation dir for libs (including site-packages) is $prefix/lib/pythonX.X, so you already have separate and properly versioned directory paths. What difference would the extra version on the .so file make in such a setup ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 30 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2010-07-19: EuroPython 2010, Birmingham, UK 18 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From brett at python.org Wed Jun 30 23:12:59 2010 From: brett at python.org (Brett Cannon) Date: Wed, 30 Jun 2010 14:12:59 -0700 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2BA0A4.9070002@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> <4C2BA0A4.9070002@v.loewis.de> Message-ID: On Wed, Jun 30, 2010 at 12:53, "Martin v. L?wis" wrote: >> The whole "unexpected" skipping is somewhat of a mess. In an ideal >> situation modules that are optionally built should be allowed to skip, > > While this may be the wide-spread interpretation, it is definitely *not* > the original intention of the feature. > > When Tim Peters added it, he wanted it to tell him whether he did the > Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can > possibly work on Windows. If you try to generalize this beyond Windows, > then the only skips that are expected are the ones for tests that > absolutely cannot work on the platform - i.e. Unix tests on Windows, > and Windows tests on Unix. Otherwise, if you can get it to pass by > installing additional software, Tim did *not* mean this to be an > expected skip. Interesting. Do you use it that way when you make the Windows build? From ncoghlan at gmail.com Wed Jun 30 23:52:30 2010 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Jul 2010 07:52:30 +1000 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: <4C2BA0A4.9070002@v.loewis.de> References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> <4C2BA0A4.9070002@v.loewis.de> Message-ID: On Thu, Jul 1, 2010 at 5:53 AM, "Martin v. L?wis" wrote: > When Tim Peters added it, he wanted it to tell him whether he did the > Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can > possibly work on Windows. If you try to generalize this beyond Windows, > then the only skips that are expected are the ones for tests that > absolutely cannot work on the platform - i.e. Unix tests on Windows, > and Windows tests on Unix. Otherwise, if you can get it to pass by > installing additional software, Tim did *not* mean this to be an > expected skip. Note that it works this way on Linux as well. On Kubuntu (for example) you need another half dozen or so additional *-dev packages installed to avoid unexpected test skips. Cheers, Nick. P.S. For anyone curious, I posted the list of extra packages you need here: http://boredomandlaziness.blogspot.com/2010/01/kubuntu-dev-packages-to-build-python.html -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From brett at python.org Wed Jun 30 23:55:14 2010 From: brett at python.org (Brett Cannon) Date: Wed, 30 Jun 2010 14:55:14 -0700 Subject: [Python-Dev] OS X buildbots: why am I skipping these tests? In-Reply-To: References: <71728.1277866512@parc.com> <4C2AD511.5020709@v.loewis.de> <68821.1277913795@parc.com> <4C2BA0A4.9070002@v.loewis.de> Message-ID: On Wed, Jun 30, 2010 at 14:52, Nick Coghlan wrote: > On Thu, Jul 1, 2010 at 5:53 AM, "Martin v. L?wis" wrote: >> When Tim Peters added it, he wanted it to tell him whether he did the >> Windows build correctly, INCLUDING ALL OPTIONAL PACKAGES that can >> possibly work on Windows. If you try to generalize this beyond Windows, >> then the only skips that are expected are the ones for tests that >> absolutely cannot work on the platform - i.e. Unix tests on Windows, >> and Windows tests on Unix. Otherwise, if you can get it to pass by >> installing additional software, Tim did *not* mean this to be an >> expected skip. > > Note that it works this way on Linux as well. On Kubuntu (for example) > you need another half dozen or so additional *-dev packages installed > to avoid unexpected test skips. So it isn't that it's "unexpected", it's that a dependency is missing. So it seems the terminology needs to get tweaked.