From aleaxit at gmail.com Mon Dec 1 02:14:08 2008 From: aleaxit at gmail.com (Alex Martelli) Date: Sun, 30 Nov 2008 17:14:08 -0800 Subject: [Python-Dev] Attribute error: providing type name In-Reply-To: <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com> References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com> <4932F901.6070803@gmail.com> <1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com> <49330AA9.7070005@gmail.com> <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com> Message-ID: On Sun, Nov 30, 2008 at 2:02 PM, Filip Gruszczy?ski wrote: >> Yeah, any time someone implements their own attribute lookup process for >> a class (be it via __getattr__, __getattribute__ or the C equivalents), >> it is up to the reimplementation to appropriately format their error >> message if they raise AttributeError directly. > > I guess, this means that I have to go to Phil Thompson at Riverbank > and try to convince him to change the message. Yes, but he should be able to change it in one place (in sip, the C++ to Python wrapper generator he's also authored and uses for PyQt) AND it would make sip even better, so he may want to put it on his backlog. Alex From jyasskin at gmail.com Mon Dec 1 02:54:02 2008 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sun, 30 Nov 2008 17:54:02 -0800 Subject: [Python-Dev] Patch to speed up non-tracing case in PyEval_EvalFrameEx (2% on pybench) Message-ID: <5d44f72f0811301754jffacbe7ubf4864049ff6d09e@mail.gmail.com> Tracing support shows up fairly heavily an a Python profile, even though it's nearly always turned off. The attached patch against the trunk speeds up PyBench by 2% for me. All tests pass. I have 2 questions: 1) Can other people corroborate this speedup on their machines? I'm running on a Macbook Pro (Intel Core2 processor, probably Merom) with a 32-bit build from Apple's gcc-4.0.1. (Apple's gcc consistently produces a faster python than gcc-4.3.) 2) Assuming this speeds things up for most people, should I check it in anywhere besides the trunk? I assume it's out for 3.0; is it in for 2.6.1 or 3.0.1? Pybench output: ------------------------------------------------------------------------------- PYBENCH 2.0 ------------------------------------------------------------------------------- * using CPython 2.7a0 (trunk:67458M, Nov 30 2008, 17:14:10) [GCC 4.0.1 (Apple Inc. build 5488)] * disabled garbage collection * system check interval set to maximum: 2147483647 * using timer: time.time ------------------------------------------------------------------------------- Benchmark: pybench.out ------------------------------------------------------------------------------- Rounds: 10 Warp: 10 Timer: time.time Machine Details: Platform ID: Darwin-9.5.0-i386-32bit Processor: i386 Python: Implementation: CPython Executable: /Users/jyasskin/src/python/trunk-fast-tracing/build/python.exe Version: 2.7.0 Compiler: GCC 4.0.1 (Apple Inc. build 5488) Bits: 32bit Build: Nov 30 2008 17:14:10 (#trunk:67458M) Unicode: UCS2 ------------------------------------------------------------------------------- Comparing with: ../build_orig/pybench.out ------------------------------------------------------------------------------- Rounds: 10 Warp: 10 Timer: time.time Machine Details: Platform ID: Darwin-9.5.0-i386-32bit Processor: i386 Python: Implementation: CPython Executable: /Users/jyasskin/src/python/trunk-fast-tracing/build_orig/python.exe Version: 2.7.0 Compiler: GCC 4.0.1 (Apple Inc. build 5488) Bits: 32bit Build: Nov 30 2008 13:51:09 (#trunk:67458) Unicode: UCS2 Test minimum run-time average run-time this other diff this other diff ------------------------------------------------------------------------------- BuiltinFunctionCalls: 127ms 130ms -2.4% 129ms 132ms -2.1% BuiltinMethodLookup: 90ms 93ms -3.2% 91ms 94ms -3.1% CompareFloats: 88ms 91ms -3.3% 89ms 93ms -4.3% CompareFloatsIntegers: 97ms 99ms -2.1% 97ms 100ms -2.4% CompareIntegers: 79ms 82ms -4.2% 79ms 85ms -6.1% CompareInternedStrings: 90ms 92ms -2.4% 94ms 94ms -0.9% CompareLongs: 86ms 83ms +3.6% 87ms 84ms +3.5% CompareStrings: 80ms 82ms -3.1% 81ms 83ms -2.3% CompareUnicode: 103ms 105ms -2.3% 106ms 108ms -1.5% ComplexPythonFunctionCalls: 139ms 137ms +1.3% 140ms 139ms +0.1% ConcatStrings: 142ms 151ms -6.0% 156ms 154ms +1.1% ConcatUnicode: 87ms 92ms -5.4% 89ms 94ms -5.7% CreateInstances: 142ms 144ms -1.4% 144ms 145ms -1.1% CreateNewInstances: 107ms 109ms -2.3% 108ms 111ms -2.1% CreateStringsWithConcat: 114ms 137ms -17.1% 117ms 139ms -16.0% CreateUnicodeWithConcat: 92ms 101ms -9.2% 95ms 102ms -7.2% DictCreation: 77ms 81ms -4.4% 80ms 85ms -5.9% DictWithFloatKeys: 91ms 107ms -14.5% 93ms 109ms -14.6% DictWithIntegerKeys: 95ms 94ms +1.4% 108ms 96ms +12.3% DictWithStringKeys: 83ms 88ms -5.8% 84ms 88ms -4.7% ForLoops: 72ms 72ms -0.1% 79ms 74ms +5.8% IfThenElse: 83ms 80ms +3.9% 85ms 80ms +5.3% ListSlicing: 117ms 118ms -0.7% 118ms 121ms -1.8% NestedForLoops: 116ms 119ms -2.4% 121ms 121ms +0.0% NormalClassAttribute: 106ms 115ms -7.7% 108ms 117ms -7.7% NormalInstanceAttribute: 96ms 98ms -2.3% 97ms 100ms -3.1% PythonFunctionCalls: 92ms 95ms -3.7% 94ms 99ms -5.2% PythonMethodCalls: 147ms 147ms +0.1% 152ms 149ms +2.1% Recursion: 135ms 136ms -0.3% 140ms 144ms -2.9% SecondImport: 101ms 99ms +2.1% 103ms 101ms +2.2% SecondPackageImport: 107ms 103ms +3.5% 108ms 104ms +3.3% SecondSubmoduleImport: 134ms 134ms +0.3% 136ms 136ms -0.0% SimpleComplexArithmetic: 105ms 111ms -5.0% 110ms 112ms -1.4% SimpleDictManipulation: 95ms 106ms -10.6% 96ms 109ms -12.0% SimpleFloatArithmetic: 90ms 99ms -9.3% 93ms 102ms -8.2% SimpleIntFloatArithmetic: 78ms 76ms +2.3% 79ms 77ms +2.0% SimpleIntegerArithmetic: 78ms 77ms +1.8% 79ms 77ms +2.0% SimpleListManipulation: 80ms 78ms +2.4% 80ms 79ms +1.9% SimpleLongArithmetic: 110ms 113ms -2.0% 111ms 113ms -2.1% SmallLists: 128ms 117ms +9.5% 130ms 124ms +4.9% SmallTuples: 115ms 114ms +1.7% 117ms 114ms +2.2% SpecialClassAttribute: 101ms 112ms -10.3% 104ms 114ms -8.9% SpecialInstanceAttribute: 173ms 177ms -1.9% 176ms 179ms -1.6% StringMappings: 165ms 167ms -1.2% 168ms 169ms -0.5% StringPredicates: 126ms 134ms -5.7% 127ms 134ms -5.6% StringSlicing: 125ms 123ms +1.9% 131ms 130ms +0.7% TryExcept: 79ms 80ms -0.6% 80ms 80ms -0.8% TryFinally: 110ms 107ms +3.0% 111ms 112ms -1.1% TryRaiseExcept: 99ms 101ms -1.6% 100ms 102ms -1.7% TupleSlicing: 127ms 127ms +0.6% 137ms 137ms +0.0% UnicodeMappings: 144ms 144ms -0.3% 145ms 145ms -0.4% UnicodePredicates: 116ms 114ms +1.3% 117ms 115ms +1.1% UnicodeProperties: 106ms 102ms +3.6% 107ms 104ms +3.1% UnicodeSlicing: 95ms 111ms -14.0% 99ms 112ms -11.8% WithFinally: 157ms 152ms +3.3% 159ms 154ms +3.3% WithRaiseExcept: 123ms 125ms -1.1% 125ms 126ms -1.2% ------------------------------------------------------------------------------- Totals: 6043ms 6182ms -2.2% 6185ms 6301ms -1.9% (this=pybench.out, other=../build_orig/pybench.out) 2to3 times: Before: $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null real 0m56.685s user 0m55.620s sys 0m0.380s After: $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null real 0m55.067s user 0m53.843s sys 0m0.376s == 3% faster Gory details: The meat of the patch is: @@ -884,11 +891,12 @@ fast_next_opcode: f->f_lasti = INSTR_OFFSET(); /* line-by-line tracing support */ - if (tstate->c_tracefunc != NULL && !tstate->tracing) { + if (_Py_TracingPossible && + tstate->c_tracefunc != NULL && !tstate->tracing) { This converts the generated assembly (produced with `gcc -S -dA ...`, then manually annotated a bit) from: # basic block 17 # ../Python/ceval.c:885 LM541: movl 8(%ebp), %ecx LVL319: subl -316(%ebp), %edx movl %edx, 60(%ecx) # ../Python/ceval.c:889 LM542: # %esi = tstate movl -336(%ebp), %esi LVL320: # %eax = tstate->c_tracefunc movl 28(%esi), %eax LVL321: # if tstate->c_tracefunc == 0 testl %eax, %eax # goto past-if () je L567 # more if conditions here to: # basic block 17 # ../Python/ceval.c:889 LM542: movl 8(%ebp), %ecx LVL319: subl -316(%ebp), %edx movl %edx, 60(%ecx) # ../Python/ceval.c:893 LM543: # %eax = _Py_TracingPossible movl __Py_TracingPossible-"L00000000033$pb"(%ebx), %eax LVL320: # if _Py_TracingPossible != 0 testl %eax, %eax # goto rest-of-if (nearby) jne L2321 # opcode = NEXTOP(); continues here The branch should be predicted accurately either way, so there are 2 things that may be contributing to the performance change. First, adding the global caching variable halves the amount of memory that has to be read to check the prediction. The memory that is read is still read one instruction before it's used, but adding a local variable to read the memory earlier doesn't affect the performance. Without the global variable, the compiler puts the tracing code immediately after the if; with the global, it moves it away and puts the non-tracing code immediately after the first test in the if. This may affect branch prediction and may affect the icache. I tried using gcc's __builtin_expect() to ensure that the tracing code is always out-of-line. This moved it much farther away and cost about 1% in performance (i.e. 1% instead of 2% faster than "before"). I don't know why the __builtin_expect() version would be slower. If anyone feels inspired to test this out on another processor or compiler version, let me know how it goes. Jeffrey -------------- next part -------------- A non-text attachment was scrubbed... Name: fast-tracing.diff Type: application/octet-stream Size: 1658 bytes Desc: not available URL: From brett at python.org Mon Dec 1 05:14:45 2008 From: brett at python.org (Brett Cannon) Date: Sun, 30 Nov 2008 20:14:45 -0800 Subject: [Python-Dev] Patch to speed up non-tracing case in PyEval_EvalFrameEx (2% on pybench) In-Reply-To: <5d44f72f0811301754jffacbe7ubf4864049ff6d09e@mail.gmail.com> References: <5d44f72f0811301754jffacbe7ubf4864049ff6d09e@mail.gmail.com> Message-ID: Can you toss the patch into the issue tracker, Jeffrey, so that any patch comments can be done there? -Brett On Sun, Nov 30, 2008 at 17:54, Jeffrey Yasskin wrote: > Tracing support shows up fairly heavily an a Python profile, even > though it's nearly always turned off. The attached patch against the > trunk speeds up PyBench by 2% for me. All tests pass. I have 2 > questions: > > 1) Can other people corroborate this speedup on their machines? I'm > running on a Macbook Pro (Intel Core2 processor, probably Merom) with > a 32-bit build from Apple's gcc-4.0.1. (Apple's gcc consistently > produces a faster python than gcc-4.3.) > > 2) Assuming this speeds things up for most people, should I check it > in anywhere besides the trunk? I assume it's out for 3.0; is it in for > 2.6.1 or 3.0.1? > > > > Pybench output: > > ------------------------------------------------------------------------------- > PYBENCH 2.0 > ------------------------------------------------------------------------------- > * using CPython 2.7a0 (trunk:67458M, Nov 30 2008, 17:14:10) [GCC 4.0.1 > (Apple Inc. build 5488)] > * disabled garbage collection > * system check interval set to maximum: 2147483647 > * using timer: time.time > > ------------------------------------------------------------------------------- > Benchmark: pybench.out > ------------------------------------------------------------------------------- > > Rounds: 10 > Warp: 10 > Timer: time.time > > Machine Details: > Platform ID: Darwin-9.5.0-i386-32bit > Processor: i386 > > Python: > Implementation: CPython > Executable: > /Users/jyasskin/src/python/trunk-fast-tracing/build/python.exe > Version: 2.7.0 > Compiler: GCC 4.0.1 (Apple Inc. build 5488) > Bits: 32bit > Build: Nov 30 2008 17:14:10 (#trunk:67458M) > Unicode: UCS2 > > > ------------------------------------------------------------------------------- > Comparing with: ../build_orig/pybench.out > ------------------------------------------------------------------------------- > > Rounds: 10 > Warp: 10 > Timer: time.time > > Machine Details: > Platform ID: Darwin-9.5.0-i386-32bit > Processor: i386 > > Python: > Implementation: CPython > Executable: > /Users/jyasskin/src/python/trunk-fast-tracing/build_orig/python.exe > Version: 2.7.0 > Compiler: GCC 4.0.1 (Apple Inc. build 5488) > Bits: 32bit > Build: Nov 30 2008 13:51:09 (#trunk:67458) > Unicode: UCS2 > > > Test minimum run-time average run-time > this other diff this other diff > ------------------------------------------------------------------------------- > BuiltinFunctionCalls: 127ms 130ms -2.4% 129ms 132ms -2.1% > BuiltinMethodLookup: 90ms 93ms -3.2% 91ms 94ms -3.1% > CompareFloats: 88ms 91ms -3.3% 89ms 93ms -4.3% > CompareFloatsIntegers: 97ms 99ms -2.1% 97ms 100ms -2.4% > CompareIntegers: 79ms 82ms -4.2% 79ms 85ms -6.1% > CompareInternedStrings: 90ms 92ms -2.4% 94ms 94ms -0.9% > CompareLongs: 86ms 83ms +3.6% 87ms 84ms +3.5% > CompareStrings: 80ms 82ms -3.1% 81ms 83ms -2.3% > CompareUnicode: 103ms 105ms -2.3% 106ms 108ms -1.5% > ComplexPythonFunctionCalls: 139ms 137ms +1.3% 140ms 139ms +0.1% > ConcatStrings: 142ms 151ms -6.0% 156ms 154ms +1.1% > ConcatUnicode: 87ms 92ms -5.4% 89ms 94ms -5.7% > CreateInstances: 142ms 144ms -1.4% 144ms 145ms -1.1% > CreateNewInstances: 107ms 109ms -2.3% 108ms 111ms -2.1% > CreateStringsWithConcat: 114ms 137ms -17.1% 117ms 139ms -16.0% > CreateUnicodeWithConcat: 92ms 101ms -9.2% 95ms 102ms -7.2% > DictCreation: 77ms 81ms -4.4% 80ms 85ms -5.9% > DictWithFloatKeys: 91ms 107ms -14.5% 93ms 109ms -14.6% > DictWithIntegerKeys: 95ms 94ms +1.4% 108ms 96ms +12.3% > DictWithStringKeys: 83ms 88ms -5.8% 84ms 88ms -4.7% > ForLoops: 72ms 72ms -0.1% 79ms 74ms +5.8% > IfThenElse: 83ms 80ms +3.9% 85ms 80ms +5.3% > ListSlicing: 117ms 118ms -0.7% 118ms 121ms -1.8% > NestedForLoops: 116ms 119ms -2.4% 121ms 121ms +0.0% > NormalClassAttribute: 106ms 115ms -7.7% 108ms 117ms -7.7% > NormalInstanceAttribute: 96ms 98ms -2.3% 97ms 100ms -3.1% > PythonFunctionCalls: 92ms 95ms -3.7% 94ms 99ms -5.2% > PythonMethodCalls: 147ms 147ms +0.1% 152ms 149ms +2.1% > Recursion: 135ms 136ms -0.3% 140ms 144ms -2.9% > SecondImport: 101ms 99ms +2.1% 103ms 101ms +2.2% > SecondPackageImport: 107ms 103ms +3.5% 108ms 104ms +3.3% > SecondSubmoduleImport: 134ms 134ms +0.3% 136ms 136ms -0.0% > SimpleComplexArithmetic: 105ms 111ms -5.0% 110ms 112ms -1.4% > SimpleDictManipulation: 95ms 106ms -10.6% 96ms 109ms -12.0% > SimpleFloatArithmetic: 90ms 99ms -9.3% 93ms 102ms -8.2% > SimpleIntFloatArithmetic: 78ms 76ms +2.3% 79ms 77ms +2.0% > SimpleIntegerArithmetic: 78ms 77ms +1.8% 79ms 77ms +2.0% > SimpleListManipulation: 80ms 78ms +2.4% 80ms 79ms +1.9% > SimpleLongArithmetic: 110ms 113ms -2.0% 111ms 113ms -2.1% > SmallLists: 128ms 117ms +9.5% 130ms 124ms +4.9% > SmallTuples: 115ms 114ms +1.7% 117ms 114ms +2.2% > SpecialClassAttribute: 101ms 112ms -10.3% 104ms 114ms -8.9% > SpecialInstanceAttribute: 173ms 177ms -1.9% 176ms 179ms -1.6% > StringMappings: 165ms 167ms -1.2% 168ms 169ms -0.5% > StringPredicates: 126ms 134ms -5.7% 127ms 134ms -5.6% > StringSlicing: 125ms 123ms +1.9% 131ms 130ms +0.7% > TryExcept: 79ms 80ms -0.6% 80ms 80ms -0.8% > TryFinally: 110ms 107ms +3.0% 111ms 112ms -1.1% > TryRaiseExcept: 99ms 101ms -1.6% 100ms 102ms -1.7% > TupleSlicing: 127ms 127ms +0.6% 137ms 137ms +0.0% > UnicodeMappings: 144ms 144ms -0.3% 145ms 145ms -0.4% > UnicodePredicates: 116ms 114ms +1.3% 117ms 115ms +1.1% > UnicodeProperties: 106ms 102ms +3.6% 107ms 104ms +3.1% > UnicodeSlicing: 95ms 111ms -14.0% 99ms 112ms -11.8% > WithFinally: 157ms 152ms +3.3% 159ms 154ms +3.3% > WithRaiseExcept: 123ms 125ms -1.1% 125ms 126ms -1.2% > ------------------------------------------------------------------------------- > Totals: 6043ms 6182ms -2.2% 6185ms 6301ms -1.9% > > (this=pybench.out, other=../build_orig/pybench.out) > > > 2to3 times: > > Before: > $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null > real 0m56.685s > user 0m55.620s > sys 0m0.380s > > After: > $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null > real 0m55.067s > user 0m53.843s > sys 0m0.376s > > == 3% faster > > > Gory details: > > The meat of the patch is: > @@ -884,11 +891,12 @@ > fast_next_opcode: > f->f_lasti = INSTR_OFFSET(); > > /* line-by-line tracing support */ > > - if (tstate->c_tracefunc != NULL && !tstate->tracing) { > + if (_Py_TracingPossible && > + tstate->c_tracefunc != NULL && !tstate->tracing) { > > > This converts the generated assembly (produced with `gcc -S -dA ...`, > then manually annotated a bit) from: > > # basic block 17 > # ../Python/ceval.c:885 > LM541: > movl 8(%ebp), %ecx > LVL319: > subl -316(%ebp), %edx > movl %edx, 60(%ecx) > # ../Python/ceval.c:889 > LM542: > # %esi = tstate > movl -336(%ebp), %esi > LVL320: > # %eax = tstate->c_tracefunc > movl 28(%esi), %eax > LVL321: > # if tstate->c_tracefunc == 0 > testl %eax, %eax > # goto past-if () > je L567 > # more if conditions here > > to: > > # basic block 17 > # ../Python/ceval.c:889 > LM542: > movl 8(%ebp), %ecx > LVL319: > subl -316(%ebp), %edx > movl %edx, 60(%ecx) > # ../Python/ceval.c:893 > LM543: > # %eax = _Py_TracingPossible > movl __Py_TracingPossible-"L00000000033$pb"(%ebx), %eax > LVL320: > # if _Py_TracingPossible != 0 > testl %eax, %eax > # goto rest-of-if (nearby) > jne L2321 > # opcode = NEXTOP(); continues here > > > The branch should be predicted accurately either way, so there are 2 > things that may be contributing to the performance change. > > First, adding the global caching variable halves the amount of memory > that has to be read to check the prediction. The memory that is read > is still read one instruction before it's used, but adding a local > variable to read the memory earlier doesn't affect the performance. > > Without the global variable, the compiler puts the tracing code > immediately after the if; with the global, it moves it away and puts > the non-tracing code immediately after the first test in the if. This > may affect branch prediction and may affect the icache. I tried using > gcc's __builtin_expect() to ensure that the tracing code is always > out-of-line. This moved it much farther away and cost about 1% in > performance (i.e. 1% instead of 2% faster than "before"). I don't know > why the __builtin_expect() version would be slower. If anyone feels > inspired to test this out on another processor or compiler version, > let me know how it goes. > > Jeffrey > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > > From jyasskin at gmail.com Mon Dec 1 05:34:27 2008 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sun, 30 Nov 2008 20:34:27 -0800 Subject: [Python-Dev] Patch to speed up non-tracing case in PyEval_EvalFrameEx (2% on pybench) In-Reply-To: References: <5d44f72f0811301754jffacbe7ubf4864049ff6d09e@mail.gmail.com> Message-ID: <5d44f72f0811302034u541a5021l6420c8bdd3f2b0ba@mail.gmail.com> Done: http://bugs.python.org/issue4477 On Sun, Nov 30, 2008 at 8:14 PM, Brett Cannon wrote: > Can you toss the patch into the issue tracker, Jeffrey, so that any > patch comments can be done there? > > -Brett > > On Sun, Nov 30, 2008 at 17:54, Jeffrey Yasskin wrote: >> Tracing support shows up fairly heavily an a Python profile, even >> though it's nearly always turned off. The attached patch against the >> trunk speeds up PyBench by 2% for me. All tests pass. I have 2 >> questions: >> >> 1) Can other people corroborate this speedup on their machines? I'm >> running on a Macbook Pro (Intel Core2 processor, probably Merom) with >> a 32-bit build from Apple's gcc-4.0.1. (Apple's gcc consistently >> produces a faster python than gcc-4.3.) >> >> 2) Assuming this speeds things up for most people, should I check it >> in anywhere besides the trunk? I assume it's out for 3.0; is it in for >> 2.6.1 or 3.0.1? >> >> >> >> Pybench output: >> >> ------------------------------------------------------------------------------- >> PYBENCH 2.0 >> ------------------------------------------------------------------------------- >> * using CPython 2.7a0 (trunk:67458M, Nov 30 2008, 17:14:10) [GCC 4.0.1 >> (Apple Inc. build 5488)] >> * disabled garbage collection >> * system check interval set to maximum: 2147483647 >> * using timer: time.time >> >> ------------------------------------------------------------------------------- >> Benchmark: pybench.out >> ------------------------------------------------------------------------------- >> >> Rounds: 10 >> Warp: 10 >> Timer: time.time >> >> Machine Details: >> Platform ID: Darwin-9.5.0-i386-32bit >> Processor: i386 >> >> Python: >> Implementation: CPython >> Executable: >> /Users/jyasskin/src/python/trunk-fast-tracing/build/python.exe >> Version: 2.7.0 >> Compiler: GCC 4.0.1 (Apple Inc. build 5488) >> Bits: 32bit >> Build: Nov 30 2008 17:14:10 (#trunk:67458M) >> Unicode: UCS2 >> >> >> ------------------------------------------------------------------------------- >> Comparing with: ../build_orig/pybench.out >> ------------------------------------------------------------------------------- >> >> Rounds: 10 >> Warp: 10 >> Timer: time.time >> >> Machine Details: >> Platform ID: Darwin-9.5.0-i386-32bit >> Processor: i386 >> >> Python: >> Implementation: CPython >> Executable: >> /Users/jyasskin/src/python/trunk-fast-tracing/build_orig/python.exe >> Version: 2.7.0 >> Compiler: GCC 4.0.1 (Apple Inc. build 5488) >> Bits: 32bit >> Build: Nov 30 2008 13:51:09 (#trunk:67458) >> Unicode: UCS2 >> >> >> Test minimum run-time average run-time >> this other diff this other diff >> ------------------------------------------------------------------------------- >> BuiltinFunctionCalls: 127ms 130ms -2.4% 129ms 132ms -2.1% >> BuiltinMethodLookup: 90ms 93ms -3.2% 91ms 94ms -3.1% >> CompareFloats: 88ms 91ms -3.3% 89ms 93ms -4.3% >> CompareFloatsIntegers: 97ms 99ms -2.1% 97ms 100ms -2.4% >> CompareIntegers: 79ms 82ms -4.2% 79ms 85ms -6.1% >> CompareInternedStrings: 90ms 92ms -2.4% 94ms 94ms -0.9% >> CompareLongs: 86ms 83ms +3.6% 87ms 84ms +3.5% >> CompareStrings: 80ms 82ms -3.1% 81ms 83ms -2.3% >> CompareUnicode: 103ms 105ms -2.3% 106ms 108ms -1.5% >> ComplexPythonFunctionCalls: 139ms 137ms +1.3% 140ms 139ms +0.1% >> ConcatStrings: 142ms 151ms -6.0% 156ms 154ms +1.1% >> ConcatUnicode: 87ms 92ms -5.4% 89ms 94ms -5.7% >> CreateInstances: 142ms 144ms -1.4% 144ms 145ms -1.1% >> CreateNewInstances: 107ms 109ms -2.3% 108ms 111ms -2.1% >> CreateStringsWithConcat: 114ms 137ms -17.1% 117ms 139ms -16.0% >> CreateUnicodeWithConcat: 92ms 101ms -9.2% 95ms 102ms -7.2% >> DictCreation: 77ms 81ms -4.4% 80ms 85ms -5.9% >> DictWithFloatKeys: 91ms 107ms -14.5% 93ms 109ms -14.6% >> DictWithIntegerKeys: 95ms 94ms +1.4% 108ms 96ms +12.3% >> DictWithStringKeys: 83ms 88ms -5.8% 84ms 88ms -4.7% >> ForLoops: 72ms 72ms -0.1% 79ms 74ms +5.8% >> IfThenElse: 83ms 80ms +3.9% 85ms 80ms +5.3% >> ListSlicing: 117ms 118ms -0.7% 118ms 121ms -1.8% >> NestedForLoops: 116ms 119ms -2.4% 121ms 121ms +0.0% >> NormalClassAttribute: 106ms 115ms -7.7% 108ms 117ms -7.7% >> NormalInstanceAttribute: 96ms 98ms -2.3% 97ms 100ms -3.1% >> PythonFunctionCalls: 92ms 95ms -3.7% 94ms 99ms -5.2% >> PythonMethodCalls: 147ms 147ms +0.1% 152ms 149ms +2.1% >> Recursion: 135ms 136ms -0.3% 140ms 144ms -2.9% >> SecondImport: 101ms 99ms +2.1% 103ms 101ms +2.2% >> SecondPackageImport: 107ms 103ms +3.5% 108ms 104ms +3.3% >> SecondSubmoduleImport: 134ms 134ms +0.3% 136ms 136ms -0.0% >> SimpleComplexArithmetic: 105ms 111ms -5.0% 110ms 112ms -1.4% >> SimpleDictManipulation: 95ms 106ms -10.6% 96ms 109ms -12.0% >> SimpleFloatArithmetic: 90ms 99ms -9.3% 93ms 102ms -8.2% >> SimpleIntFloatArithmetic: 78ms 76ms +2.3% 79ms 77ms +2.0% >> SimpleIntegerArithmetic: 78ms 77ms +1.8% 79ms 77ms +2.0% >> SimpleListManipulation: 80ms 78ms +2.4% 80ms 79ms +1.9% >> SimpleLongArithmetic: 110ms 113ms -2.0% 111ms 113ms -2.1% >> SmallLists: 128ms 117ms +9.5% 130ms 124ms +4.9% >> SmallTuples: 115ms 114ms +1.7% 117ms 114ms +2.2% >> SpecialClassAttribute: 101ms 112ms -10.3% 104ms 114ms -8.9% >> SpecialInstanceAttribute: 173ms 177ms -1.9% 176ms 179ms -1.6% >> StringMappings: 165ms 167ms -1.2% 168ms 169ms -0.5% >> StringPredicates: 126ms 134ms -5.7% 127ms 134ms -5.6% >> StringSlicing: 125ms 123ms +1.9% 131ms 130ms +0.7% >> TryExcept: 79ms 80ms -0.6% 80ms 80ms -0.8% >> TryFinally: 110ms 107ms +3.0% 111ms 112ms -1.1% >> TryRaiseExcept: 99ms 101ms -1.6% 100ms 102ms -1.7% >> TupleSlicing: 127ms 127ms +0.6% 137ms 137ms +0.0% >> UnicodeMappings: 144ms 144ms -0.3% 145ms 145ms -0.4% >> UnicodePredicates: 116ms 114ms +1.3% 117ms 115ms +1.1% >> UnicodeProperties: 106ms 102ms +3.6% 107ms 104ms +3.1% >> UnicodeSlicing: 95ms 111ms -14.0% 99ms 112ms -11.8% >> WithFinally: 157ms 152ms +3.3% 159ms 154ms +3.3% >> WithRaiseExcept: 123ms 125ms -1.1% 125ms 126ms -1.2% >> ------------------------------------------------------------------------------- >> Totals: 6043ms 6182ms -2.2% 6185ms 6301ms -1.9% >> >> (this=pybench.out, other=../build_orig/pybench.out) >> >> >> 2to3 times: >> >> Before: >> $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null >> real 0m56.685s >> user 0m55.620s >> sys 0m0.380s >> >> After: >> $ time ./python.exe ~/src/2to3/2to3 -f all ~/src/2to3/ >/dev/null >> real 0m55.067s >> user 0m53.843s >> sys 0m0.376s >> >> == 3% faster >> >> >> Gory details: >> >> The meat of the patch is: >> @@ -884,11 +891,12 @@ >> fast_next_opcode: >> f->f_lasti = INSTR_OFFSET(); >> >> /* line-by-line tracing support */ >> >> - if (tstate->c_tracefunc != NULL && !tstate->tracing) { >> + if (_Py_TracingPossible && >> + tstate->c_tracefunc != NULL && !tstate->tracing) { >> >> >> This converts the generated assembly (produced with `gcc -S -dA ...`, >> then manually annotated a bit) from: >> >> # basic block 17 >> # ../Python/ceval.c:885 >> LM541: >> movl 8(%ebp), %ecx >> LVL319: >> subl -316(%ebp), %edx >> movl %edx, 60(%ecx) >> # ../Python/ceval.c:889 >> LM542: >> # %esi = tstate >> movl -336(%ebp), %esi >> LVL320: >> # %eax = tstate->c_tracefunc >> movl 28(%esi), %eax >> LVL321: >> # if tstate->c_tracefunc == 0 >> testl %eax, %eax >> # goto past-if () >> je L567 >> # more if conditions here >> >> to: >> >> # basic block 17 >> # ../Python/ceval.c:889 >> LM542: >> movl 8(%ebp), %ecx >> LVL319: >> subl -316(%ebp), %edx >> movl %edx, 60(%ecx) >> # ../Python/ceval.c:893 >> LM543: >> # %eax = _Py_TracingPossible >> movl __Py_TracingPossible-"L00000000033$pb"(%ebx), %eax >> LVL320: >> # if _Py_TracingPossible != 0 >> testl %eax, %eax >> # goto rest-of-if (nearby) >> jne L2321 >> # opcode = NEXTOP(); continues here >> >> >> The branch should be predicted accurately either way, so there are 2 >> things that may be contributing to the performance change. >> >> First, adding the global caching variable halves the amount of memory >> that has to be read to check the prediction. The memory that is read >> is still read one instruction before it's used, but adding a local >> variable to read the memory earlier doesn't affect the performance. >> >> Without the global variable, the compiler puts the tracing code >> immediately after the if; with the global, it moves it away and puts >> the non-tracing code immediately after the first test in the if. This >> may affect branch prediction and may affect the icache. I tried using >> gcc's __builtin_expect() to ensure that the tracing code is always >> out-of-line. This moved it much farther away and cost about 1% in >> performance (i.e. 1% instead of 2% faster than "before"). I don't know >> why the __builtin_expect() version would be slower. If anyone feels >> inspired to test this out on another processor or compiler version, >> let me know how it goes. >> >> Jeffrey >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org >> >> > -- Namast?, Jeffrey Yasskin http://jeffrey.yasskin.info/ From gruszczy at gmail.com Mon Dec 1 10:30:27 2008 From: gruszczy at gmail.com (=?UTF-8?Q?Filip_Gruszczy=C5=84ski?=) Date: Mon, 1 Dec 2008 10:30:27 +0100 Subject: [Python-Dev] Attribute error: providing type name In-Reply-To: References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com> <4932F901.6070803@gmail.com> <1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com> <49330AA9.7070005@gmail.com> <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com> Message-ID: <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com> > Yes, but he should be able to change it in one place (in sip, the C++ > to Python wrapper generator he's also authored and uses for PyQt) AND > it would make sip even better, so he may want to put it on his > backlog. He does. It is supposed to appear in 4.8. So I guess that's it, thanks a lot for your help. -- Filip Gruszczy?ski From kristjan at ccpgames.com Mon Dec 1 16:32:24 2008 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Mon, 1 Dec 2008 15:32:24 +0000 Subject: [Python-Dev] Python under valgrind In-Reply-To: <492FF774.1050101@avl.com> References: <492FE636.50905@avl.com> <492FF774.1050101@avl.com> Message-ID: <4E9372E6B2234D4F859320D896059A9510E0D7D122@exchis.ccp.ad.local> Probably because of the object memory allocator. It reads the start of memory pages to see if a block belongs tot the obmalloc system or not. You want to remove the following line: #define WITH_PYMALLOC 1 >From pyconfig.h if you intend to run using valgrind or say, purify. K -----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Hrvoje Niksic Sent: 28. n?vember 2008 13:52 Cc: Python-Dev Subject: Re: [Python-Dev] Python under valgrind Amaury Forgeot d'Arc wrote: > Did you use the suppressions file as suggested in Misc/README.valgrind? Thanks for the suggestion (as well as to Gustavo and Victor), but my question wasn't about how to suppress the messages, but about why the messages appear in the first place. I think my last paragraph answers my own question, but I'm not sure. _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com From aleaxit at gmail.com Mon Dec 1 16:35:34 2008 From: aleaxit at gmail.com (Alex Martelli) Date: Mon, 1 Dec 2008 07:35:34 -0800 Subject: [Python-Dev] Attribute error: providing type name In-Reply-To: <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com> References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com> <4932F901.6070803@gmail.com> <1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com> <49330AA9.7070005@gmail.com> <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com> <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com> Message-ID: I wonder if there's some desiderata left for future Python versions to make this standard behavior easier (for C-coded, Python-coded, and Cython-coded classes, ones made by SWIG, etc) without too much black magic... Alex On Mon, Dec 1, 2008 at 1:30 AM, Filip Gruszczy?ski wrote: >> Yes, but he should be able to change it in one place (in sip, the C++ >> to Python wrapper generator he's also authored and uses for PyQt) AND >> it would make sip even better, so he may want to put it on his >> backlog. > > He does. It is supposed to appear in 4.8. So I guess that's it, thanks > a lot for your help. > > -- > Filip Gruszczy?ski > From dinov at microsoft.com Mon Dec 1 18:56:08 2008 From: dinov at microsoft.com (Dino Viehland) Date: Mon, 1 Dec 2008 09:56:08 -0800 Subject: [Python-Dev] format specification mini-language docs... In-Reply-To: <492BF1C4.4050807@trueblade.com> References: <350E7D38B6D819428718949920EC235556486D00A7@NA-EXMSG-C102.redmond.corp.microsoft.com> <492BF1C4.4050807@trueblade.com> Message-ID: <350E7D38B6D819428718949920EC2355564A332869@NA-EXMSG-C102.redmond.corp.microsoft.com> Yep, after the thanksgiving delay I've opened bug #4482 (http://bugs.python.org/issue4482). I either don't know how to or don't have the power to change who a bug is assigned to so it appears to be currently unassigned. -----Original Message----- From: Eric Smith [mailto:eric at trueblade.com] Sent: Tuesday, November 25, 2008 4:38 AM To: Dino Viehland Cc: python-dev at python.org dev Subject: Re: [Python-Dev] format specification mini-language docs... Dino Viehland wrote: > Finally providing any sign character seems to cause +1.0#INF and friends to be returned instead of inf as is documented: > >>>> 10e667.__format__('+') > '+1.0#INF' >>>> 10e667.__format__('') > 'inf' > > > Are these just doc bugs? The inf issue is the only one that seems particularly weird to me. I think the inf one is a bug. Would you mind opening a bug and assigning it to me? Thanks. Eric. From eric at trueblade.com Mon Dec 1 19:15:03 2008 From: eric at trueblade.com (Eric Smith) Date: Mon, 01 Dec 2008 13:15:03 -0500 Subject: [Python-Dev] format specification mini-language docs... In-Reply-To: <350E7D38B6D819428718949920EC2355564A332869@NA-EXMSG-C102.redmond.corp.microsoft.com> References: <350E7D38B6D819428718949920EC235556486D00A7@NA-EXMSG-C102.redmond.corp.microsoft.com> <492BF1C4.4050807@trueblade.com> <350E7D38B6D819428718949920EC2355564A332869@NA-EXMSG-C102.redmond.corp.microsoft.com> Message-ID: <493429A7.4020004@trueblade.com> Dino Viehland wrote: > Yep, after the thanksgiving delay I've opened bug #4482 (http://bugs.python.org/issue4482). Thanks! > I either don't know how to or don't have the power to change who a bug is assigned to so it appears to be currently unassigned. I'll take care of it. Eric. > -----Original Message----- > From: Eric Smith [mailto:eric at trueblade.com] > Sent: Tuesday, November 25, 2008 4:38 AM > To: Dino Viehland > Cc: python-dev at python.org dev > Subject: Re: [Python-Dev] format specification mini-language docs... > > Dino Viehland wrote: > > >> Finally providing any sign character seems to cause +1.0#INF and friends to be returned instead of inf as is documented: >> >>>>> 10e667.__format__('+') >> '+1.0#INF' >>>>> 10e667.__format__('') >> 'inf' >> >> >> Are these just doc bugs? The inf issue is the only one that seems particularly weird to me. > > I think the inf one is a bug. Would you mind opening a bug and assigning > it to me? Thanks. > > Eric. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/eric%2Bpython-dev%40trueblade.com > From gerald.koenig at hp.com Mon Dec 1 20:05:01 2008 From: gerald.koenig at hp.com (Koenig, Gerald) Date: Mon, 1 Dec 2008 19:05:01 +0000 Subject: [Python-Dev] Python for windows. In-Reply-To: <49331277.10003@v.loewis.de> References: <90bb445a0811260928u5a6b5c36ib4b6947472d2b2be@mail.gmail.com> <238A96A773B3934685A7269CC8A8D0423F7FBF71E9@GVW0436EXB.americas.hpqcorp.net> <238A96A773B3934685A7269CC8A8D0423F7FBF728C@GVW0436EXB.americas.hpqcorp.net> <042401c95012$3bff99d0$b3fecd70$@com.au> <492DCE5E.5080602@v.loewis.de> <043e01c95019$9955a0a0$cc00e1e0$@com.au> <492DDE40.2040206@v.loewis.de> <045801c9503a$8e85d2f0$ab9178d0$@com.au> <492F2788.7040300@canterbury.ac.nz> <04ce01c9510d$3132d750$939885f0$@com.au> <492FBB2C.5000309@gmail.com> <053201c95337$fa09e930$ee1dbb90$@com.au> <49331277.10003@v.loewis.de> Message-ID: <238A96A773B3934685A7269CC8A8D0423F80619F94@GVW0436EXB.americas.hpqcorp.net> Hi all, I didn't look at the thread until this morning. The OEM ready program required that the installed force to program files. But as we preinstalled we use your msi with a normal parameter: python-2.5.2.msi TARGETDIR=c:\program files\python" That why I didn't ask you about that. WE have done already few weeks of test and nothing is breaking up to now :) Now about the 2 others issues what will be the easier way to fix them properly ? - for the executable without manifest as we are on vista OS only I can add a manifest for vista outside the executable it should work. - for python_icon.exe I do not know what is calling it in start menu can you help me on that ? Gerald -----Original Message----- From: python-dev-bounces+gerald.koenig=hp.com at python.org [mailto:python-dev-bounces+gerald.koenig=hp.com at python.org] On Behalf Of "Martin v. L?wis" Sent: Sunday, November 30, 2008 2:24 PM To: mhammond at skippinet.com.au Cc: 'Nick Coghlan'; python-dev at python.org Subject: Re: [Python-Dev] Python for windows. > Of course, I don't object to that and still think we should help where we > can, but if that is true it would make the premise of this thread a little > misleading, as obviously HP could then make *any* necessary changes without > our agreement or even knowledge. Perhaps. However, "help where we can" is about right. If its only the changes HP discussed so far, I think we should be able to help. For the Program Files issue, without going into the discussion whether Python's defaults are good or not, I think there would be still a number of technical solutions (such as providing a merge module which changes the default). Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/gerald.koenig%40hp.com From gerald.koenig at hp.com Mon Dec 1 20:08:10 2008 From: gerald.koenig at hp.com (Koenig, Gerald) Date: Mon, 1 Dec 2008 19:08:10 +0000 Subject: [Python-Dev] Python for windows. References: <90bb445a0811260928u5a6b5c36ib4b6947472d2b2be@mail.gmail.com> <238A96A773B3934685A7269CC8A8D0423F7FBF71E9@GVW0436EXB.americas.hpqcorp.net> <238A96A773B3934685A7269CC8A8D0423F7FBF728C@GVW0436EXB.americas.hpqcorp.net> <042401c95012$3bff99d0$b3fecd70$@com.au> <492DCE5E.5080602@v.loewis.de> <043e01c95019$9955a0a0$cc00e1e0$@com.au> <492DDE40.2040206@v.loewis.de> <045801c9503a$8e85d2f0$ab9178d0$@com.au> <492F2788.7040300@canterbury.ac.nz> <04ce01c9510d$3132d750$939885f0$@com.au> <492FBB2C.5000309@gmail.com> <053201c95337$fa09e930$ee1dbb90$@com.au> <49331277.10003@v.loewis.de> Message-ID: <238A96A773B3934685A7269CC8A8D0423F80619F9F@GVW0436EXB.americas.hpqcorp.net> Mark, We do not install that on first boot. I can not tell how it is install but on first boot python is already there and installed properly Gerald -----Original Message----- From: Koenig, Gerald Sent: Monday, December 01, 2008 11:05 AM To: '"Martin v. L?wis"'; mhammond at skippinet.com.au Cc: 'Nick Coghlan'; python-dev at python.org Subject: RE: [Python-Dev] Python for windows. Hi all, I didn't look at the thread until this morning. The OEM ready program required that the installed force to program files. But as we preinstalled we use your msi with a normal parameter: python-2.5.2.msi TARGETDIR=c:\program files\python" That why I didn't ask you about that. WE have done already few weeks of test and nothing is breaking up to now :) Now about the 2 others issues what will be the easier way to fix them properly ? - for the executable without manifest as we are on vista OS only I can add a manifest for vista outside the executable it should work. - for python_icon.exe I do not know what is calling it in start menu can you help me on that ? Gerald -----Original Message----- From: python-dev-bounces+gerald.koenig=hp.com at python.org [mailto:python-dev-bounces+gerald.koenig=hp.com at python.org] On Behalf Of "Martin v. L?wis" Sent: Sunday, November 30, 2008 2:24 PM To: mhammond at skippinet.com.au Cc: 'Nick Coghlan'; python-dev at python.org Subject: Re: [Python-Dev] Python for windows. > Of course, I don't object to that and still think we should help where we > can, but if that is true it would make the premise of this thread a little > misleading, as obviously HP could then make *any* necessary changes without > our agreement or even knowledge. Perhaps. However, "help where we can" is about right. If its only the changes HP discussed so far, I think we should be able to help. For the Program Files issue, without going into the discussion whether Python's defaults are good or not, I think there would be still a number of technical solutions (such as providing a merge module which changes the default). Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/gerald.koenig%40hp.com From ncoghlan at gmail.com Mon Dec 1 22:20:36 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 02 Dec 2008 07:20:36 +1000 Subject: [Python-Dev] Attribute error: providing type name In-Reply-To: References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com> <4932F901.6070803@gmail.com> <1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com> <49330AA9.7070005@gmail.com> <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com> <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com> Message-ID: <49345524.2090409@gmail.com> Alex Martelli wrote: > I wonder if there's some desiderata left for future Python versions to > make this standard behavior easier (for C-coded, Python-coded, and > Cython-coded classes, ones made by SWIG, etc) without too much black > magic... Perhaps adding something like the following to the C API: void PyErr_FormatAttributeError(PyObject* type, char *attr) { PyErr_Format(PyExc_AttributeError, "object of type %.100s has no attribute '%.200s'", type->tp_name, attr); } This could also be exposed as a class method of AttributeError itself for use in Python code. (Interestingly, I noticed that there are still quite a few attribute errors at least in typeobject.c that don't provide any information on the type of the object that is missing an attribute - they appeared to mostly be obscure errors that will only turn up if something has gone very strange, but they're there) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From martin at v.loewis.de Mon Dec 1 23:55:37 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 01 Dec 2008 23:55:37 +0100 Subject: [Python-Dev] Python for windows. In-Reply-To: <238A96A773B3934685A7269CC8A8D0423F80619F94@GVW0436EXB.americas.hpqcorp.net> References: <90bb445a0811260928u5a6b5c36ib4b6947472d2b2be@mail.gmail.com> <238A96A773B3934685A7269CC8A8D0423F7FBF71E9@GVW0436EXB.americas.hpqcorp.net> <238A96A773B3934685A7269CC8A8D0423F7FBF728C@GVW0436EXB.americas.hpqcorp.net> <042401c95012$3bff99d0$b3fecd70$@com.au> <492DCE5E.5080602@v.loewis.de> <043e01c95019$9955a0a0$cc00e1e0$@com.au> <492DDE40.2040206@v.loewis.de> <045801c9503a$8e85d2f0$ab9178d0$@com.au> <492F2788.7040300@canterbury.ac.nz> <04ce01c9510d$3132d750$939885f0$@com.au> <492FBB2C.5000309@gmail.com> <053201c95337$fa09e930$ee1dbb90$@com.au> <49331277.10003@v.loewis.de> <238A96A773B3934685A7269CC8A8D0423F80619F94@GVW0436EXB.americas.hpqcorp.net> Message-ID: <49346B69.1030901@v.loewis.de> > The OEM ready program required that the installed force to program > files. But as we preinstalled we use your msi with a normal > parameter: python-2.5.2.msi TARGETDIR=c:\program files\python" I think the debate was about whether it can be "OEM ready", even though you still need to pass the TARGETDIR parameter. If it works for you, it works for me, of course. > Now about the 2 others issues what will be the easier way to fix them > properly ? - for the executable without manifest as we are on vista > OS only I can add a manifest for vista outside the executable it > should work. Please do submit an issue in the bug tracker atleast, asking that the files be renamed. Please confirm explicitly that renaming them would also solve the problem (assuming you are still talking about the files in distutils). > for python_icon.exe I do not know what is calling it > in start menu can you help me on that ? Please look in Tools/msi/msi.py for all occurrences of python_icon.exe. Regards, Martin From martin at v.loewis.de Tue Dec 2 00:10:57 2008 From: martin at v.loewis.de (=?ISO-8859-2?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 02 Dec 2008 00:10:57 +0100 Subject: [Python-Dev] Attribute error: providing type name In-Reply-To: References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com> <4932F901.6070803@gmail.com> <1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com> <49330AA9.7070005@gmail.com> <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com> <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com> Message-ID: <49346F01.2030607@v.loewis.de> Alex Martelli wrote: > I wonder if there's some desiderata left for future Python versions to > make this standard behavior easier (for C-coded, Python-coded, and > Cython-coded classes, ones made by SWIG, etc) without too much black > magic... I think the standard exception hierarchy should grow additional standard fields. E.g. AttributeError should have attributes 'type','name', or perhaps even 'object','name'. TypeError should have attributes 'expected', 'actual' (or, again, 'expected', 'object'). Also, some languages support nested exceptions (attribute 'inner'); usefulness of this concept should be reviewed. And so on - that might produce quite a large PEP. As 3.0 missed the chance to fix this, compatibility is also an issue. It might be possible to overload exception constructors on the number of parameters, or using keyword parameters for the new way of filling the exception. And no, I don't volunteer to write this PEP :-) Regards, Martin From tom at vector-seven.com Tue Dec 2 07:57:11 2008 From: tom at vector-seven.com (Thomas Lee) Date: Tue, 02 Dec 2008 17:57:11 +1100 Subject: [Python-Dev] Move encoding_decl to the top of Grammar/Grammar? Message-ID: <4934DC47.6040508@vector-seven.com> Hi all, Currently, Parser/parsetok.c has a dependency on graminit.h. This can cause headaches when rebuilding after adding new syntax to Grammar/Grammar because parsetok.c is part of pgen, which is responsible for *generating* graminit.h. This circular dependency can result in parsetok.c using a different value for encoding_decl to what is used in ast.c, which causes PyAST_FromNode to fall over at runtime. It effectively looks something like this: * Grammar/Grammar is modified * build begins -- pgen compiles, parsetok.c uses encoding_decl=X * graminit.h is rebuilt with encoding_decl=Y * ast.c is compiled using encoding_decl=Y * when python runs, parsetok() emits encoding_decl nodes that PyAST_FromNode can't recognize: SystemError: invalid node XXX for PyAST_FromNode A nice, easy short term solution that doesn't require unwinding this dependency would be to simply move encoding_decl to the top of Grammar/Grammar and add a big warning noting that it needs to come before everything else. This will help to ensure its value never changes when syntax is added/removed. I'm happy to provide a patch for this (including some additional dependency info for files dependent upon graminit.h and Python-ast.h), but was wondering if there were any opinions about how this should be resolved. Cheers, Tom From tom at vector-seven.com Tue Dec 2 09:00:16 2008 From: tom at vector-seven.com (Thomas Lee) Date: Tue, 02 Dec 2008 19:00:16 +1100 Subject: [Python-Dev] Move encoding_decl to the top of Grammar/Grammar? In-Reply-To: <4934DC47.6040508@vector-seven.com> References: <4934DC47.6040508@vector-seven.com> Message-ID: <4934EB10.2010100@vector-seven.com> Here's the corresponding tracker issue: http://bugs.python.org/issue4347 I've uploaded a patch there anyway, since I'm going to need this stuff working for a presentation I'm giving tomorrow. Cheers, T Thomas Lee wrote: > Hi all, > > Currently, Parser/parsetok.c has a dependency on graminit.h. This can > cause headaches when rebuilding after adding new syntax to > Grammar/Grammar because parsetok.c is part of pgen, which is > responsible for *generating* graminit.h. > > This circular dependency can result in parsetok.c using a different > value for encoding_decl to what is used in ast.c, which causes > PyAST_FromNode to fall over at runtime. It effectively looks something > like this: > > * Grammar/Grammar is modified > * build begins -- pgen compiles, parsetok.c uses encoding_decl=X > * graminit.h is rebuilt with encoding_decl=Y > * ast.c is compiled using encoding_decl=Y > * when python runs, parsetok() emits encoding_decl nodes that > PyAST_FromNode can't recognize: > > SystemError: invalid node XXX for PyAST_FromNode > > A nice, easy short term solution that doesn't require unwinding this > dependency would be to simply move encoding_decl to the top of > Grammar/Grammar and add a big warning noting that it needs to come > before everything else. This will help to ensure its value never > changes when syntax is added/removed. > > I'm happy to provide a patch for this (including some additional > dependency info for files dependent upon graminit.h and Python-ast.h), > but was wondering if there were any opinions about how this should be > resolved. > > Cheers, > Tom > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/tom%40vector-seven.com From ncoghlan at gmail.com Tue Dec 2 11:37:56 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 02 Dec 2008 20:37:56 +1000 Subject: [Python-Dev] Attribute error: providing type name In-Reply-To: <49346F01.2030607@v.loewis.de> References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com> <4932F901.6070803@gmail.com> <1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com> <49330AA9.7070005@gmail.com> <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com> <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com> <49346F01.2030607@v.loewis.de> Message-ID: <49351004.9090208@gmail.com> Martin v. L?wis wrote: > Alex Martelli wrote: > I think the standard exception hierarchy should grow additional standard > fields. E.g. AttributeError should have attributes 'type','name', or > perhaps even 'object','name'. TypeError should have attributes > 'expected', 'actual' (or, again, 'expected', 'object'). > And so on - that might produce quite a large PEP. I don't think there's any reason to do it in one big bang. And approached individually, each such alternate constructor is a small RFE consisting of: 1. Specific C API for creating exceptions of that type with a standard message and attributes 2. Python level class method 3. New attributes on the affected object Point 3 would be optional really, since most of the gain comes from the better error messages. If extra attributes were included in such an RFE, the potential lifecycle implications of including references to actual objects rather than merely their types makes the better choice fairly obvious to me (i.e. just include the type information, since it generally tells you everything you need to know for TypeErrors and AttributeErrors). > As 3.0 missed the > chance to fix this, compatibility is also an issue. It might be possible > to overload exception constructors on the number of parameters, or using > keyword parameters for the new way of filling the exception. Or go the traditional "multiple constructor" route and provide class methods for the alternative mechanisms. > And no, I don't volunteer to write this PEP :-) Assuming I understand what you mean by "nested exceptions" correctly, they should be covered by the __context__ and __cause__ attributes in Py3k: Exception context: =========================== >>> try: ... raise IOError ... except: ... raise AttributeError ... Traceback (most recent call last): File "", line 2, in IOError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 4, in AttributeError =========================== Exception cause: =========================== >>> raise AttributeError from KeyError KeyError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 1, in AttributeError =========================== Putting it all together: =========================== >>> try: ... raise IOError ... except: ... try: ... raise KeyError ... except Exception as ex: ... raise AttributeError from ex ... Traceback (most recent call last): File "", line 2, in IOError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 5, in KeyError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 7, in AttributeError =========================== Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From barry at python.org Tue Dec 2 21:31:58 2008 From: barry at python.org (Barry Warsaw) Date: Tue, 2 Dec 2008 15:31:58 -0500 Subject: [Python-Dev] Tomorrow's releases Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I believe we are on track for releasing Python 3.0 final and 2.6.1 tomorrow. There is just one release blocker for 3.0 left -- Guido needs to finish the What's New for 3.0. This is bug 2306. So that Martin can have something to work with when he wakes up tomorrow morning, I would like to tag and branch the tree some time today, Tuesday 02-Dec US/Eastern. Therefore I am freezing both the 2.6 and 3.0 trees, with special dispensation to Guido for the updated What's New. Ping me on irc @ freenode #python-dev if you have anything else to check in to either tree before then. As soon as I hear from Guido, or issue 2306 is closed, I'm branching 3.0 and tagging it for release. Great work everyone, we're almost there! - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTWbPnEjvBPtnXfVAQKtOgP9EZgGkE8/UY1IRn7j0l6vX6uqbPapg+9H MlBIZrA6mEbGiaDSvPRiwuo71jP5cg0u/xFRdDlGYl0GAzOEWvKCZVlVsndM4kbh 7UxHjHfkIOo4MUw4zz1NrJ4GRNgBQa52OOtiOKKkIhr/oMsg+GWv8Y9hRXYA9xue s8as2AQe2QU= =5j55 -----END PGP SIGNATURE----- From steve at holdenweb.com Wed Dec 3 04:39:23 2008 From: steve at holdenweb.com (Steve Holden) Date: Tue, 02 Dec 2008 22:39:23 -0500 Subject: [Python-Dev] Attribute error: providing type name In-Reply-To: <49351004.9090208@gmail.com> References: <1be78d220811301041o6f737b6q9088b4b8266cf56f@mail.gmail.com> <4932F901.6070803@gmail.com> <1be78d220811301339l407ba8advfe146dc8c1511370@mail.gmail.com> <49330AA9.7070005@gmail.com> <1be78d220811301402p4281e8b3wd05122dd4ea87a6@mail.gmail.com> <1be78d220812010130r6a9fe6afx9da597a168acf873@mail.gmail.com> <49346F01.2030607@v.loewis.de> <49351004.9090208@gmail.com> Message-ID: Nick Coghlan wrote: > Martin v. L?wis wrote: >> Alex Martelli wrote: >> I think the standard exception hierarchy should grow additional standard >> fields. E.g. AttributeError should have attributes 'type','name', or >> perhaps even 'object','name'. TypeError should have attributes >> 'expected', 'actual' (or, again, 'expected', 'object'). > >> And so on - that might produce quite a large PEP. > > I don't think there's any reason to do it in one big bang. And > approached individually, each such alternate constructor is a small RFE > consisting of: > > 1. Specific C API for creating exceptions of that type with a standard > message and attributes > 2. Python level class method > 3. New attributes on the affected object > > Point 3 would be optional really, since most of the gain comes from the > better error messages. If extra attributes were included in such an RFE, > the potential lifecycle implications of including references to actual > objects rather than merely their types makes the better choice fairly > obvious to me (i.e. just include the type information, since it > generally tells you everything you need to know for TypeErrors and > AttributeErrors). > >> As 3.0 missed the >> chance to fix this, compatibility is also an issue. It might be possible >> to overload exception constructors on the number of parameters, or using >> keyword parameters for the new way of filling the exception. > > Or go the traditional "multiple constructor" route and provide class > methods for the alternative mechanisms. > Bear in mind, though, that as new functionality none of this can go in before 3.1/2.7. So a PEP might not be a bad idea if only to establish best practices. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From alexander.belopolsky at gmail.com Wed Dec 3 04:44:43 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 2 Dec 2008 22:44:43 -0500 Subject: [Python-Dev] Accessing source code in zipped packages Message-ID: About a month ago, I submitted two patches that address Pdb and doctest inability to load source code from modules with custom loaders such as modules loaded from zip files: http://bugs.python.org/issue4201 http://bugs.python.org/issue4197 The patches are very simple, basically calls to linecache.getline() need to be provided with the module's dict to enable linecache to find the module's __loader__. Is there a chance that these patches could make it to 2.6.1? From g.brandl at gmx.net Wed Dec 3 07:52:02 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 03 Dec 2008 07:52:02 +0100 Subject: [Python-Dev] Accessing source code in zipped packages In-Reply-To: References: Message-ID: Alexander Belopolsky schrieb: > About a month ago, I submitted two patches that address Pdb and > doctest inability to load source code from modules with custom loaders > such as modules loaded from zip files: > > http://bugs.python.org/issue4201 > http://bugs.python.org/issue4197 > > The patches are very simple, basically calls to linecache.getline() > need to be provided with the module's dict to enable linecache to find > the module's __loader__. There is also http://bugs.python.org/issue4223 which goes in the same direction. Georg From amk at amk.ca Wed Dec 3 16:31:28 2008 From: amk at amk.ca (A.M. Kuchling) Date: Wed, 3 Dec 2008 10:31:28 -0500 Subject: [Python-Dev] Holding a Python Language Summit at PyCon Message-ID: <20081203153128.GA6161@amk-desktop.matrixgroup.net> The PyCon organizers are planning a Python Language Summit to be held in Chicago just before the conference, on Thursday March 26 2009. (This is the second day of tutorials, and the day before PyCon officially starts.) The purpose of the Python Language Summit is to let the developers of Python implementations discuss issues that affect us all, and to let the developers of a particular implementation discuss their own project-specific issues. PyCon brings a lot of the core developers together into one place and there's been a "Python core" sprint for a long time, but we haven't had a formal time and place for *discussion* among core developers. Attending the summit will be free; registration for PyCon is *not* included, but won't be required to attend the summit. I e-mailed some CPython, Jython, IronPython, PyPy, etc. developers asking for topic suggestions, and assembled a draft of a schedule from some of the most commonly mentioned topics; the current draft schedule is below. The schedule is very 'loose', leaving a fair bit of open space so that we can hopefully begin working on ideas arising from the discussion. * What do you think of the selected topics? * I'd like to have a champion for each session, who will make a brief presentation about the session's topic at the start, laying out the issues and possible courses of action to guide the resulting discussion. If you wish to volunteer as the champion for a session, please let me know. (Preference will be given to people actively working on the particular topic.) * For CPython, invitations will be sent to everyone with committer status (plus a few book authors, significant patch contributors who aren't committers yet, etc.). If you're not a committer but think you can contribute, please let me know privately. Also, please suggest other * There will probably be summit-related sponsorship opportunities for interested companies. Andrew M. Kuchling amk at amk.ca Registration Manager, PyCon 2009 http://us.pycon.org 9:00 - 10:30 ============= Open discussion session 11:00 - 12:30 ============= Transition plan for rest of 2.x series; goals for 2.7/3.1. - New features & future plans? - Is 2.7 last of the 2.x releases? - Unicode issues - Stdlib plans? Champion needed. 12:30 - 14:00 ============= Lunch (probably provided by the PSF or a sponsor). 14:00 - 15:30 ============= Two tracks: Cross-implementation issues: What do the various VMs want/need from CPython to help with their implementations? * Marking CPython-specific tests in the test suite? * Getting an implementation agnostic test suite for the Python language? * Separating the language tests and the pure Python part of the stdlib into a separate project? (Or publish them as a separate package.) * Transition plans for 3.0? Champion needed. Package distribution & installation. * setting up an organized network of mirrors ? la CPAN * adding a commenting system on PyPI * think about a reference implementation for a PyPI client in the stdlib (XML-RPC client+upload and register) * improvments on packaging matters - this includes distutils but also setuptools. Champion needed. 16:00 - 17:30 ============= Free space for sprinting, hacking, further discussion, etc. 18:00-ish ============= Group dinners. From ziade.tarek at gmail.com Wed Dec 3 16:48:12 2008 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 3 Dec 2008 16:48:12 +0100 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: <20081203153128.GA6161@amk-desktop.matrixgroup.net> References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> Message-ID: <94bdd2610812030748q28484a76j9c583d541ec076eb@mail.gmail.com> On Wed, Dec 3, 2008 at 4:31 PM, A.M. Kuchling wrote: > The PyCon organizers are planning a Python Language Summit to be held > in Chicago just before the conference, on Thursday March 26 2009. > (This is the second day of tutorials, and the day before PyCon > officially starts.) > [cut] > > Package distribution & installation. > > * setting up an organized network of mirrors ? la CPAN > * adding a commenting system on PyPI > * think about a reference implementation for a PyPI client in the > stdlib (XML-RPC client+upload and register) > * improvments on packaging matters - this includes distutils but > also setuptools. Hello, I'd like to volunteer for that part given the fact that I am currently working on the patches for the mirroring thing in a branch of PyPI. The work is described here : http://wiki.python.org/moin/PEP%20374 It changed a bit and I need to update it, but you get the idea there. I also have some work going on for distutils. You have a summary of the work going on in my blog http://tarekziade.wordpress.com/2008/11/26/python-package-distribution-my-current-work/ Regards Tarek > > Champion needed. > > > 16:00 - 17:30 > ============= > > Free space for sprinting, hacking, further discussion, etc. > > > 18:00-ish > ============= > > Group dinners. > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ziade.tarek%40gmail.com > -- Tarek Ziad? | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/ From ncoghlan at gmail.com Wed Dec 3 21:34:56 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 04 Dec 2008 06:34:56 +1000 Subject: [Python-Dev] Accessing source code in zipped packages In-Reply-To: References: Message-ID: <4936ED70.2040005@gmail.com> Georg Brandl wrote: > Alexander Belopolsky schrieb: >> About a month ago, I submitted two patches that address Pdb and >> doctest inability to load source code from modules with custom loaders >> such as modules loaded from zip files: >> >> http://bugs.python.org/issue4201 >> http://bugs.python.org/issue4197 >> >> The patches are very simple, basically calls to linecache.getline() >> need to be provided with the module's dict to enable linecache to find >> the module's __loader__. > > There is also http://bugs.python.org/issue4223 which goes in the same > direction. I've assigned all 3 of those to myself, since I've been meaning to look at some zipimport related stuff anyway (the things I'm looking at are 2.7/3.1 related though, so I was waiting for the 3.0 release to be cut first). We already missed the 2.6.1 deadline though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Wed Dec 3 21:44:38 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 04 Dec 2008 06:44:38 +1000 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: <94bdd2610812030748q28484a76j9c583d541ec076eb@mail.gmail.com> References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <94bdd2610812030748q28484a76j9c583d541ec076eb@mail.gmail.com> Message-ID: <4936EFB6.1080808@gmail.com> Tarek Ziad? wrote: > Hello, > > I'd like to volunteer for that part given the fact that I am currently > working on the patches > for the mirroring thing in a branch of PyPI. > > The work is described here : http://wiki.python.org/moin/PEP%20374 > It changed a bit and I need to update it, but you get the idea there. For the record, when working on a PEP draft on the Wiki or Google docs, it's worth asking the PEP editors (or any of the SVN committers really) to reserve a PEP number once things start to progress to the point where folks need a common shorthand reference to the document. PEP 374 for example, is already a placeholder for the SVN to DVCS migration PEP: http://www.python.org/dev/peps/pep-0374/ We aren't going to run out of PEP numbers anytime soon - it's OK if some of them get "wasted" on draft PEPs that end up getting abandoned. (Better that than having multiple draft PEPs being referred to with the same number as appears to be the case at the moment). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ziade.tarek at gmail.com Wed Dec 3 23:02:07 2008 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 3 Dec 2008 23:02:07 +0100 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: <4936EFB6.1080808@gmail.com> References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <94bdd2610812030748q28484a76j9c583d541ec076eb@mail.gmail.com> <4936EFB6.1080808@gmail.com> Message-ID: <94bdd2610812031402x152d5c7bjfb18c2c14fa5d411@mail.gmail.com> On Wed, Dec 3, 2008 at 9:44 PM, Nick Coghlan wrote: > Tarek Ziad? wrote: >> Hello, >> >> I'd like to volunteer for that part given the fact that I am currently >> working on the patches >> for the mirroring thing in a branch of PyPI. >> >> The work is described here : http://wiki.python.org/moin/PEP%20374 >> It changed a bit and I need to update it, but you get the idea there. > > For the record, when working on a PEP draft on the Wiki or Google docs, > it's worth asking the PEP editors (or any of the SVN committers really) > to reserve a PEP number once things start to progress to the point where > folks need a common shorthand reference to the document. > > PEP 374 for example, is already a placeholder for the SVN to DVCS > migration PEP: > http://www.python.org/dev/peps/pep-0374/ > Right, I'll ask for a number and change it accordingly; Regards Tarek From barry at python.org Thu Dec 4 02:51:33 2008 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Dec 2008 20:51:33 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On behalf of the Python development team and the Python community, I am happy to announce the release of Python 3.0 final. Python 3.0 (a.k.a. "Python 3000" or "Py3k") represents a major milestone in Python's history, and was nearly three years in the making. This is a new version of the language that is incompatible with the 2.x line of releases, while remaining true to BDFL Guido van Rossum's vision. Some things you will notice include: * Fixes to many old language warts * Removal of long deprecated features and redundant syntax * Improvements in, and a reorganization of, the standard library * Changes to the details of how built-in objects like strings and dicts work * ...and many more new features While these changes were made without concern for backward compatibility, Python 3.0 still remains very much "Pythonic". We are confident that Python 3.0 is of the same high quality as our previous releases, such as the recently announced Python 2.6. We will continue to support and develop both Python 3 and Python 2 for the foreseeable future, and you can safely choose either version (or both) to use in your projects. Which you choose depends on your own needs and the availability of third-party packages that you depend on. Some other things to consider: * Python 3 has a single Unicode string type; there are no more 8-bit strings * The C API has changed considerably in Python 3.0 and third-party extension modules you rely on may not yet be ported * Tools are available in both Python 2.6 and 3.0 to help you migrate your code * Python 2.6 is backward compatible with earlier Python 2.x releases We encourage you to participate in Python 3.0's development process by joining its mailing list: http://mail.python.org/mailman/listinfo/python-3000 If you find things in Python 3.0 that are broken or incorrect, please submit bug reports at: http://bugs.python.org/ For more information, links to documentation, and downloadable distributions, see the Python 3.0 website: http://www.python.org/download/releases/3.0/ Enjoy, - -Barry Barry Warsaw barry at python.org Python 2.6/3.0 Release Manager (on behalf of the entire python-dev team) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTc3pXEjvBPtnXfVAQI69wP/dPHh8IL3GxziEV9QzlveKG+KyZb2X16x fxJnTCiXAbiAhT5C+m43OEnbF1PJgMDKtcZ5b7aQb4TQ0mJxISTQh0RfLCpArmlo tdTbzCLnh13KzB+3sUHCx+MeQNXERoWDV8hLz+4Ae71UsuUGynhtyP7ZJMJDue8j so2gv3fOMSs= =vkiy -----END PGP SIGNATURE----- From guido at python.org Thu Dec 4 03:19:09 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Dec 2008 18:19:09 -0800 Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final In-Reply-To: References: Message-ID: Thanks so much for seeing this one through, Barry and co! Champagne!!! On Wed, Dec 3, 2008 at 5:51 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On behalf of the Python development team and the Python community, I am > happy to announce the release of Python 3.0 final. > > Python 3.0 (a.k.a. "Python 3000" or "Py3k") represents a major milestone in > Python's history, and was nearly three years in the making. This is a new > version of the language that is incompatible with the 2.x line of releases, > while remaining true to BDFL Guido van Rossum's vision. Some things you > will notice include: > > * Fixes to many old language warts > * Removal of long deprecated features and redundant syntax > * Improvements in, and a reorganization of, the standard library > * Changes to the details of how built-in objects like strings and dicts work > * ...and many more new features > > While these changes were made without concern for backward compatibility, > Python 3.0 still remains very much "Pythonic". > > We are confident that Python 3.0 is of the same high quality as our previous > releases, such as the recently announced Python 2.6. We will continue to > support and develop both Python 3 and Python 2 for the foreseeable future, > and you can safely choose either version (or both) to use in your projects. > Which you choose depends on your own needs and the availability of > third-party packages that you depend on. Some other things to consider: > > * Python 3 has a single Unicode string type; there are no more 8-bit strings > * The C API has changed considerably in Python 3.0 and third-party extension > modules you rely on may not yet be ported > * Tools are available in both Python 2.6 and 3.0 to help you migrate your > code > * Python 2.6 is backward compatible with earlier Python 2.x releases > > We encourage you to participate in Python 3.0's development process by > joining its mailing list: > > http://mail.python.org/mailman/listinfo/python-3000 > > If you find things in Python 3.0 that are broken or incorrect, please submit > bug reports at: > > http://bugs.python.org/ > > For more information, links to documentation, and downloadable > distributions, see the Python 3.0 website: > > http://www.python.org/download/releases/3.0/ > > Enjoy, > - -Barry > > Barry Warsaw > barry at python.org > Python 2.6/3.0 Release Manager > (on behalf of the entire python-dev team) > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (Darwin) > > iQCVAwUBSTc3pXEjvBPtnXfVAQI69wP/dPHh8IL3GxziEV9QzlveKG+KyZb2X16x > fxJnTCiXAbiAhT5C+m43OEnbF1PJgMDKtcZ5b7aQb4TQ0mJxISTQh0RfLCpArmlo > tdTbzCLnh13KzB+3sUHCx+MeQNXERoWDV8hLz+4Ae71UsuUGynhtyP7ZJMJDue8j > so2gv3fOMSs= > =vkiy > -----END PGP SIGNATURE----- > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From barry at python.org Thu Dec 4 03:24:23 2008 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Dec 2008 21:24:23 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> Message-ID: <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 3, 2008, at 9:13 PM, Dotan Cohen wrote: > On this page: > http://www.python.org/download/releases/3.0/ > > The text "This is a proeuction release" should probably read "This is > a production release". It would give a better first impression :) Fixed, thanks! - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTc/WHEjvBPtnXfVAQL8TwP+M2Ryv7WY36ICEvzGU4EzlRG/gI4MolQe cD8DJUJfQuR6INTot/t7vTcL8oDHq7q9OHbfvd3jmSwH/ZytsMz2OvJUYlKDQjwG BcQRpioprcesoU6cufSmKAUiUP+L0RTAMmT0WDbbeCzzMZRq3Humd4Zs43nL26NT uFb83Dk6yWA= =qPjn -----END PGP SIGNATURE----- From barry at python.org Thu Dec 4 03:25:05 2008 From: barry at python.org (Barry Warsaw) Date: Wed, 3 Dec 2008 21:25:05 -0500 Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 3, 2008, at 9:19 PM, Guido van Rossum wrote: > Thanks so much for seeing this one through, Barry and co! Champagne!!! Now if only I could go on vacation. :) - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTc/gXEjvBPtnXfVAQKZGgP/Y41JSlU6bQlGQKQmrjxv2jUWf2AWDLSu 4HG45m5plX/r6z1bZlxdqvpVqVRGgInoe+uw96WEgjW+F5NomU4ZKQ+YVOZFjkJY izAWQllxZRkErdIBq158DOKTTyiJpUpRnGvwx2J67/pIBGLfFLZ+yPAu+4jT4fJ+ qFq/oGKCKIY= =wiBX -----END PGP SIGNATURE----- From ed at leafe.com Thu Dec 4 04:29:41 2008 From: ed at leafe.com (Ed Leafe) Date: Wed, 3 Dec 2008 21:29:41 -0600 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: Message-ID: <20739B1C-8B6F-4A7A-B699-76DD938DA2E3@leafe.com> On Dec 3, 2008, at 7:51 PM, Barry Warsaw wrote: > On behalf of the Python development team and the Python community, I > am happy to announce the release of Python 3.0 final. Props to all the folks whose hard work made this possible! You guys rock! -- Ed Leafe From martin at v.loewis.de Thu Dec 4 08:26:35 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 04 Dec 2008 08:26:35 +0100 Subject: [Python-Dev] 2.5.3 and 2.4.6 release schedule Message-ID: <4937862B.8000403@v.loewis.de> I would like to create 2.5.3 and 2.4.6 release candidates next week, December 12, and final releases on December 19. If there are any open issues that you think need to be considered, please create a bug in the bug tracker, mark it as release blocker, and label it with version 2.5.3 (or 2.4). Of course, a number of such issues are already in the tracker, some already being worked on. Remember: 2.5.3 will be the last bug fix release for Python 2.5; afterwards, only security patches will be accepted for the 2.5 branch. The 2.4 branch is already in that state (the 2.3 branch is not maintained anymore; 2.4 security patches will be produced until November 2009). Regards, Martin From martin at v.loewis.de Thu Dec 4 08:36:11 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 04 Dec 2008 08:36:11 +0100 Subject: [Python-Dev] Merging mailing lists Message-ID: <4937886B.4000002@v.loewis.de> I would like to merge mailing lists, now that the design and first implementation of Python 3000 is complete. In particular, I would like to merge the python-3000 mailing list back into python-dev, and the python-3000-checkins mailing list back into python-checkins. The rationale is to simplify usage of the lists, and to avoid cross-postings. To implement this, all subscribers of the 3000 mailing lists would be added to the trunk mailing lists (avoiding duplicates, of course), and all automated messages going to python-3000-checkins would then be directed to the trunk lists. The 3000 mailing lists would change into read-only mode (i.e. primarily leaving the archives behind). Any objections? Regards, Martin From fdrake at gmail.com Thu Dec 4 09:04:27 2008 From: fdrake at gmail.com (Fred Drake) Date: Thu, 4 Dec 2008 03:04:27 -0500 Subject: [Python-Dev] [Python-checkins] Merging mailing lists In-Reply-To: <4937886B.4000002@v.loewis.de> References: <4937886B.4000002@v.loewis.de> Message-ID: <9cee7ab80812040004r54cce844lbd3728d99dc780d8@mail.gmail.com> On Thu, Dec 4, 2008 at 2:36 AM, "Martin v. L?wis" wrote: > I would like to merge mailing lists, now that the design and first > implementation of Python 3000 is complete. In particular, I would +1 -Fred -- Fred L. Drake, Jr. "Chaos is the score upon which reality is written." --Henry Miller From ondrej at certik.cz Thu Dec 4 10:42:52 2008 From: ondrej at certik.cz (Ondrej Certik) Date: Thu, 4 Dec 2008 10:42:52 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> Message-ID: <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> On Thu, Dec 4, 2008 at 3:24 AM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Dec 3, 2008, at 9:13 PM, Dotan Cohen wrote: > >> On this page: >> http://www.python.org/download/releases/3.0/ >> >> The text "This is a proeuction release" should probably read "This is >> a production release". It would give a better first impression :) > > Fixed, thanks! I tried to find the documentation here: http://python.org/doc/ but clicking on the links: http://docs.python.org/whatsnew/3.0.html http://docs.python.org/3.0 gives me: 404 Not Found Ondrej From ncoghlan at gmail.com Thu Dec 4 11:59:25 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 04 Dec 2008 20:59:25 +1000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> Message-ID: <4937B80D.9070309@gmail.com> Ondrej Certik wrote: > I tried to find the documentation here: > > http://python.org/doc/ > > but clicking on the links: > > http://docs.python.org/whatsnew/3.0.html > http://docs.python.org/3.0 These 404 for me as well. but the dev links have already rolled over to 3.1a0. There are also no cross-links from the main 2.6 docs to the released py3k docs. I was going to suggest there needs to be something in PEP 101 about checking the doc links, but it's already there :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Thu Dec 4 12:07:02 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 04 Dec 2008 21:07:02 +1000 Subject: [Python-Dev] [Python-checkins] r67511 - in python/trunk: Doc/library/logging.rst Lib/logging/__init__.py Lib/test/test_logging.py Misc/NEWS In-Reply-To: <20081203232258.7526A1E4002@bag.python.org> References: <20081203232258.7526A1E4002@bag.python.org> Message-ID: <4937B9D6.3020906@gmail.com> vinay.sajip wrote: > +def _showwarning(message, category, filename, lineno, file=None, line=None): > + """ > + Implementation of showwarnings which redirects to logging, which will first > + check to see if the file parameter is None. If a file is specified, it will > + delegate to the original warnings implementation of showwarning. Otherwise, > + it will call warnings.formatwarning and will log the resulting string to a > + warnings logger named "py.warnings" with level logging.WARNING. > + """ > + if file is not None: > + if _warnings_showwarning is not None: > + _warnings_showwarning(message, category, filename, lineno, file, line) > + else: > + import warnings > + s = warnings.formatwarning(message, category, filename, lineno, line) > + logger = getLogger("py.warnings") > + if not logger.handlers: > + logger.addHandler(NullHandler()) > + logger.warning("%s", s) I'd be careful here - this could deadlock if a thread spawned as a side effect of importing a module happens to trigger a warning. warnings is pulled into sys.modules as part of the interpreter startup - having a global "import warnings" shouldn't have any real effect on logging's import time. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From lists at cheimes.de Thu Dec 4 12:50:21 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 04 Dec 2008 12:50:21 +0100 Subject: [Python-Dev] Merging mailing lists In-Reply-To: <4937886B.4000002@v.loewis.de> References: <4937886B.4000002@v.loewis.de> Message-ID: Martin v. L?wis wrote: > Any objections? +1 From amk at amk.ca Thu Dec 4 13:37:50 2008 From: amk at amk.ca (A.M. Kuchling) Date: Thu, 4 Dec 2008 07:37:50 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: Message-ID: <20081204123750.GA890@amk.local> On Wed, Dec 03, 2008 at 08:51:33PM -0500, Barry Warsaw wrote: > On behalf of the Python development team and the Python community, I > am happy to announce the release of Python 3.0 final. Yay! > We are confident that Python 3.0 is of the same high quality as our > previous releases, such as the recently announced Python 2.6. We will > continue to support and develop both Python 3 and Python 2 for the > foreseeable future, and you can safely choose either version (or both) > to use in your projects. Which you choose depends on your own needs > and the availability of third-party packages that you depend on. Some > other things to consider: I think we should also have a statement upon on python.org about future plans: e.g. * that there will be a Python 2.7 that will incorporate what we learn from people trying to port, * that 3.1 will rearrange the standard library in mostly-known ways, and * that we expect people to use 3.0 mostly for compatibility testing, not going into serious production use until 3.1 or maybe even 3.2. (The details are open to discussion, of course.) --amk From g.brandl at gmx.net Thu Dec 4 13:40:19 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 04 Dec 2008 13:40:19 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <4937B80D.9070309@gmail.com> References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> <4937B80D.9070309@gmail.com> Message-ID: Nick Coghlan schrieb: > Ondrej Certik wrote: >> I tried to find the documentation here: >> >> http://python.org/doc/ >> >> but clicking on the links: >> >> http://docs.python.org/whatsnew/3.0.html >> http://docs.python.org/3.0 > > These 404 for me as well. but the dev links have already rolled over to > 3.1a0. > > There are also no cross-links from the main 2.6 docs to the released > py3k docs. > > I was going to suggest there needs to be something in PEP 101 about > checking the doc links, but it's already there :) I can't find any docs built for Python 3.0 (not 3.1a0). I would have handled building and uploading the docs if somebody (or at least anybody) had told me I was to do it. Now we again have the situation that the docs for the new release are wrecked. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From steve at holdenweb.com Thu Dec 4 14:08:47 2008 From: steve at holdenweb.com (Steve Holden) Date: Thu, 04 Dec 2008 08:08:47 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> <4937B80D.9070309@gmail.com> Message-ID: Georg Brandl wrote: > Nick Coghlan schrieb: >> Ondrej Certik wrote: >>> I tried to find the documentation here: >>> >>> http://python.org/doc/ >>> >>> but clicking on the links: >>> >>> http://docs.python.org/whatsnew/3.0.html >>> http://docs.python.org/3.0 >> These 404 for me as well. but the dev links have already rolled over to >> 3.1a0. >> >> There are also no cross-links from the main 2.6 docs to the released >> py3k docs. >> >> I was going to suggest there needs to be something in PEP 101 about >> checking the doc links, but it's already there :) > > I can't find any docs built for Python 3.0 (not 3.1a0). I would have > handled building and uploading the docs if somebody (or at least anybody) > had told me I was to do it. Now we again have the situation that the > docs for the new release are wrecked. > Sounds like we need a bot to check the web each new release before the release manager "presses the button" and makes the announcement. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From facundobatista at gmail.com Thu Dec 4 14:18:11 2008 From: facundobatista at gmail.com (Facundo Batista) Date: Thu, 4 Dec 2008 11:18:11 -0200 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081204123750.GA890@amk.local> References: <20081204123750.GA890@amk.local> Message-ID: 2008/12/4 A.M. Kuchling : > * that there will be a Python 2.7 that will incorporate what we learn from > people trying to port, > * that 3.1 will rearrange the standard library in mostly-known ways, and > * that we expect people to use 3.0 mostly for compatibility testing, > not going into serious production use until 3.1 or maybe even 3.2. I think that would be fantastic to have a small set of straightforward sentences like these, to transmit the most important stuff. For my part, when it's fixed, I will translate them to spanish and propagate them. > (The details are open to discussion, of course.) I think those are fine. I would add something about the migration path, something like "If you want to start testing your library/system in 3.0, you should first use Python 2.6, see migration details [here]" Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/ From jeremy at alum.mit.edu Thu Dec 4 14:24:57 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 4 Dec 2008 08:24:57 -0500 Subject: [Python-Dev] [Python-3000] Merging mailing lists In-Reply-To: <4937886B.4000002@v.loewis.de> References: <4937886B.4000002@v.loewis.de> Message-ID: On Thu, Dec 4, 2008 at 2:36 AM, "Martin v. L?wis" wrote: > I would like to merge mailing lists, now that the design and first > implementation of Python 3000 is complete. In particular, I would > like to merge the python-3000 mailing list back into python-dev, > and the python-3000-checkins mailing list back into python-checkins. > The rationale is to simplify usage of the lists, and to avoid > cross-postings. +1 > To implement this, all subscribers of the 3000 mailing lists would > be added to the trunk mailing lists (avoiding duplicates, of course), > and all automated messages going to python-3000-checkins would then > be directed to the trunk lists. The 3000 mailing lists would change > into read-only mode (i.e. primarily leaving the archives behind). > > Any objections? No Jeremy > Regards, > Martin > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu > From lists at cheimes.de Thu Dec 4 16:12:07 2008 From: lists at cheimes.de (Christian Heimes) Date: Thu, 04 Dec 2008 16:12:07 +0100 Subject: [Python-Dev] Merging flow Message-ID: Several people have asked about the patch and merge flow. Now that Python 3.0 is out it's a bit more complicated. Flow diagram ------------ trunk ---> release26-maint \-> py3k ---> release30-maint Patches for all versions of Python should land in the trunk. They are then merged into release26-maint and py3k branches. Changes for Python 3.0 are merged via the py3k branch. Christian From tjreedy at udel.edu Thu Dec 4 17:12:23 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 04 Dec 2008 11:12:23 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> <4937B80D.9070309@gmail.com> Message-ID: Georg Brandl wrote: > I can't find any docs built for Python 3.0 (not 3.1a0). The Windows installation has new 3.0 doc dated Dec 3, so it was built, just not posted correctly. From tjreedy at udel.edu Thu Dec 4 17:47:22 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 04 Dec 2008 11:47:22 -0500 Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final In-Reply-To: References: Message-ID: Guido van Rossum wrote: >> Python 3.0 (a.k.a. "Python 3000" or "Py3k") represents a major milestone in >> Python's history, and was nearly three years in the making. This is a new >> version of the language that is incompatible with the 2.x line of releases, I think this >> while remaining true to BDFL Guido van Rossum's vision. Some things you >> will notice include: >> >> * Fixes to many old language warts >> * Removal of long deprecated features and redundant syntax >> * Improvements in, and a reorganization of, the standard library >> * Changes to the details of how built-in objects like strings and dicts work >> * ...and many more new features >> >> While these changes were made without concern for backward compatibility, and this could give some people a mis-impression, most likely negative, as to the magnitude and nature of the change. Most of the code I am now writing would, I believe, run with 2.5 except for print(..., file=xxx). And I know that there was concern for backward compatibility to the point that some changes were rejected (renaming builtins) or delayed (deleting duplicate test asserts) for that reason. So I would soften the statements to "... version of the language that is partially incompatible with... " and "were made without being bound by backward compatibility," tjr From dickinsm at gmail.com Thu Dec 4 18:23:53 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Thu, 4 Dec 2008 17:23:53 +0000 Subject: [Python-Dev] Merging flow In-Reply-To: References: Message-ID: <5c6f2a5d0812040923h12e480a2k9512754009274350@mail.gmail.com> On Thu, Dec 4, 2008 at 3:12 PM, Christian Heimes wrote: > Patches for all versions of Python should land in the trunk. They are then > merged into release26-maint and py3k branches. Changes for Python 3.0 are > merged via the py3k branch. Thanks, Christian! Questions: (1) If I commit a change to the trunk that I don't want to go into release26-maint, should I explicitly block it using svnmerge? (2) Same question for trunk -> py3k (3) Same question for py3k -> release30-maint. I'm guessing that the answers are (1) No, (2) Yes, (3) No. Mark From musiccomposition at gmail.com Thu Dec 4 18:30:43 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 4 Dec 2008 11:30:43 -0600 Subject: [Python-Dev] Merging flow In-Reply-To: <5c6f2a5d0812040923h12e480a2k9512754009274350@mail.gmail.com> References: <5c6f2a5d0812040923h12e480a2k9512754009274350@mail.gmail.com> Message-ID: <1afaf6160812040930i350d44ffwaeb40b670f3da537@mail.gmail.com> On Thu, Dec 4, 2008 at 11:23 AM, Mark Dickinson wrote: > On Thu, Dec 4, 2008 at 3:12 PM, Christian Heimes wrote: >> Patches for all versions of Python should land in the trunk. They are then >> merged into release26-maint and py3k branches. Changes for Python 3.0 are >> merged via the py3k branch. > > Thanks, Christian! > > Questions: > > (1) If I commit a change to the trunk that I don't want to go into > release26-maint, should I explicitly block it using svnmerge? > > (2) Same question for trunk -> py3k > > (3) Same question for py3k -> release30-maint. > > I'm guessing that the answers are (1) No, (2) Yes, (3) No. That is correct. We don't care too much about blocking for the release branches. -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From brett at python.org Thu Dec 4 18:37:05 2008 From: brett at python.org (Brett Cannon) Date: Thu, 4 Dec 2008 09:37:05 -0800 Subject: [Python-Dev] [Python-3000-checkins] Merging mailing lists In-Reply-To: <4937886B.4000002@v.loewis.de> References: <4937886B.4000002@v.loewis.de> Message-ID: On Wed, Dec 3, 2008 at 23:36, "Martin v. L?wis" wrote: > I would like to merge mailing lists, now that the design and first > implementation of Python 3000 is complete. In particular, I would > like to merge the python-3000 mailing list back into python-dev, > and the python-3000-checkins mailing list back into python-checkins. > The rationale is to simplify usage of the lists, and to avoid > cross-postings. > > To implement this, all subscribers of the 3000 mailing lists would > be added to the trunk mailing lists (avoiding duplicates, of course), > and all automated messages going to python-3000-checkins would then > be directed to the trunk lists. The 3000 mailing lists would change > into read-only mode (i.e. primarily leaving the archives behind). > > Any objections? > Nope; +1. -Brett From jeremy at alum.mit.edu Thu Dec 4 19:18:23 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 4 Dec 2008 13:18:23 -0500 Subject: [Python-Dev] Merging flow In-Reply-To: References: Message-ID: On Thu, Dec 4, 2008 at 10:12 AM, Christian Heimes wrote: > Several people have asked about the patch and merge flow. Now that Python > 3.0 is out it's a bit more complicated. > > Flow diagram > ------------ > > trunk ---> release26-maint > \-> py3k ---> release30-maint > > > Patches for all versions of Python should land in the trunk. They are then > merged into release26-maint and py3k branches. Changes for Python 3.0 are > merged via the py3k branch. You say "they are then merged." Does that mean if I commit something on the trunk, someone else will merge it for me? Or do I need to do it? The library is vastly different between 2.x and 3.x. I'm personally aware of the many changes related to httplib / urllib / xmlrpclib. I'm worried that it will be hard to decide how to "merge" things between the two versions. Jeremy > Christian > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu > From musiccomposition at gmail.com Thu Dec 4 19:25:30 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 4 Dec 2008 12:25:30 -0600 Subject: [Python-Dev] Merging flow In-Reply-To: References: Message-ID: <1afaf6160812041025y113825dfoefbee6c4b69a55f2@mail.gmail.com> On Thu, Dec 4, 2008 at 12:18 PM, Jeremy Hylton wrote: > On Thu, Dec 4, 2008 at 10:12 AM, Christian Heimes wrote: >> Several people have asked about the patch and merge flow. Now that Python >> 3.0 is out it's a bit more complicated. >> >> Flow diagram >> ------------ >> >> trunk ---> release26-maint >> \-> py3k ---> release30-maint >> >> >> Patches for all versions of Python should land in the trunk. They are then >> merged into release26-maint and py3k branches. Changes for Python 3.0 are >> merged via the py3k branch. > > You say "they are then merged." Does that mean if I commit something > on the trunk, someone else will merge it for me? Or do I need to do > it? Generally, somebody else will do it if it is on the trunk and bound for py3k. (Bug fixes should be backported by the original committer.) Of course, if the change required in py3k is complicated and vastly different, I and the other mergers would appreciate it if you did it yourself. > > The library is vastly different between 2.x and 3.x. I'm personally > aware of the many changes related to httplib / urllib / xmlrpclib. > I'm worried that it will be hard to decide how to "merge" things > between the two versions. Feel free to do it yourself. > > Jeremy -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From eric at trueblade.com Thu Dec 4 19:52:05 2008 From: eric at trueblade.com (Eric Smith) Date: Thu, 04 Dec 2008 13:52:05 -0500 Subject: [Python-Dev] Merging flow In-Reply-To: References: Message-ID: <493826D5.3020205@trueblade.com> Christian Heimes wrote: > Several people have asked about the patch and merge flow. Now that > Python 3.0 is out it's a bit more complicated. > > Flow diagram > ------------ > > trunk ---> release26-maint > \-> py3k ---> release30-maint > > > Patches for all versions of Python should land in the trunk. They are > then merged into release26-maint and py3k branches. Changes for Python > 3.0 are merged via the py3k branch. Apologies if this has been discussed before. I looked but didn't see anything. Given that at least 99% of the changes for the trunk will not get merged into release26-maint, doesn't it make more sense to merge the other way? That is, anything that gets checked in to release26-maint would potentially be merged into trunk. That would remove the huge number of merge blocks that will otherwise be required. Same fore py3k and release30-maint. Eric. From nicole at cats-muvva.net Thu Dec 4 19:36:48 2008 From: nicole at cats-muvva.net (Nicole King) Date: Thu, 4 Dec 2008 18:36:48 +0000 Subject: [Python-Dev] Taint Mode in Python 3.0 Message-ID: <200812041836.48146.nicole@cats-muvva.net> Dear All, I have published the diff for my implementation of tainted mode in Python for R3.0 (released version) at http://www.cats-muvva.net/software/. Look at the bottom the page. I apologise for past problems accessing this web site: I hope to have resolved all the issues with it. Nicole From musiccomposition at gmail.com Thu Dec 4 19:57:34 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 4 Dec 2008 12:57:34 -0600 Subject: [Python-Dev] Merging flow In-Reply-To: <493826D5.3020205@trueblade.com> References: <493826D5.3020205@trueblade.com> Message-ID: <1afaf6160812041057v5a7b6381o55513ef9a14b0e02@mail.gmail.com> On Thu, Dec 4, 2008 at 12:52 PM, Eric Smith wrote: > Christian Heimes wrote: >> >> Several people have asked about the patch and merge flow. Now that Python >> 3.0 is out it's a bit more complicated. >> >> Flow diagram >> ------------ >> >> trunk ---> release26-maint >> \-> py3k ---> release30-maint >> >> >> Patches for all versions of Python should land in the trunk. They are then >> merged into release26-maint and py3k branches. Changes for Python 3.0 are >> merged via the py3k branch. > > Apologies if this has been discussed before. I looked but didn't see > anything. > > Given that at least 99% of the changes for the trunk will not get merged > into release26-maint, doesn't it make more sense to merge the other way? > That is, anything that gets checked in to release26-maint would potentially > be merged into trunk. That would remove the huge number of merge blocks that > will otherwise be required. Same fore py3k and release30-maint. I think the percentage is a bit lower than that. Also, we haven't been using blocking with the maintenance branch so far; svnmerge.py is just a convenience. (It generates commit messages and has a simpler interface than a simple "svn merge" command.) -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From python at rcn.com Thu Dec 4 20:12:57 2008 From: python at rcn.com (Raymond Hettinger) Date: Thu, 4 Dec 2008 11:12:57 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final References: <20081204123750.GA890@amk.local> Message-ID: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> From: "A.M. Kuchling" > I think we should also have a statement upon on python.org about > future plans: e.g. > > * that there will be a Python 2.7 that will incorporate what we learn from > people trying to port, > * that 3.1 will rearrange the standard library in mostly-known ways, and > * that we expect people to use 3.0 mostly for compatibility testing, > not going into serious production use until 3.1 or maybe even 3.2. The latter statement worries me. It seems to unnecessarily undermine adoption of 3.0. It essentially says, "don't use this". Is that what we want? ISTM, 3.0 is in pretty good shape. There is nothing intrinsically wrong with it. The number one adoption issue is external, i.e. how quickly key third-party modules get converted. Raymond From fijall at gmail.com Thu Dec 4 20:31:35 2008 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 4 Dec 2008 20:31:35 +0100 Subject: [Python-Dev] Taint Mode in Python 3.0 In-Reply-To: <200812041836.48146.nicole@cats-muvva.net> References: <200812041836.48146.nicole@cats-muvva.net> Message-ID: <693bc9ab0812041131o63b462e2id0d9783c2c459143@mail.gmail.com> When I try to run this, I get: Fatal Python error: Py_Initialize: can't initialize sys standard streams Traceback (most recent call last): File "/home/fijal/lang/python/Python30/Lib/encodings/__init__.py", line 31, in File "/home/fijal/lang/python/Python30/Lib/codecs.py", line 1060, in TaintError: using tainted data Aborted Are there any tests what it should do? Didn't find it in a diff On Thu, Dec 4, 2008 at 7:36 PM, Nicole King wrote: > Dear All, > > I have published the diff for my implementation of tainted mode in Python for > R3.0 (released version) at http://www.cats-muvva.net/software/. Look at the > bottom the page. I apologise for past problems accessing this web site: I > hope to have resolved all the issues with it. > > Nicole > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > From barry at python.org Thu Dec 4 20:41:31 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Dec 2008 14:41:31 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 4, 2008, at 2:12 PM, Raymond Hettinger wrote: > From: "A.M. Kuchling" >> I think we should also have a statement upon on python.org about >> future plans: e.g. >> * that there will be a Python 2.7 that will incorporate what we >> learn from >> people trying to port, >> * that 3.1 will rearrange the standard library in mostly-known >> ways, and * that we expect people to use 3.0 mostly for >> compatibility testing, not going into serious production use until >> 3.1 or maybe even 3.2. > > The latter statement worries me. It seems to unnecessarily undermine > adoption of 3.0. It essentially says, "don't use this". Is that > what we want? > ISTM, 3.0 is in pretty good shape. There is nothing intrinsically > wrong > with it. The number one adoption issue is external, i.e. how quickly > key third-party modules get converted. I agree. I tried to put a positive spin on the announcement, and the backward compatibility issue in particular. I probably failed. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTgybHEjvBPtnXfVAQJPjgP+NeyLY2ACryOmxeRV8qcotKrMJZYBwu6q gtNjax3m0faRr2VrRwVLpiJqBoVkwpr+heKg7z2rR183MstsgQ9QsQpkZXBV+QnH yK1yA18jaVZhLMR0VPT75GN1KPp5KCL+TbuT0cFtJ/SSt1LT5K356jdMYFi/ZbUP t2YtaWoxB5o= =4lo8 -----END PGP SIGNATURE----- From fdrake at acm.org Thu Dec 4 20:00:39 2008 From: fdrake at acm.org (Fred Drake) Date: Thu, 04 Dec 2008 14:00:39 -0500 Subject: [Python-Dev] Merging flow In-Reply-To: <493826D5.3020205@trueblade.com> References: <493826D5.3020205@trueblade.com> Message-ID: <646307E1-3CB4-4538-9C4D-5ADE9C4E69F1@acm.org> On Dec 4, 2008, at 1:52 PM, Eric Smith wrote: > Apologies if this has been discussed before. I looked but didn't see > anything. Probably has, just 'cause everything has been discussed before. > Given that at least 99% of the changes for the trunk will not get > merged into release26-maint, doesn't it make more sense to merge the > other way? That is, anything that gets checked in to release26-maint > would potentially be merged into trunk. That would remove the huge > number of merge blocks that will otherwise be required. Same fore > py3k and release30-maint. The directions of merges were established in the past at some point. Though they feel wrong (at least to you and me), the direction is what it is. I'd asked about the direction mostly because I can never remember after time away from working on the Python tree. That said, don't let Python's decision on the direction keep you from managing your own projects the right way. :-) In fact, it's reasonable to fix bugs on the release26-maint branch, migrate the patch to the trunk, and then use svnmerge.py from there to propagate the changes. -Fred -- Fred Drake From a.badger at gmail.com Thu Dec 4 21:02:19 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 04 Dec 2008 12:02:19 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ Message-ID: <4938374B.8000006@gmail.com> I opened up bug http://bugs.python.org/issue4006 a while ago and it was suggested in the report that it's not a bug but a feature and so I should come here to see about getting the feature changed :-) I have a specific problem with os.environ and a somewhat less important architectural issue with the unicode/bytes handling in certain os.* modules. I'll start with the important one: Currently in python3 there's no way to get at environment variables that are not encoded in the system default encoding. My understanding is that this isn't a problem on Windows systems but on *nix this is a huge problem. environment variables on *nix are a sequence of non-null bytes. These bytes are almost always "characters" but they do not have to be. Further, there is nothing that requires that the characters be in the same encoding; some of the characters could be in the UTF-8 character set while others are in latin-1, shift-jis, or big-5. These mixed encodings can occur for a variety of reasons. Here's an example that isn't too contrived :-) Swallow is a multi-user shell server hosted at a university in Japan. The OS installed is Fedora 10 where the encoding of all filenames provided by the OS are UTF-8. The administrator of the OS has kept this convention and, among other things has created a directory to mount and NFS directory from another computer. He calls that "??????" ("network" in Japanese). Since it's utf-8, that gets put on the filesystem as '\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf' Now the administrators of the fileserver have been maintaining it since before Unicode was invented. Furthermore, they don't want to suffer from the space loss of using utf-8 to encode Japanese so they use shift-jis everywhere. They have a directory on the nfs share for programs that are useful for people on the shell server to access. It's called "?????" ("programs" in Japanese) Since they're using shift-jis, the bytes on the filesystem are: '\x83v\x83\x8d\x83O\x83\x89\x83\x80' The system administrator of the shell server adds the directory of programs to all his user's default PATH variables so then they have this: PATH=/bin:/usr/bin:/usr/local/bin:/mnt/\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf/\x83v\x83\x8d\x83O\x83\x89\x83\x80 (Note: python syntax, In the unix shell you'd likely have octal instead of hex) Now comes the problematic part. One of the user's on the system wants to write a python3 program that needs to determine if a needed program is in the user's PATH. He tries to code it like this:: #!/usr/bin/python3.0 import os for directory in os.environ['PATH']: programs = os.listdir(directory) That code raises a KeyError because python3 has silently discarded the PATH due to the shift-jis encoded path elements. Much more importantly, there's no way the programmer can handle the KeyError and actually get the PATH from within python. In the bug report I opened, I listed four ways to fix this along with the pros and cons: 1) return mixed unicode and byte types in os.environ and os.getenv - I think this one is a bad idea. It's the easiest for simple code to deal with but it's repeating the major problem with python2's Unicode handling: mixing unicode and byte types unpredictably. 2) return only byte types in os.environ - This is conceptually correct but the most annoying option. Technically we're receiving bytes from the C libraries and the C libraries expect bytes in return. But in the common case we will be dealing with things in one encoding so this causes needless effort to the application programmer in the common case. 3) silently ignore non-decodable value when accessing os.environ['PATH'] as we do now but allow access to the full information via os.environ[b'PATH'] and os.getenvb(). - This mirrors the practice of os.listdir('.') vs os.listdir(b'.') and os.getcwd() vs os.getcwdb(). 4) raise an exception when non-decodable values are *accessed* and continue as in #3. This means that os.environ wouldn't be a simple dict as it would need to decode the values when keys are accessed (although it could cache the values). - This mirrors the practice of open() which is to decode the value for the common case but throw an exception and allow the programmer to decide what to do if all values are not decodable. Either #3 or #4 will solve the major problem and both have precedent in python3's current implementation. The difference between them is whether to throw an exception when a non-decodable value is encountered. Here's why I think that's appropriate: One of the things I enjoy about python is the informative tracebacks that make debugging easy. I think that the ease of debugging is lost when we silently ignore an error. If we look at the difference in coding and debugging for problems with files that aren't encoded in the default encoding (where a traceback is issued) and os.listdir() when filenames aren't in the default encoding (where the filenames are silently ignored), I think we'll see that:: #!/usr/bin/python3.0 # Code with two unicode problems: import os, sys directory = sys.stdin.readline().strip() for filename in os.listdir(directory): myfile = open(filename, 'r') print('%s: %s' % [os.path.join(directory, filename), myfile.readline()]) myfile.close() Let's say I write the above code and test it on a directory that's all encoded in the default encoding. I release it to the world. Someone uses it on a system that has files and filenames with mixed encodings. They immediately get a traceback like this: File "./test.py", line 7, in print(myfile.readline()) [...] UnicodeDecodeError: 'utf8' codec can't decode bytes in position 24-26: invalid data With that information I can diagnose that my program is failing to read a line from a file because the file is not written in the default encoding (utf8 in this case). It points out that myfile on line 7 of test.py is the file object that has issues. I quickly fix it by doing this: + unknown_encoded_files = [] [...] + try: - print(myfile.readline()) + print('%s: %s' % [os.path.join(directory, filename), myfile.readline()]) + except UnicodeDecodeError: + unknown_encoded_files.append(filename) myfile.close() +if unknown_encoded_files: + print('These files are not in the default encoding:\n %s' % '\n '.join(unknown_encoded_files)) Very simple. The traceback has all the information I need to fix this. A little later I get another report from that user that my code is failing to list the first line of all the files in their home directory. This time there's no traceback to point out which of my files is failing, just that some files are being ignored. I ask for the list of files in the directory and get back: ?.txt ?.txt I create those files in a directory and they're processed fine. I tell the user that and ask if there's anything special about what's in the files or anything that makes them different. No... they're both text files on his machine. One was created there, though, and the other was copied from another machine. Hmm.. do the filenames show up mangled by any chance? Yes, one of them does but he knows it's correct since it shows up correctly on his machine at home. Ah ha! That seems to point at an encoding problem. But where? After writing a test and perusing my code for a while, I find my os.listdir() call. directory has to be converted to bytes for this to work. So I change the code like so: - for filename in os.listdir(directory): + for filename in os.listdir(directory.encode()): [...] - unknown_encoded_files.append(filename) + unknown_encoded_files.append(str(filename, errors='replace')) The code for the fix is simple but the debugging to find the problem is not. Raising an exception instead of silently failing is much better for getting code that works correctly. The bug report I opened suggests creating a PEP to address this issue. I think that's a good idea for whether os.listdir() and friends should be changed to raise an exception but not having any way to get at some environment variables seems like it's just a bug that needs to be addressed. What do other people think on both these issues? -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From fwierzbicki at gmail.com Thu Dec 4 21:05:51 2008 From: fwierzbicki at gmail.com (Frank Wierzbicki) Date: Thu, 4 Dec 2008 15:05:51 -0500 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: <20081203153128.GA6161@amk-desktop.matrixgroup.net> References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> Message-ID: <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> On Wed, Dec 3, 2008 at 10:31 AM, A.M. Kuchling wrote: > 14:00 - 15:30 > ============= > > Two tracks: > > Cross-implementation issues: > > What do the various VMs want/need from CPython to help with their > implementations? > > * Marking CPython-specific tests in the test suite? > * Getting an implementation agnostic test suite for the Python language? > * Separating the language tests and the pure Python part of the stdlib into > a separate project? (Or publish them as a separate package.) > * Transition plans for 3.0? > > Champion needed. I would like to champion this one. -Frank From brett at python.org Thu Dec 4 21:16:08 2008 From: brett at python.org (Brett Cannon) Date: Thu, 4 Dec 2008 12:16:08 -0800 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> Message-ID: On Thu, Dec 4, 2008 at 12:05, Frank Wierzbicki wrote: > On Wed, Dec 3, 2008 at 10:31 AM, A.M. Kuchling wrote: >> 14:00 - 15:30 >> ============= >> >> Two tracks: >> >> Cross-implementation issues: >> >> What do the various VMs want/need from CPython to help with their >> implementations? >> >> * Marking CPython-specific tests in the test suite? >> * Getting an implementation agnostic test suite for the Python language? >> * Separating the language tests and the pure Python part of the stdlib into >> a separate project? (Or publish them as a separate package.) >> * Transition plans for 3.0? >> >> Champion needed. > I would like to champion this one. > I told AMK this a while back, but might as well make it more public; I am up for chairing as well. -Brett From amk at amk.ca Thu Dec 4 21:16:27 2008 From: amk at amk.ca (A.M. Kuchling) Date: Thu, 4 Dec 2008 15:16:27 -0500 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> Message-ID: <20081204201627.GA23627@amk-desktop.matrixgroup.net> On Thu, Dec 04, 2008 at 03:05:51PM -0500, Frank Wierzbicki wrote: > > Cross-implementation issues: > > I would like to champion this one. Thanks! You're now listed as the champion for it. --amk From p.f.moore at gmail.com Thu Dec 4 21:20:34 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 4 Dec 2008 20:20:34 +0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> Message-ID: <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> 2008/12/4 Barry Warsaw : >>> * that 3.1 will rearrange the standard library in mostly-known ways, and >>> * that we expect people to use 3.0 mostly for compatibility testing, not going into serious production >>> use until 3.1 or maybe even 3.2. >> The latter statement worries me. It seems to unnecessarily undermine >> adoption of 3.0. It essentially says, "don't use this". Is that what we >> want? >> ISTM, 3.0 is in pretty good shape. There is nothing intrinsically wrong >> with it. The number one adoption issue is external, i.e. how quickly >> key third-party modules get converted. > > I agree. I tried to put a positive spin on the announcement, and the > backward compatibility issue in particular. I probably failed. Hmm, looking back, the quote Raymond is referring to is just a suggestion for additional text on the 3.0 page. I agree with him that it's a bit too negative. The announcement itself hits just the right note in my view. You (Barry) seem to have got it pretty well on target. One thing I'd like to see more clearly stated is that there's no reason NOT to use Python 3.0 for new code. I don't think that message has really come across yet - in spite of the warnings being all about compatibility issues, no-one has stressed the simple point that if your code is new, it doesn't have compatibility concerns! Paul. From p.f.moore at gmail.com Thu Dec 4 21:21:28 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 4 Dec 2008 20:21:28 +0000 Subject: [Python-Dev] [Python-3000] Merging mailing lists In-Reply-To: <4937886B.4000002@v.loewis.de> References: <4937886B.4000002@v.loewis.de> Message-ID: <79990c6b0812041221y6feaae02k87c7133b535e1ece@mail.gmail.com> 2008/12/4 "Martin v. L?wis" : > Any objections? The timing is right, go for it. Paul From rhamph at gmail.com Thu Dec 4 21:54:03 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 4 Dec 2008 13:54:03 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4938374B.8000006@gmail.com> References: <4938374B.8000006@gmail.com> Message-ID: On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi wrote: > I opened up bug http://bugs.python.org/issue4006 a while ago and it was > suggested in the report that it's not a bug but a feature and so I > should come here to see about getting the feature changed :-) > > I have a specific problem with os.environ and a somewhat less important > architectural issue with the unicode/bytes handling in certain os.* > modules. I'll start with the important one: > > Currently in python3 there's no way to get at environment variables that > are not encoded in the system default encoding. My understanding is > that this isn't a problem on Windows systems but on *nix this is a huge > problem. environment variables on *nix are a sequence of non-null > bytes. These bytes are almost always "characters" but they do not have > to be. Further, there is nothing that requires that the characters be > in the same encoding; some of the characters could be in the UTF-8 > character set while others are in latin-1, shift-jis, or big-5. Multiple encoding environments are best described as "batshit insane". It's impossible to handle any of it correctly *as text*, which is why UTF-8 is becoming a universal standard. For everybody's sanity python should continue to push it. However, some pragmatism is also possible. Many uses of PATH may allow it to be treated as black-box bytes, rather than text. The minimal solution I see is to make os.getenv() and os.putenv() switch to byte modes when given byte arguments, as os.listdir() does. This use case doesn't require the ability to iterate over all environment variables, as os.environb would allow. I do wonder if controlling the environment given to a subprocess requires os.environb, but it may be too obscure to really matter. -- Adam Olsen, aka Rhamphoryncus From exarkun at divmod.com Thu Dec 4 22:00:46 2008 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Thu, 4 Dec 2008 16:00:46 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> Message-ID: <20081204210046.20272.2138425533.divmod.quotient.15747@ohm> On Thu, 4 Dec 2008 20:20:34 +0000, Paul Moore wrote: >2008/12/4 Barry Warsaw : > [snip] > >One thing I'd like to see more clearly stated is that there's no >reason NOT to use Python 3.0 for new code. I don't think that message >has really come across yet - in spite of the warnings being all about >compatibility issues, no-one has stressed the simple point that if >your code is new, it doesn't have compatibility concerns! New code that wouldn't be more easily written with a dependency on a library that hasn't been ported, you mean. Although beyond that, there may be reasons (for example, the significant performance degradation in the I/O library currently being discussed on python-list). Jean-Paul From ncoghlan at gmail.com Thu Dec 4 22:07:07 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 05 Dec 2008 07:07:07 +1000 Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final In-Reply-To: References: Message-ID: <4938467B.40806@gmail.com> Terry Reedy wrote: > and this could give some people a mis-impression, most likely negative, > as to the magnitude and nature of the change. Most of the code I am now > writing would, I believe, run with 2.5 except for print(..., file=xxx). > And I know that there was concern for backward compatibility to the > point that some changes were rejected (renaming builtins) or delayed > (deleting duplicate test asserts) for that reason. So I would soften > the statements to "... version of the language that is partially > incompatible with... " and "were made without being bound by backward > compatibility," I would agree with Terry - while there are backwards incompatibilities, they aren't gratuitous. Then again, Guido does seem to want to discourage people from trying to target the common subset of the two languages instead of using 2to3 as a compilation step from the python3 version. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From nd at perlig.de Thu Dec 4 22:09:34 2008 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Thu, 4 Dec 2008 22:09:34 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> Message-ID: <200812042209.34814.nd@perlig.de> * Adam Olsen wrote: > On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi wrote: > > I opened up bug http://bugs.python.org/issue4006 a while ago and it was > > suggested in the report that it's not a bug but a feature and so I > > should come here to see about getting the feature changed :-) > > > > I have a specific problem with os.environ and a somewhat less important > > architectural issue with the unicode/bytes handling in certain os.* > > modules. I'll start with the important one: > > > > Currently in python3 there's no way to get at environment variables > > that are not encoded in the system default encoding. My understanding > > is that this isn't a problem on Windows systems but on *nix this is a > > huge problem. environment variables on *nix are a sequence of non-null > > bytes. These bytes are almost always "characters" but they do not have > > to be. Further, there is nothing that requires that the characters be > > in the same encoding; some of the characters could be in the UTF-8 > > character set while others are in latin-1, shift-jis, or big-5. > > Multiple encoding environments are best described as "batshit insane". > It's impossible to handle any of it correctly *as text*, which is why > UTF-8 is becoming a universal standard. For everybody's sanity python > should continue to push it. Here's an example which will become popular soon, I guess: CGI scripts and, of course WSGI applications. All those get their environment in an unknown encoding. In the worst case one can blow up the application by simply sending strange header lines over the wire. But there's more: consider running the server in C locale, then probably even a single 8 bit char might break something (?). > However, some pragmatism is also possible. Many uses of PATH may > allow it to be treated as black-box bytes, rather than text. The > minimal solution I see is to make os.getenv() and os.putenv() switch > to byte modes when given byte arguments, as os.listdir() does. This > use case doesn't require the ability to iterate over all environment > variables, as os.environb would allow. > > I do wonder if controlling the environment given to a subprocess > requires os.environb, but it may be too obscure to really matter. IMHO, environment variables are no text. They are bytes by definition and should be treated as such. I know, there's windows having unicode enabled env vars on demand, but there's only trouble with those over there in apache's httpd (when passing them to CGI scripts, oh well...). nd From ncoghlan at gmail.com Thu Dec 4 22:11:47 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 05 Dec 2008 07:11:47 +1000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081204123750.GA890@amk.local> References: <20081204123750.GA890@amk.local> Message-ID: <49384793.2030308@gmail.com> A.M. Kuchling wrote: > * that 3.1 will rearrange the standard library in mostly-known ways, and > * that we expect people to use 3.0 mostly for compatibility testing, > not going into serious production use until 3.1 or maybe even 3.2. As Raymond notes, this is probably too negative: for new projects, 3.0 should be fine so long as they don't need too many external libraries in the short term. For projects migrating from Python 2.x, the 3rd party library support problem is likely to hold a lot of projects back for several months at least, possibly to the point where it makes more sense to just wait for 2.7/3.1 to finalise any migration plans. Such projects are still well-advised to start their porting efforts as soon as possible though so they can identify *which* of their external dependencies don't have python 3.0 compatible versions available yet. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From a.badger at gmail.com Thu Dec 4 22:15:42 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 04 Dec 2008 13:15:42 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> Message-ID: <4938487E.7050809@gmail.com> Adam Olsen wrote: > On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi wrote: >> I opened up bug http://bugs.python.org/issue4006 a while ago and it was >> suggested in the report that it's not a bug but a feature and so I >> should come here to see about getting the feature changed :-) >> >> I have a specific problem with os.environ and a somewhat less important >> architectural issue with the unicode/bytes handling in certain os.* >> modules. I'll start with the important one: >> >> Currently in python3 there's no way to get at environment variables that >> are not encoded in the system default encoding. My understanding is >> that this isn't a problem on Windows systems but on *nix this is a huge >> problem. environment variables on *nix are a sequence of non-null >> bytes. These bytes are almost always "characters" but they do not have >> to be. Further, there is nothing that requires that the characters be >> in the same encoding; some of the characters could be in the UTF-8 >> character set while others are in latin-1, shift-jis, or big-5. > > Multiple encoding environments are best described as "batshit insane". > It's impossible to handle any of it correctly *as text*, which is why > UTF-8 is becoming a universal standard. For everybody's sanity python > should continue to push it. > Amen brother! > However, some pragmatism is also possible. Unfortunately, this is exactly what I'm talking about :-) > Many uses of PATH may > allow it to be treated as black-box bytes, rather than text. The > minimal solution I see is to make os.getenv() and os.putenv() switch > to byte modes when given byte arguments, as os.listdir() does. This > use case doesn't require the ability to iterate over all environment > variables, as os.environb would allow. > This would be a partial implementation of my option #3. It allows the programmer to workaround problems but does allow subtle bugs to creep in unawares. For instance:: > I do wonder if controlling the environment given to a subprocess > requires os.environb, but it may be too obscure to really matter. > If you wanted to change one variable before passing it on to the subprocess this could lead to head-scratcher bugs. Here's a contrived example: Say I have an app that talks to multiple cvs repositories. It copies os.environ and modifies CVSROOT and CVS_RSH then calls subprocess with env=temp_env. If the PATH variable contains non-decodable elements on some machines, this could lead to mysterious failures. This is particularly bad because we aren't directly modifying PATH anywhere in our code so there won't be an obvious reason in the code that this is failing. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From ncoghlan at gmail.com Thu Dec 4 22:19:19 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 05 Dec 2008 07:19:19 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4938374B.8000006@gmail.com> References: <4938374B.8000006@gmail.com> Message-ID: <49384957.3030102@gmail.com> Toshio Kuratomi wrote: > The bug report I opened suggests creating a PEP to address this issue. > I think that's a good idea for whether os.listdir() and friends should > be changed to raise an exception but not having any way to get at some > environment variables seems like it's just a bug that needs to be > addressed. What do other people think on both these issues? I'm pretty sure the discussion on this topic a while back decided that where necessary Python 3 would grow parallel bytes versions of APIs affected by environmental encoding issues (such as os.environb, os.listdirb, os.getcwdb), but that we were OK with the idea of deferring addition of those APIs until 3.1. That is, this was an acknowledged limitation with a fairly straightforward agreed solution, but it wasn't considered a common enough issue to delay the release of 3.0 until all of those parallel APIs had been implemented Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From g.brandl at gmx.net Thu Dec 4 22:21:55 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 04 Dec 2008 22:21:55 +0100 Subject: [Python-Dev] Merging flow In-Reply-To: <1afaf6160812041057v5a7b6381o55513ef9a14b0e02@mail.gmail.com> References: <493826D5.3020205@trueblade.com> <1afaf6160812041057v5a7b6381o55513ef9a14b0e02@mail.gmail.com> Message-ID: Benjamin Peterson schrieb: > On Thu, Dec 4, 2008 at 12:52 PM, Eric Smith wrote: >> Christian Heimes wrote: >>> >>> Several people have asked about the patch and merge flow. Now that Python >>> 3.0 is out it's a bit more complicated. >>> >>> Flow diagram >>> ------------ >>> >>> trunk ---> release26-maint >>> \-> py3k ---> release30-maint >>> >>> >>> Patches for all versions of Python should land in the trunk. They are then >>> merged into release26-maint and py3k branches. Changes for Python 3.0 are >>> merged via the py3k branch. >> >> Apologies if this has been discussed before. I looked but didn't see >> anything. >> >> Given that at least 99% of the changes for the trunk will not get merged >> into release26-maint, doesn't it make more sense to merge the other way? >> That is, anything that gets checked in to release26-maint would potentially >> be merged into trunk. That would remove the huge number of merge blocks that >> will otherwise be required. Same fore py3k and release30-maint. I've suggested that too; the counter-argument was that "most people don't want to care in which branch to commit something". I'm not too comfortable with this argument as it implies a certain ignorance on the part of our committers. As Fred says, it wasn't discussed anyway. Also, with svnmerge, it is not too late to change merging direction. > I think the percentage is a bit lower than that. Also, we haven't been > using blocking with the maintenance branch so far; svnmerge.py is just > a convenience. (It generates commit messages and has a simpler > interface than a simple "svn merge" command.) I *did* use blocking with the 2.6 branch when I last merged a whole batch of commits. As you say, by using svnmerge without blocking we only get a tool that can generate commit messages. However, with blocking we get something more valuable: we don't overlook backportable fixes anymore. So: yes, blocking is more work, but it gives something important in return. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Thu Dec 4 22:22:50 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 04 Dec 2008 22:22:50 +0100 Subject: [Python-Dev] Merging flow In-Reply-To: References: Message-ID: Christian Heimes schrieb: > Several people have asked about the patch and merge flow. Now that > Python 3.0 is out it's a bit more complicated. > > Flow diagram > ------------ > > trunk ---> release26-maint > \-> py3k ---> release30-maint > > > Patches for all versions of Python should land in the trunk. They are > then merged into release26-maint and py3k branches. Changes for Python > 3.0 are merged via the py3k branch. As a side-note: this merging flow means that bugfix and feature commits may never be merged from trunk to py3k in one svnmerge batch. Else, they cannot be separated when merging from py3k to 30-maint. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From amk at amk.ca Thu Dec 4 22:31:04 2008 From: amk at amk.ca (A.M. Kuchling) Date: Thu, 4 Dec 2008 16:31:04 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> Message-ID: <20081204213104.GA24509@amk-desktop.matrixgroup.net> On Thu, Dec 04, 2008 at 08:20:34PM +0000, Paul Moore wrote: > Hmm, looking back, the quote Raymond is referring to is just a > suggestion for additional text on the 3.0 page. I agree with him that > it's a bit too negative. Actually I want it to be an entirely separate page so that we can point people to it. > has really come across yet - in spite of the warnings being all about > compatibility issues, no-one has stressed the simple point that if > your code is new, it doesn't have compatibility concerns! Well, at least not until you decide you need some particular external library that hasn't been ported to 3.0 yet. For example, if you go to discussion threads such as , you can see people making statements like "I've been holding off learning it until 3000 went gold." But I think starting with Python 3.0 is a bad idea for a newbie, because they'll be limited in what they can do until the libraries have been ported. They can do some tasks (command-line tools, Fibonacci functions, Tk GUIs), but can they use the fancy new web framework they've just read about? Write a game? Draw graphs with matplotlib? Use and extend an application such as Roundup? Bzzt, no, not yet! Starting with 3.0 is starting out on an island. While I expect the island will grow in territory over time, I'm worried that new learners will automatically go for the highest version number, find their available tools are highly restricted, and get frustrated. Perhaps the statement could say something like "we do not expect most Python packages will be ported to the 3.x series until around the time 3.1 is released in X months." (where X=12? 6?) --amk From dima at hlabs.spb.ru Thu Dec 4 21:58:40 2008 From: dima at hlabs.spb.ru (Dmitry Vasiliev) Date: Thu, 04 Dec 2008 23:58:40 +0300 Subject: [Python-Dev] [Python-3000] Merging mailing lists In-Reply-To: <4937886B.4000002@v.loewis.de> References: <4937886B.4000002@v.loewis.de> Message-ID: <49384480.8090806@hlabs.spb.ru> Martin v. L?wis wrote: > I would like to merge mailing lists, now that the design and first > implementation of Python 3000 is complete. In particular, I would +1 -- Dmitry Vasiliev (dima at hlabs.spb.ru) http://hlabs.spb.ru From rhamph at gmail.com Thu Dec 4 22:34:01 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 4 Dec 2008 14:34:01 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812042209.34814.nd@perlig.de> References: <4938374B.8000006@gmail.com> <200812042209.34814.nd@perlig.de> Message-ID: On Thu, Dec 4, 2008 at 2:09 PM, Andr? Malo wrote: > * Adam Olsen wrote: >> On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi > wrote: >> > I opened up bug http://bugs.python.org/issue4006 a while ago and it was >> > suggested in the report that it's not a bug but a feature and so I >> > should come here to see about getting the feature changed :-) >> > >> > I have a specific problem with os.environ and a somewhat less important >> > architectural issue with the unicode/bytes handling in certain os.* >> > modules. I'll start with the important one: >> > >> > Currently in python3 there's no way to get at environment variables >> > that are not encoded in the system default encoding. My understanding >> > is that this isn't a problem on Windows systems but on *nix this is a >> > huge problem. environment variables on *nix are a sequence of non-null >> > bytes. These bytes are almost always "characters" but they do not have >> > to be. Further, there is nothing that requires that the characters be >> > in the same encoding; some of the characters could be in the UTF-8 >> > character set while others are in latin-1, shift-jis, or big-5. >> >> Multiple encoding environments are best described as "batshit insane". >> It's impossible to handle any of it correctly *as text*, which is why >> UTF-8 is becoming a universal standard. For everybody's sanity python >> should continue to push it. > > Here's an example which will become popular soon, I guess: CGI scripts and, > of course WSGI applications. All those get their environment in an unknown > encoding. In the worst case one can blow up the application by simply > sending strange header lines over the wire. But there's more: consider > running the server in C locale, then probably even a single 8 bit char > might break something (?). I think that's an argument that the framework should reencode all input text into the correct system encoding before passing it on to the CGI script or WSGI app. If the framework doesn't have a clear way to determine the client's encoding then it's all just gibberish anyway. A HTTP 400 or 500 range error code is appropriate here. >> However, some pragmatism is also possible. Many uses of PATH may >> allow it to be treated as black-box bytes, rather than text. The >> minimal solution I see is to make os.getenv() and os.putenv() switch >> to byte modes when given byte arguments, as os.listdir() does. This >> use case doesn't require the ability to iterate over all environment >> variables, as os.environb would allow. >> >> I do wonder if controlling the environment given to a subprocess >> requires os.environb, but it may be too obscure to really matter. > > IMHO, environment variables are no text. They are bytes by definition and > should be treated as such. > I know, there's windows having unicode enabled env vars on demand, but > there's only trouble with those over there in apache's httpd (when passing > them to CGI scripts, oh well...). Environment variables have textual names, are set via text, frequently contain textual file names or paths, and my shell (bash in gnome-terminal on ubuntu) lets me put unicode text in just fine. The underlying APIs may use bytes, but they're *intended* to be encoded text. -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Thu Dec 4 22:40:05 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 4 Dec 2008 14:40:05 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <49384957.3030102@gmail.com> References: <4938374B.8000006@gmail.com> <49384957.3030102@gmail.com> Message-ID: On Thu, Dec 4, 2008 at 2:19 PM, Nick Coghlan wrote: > Toshio Kuratomi wrote: >> The bug report I opened suggests creating a PEP to address this issue. >> I think that's a good idea for whether os.listdir() and friends should >> be changed to raise an exception but not having any way to get at some >> environment variables seems like it's just a bug that needs to be >> addressed. What do other people think on both these issues? > > I'm pretty sure the discussion on this topic a while back decided that > where necessary Python 3 would grow parallel bytes versions of APIs > affected by environmental encoding issues (such as os.environb, > os.listdirb, os.getcwdb), but that we were OK with the idea of deferring > addition of those APIs until 3.1. It looks like most of them got into 3.0. http://docs.python.org/3.0/library/os.html says "All functions accepting path or file names accept both bytes and string objects, and result in an object of the same type, if a path or file name is returned." > That is, this was an acknowledged limitation with a fairly > straightforward agreed solution, but it wasn't considered a common > enough issue to delay the release of 3.0 until all of those parallel > APIs had been implemented Aye. IMO it's fairly clear that os.getenv()/os.putenv() should follow suit in 3.1. I'm not so sure about adding os.environb (and making subprocess use it), unless the OP can demonstrate they really need it. -- Adam Olsen, aka Rhamphoryncus From python at rcn.com Thu Dec 4 22:42:35 2008 From: python at rcn.com (Raymond Hettinger) Date: Thu, 4 Dec 2008 13:42:35 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final References: <20081204123750.GA890@amk.local><6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1><79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> Message-ID: From: "A.M. Kuchling" > Perhaps the statement could say something like "we do not expect > most Python packages will be ported to the 3.x series until > around the time 3.1 is released in X months." (where X=12? 6?) I would leave out any discussion of 3.1. Its content and release date have nothing to do with when third party modules get updated. Also, we don't know the timing of the third-party updates. Some may never get converted. Some may convert quickly and easily. Someone (perhaps me) may organize a series of funded sprints to get many of the major packages converted. It would be better to simply say that at the present time, most important third-party modules do not yet support 3.0. FWIW, my father is Python newbie and I'm pointing him to 3.0 because it will be easier to learn than 2.6's hodgepodge of new and old features. The 3.0 environment is much cleaner. From brett at python.org Thu Dec 4 22:49:40 2008 From: brett at python.org (Brett Cannon) Date: Thu, 4 Dec 2008 13:49:40 -0800 Subject: [Python-Dev] Merging flow In-Reply-To: References: <493826D5.3020205@trueblade.com> <1afaf6160812041057v5a7b6381o55513ef9a14b0e02@mail.gmail.com> Message-ID: On Thu, Dec 4, 2008 at 13:21, Georg Brandl wrote: > Benjamin Peterson schrieb: >> On Thu, Dec 4, 2008 at 12:52 PM, Eric Smith wrote: >>> Christian Heimes wrote: >>>> >>>> Several people have asked about the patch and merge flow. Now that Python >>>> 3.0 is out it's a bit more complicated. >>>> >>>> Flow diagram >>>> ------------ >>>> >>>> trunk ---> release26-maint >>>> \-> py3k ---> release30-maint >>>> >>>> >>>> Patches for all versions of Python should land in the trunk. They are then >>>> merged into release26-maint and py3k branches. Changes for Python 3.0 are >>>> merged via the py3k branch. >>> >>> Apologies if this has been discussed before. I looked but didn't see >>> anything. >>> >>> Given that at least 99% of the changes for the trunk will not get merged >>> into release26-maint, doesn't it make more sense to merge the other way? >>> That is, anything that gets checked in to release26-maint would potentially >>> be merged into trunk. That would remove the huge number of merge blocks that >>> will otherwise be required. Same fore py3k and release30-maint. > > I've suggested that too; the counter-argument was that "most people don't > want to care in which branch to commit something". I'm not too comfortable > with this argument as it implies a certain ignorance on the part of our > committers. As Fred says, it wasn't discussed anyway. > That would make the rule for choosing which branch to first commit to be the one with the smallest version: 2.6 -> trunk -> 3.0 -> py3k That seems reasonable to me since that is really what the code branching is and how I suspect we will do things with a DVCS. > Also, with svnmerge, it is not too late to change merging direction. > If changing it to be like above is not an issue then I vote for the change. >> I think the percentage is a bit lower than that. Also, we haven't been >> using blocking with the maintenance branch so far; svnmerge.py is just >> a convenience. (It generates commit messages and has a simpler >> interface than a simple "svn merge" command.) > > I *did* use blocking with the 2.6 branch when I last merged a whole batch > of commits. As you say, by using svnmerge without blocking we only get a > tool that can generate commit messages. However, with blocking we get > something more valuable: we don't overlook backportable fixes anymore. > > So: yes, blocking is more work, but it gives something important in return. The other perk of this ordering is you should be able to place a single block along the chain where the patch should stop and potentially be done with the merges if you are in a rush. That way people who do mass merges can just sequentially merge and not worry about where a patch should stop. -Brett From tjreedy at udel.edu Thu Dec 4 22:51:52 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 04 Dec 2008 16:51:52 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4938374B.8000006@gmail.com> References: <4938374B.8000006@gmail.com> Message-ID: Toshio Kuratomi wrote: > I opened up bug http://bugs.python.org/issue4006 a while ago and it was > suggested in the report that it's not a bug but a feature and so I > should come here to see about getting the feature changed :-) It does you no good and (and will irritate others) to conflate 'design decision I do not agree with' with 'mistaken documentation or implementation of a design decision'. The former is opinion, the latter is usually fact (with occasional border cases). The latter is what core developers mean by 'bug'. > Currently in python3 there's no way to get at environment variables that > are not encoded in the system default encoding. My understanding is > that this isn't a problem on Windows systems but on *nix this is a huge > problem. environment variables on *nix are a sequence of non-null > bytes. These bytes are almost always "characters" but they do not have > to be. Further, there is nothing that requires that the characters be > in the same encoding; some of the characters could be in the UTF-8 > character set while others are in latin-1, shift-jis, or big-5. To me, mixing encodings within a string is at least slightly insane. If by design, maybe even a 'design bug' ;-). > These mixed encodings can occur for a variety of reasons. Here's an > example that isn't too contrived :-) > > Swallow is a multi-user shell server hosted at a university in Japan. > The OS installed is Fedora 10 where the encoding of all filenames > provided by the OS are UTF-8. The administrator of the OS has kept this > convention and, among other things has created a directory to mount and > NFS directory from another computer. He calls that "??????" > ("network" in Japanese). Since it's utf-8, that gets put on the > filesystem as > '\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf' > > Now the administrators of the fileserver have been maintaining it since > before Unicode was invented. Furthermore, they don't want to suffer > from the space loss of using utf-8 to encode Japanese so they use > shift-jis everywhere. They have a directory on the nfs share for > programs that are useful for people on the shell server to access. It's > called "?????" ("programs" in Japanese) Since they're using > shift-jis, the bytes on the filesystem are: > '\x83v\x83\x8d\x83O\x83\x89\x83\x80' > > The system administrator of the shell server adds the directory of > programs to all his user's default PATH variables so then they have this: > > PATH=/bin:/usr/bin:/usr/local/bin:/mnt/\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf/\x83v\x83\x8d\x83O\x83\x89\x83\x80 I would think life would be ultimately easier if either the file server or the shell server automatically translated file names from jis and utf8 and back, so that the PATH on the *nix shell server is entirely utf8. How would you ever display a mixture to users? What if there were an ambiguous component that could be legally decoded more than one way? > Now comes the problematic part. One of the user's on the system wants > to write a python3 program that needs to determine if a needed program > is in the user's PATH. He tries to code it like this:: > > #!/usr/bin/python3.0 > > import os > > for directory in os.environ['PATH']: > programs = os.listdir(directory) > > That code raises a KeyError because python3 has silently discarded the > PATH due to the shift-jis encoded path elements. Much more importantly, > there's no way the programmer can handle the KeyError and actually get > the PATH from within python. Have you tried os.system or os.popen or the subprocess module to use and get a response from a native *nix command? On Windows >>> import subprocess as sp >>> s=sp.Popen('path', shell=True, stdout=sp.PIPE) >>> s.stdout.read() b'PATH=C:\\temp\\WatconPermanent\\binnt;C:\\temp\\WatconPermanent\\binw;C:\\WINDOWS\\System32;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\Program Files\\PC-Doctor for Windows\\services;C:\\Program Files\\ATI Technologies\\ATI.ACE\\Core-Static;C:\\Program Files\\Python25;C:\\Program Files\\QuickTime\\QTSystem\\\r\n' There are the bytes. This took me 10 minutes and a few mistakes as a first time subprocess user. Another 10 minutes and I figured out how to get the entire environment as bytes *and* convert them to a dict. This is a bit trickier s=sp.Popen('set', shell=True, stdout=sp.PIPE) #null set (env) cmd gets e1= s.stdout.read() e2=e1.split(b'\r\n') e2.pop() # get rid of trailing b'' from trailing '\r\n' e3=[i.split(b'=') for i in e2] env = dict(e3) Whether either of these should be wrapped in os, I'll leave for others to discuss and decide, but if you can do the same in *nix, you should be able to do what you need to for now. Terry Jan Reedy From brett at python.org Thu Dec 4 23:03:52 2008 From: brett at python.org (Brett Cannon) Date: Thu, 4 Dec 2008 14:03:52 -0800 Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final In-Reply-To: <4938467B.40806@gmail.com> References: <4938467B.40806@gmail.com> Message-ID: On Thu, Dec 4, 2008 at 13:07, Nick Coghlan wrote: > Terry Reedy wrote: >> and this could give some people a mis-impression, most likely negative, >> as to the magnitude and nature of the change. Most of the code I am now >> writing would, I believe, run with 2.5 except for print(..., file=xxx). >> And I know that there was concern for backward compatibility to the >> point that some changes were rejected (renaming builtins) or delayed >> (deleting duplicate test asserts) for that reason. So I would soften >> the statements to "... version of the language that is partially >> incompatible with... " and "were made without being bound by backward >> compatibility," > > I would agree with Terry - while there are backwards incompatibilities, > they aren't gratuitous. > > Then again, Guido does seem to want to discourage people from trying to > target the common subset of the two languages instead of using 2to3 as a > compilation step from the python3 version. > It makes sense if your code would have required jumping through hoops to keep the base use-case. But if the only major difference is something easily covered by a __future__ statement (think print_function or unicode_literals, I believe although that __future__ statement is not documented anywhere according to Google), then I honestly think it's okay to try to target the subset. -Brett From a.badger at gmail.com Thu Dec 4 23:13:35 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 04 Dec 2008 14:13:35 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <200812042209.34814.nd@perlig.de> Message-ID: <4938560F.8070903@gmail.com> Adam Olsen wrote: > On Thu, Dec 4, 2008 at 2:09 PM, Andr? Malo wrote: >> * Adam Olsen wrote: >>> On Thu, Dec 4, 2008 at 1:02 PM, Toshio Kuratomi >> wrote: >>>> I opened up bug http://bugs.python.org/issue4006 a while ago and it was >>>> suggested in the report that it's not a bug but a feature and so I >>>> should come here to see about getting the feature changed :-) >>>> >>>> I have a specific problem with os.environ and a somewhat less important >>>> architectural issue with the unicode/bytes handling in certain os.* >>>> modules. I'll start with the important one: >>>> >>>> Currently in python3 there's no way to get at environment variables >>>> that are not encoded in the system default encoding. My understanding >>>> is that this isn't a problem on Windows systems but on *nix this is a >>>> huge problem. environment variables on *nix are a sequence of non-null >>>> bytes. These bytes are almost always "characters" but they do not have >>>> to be. Further, there is nothing that requires that the characters be >>>> in the same encoding; some of the characters could be in the UTF-8 >>>> character set while others are in latin-1, shift-jis, or big-5. >>> Multiple encoding environments are best described as "batshit insane". >>> It's impossible to handle any of it correctly *as text*, which is why >>> UTF-8 is becoming a universal standard. For everybody's sanity python >>> should continue to push it. >> Here's an example which will become popular soon, I guess: CGI scripts and, >> of course WSGI applications. All those get their environment in an unknown >> encoding. In the worst case one can blow up the application by simply >> sending strange header lines over the wire. But there's more: consider >> running the server in C locale, then probably even a single 8 bit char >> might break something (?). > > I think that's an argument that the framework should reencode all > input text into the correct system encoding before passing it on to > the CGI script or WSGI app. If the framework doesn't have a clear way > to determine the client's encoding then it's all just gibberish > anyway. A HTTP 400 or 500 range error code is appropriate here. > The framework can't always encode input bytes into the system encoding for text. Sometimes the framework can be dealing with actual bytes. For instance, if the framework is being asked to reference an actual file on a *NIX filesystem the bytes have to match up with the bytes in the filename whether or not those bytes agree with the system encoding. > >>> However, some pragmatism is also possible. Many uses of PATH may >>> allow it to be treated as black-box bytes, rather than text. The >>> minimal solution I see is to make os.getenv() and os.putenv() switch >>> to byte modes when given byte arguments, as os.listdir() does. This >>> use case doesn't require the ability to iterate over all environment >>> variables, as os.environb would allow. >>> >>> I do wonder if controlling the environment given to a subprocess >>> requires os.environb, but it may be too obscure to really matter. >> IMHO, environment variables are no text. They are bytes by definition and >> should be treated as such. >> I know, there's windows having unicode enabled env vars on demand, but >> there's only trouble with those over there in apache's httpd (when passing >> them to CGI scripts, oh well...). > > Environment variables have textual names, are set via text, frequently > contain textual file names or paths, and my shell (bash in > gnome-terminal on ubuntu) lets me put unicode text in just fine. The > underlying APIs may use bytes, but they're *intended* to be encoded > text. > The example I've started using recently is this: text files on my system contain character data and I expect them to be read into a string type when I open them in python3. However, if a text file contains text that is not encoded in the system default encoding I should still be able to get at the data and perform my own conversion. So I agree with the default of treating environment variables as text. We just need to be able to treat them as bytes when these corner cases come up. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From nd at perlig.de Thu Dec 4 23:47:52 2008 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Thu, 4 Dec 2008 23:47:52 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <200812042209.34814.nd@perlig.de> Message-ID: <200812042347.52388.nd@perlig.de> * Adam Olsen wrote: > On Thu, Dec 4, 2008 at 2:09 PM, Andr? Malo wrote: > > Here's an example which will become popular soon, I guess: CGI scripts > > and, of course WSGI applications. All those get their environment in an > > unknown encoding. In the worst case one can blow up the application by > > simply sending strange header lines over the wire. But there's more: > > consider running the server in C locale, then probably even a single 8 > > bit char might break something (?). > > I think that's an argument that the framework should reencode all > input text into the correct system encoding before passing it on to > the CGI script or WSGI app. If the framework doesn't have a clear way > to determine the client's encoding then it's all just gibberish > anyway. A HTTP 400 or 500 range error code is appropriate here. Duh. See, you're already mixing different encodings and creating issues here! You're talking about client encoding (whatever that is) with correct system encoding (whatever that is, too) in the same paragraph and assume they are the same or compatible. There are several points here: - there is no clear way to get a single client encoding for the whole HTTP transaction (headers + body), because *there is none*. If the whole header set matches the same encoding, it's more or less luck. - there is no correct system encoding either. As said, I prefer running my servers in C locale, so it's all ascii. In fact, it shouldn't matter. The locale should not have anything to do with an application called over the network. - A 400 or 500 response for a header containing something like my name is not appropriate. - Octets in HTTP headers are allowed. And they are what they are - octets. The interpretation has to be left to the application, not the framework. > > >> However, some pragmatism is also possible. Many uses of PATH may > >> allow it to be treated as black-box bytes, rather than text. The > >> minimal solution I see is to make os.getenv() and os.putenv() switch > >> to byte modes when given byte arguments, as os.listdir() does. This > >> use case doesn't require the ability to iterate over all environment > >> variables, as os.environb would allow. > >> > >> I do wonder if controlling the environment given to a subprocess > >> requires os.environb, but it may be too obscure to really matter. > > > > IMHO, environment variables are no text. They are bytes by definition > > and should be treated as such. > > I know, there's windows having unicode enabled env vars on demand, but > > there's only trouble with those over there in apache's httpd (when > > passing them to CGI scripts, oh well...). > > Environment variables have textual names, are set via text, frequently Well, think about my example again. The friendly way to maintain them is not the issue. The problems arise at least when the variables are set by an attacker. > contain textual file names or paths, and my shell (bash in > gnome-terminal on ubuntu) lets me put unicode text in just fine. The > underlying APIs may use bytes, but they're *intended* to be encoded > text. Yes, encoded text == bytes. No, they're intended to be c-strings. And well, even if we assume that they should contain text (as in encoded unicode), their meaning is application specific and so is the encoding (even if it's mixed). What I'm saying is: I don't see much use for unicode APIs for the environment at all, because I don't know what's in there before inspecting them. And apparently the only reliable way to inspect them is via a byte oriented API. nd From a.badger at gmail.com Thu Dec 4 23:51:25 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 04 Dec 2008 14:51:25 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> Message-ID: <49385EED.9040004@gmail.com> Terry Reedy wrote: > Toshio Kuratomi wrote: >> I opened up bug http://bugs.python.org/issue4006 a while ago and it was >> suggested in the report that it's not a bug but a feature and so I >> should come here to see about getting the feature changed :-) > > It does you no good and (and will irritate others) to conflate 'design > decision I do not agree with' with 'mistaken documentation or > implementation of a design decision'. The former is opinion, the latter > is usually fact (with occasional border cases). The latter is what core > developers mean by 'bug'. > Noted. However, there's also a difference between "Prevents us from doing useful things" and "Allows doing a useful thing in a non-trivial manner". The latter I would call a difference in design decision and the former I would call a bug in the design. >> Currently in python3 there's no way to get at environment variables that >> are not encoded in the system default encoding. My understanding is >> that this isn't a problem on Windows systems but on *nix this is a huge >> problem. environment variables on *nix are a sequence of non-null >> bytes. These bytes are almost always "characters" but they do not have >> to be. Further, there is nothing that requires that the characters be >> in the same encoding; some of the characters could be in the UTF-8 >> character set while others are in latin-1, shift-jis, or big-5. > > To me, mixing encodings within a string is at least slightly insane. If > by design, maybe even a 'design bug' ;-). > As an application level developer I echo your sentiment :-) I recognize, though, that *nix filesystem semantics were designed many years before unicode and the decision to treat filenames, environment variables, and so much else as bytes follows naturally from the C definition of a char. It's up to a higher level than the OS to decide how to displa6 the bytes. [shell server and fileserver result in this insane PATH] >> PATH=/bin:/usr/bin:/usr/local/bin:/mnt/\xe3\x83\x8d\xe3\x83\x83\xe3\x83\x88\xe3\x83\xaf\xe3\x83\xbc\xe3\x82\xaf/\x83v\x83\x8d\x83O\x83\x89\x83\x80 >> > > I would think life would be ultimately easier if either the file server > or the shell server automatically translated file names from jis and > utf8 and back, so that the PATH on the *nix shell server is entirely > utf8. This is not possible because no part of the computer knows what the encoding is. To the computer, it's just a sequence of bytes. Unlike xml or the windows filesystem (winfs? ntfs?) where the encoding is specified as part of the document/filesystem there's nothing to tell what encoding the filenames are in. > How would you ever display a mixture to users? This is up to the application. My recomendation would be to keep the raw bytes (to access the file on the filesystem) and display the results of str(filename, errors='replace') to the user. > What if there > were an ambiguous component that could be legally decoded more than one > way? > The ambiguity is the reason that the fileserver and shell server can't automatically translate the filename (many encodings merely use all of the 2^8 byte combinations available in a C char type. This makes the byte decodable in any one of those encodings). In the application, only using the raw bytes to access the file also prevents ambiguity because the raw bytes only references one file. >> Now comes the problematic part. One of the user's on the system wants >> to write a python3 program that needs to determine if a needed program >> is in the user's PATH. He tries to code it like this:: >> >> #!/usr/bin/python3.0 >> >> import os >> >> for directory in os.environ['PATH']: >> programs = os.listdir(directory) >> >> That code raises a KeyError because python3 has silently discarded the >> PATH due to the shift-jis encoded path elements. Much more importantly, >> there's no way the programmer can handle the KeyError and actually get >> the PATH from within python. > > Have you tried os.system or os.popen or the subprocess module to use and > get a response from a native *nix command? On Windows > Sure, you can subprocess your way out of a lot of sticky situations since you're essentially delegating the task to a C routine. But there are drawbacks: * You become dependent on an external program being available. What happens if your code is run in a chroot, for instance? * Do we want anyone writing programs that access the environment on *NIX to have to discover this pattern themselves and implement it? As for wrapping this up in os.*, that isn't necessary -- the python3 interpreter already knows about the byte-oriented environment; it just isn't making it available to people programming in python. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From p.f.moore at gmail.com Thu Dec 4 23:52:41 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 4 Dec 2008 22:52:41 +0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> Message-ID: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> 2008/12/4 Raymond Hettinger : > Also, we don't know the timing of the third-party updates. > Some may never get converted. Some may convert quickly > and easily. Someone (perhaps me) may organize a series of > funded sprints to get many of the major packages converted. One piece of encouraging news I heard today is that mod_wsgi apparently works with 3.0 already - which may well mean that more web software than I'd originally anticipated will work sooner rather than later. But it's certainly true that Python (all versions, not just 3.0) is more of an ecosystem than just the CPython core. "Batteries included" notwithstanding. And it'll take longer for the 3.0 ecosystem to grow than the 2.6 one. Paul. From rhamph at gmail.com Fri Dec 5 00:15:47 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 4 Dec 2008 16:15:47 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812042347.52388.nd@perlig.de> References: <4938374B.8000006@gmail.com> <200812042209.34814.nd@perlig.de> <200812042347.52388.nd@perlig.de> Message-ID: On Thu, Dec 4, 2008 at 3:47 PM, Andr? Malo wrote: > * Adam Olsen wrote: > >> On Thu, Dec 4, 2008 at 2:09 PM, Andr? Malo wrote: > >> > Here's an example which will become popular soon, I guess: CGI scripts >> > and, of course WSGI applications. All those get their environment in an >> > unknown encoding. In the worst case one can blow up the application by >> > simply sending strange header lines over the wire. But there's more: >> > consider running the server in C locale, then probably even a single 8 >> > bit char might break something (?). >> >> I think that's an argument that the framework should reencode all >> input text into the correct system encoding before passing it on to >> the CGI script or WSGI app. If the framework doesn't have a clear way >> to determine the client's encoding then it's all just gibberish >> anyway. A HTTP 400 or 500 range error code is appropriate here. > > Duh. > See, you're already mixing different encodings and creating issues here! > You're talking about client encoding (whatever that is) with correct system > encoding (whatever that is, too) in the same paragraph and assume they are > the same or compatible. Mixing can work so long as the encoding is clearly specified and unambiguous. It limits your character set to a common subset of both encodings, but that's a lesser problem. > There are several points here: > > - there is no clear way to get a single client encoding for the whole HTTP > transaction (headers + body), because *there is none*. If the whole > header set matches the same encoding, it's more or less luck. If there is no way, via official standards or defacto standards, you should assume ascii and blow up if anything else is given. At that point it's meaningless garbage anyway. > - there is no correct system encoding either. As said, I prefer running my > servers in C locale, so it's all ascii. In fact, it shouldn't matter. The > locale should not have anything to do with an application called over the > network. I half agree: the network should be unaffected by the C locale. However, using a C locale should limit you to ascii file names and environment variables. > - A 400 or 500 response for a header containing something like my name is > not appropriate. > > - Octets in HTTP headers are allowed. And they are what they are - > octets. The interpretation has to be left to the application, not the > framework. If there is no clear interpretation then they're garbage. If there is a clear interpretation it could be done just as well in the framework, which also lets all the apps benefit from a single implementation, rather than trying to reimplement it for each one. >> >> However, some pragmatism is also possible. Many uses of PATH may >> >> allow it to be treated as black-box bytes, rather than text. The >> >> minimal solution I see is to make os.getenv() and os.putenv() switch >> >> to byte modes when given byte arguments, as os.listdir() does. This >> >> use case doesn't require the ability to iterate over all environment >> >> variables, as os.environb would allow. >> >> >> >> I do wonder if controlling the environment given to a subprocess >> >> requires os.environb, but it may be too obscure to really matter. >> > >> > IMHO, environment variables are no text. They are bytes by definition >> > and should be treated as such. >> > I know, there's windows having unicode enabled env vars on demand, but >> > there's only trouble with those over there in apache's httpd (when >> > passing them to CGI scripts, oh well...). >> >> Environment variables have textual names, are set via text, frequently > > Well, think about my example again. The friendly way to maintain them is not > the issue. The problems arise at least when the variables are set by an > attacker. Maintaining them *IS* the issue. The whole reason they're text in the first place is to display them to and receive them back from the user. Otherwise we'd use meaningless serial numbers for directories or something. It may not seem to matter in this use case, but that's only because they're communicated to/from the user on another system. >> contain textual file names or paths, and my shell (bash in >> gnome-terminal on ubuntu) lets me put unicode text in just fine. The >> underlying APIs may use bytes, but they're *intended* to be encoded >> text. > > Yes, encoded text == bytes. No, they're intended to be c-strings. And well, > even if we assume that they should contain text (as in encoded unicode), > their meaning is application specific and so is the encoding (even if it's > mixed). > > What I'm saying is: I don't see much use for unicode APIs for the > environment at all, because I don't know what's in there before inspecting > them. And apparently the only reliable way to inspect them is via a byte > oriented API. If you don't think your paths should contain text then please alter your other systems to stop using japanese names. Standardize on ascii serial numbers or something equally meaningless. Treating it as bytes is a bodge. It's worth getting your use case to "just work", but in the end it is text, and the *only* broad solution to text is unicode. -- Adam Olsen, aka Rhamphoryncus From a.badger at gmail.com Fri Dec 5 00:16:27 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 04 Dec 2008 15:16:27 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49384957.3030102@gmail.com> Message-ID: <493864CB.1040604@gmail.com> Adam Olsen wrote: > On Thu, Dec 4, 2008 at 2:19 PM, Nick Coghlan wrote: >> Toshio Kuratomi wrote: >>> The bug report I opened suggests creating a PEP to address this issue. >>> I think that's a good idea for whether os.listdir() and friends should >>> be changed to raise an exception but not having any way to get at some >>> environment variables seems like it's just a bug that needs to be >>> addressed. What do other people think on both these issues? >> I'm pretty sure the discussion on this topic a while back decided that >> where necessary Python 3 would grow parallel bytes versions of APIs >> affected by environmental encoding issues (such as os.environb, >> os.listdirb, os.getcwdb), but that we were OK with the idea of deferring >> addition of those APIs until 3.1. > > It looks like most of them got into 3.0. > http://docs.python.org/3.0/library/os.html says "All functions > accepting path or file names accept both bytes and string objects, and > result in an object of the same type, if a path or file name is > returned." > I'm very glad this is coming along. Just want to make sure the environment is also handled in 3.1. > >> That is, this was an acknowledged limitation with a fairly >> straightforward agreed solution, but it wasn't considered a common >> enough issue to delay the release of 3.0 until all of those parallel >> APIs had been implemented > > Aye. IMO it's fairly clear that os.getenv()/os.putenv() should follow > suit in 3.1. I'm not so sure about adding os.environb (and making > subprocess use it), unless the OP can demonstrate they really need it. > Note: subprocess currently uses the "real" environment (the raw environment as given to the python interpreter) when it is started without the `env` parameter. So the question would be what people overriding the env parameter on their own need to do. To be non-surprising I'd think they'd want to have a way to override just a few variables from the raw environment. Otherwise you have to know which variables the program you're calling relies on and make sure that those are set or call os.getenvb() to retrieve the byte version and add it to your copy of os.environ before passing that to subprocess. One example of something that would be even harder to implement without access to the os.environb dictionary would be writing a program that wraps make. Since make takes all the variables from the environment and transforms them into make variables you need to pass everything from the environment that you are not modifying into the command. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From martin at v.loewis.de Fri Dec 5 00:21:50 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 05 Dec 2008 00:21:50 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> <4937B80D.9070309@gmail.com> Message-ID: <4938660E.9080809@v.loewis.de> >> I can't find any docs built for Python 3.0 (not 3.1a0). > > The Windows installation has new 3.0 doc dated Dec 3, so it was built, > just not posted correctly. That doesn't mean very much. I built it on my local machine. Anybody with subversion and python could do that; the documentation is in subversion. Whether or not it appears on the web site as part of the release process is an entirely different matter. It used to be that the doc maintainer (Fred Drake) was part of the release team and release process. I think Georg is complaining that he is release maintainer, but not part of the release process. Regards, Martin From martin at v.loewis.de Fri Dec 5 00:24:26 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 05 Dec 2008 00:24:26 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> Message-ID: <493866AA.7000806@v.loewis.de> > ISTM, 3.0 is in pretty good shape. There is nothing intrinsically wrong > with it. I think it has many bugs, some known before the release, but many more yet to show up. I agree that the design is good; the implementation will certainly improve (I deliberately didn't say "could have been better", because it could not have been better, given the time available to the contributors). Regards, Martin From barry at python.org Fri Dec 5 00:25:35 2008 From: barry at python.org (Barry Warsaw) Date: Thu, 4 Dec 2008 18:25:35 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <4938660E.9080809@v.loewis.de> References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> <4937B80D.9070309@gmail.com> <4938660E.9080809@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 4, 2008, at 6:21 PM, Martin v. L?wis wrote: >>> I can't find any docs built for Python 3.0 (not 3.1a0). >> >> The Windows installation has new 3.0 doc dated Dec 3, so it was >> built, >> just not posted correctly. > > That doesn't mean very much. I built it on my local machine. Anybody > with subversion and python could do that; the documentation is in > subversion. > > Whether or not it appears on the web site as part of the release > process is an entirely different matter. It used to be that the > doc maintainer (Fred Drake) was part of the release team and release > process. I think Georg is complaining that he is release maintainer, > but not part of the release process. I've asked Georg to update PEP 101 to make his role as Documentation Expert explicit. Unfortunately we only debug major releases once (or twice) every 18 months. But next time, we'll get that part right for sure! In the meantime, I'll make sure Georg is involved in point releases moving forward. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSThm8HEjvBPtnXfVAQJgGgP/eiAUroHbxvpJLT8JRpW5H+nmyU5yGGCY NZYrU/Vm2vRPFyfDevOFErQX9Jr1LqO0x4Qgxm4PpIj3OVwM16INz98as6nONEhC MfTjf8Pp7f5BrF7HYh1XfPqTy5qpVhAkzKrCcjUk2VT/JHzJ4wrAl+29VhDTjvrz 3SXphnxWi6w= =dfm7 -----END PGP SIGNATURE----- From amauryfa at gmail.com Fri Dec 5 00:29:43 2008 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 5 Dec 2008 00:29:43 +0100 Subject: [Python-Dev] Taint Mode in Python 3.0 In-Reply-To: <200812041836.48146.nicole@cats-muvva.net> References: <200812041836.48146.nicole@cats-muvva.net> Message-ID: Hello, On Thu, Dec 4, 2008 at 19:36, Nicole King wrote: > Dear All, > > I have published the diff for my implementation of tainted mode in Python for > R3.0 (released version) at http://www.cats-muvva.net/software/. Look at the > bottom the page. I apologise for past problems accessing this web site: I > hope to have resolved all the issues with it. The patch is indeed huge! it seems that every function that returns a PyObject must be modified. And it seems very difficult to check for its correctness. Did you look at the Pypy project? The C code of the interpreter is generated, and it already proposes a "Taint" option at translation time. http://codespeak.net/pypy/dist/pypy/doc/objspace-proxies.html#taint With only 300 lines of elegant python code... -- Amaury Forgeot d'Arc From martin at v.loewis.de Fri Dec 5 00:31:59 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 05 Dec 2008 00:31:59 +0100 Subject: [Python-Dev] Merging flow In-Reply-To: References: Message-ID: <4938686F.1090508@v.loewis.de> >> trunk ---> release26-maint >> \-> py3k ---> release30-maint >> >> > > As a side-note: this merging flow means that bugfix and feature commits > may never be merged from trunk to py3k in one svnmerge batch. Else, > they cannot be separated when merging from py3k to 30-maint. True. However, the same would be true for the merge flow 26 -> trunk -> 3.0 -> 3k In fact, that merge flow wouldn't support merging features *at all*: a feature added to trunk would need to flow through 3.0, which can't accept new features. Regards, Martin From fijall at gmail.com Fri Dec 5 00:38:25 2008 From: fijall at gmail.com (Maciej Fijalkowski) Date: Fri, 5 Dec 2008 00:38:25 +0100 Subject: [Python-Dev] Taint Mode in Python 3.0 In-Reply-To: References: <200812041836.48146.nicole@cats-muvva.net> Message-ID: <693bc9ab0812041538u714e4e18y6f9aa9a656ba9460@mail.gmail.com> Hello, The thing is pypy's taint code is broken. Basically you don't only need to patch all places that return pyobject, but also all places that might modify anything. (All side effects) For example innocently looking call to addition might end up calling arbitrary python code (and have arbitrary side effects). There is a question how do you approach such things? Cheers, fijal On Fri, Dec 5, 2008 at 12:29 AM, Amaury Forgeot d'Arc wrote: > Hello, > > On Thu, Dec 4, 2008 at 19:36, Nicole King wrote: >> Dear All, >> >> I have published the diff for my implementation of tainted mode in Python for >> R3.0 (released version) at http://www.cats-muvva.net/software/. Look at the >> bottom the page. I apologise for past problems accessing this web site: I >> hope to have resolved all the issues with it. > > The patch is indeed huge! it seems that every function that returns a > PyObject must be modified. > And it seems very difficult to check for its correctness. > > Did you look at the Pypy project? The C code of the interpreter is > generated, and it already proposes a "Taint" option at translation > time. > http://codespeak.net/pypy/dist/pypy/doc/objspace-proxies.html#taint > With only 300 lines of elegant python code... > > -- > Amaury Forgeot d'Arc > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com > From martin at v.loewis.de Fri Dec 5 00:39:24 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 05 Dec 2008 00:39:24 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4938374B.8000006@gmail.com> References: <4938374B.8000006@gmail.com> Message-ID: <49386A2C.60208@v.loewis.de> > In the bug report I opened, I listed four ways to fix this along with > the pros and cons: I'm in favour of a different, fifth solution: 5) represent all environment variables in Unicode strings, including the ones that currently fail to decode. (then do the same to file names, then drop the byte-oriented file operations again) Regards, Martin From janzert at janzert.com Fri Dec 5 01:20:57 2008 From: janzert at janzert.com (Janzert) Date: Thu, 04 Dec 2008 19:20:57 -0500 Subject: [Python-Dev] Merging mailing lists In-Reply-To: <4937886B.4000002@v.loewis.de> References: <4937886B.4000002@v.loewis.de> Message-ID: Martin v. L?wis wrote: > I would like to merge mailing lists, now that the design and first > implementation of Python 3000 is complete. In particular, I would > like to merge the python-3000 mailing list back into python-dev, > and the python-3000-checkins mailing list back into python-checkins. > The rationale is to simplify usage of the lists, and to avoid > cross-postings. > > To implement this, all subscribers of the 3000 mailing lists would > be added to the trunk mailing lists (avoiding duplicates, of course), > and all automated messages going to python-3000-checkins would then > be directed to the trunk lists. The 3000 mailing lists would change > into read-only mode (i.e. primarily leaving the archives behind). > > Any objections? > > Regards, > Martin I like the general sentiment, but I think it may be a bad idea to automatically bring all the subscribers from the -3000 lists over to the more general lists. For instance if someone has an address subscribed specifically to archive the -3000 list suddenly it will begin archiving the other. I would rather just see a final announcement to switch to the other list and then close the list to further submissions. Let people join the new appropriate list manually if needed. Otherwise +1 on getting the discussion and checkins back into combined lists. Janzert From fwierzbicki at gmail.com Fri Dec 5 02:02:59 2008 From: fwierzbicki at gmail.com (Frank Wierzbicki) Date: Thu, 4 Dec 2008 20:02:59 -0500 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> Message-ID: <4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com> On Thu, Dec 4, 2008 at 3:16 PM, Brett Cannon wrote: > On Thu, Dec 4, 2008 at 12:05, Frank Wierzbicki wrote: >> On Wed, Dec 3, 2008 at 10:31 AM, A.M. Kuchling wrote: >>> 14:00 - 15:30 >>> ============= >>> >>> Two tracks: >>> >>> Cross-implementation issues: >>> >>> What do the various VMs want/need from CPython to help with their >>> implementations? >>> >>> * Marking CPython-specific tests in the test suite? >>> * Getting an implementation agnostic test suite for the Python language? >>> * Separating the language tests and the pure Python part of the stdlib into >>> a separate project? (Or publish them as a separate package.) >>> * Transition plans for 3.0? >>> >>> Champion needed. >> I would like to champion this one. >> > > I told AMK this a while back, but might as well make it more public; I > am up for chairing as well. Brett, Are you saying you've already called the cross-implementation champion role? If so I'm happy to defer or co-chair. -Frank From tjreedy at udel.edu Fri Dec 5 02:16:55 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 04 Dec 2008 20:16:55 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <49385EED.9040004@gmail.com> References: <4938374B.8000006@gmail.com> <49385EED.9040004@gmail.com> Message-ID: Toshio Kuratomi wrote: > >> I would think life would be ultimately easier if either the file server >> or the shell server automatically translated file names from jis and >> utf8 and back, so that the PATH on the *nix shell server is entirely >> utf8. > > This is not possible because no part of the computer knows what the > encoding is. To the computer, it's just a sequence of bytes. Unlike > xml or the windows filesystem (winfs? ntfs?) where the encoding is > specified as part of the document/filesystem there's nothing to tell > what encoding the filenames are in. I thought you said that the file server keep all filenames in shift-jis, and the shell server all in utf-8. If so, then the shell server could know if it were told so. From python at rcn.com Fri Dec 5 02:29:31 2008 From: python at rcn.com (Raymond Hettinger) Date: Thu, 4 Dec 2008 17:29:31 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> Message-ID: > 2008/12/4 Raymond Hettinger : >> Also, we don't know the timing of the third-party updates. >> Some may never get converted. Some may convert quickly >> and easily. Someone (perhaps me) may organize a series of >> funded sprints to get many of the major packages converted. From: "Paul Moore" > One piece of encouraging news I heard today is that mod_wsgi > apparently works with 3.0 already - which may well mean that more web > software than I'd originally anticipated will work sooner rather than > later. > > But it's certainly true that Python (all versions, not just 3.0) is > more of an ecosystem than just the CPython core. "Batteries included" > notwithstanding. And it'll take longer for the 3.0 ecosystem to grow > than the 2.6 one. Here's a bright idea. On the 3.0 release page, include a box listing which major third-party apps have been converted. Update it once every couple of weeks. That way, we're not explicitly discouraging adoption of 3.0, we're just listing what support is then currently available (if you need twisted and its not on the list, then that would be your guide). Raymond From tjreedy at udel.edu Fri Dec 5 02:33:56 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 04 Dec 2008 20:33:56 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081204123750.GA890@amk.local><6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1><79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> Message-ID: Raymond Hettinger wrote: > From: "A.M. Kuchling" >> Perhaps the statement could say something like "we do not expect >> most Python packages will be ported to the 3.x series until around the >> time 3.1 is released in X months." (where X=12? 6?) > > I would leave out any discussion of 3.1. Its content and release date > have nothing to do with when third party modules get updated. > > Also, we don't know the timing of the third-party updates. > Some may never get converted. Some may convert quickly > and easily. Someone (perhaps me) may organize a series of > funded sprints to get many of the major packages converted. > > It would be better to simply say that at the present time, > most important third-party modules do not yet support 3.0. > > FWIW, my father is Python newbie and I'm pointing him > to 3.0 because it will be easier to learn than 2.6's hodgepodge > of new and old features. The 3.0 environment is much cleaner. I agree with all 4 points, especially the last. I think newcomers should be informed of the +/- of different versions and then choose for themselves. For full battery availability, 2.5 is it and will be for some months. For a fresh start without need of extras, 3.0 wins in my experience so far. tjr From tjreedy at udel.edu Fri Dec 5 02:36:41 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 04 Dec 2008 20:36:41 -0500 Subject: [Python-Dev] [Python-3000] RELEASED Python 3.0 final In-Reply-To: <4938467B.40806@gmail.com> References: <4938467B.40806@gmail.com> Message-ID: Nick Coghlan wrote: > Terry Reedy wrote: >> and this could give some people a mis-impression, most likely negative, >> as to the magnitude and nature of the change. Most of the code I am now >> writing would, I believe, run with 2.5 except for print(..., file=xxx). >> And I know that there was concern for backward compatibility to the >> point that some changes were rejected (renaming builtins) or delayed >> (deleting duplicate test asserts) for that reason. So I would soften >> the statements to "... version of the language that is partially >> incompatible with... " and "were made without being bound by backward >> compatibility," > > I would agree with Terry - while there are backwards incompatibilities, > they aren't gratuitous. > > Then again, Guido does seem to want to discourage people from trying to > target the common subset of the two languages instead of using 2to3 as a > compilation step from the python3 version. I am not restricting myself to that subset. There simply is an unchanged core that happens to include what I currently need (except print, which is isolated in one module). I might need 'except x as y:' someday and will use it if I do but so far 'except x:' is enough and back compatible). tjr From foom at fuhm.net Fri Dec 5 02:14:40 2008 From: foom at fuhm.net (James Y Knight) Date: Thu, 4 Dec 2008 20:14:40 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <49386A2C.60208@v.loewis.de> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> Message-ID: <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote: > I'm in favour of a different, fifth solution: > > 5) represent all environment variables in Unicode strings, > including the ones that currently fail to decode. > (then do the same to file names, then drop the byte-oriented > file operations again) Yay, maybe we can have this whole discussion all over again! Let's bring out all the same arguments, come to no conclusion, and let it taper off unresolved, yet again! :) FWIW, I still agree with Martin that that's the most reasonable solution. James From tjreedy at udel.edu Fri Dec 5 03:08:03 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 04 Dec 2008 21:08:03 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: James Y Knight wrote: > On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote: >> I'm in favour of a different, fifth solution: >> >> 5) represent all environment variables in Unicode strings, >> including the ones that currently fail to decode. >> (then do the same to file names, then drop the byte-oriented >> file operations again) > > Yay, maybe we can have this whole discussion all over again! > > Let's bring out all the same arguments, come to no conclusion, and let > it taper off unresolved, yet again! :) My impression was that there was not enough time to do something like that for the soon-to-be-released 3.0, so it was deferred. Now or soon is the time to reconsider. > FWIW, I still agree with Martin that that's the most reasonable solution. FWIW2, I have much the same feeling. From rhamph at gmail.com Fri Dec 5 03:32:22 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 4 Dec 2008 19:32:22 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight wrote: > On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote: >> >> I'm in favour of a different, fifth solution: >> >> 5) represent all environment variables in Unicode strings, >> including the ones that currently fail to decode. >> (then do the same to file names, then drop the byte-oriented >> file operations again) > > Yay, maybe we can have this whole discussion all over again! > > Let's bring out all the same arguments, come to no conclusion, and let it > taper off unresolved, yet again! :) > > FWIW, I still agree with Martin that that's the most reasonable solution. It died because nobody presented a viable solution, and I maintain no solution is possible. All suggestions involve arbitrary transformations that fail to round trip correctly at some point or another. They're simply about shuffling the failure around to somewhere the poster happens to like. Please, if you have a *new* idea that doesn't have a failure mode, by all means post it. But don't resurrect a pointless bikeshed. -- Adam Olsen, aka Rhamphoryncus From amk at amk.ca Fri Dec 5 03:35:14 2008 From: amk at amk.ca (A.M. Kuchling) Date: Thu, 4 Dec 2008 21:35:14 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> Message-ID: <20081205023514.GA1723@amk.local> On Thu, Dec 04, 2008 at 05:29:31PM -0800, Raymond Hettinger wrote: > Here's a bright idea. On the 3.0 release page, include a box listing > which major third-party apps have been converted. Update it > once every couple of weeks. That way, we're not explicitly That's an excellent idea. We could have a webpage, or start a topic-specific weblog for posting announcements. I've started a draft of a 3.0 FAQ in the wiki at . Once it's finished we can move it into the 3.0 release pages. Everyone please edit and improve it! --amk From dinov at microsoft.com Fri Dec 5 04:24:08 2008 From: dinov at microsoft.com (Dino Viehland) Date: Thu, 4 Dec 2008 19:24:08 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: <350E7D38B6D819428718949920EC2355564A7FB096@NA-EXMSG-C102.redmond.corp.microsoft.com> Does anyone know what Mono does here? Presumably they have the exact same problem as all strings in .NET are Unicode, and filenames/env vars/etc... are always strings. Maybe if it's gotta be broken at least it can be broken in a manner that's consistent with others :) > -----Original Message----- > From: python-dev-bounces+dinov=microsoft.com at python.org [mailto:python- > dev-bounces+dinov=microsoft.com at python.org] On Behalf Of Adam Olsen > Sent: Thursday, December 04, 2008 6:32 PM > To: James Y Knight > Cc: "Martin v. L?wis"; python-dev List > Subject: Re: [Python-Dev] Python-3.0, unicode, and os.environ > > On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight wrote: > > On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote: > >> > >> I'm in favour of a different, fifth solution: > >> > >> 5) represent all environment variables in Unicode strings, > >> including the ones that currently fail to decode. > >> (then do the same to file names, then drop the byte-oriented > >> file operations again) > > > > Yay, maybe we can have this whole discussion all over again! > > > > Let's bring out all the same arguments, come to no conclusion, and > let it > > taper off unresolved, yet again! :) > > > > FWIW, I still agree with Martin that that's the most reasonable > solution. > > It died because nobody presented a viable solution, and I maintain no > solution is possible. All suggestions involve arbitrary > transformations that fail to round trip correctly at some point or > another. They're simply about shuffling the failure around to > somewhere the poster happens to like. > > Please, if you have a *new* idea that doesn't have a failure mode, by > all means post it. But don't resurrect a pointless bikeshed. > > > -- > Adam Olsen, aka Rhamphoryncus > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python- > dev/dinov%40microsoft.com From rhamph at gmail.com Fri Dec 5 04:47:22 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 4 Dec 2008 20:47:22 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <350E7D38B6D819428718949920EC2355564A7FB096@NA-EXMSG-C102.redmond.corp.microsoft.com> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> <350E7D38B6D819428718949920EC2355564A7FB096@NA-EXMSG-C102.redmond.corp.microsoft.com> Message-ID: On Thu, Dec 4, 2008 at 8:24 PM, Dino Viehland wrote: > Does anyone know what Mono does here? Presumably they have the exact same > problem as all strings in .NET are Unicode, and filenames/env vars/etc... > are always strings. > > Maybe if it's gotta be broken at least it can be broken in a manner > that's consistent with others :) Many of the windows APIs use UTF-16 without validating it. They'll pass through invalid strings until they hit something that does validate, at which point it'll blow up. I suspect that it doesn't happen very often in practice, as having only one encoding makes it quite clear that it's a broken file name, not a mixed encoding environment. -- Adam Olsen, aka Rhamphoryncus From glyph at divmod.com Fri Dec 5 04:52:36 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 05 Dec 2008 03:52:36 -0000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: <20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com> On 02:08 am, tjreedy at udel.edu wrote: >James Y Knight wrote: >>On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote: >>>I'm in favour of a different, fifth solution: >>> >>>5) represent all environment variables in Unicode strings, >>> including the ones that currently fail to decode. >>> (then do the same to file names, then drop the byte-oriented >>> file operations again) >>FWIW, I still agree with Martin that that's the most reasonable >>solution. > >FWIW2, I have much the same feeling. And I still disagree, but I re-read the old thread and didn't see much of a clear argument there, so at least I'm not re-treading old ground :). The only strategy that would allow us to encode all inputs as unicode (including the invalid ones) is to abuse NUL to mean "ha ha, this isn't actually a unicode string, it's something I couldn't decode". This is nice because it allows the type of the returned value to be the same, so a Python program that expects a unicode object will be able to manipulate this object (as long as it doesn't split it up too close to a NUL). It seems to me that this convenient, but clever-clever type distinction will inevitably be a bug magnet. For the most basic example, see the caveat above. But more realistically - not too much code splits filenames on anything but "." or os.sep, after all - if you pass this to an extension module which then wants to invoke a C library function which passes the file name to open() and friends, what is the right thing for the extension module to do? There would need to be a new API which could get the "right" bytes out of a unicode string which potentially has NULs in it. This can't just be an encoding, either, because you might need to get the Shift-JIS bytes (some foreign system's encoding) for some got-NULs-in-it filename even though your locale says the encoding is UTF-8. And what if those bytes happen to be valid Shift-JIS? Decoding bytes makes a lot more sense to me than transcoding strings. Filenames and environment variables would all need to be encoded or decoded according to this magic encoding. And what happens if you get some garbage data from elsewhere and pass it to a function that *generates* a filename? Now, you get a pleasant error message, "TypeError: file() argument 1 must be (encoded string without NULL bytes), not str". In the future, I can only assume (if you're lucky) that you'll get some weird thing out of the guts of an encoding module; or, more likely, some crazy mojibake filename containing PUA code points or whatever will silently get opened. You can make this less likely (and harder to debug in the odd cases where it does happen) by making the encoding more clever, but eventually your luck will run out: most likely on somebody's computer who doesn't speak english well enough to report the problem clearly. The scenario gets progressively more nightmarish as you start putting more libraries into the mix. You pass some environment variable into some library which knows all about unicode and happily handles it correctly, but a second library which doesn't know about this proposed tricky NUL convention gets the unicode filename and transcodes it literally, causing an error return from open(). This puts the apparent error very far away from the responsible code. Ultimately it makes sense to expose the underlying bytes as bytes without forcing everyone to pretend that they make sense as anything but bytes, and allow different applications to make appropriately educated guesses about their character format. In any case, programmers who don't know about these kinds of issues are going to make mistakes in handling invalid filenames on UNIXy systems, and some users won't be able to open some files. If there is an easy and straightforward way to get the bytes out, it's more likely that programmers who know what they are doing will be able to get the correct behavior. Of course, the NUL-encoding trick will make it *possible* to do the right thing, but our hypothetically savvy programmer now needs to learn about the bytes/unicode distinction between windows/mac+linux+everythingelse, and Python's special convention for invalid data, and how to mix it with encoding/decoding/transcoding, rather than just Python's distinct API for the distinct types that may represent a filename. I think this is significantly harder to document than just having two parallel APIs (environ, environb, open(str), open(bytes), listdir(str), listdir(bytes)) to reflect the very subtle, but nevertheless very real, distinction between the Windows and UNIX worlds. This distinct API can still provide the same illusion of "it usually works" portability that the encoding convention can: for Windows, environb can be the representation of the environment in a particular encoding; for UNIX, environ(u) can be all of the variables which correctly decode. And so on for each other API. At least this time I think I've encapsulated pretty much my entire argument here, so if you don't buy it, we can probably just agree to disagree :). From glyph at divmod.com Fri Dec 5 04:55:50 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 05 Dec 2008 03:55:50 -0000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: <20081205035550.12555.1158502921.divmod.xquotient.958@weber.divmod.com> On 02:32 am, rhamph at gmail.com wrote: >On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight wrote: >>FWIW, I still agree with Martin that that's the most reasonable >>solution. > >It died because nobody presented a viable solution, and I maintain no >solution is possible. All suggestions involve arbitrary >transformations that fail to round trip correctly at some point or >another. They're simply about shuffling the failure around to >somewhere the poster happens to like. > >Please, if you have a *new* idea that doesn't have a failure mode, by >all means post it. But don't resurrect a pointless bikeshed. Despite my objection to the funny-encoding strategy (which I've documented thoroughly in my other message to this thread) this isn't accurate. The PUA solution doesn't work, but using NUL does. This was proposed last time, as a copy of what Mono does. You can't get a NUL in os.environ or a filename; it's not valid. So, it works fine as an escape character. It can round-trip perfectly. From glyph at divmod.com Fri Dec 5 04:59:42 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 05 Dec 2008 03:59:42 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205023514.GA1723@amk.local> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> Message-ID: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> On 02:35 am, amk at amk.ca wrote: >On Thu, Dec 04, 2008 at 05:29:31PM -0800, Raymond Hettinger wrote: >>Here's a bright idea. On the 3.0 release page, include a box listing >>which major third-party apps have been converted. Update it >>once every couple of weeks. That way, we're not explicitly > >That's an excellent idea. We could have a webpage, or start a >topic-specific weblog for posting announcements. > >I've started a draft of a 3.0 FAQ in the wiki at >. Once it's finished we >can move it into the 3.0 release pages. Everyone please edit and >improve it! It occurs to me that this specific idea (the box with the list of supported applications / libraries) should be implementable as a simple query against PyPI. I don't know if it actually is :), but it should be. In general it would be nice to know whether one's favorite tools were available for *any* new Python version. From fdrake at acm.org Fri Dec 5 05:15:06 2008 From: fdrake at acm.org (Fred Drake) Date: Thu, 04 Dec 2008 23:15:06 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> Message-ID: On Dec 4, 2008, at 10:59 PM, glyph at divmod.com wrote: > It occurs to me that this specific idea (the box with the list of > supported applications / libraries) should be implementable as a > simple query against PyPI. I don't know if it actually is :), but > it should be. In general it would be nice to know whether one's > favorite tools were available for *any* new Python version. I agree, this would be ideal. I'm not sure the metadata is there to support it, though. Each (version of each) package would need to register metadata recording which versions of Python it's known to be compatible with ("has been tested with"). I'd love for this to be available, and would be more proactive about testing software I've been involved in releasing against more Python versions. -Fred -- Fred Drake From guido at python.org Fri Dec 5 05:16:45 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Dec 2008 20:16:45 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> References: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> Message-ID: I hear some folks are considering advertising 3.0 as experimental or not ready for serious use yet. I think that's too negative -- we should encourage people to use it, period. They'll have to decide for themselves whether they can live with the lack of ported 3rd party libraries -- which may resolve itself soon enough. We should make it clear that it's perfectly fine to stick with 2.6, but at the same time encourage people to try 3.0 and see for themselves -- IMO it's as solid as 2.6. (2.6.1 being more solid, of course, as will be 3.0.1). Especially from the education front I've heard a lot of positive noises about 3.0. See e.g. an early review, posted 8 months ago: http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html. I also want to remind folks that I've promised left and right that post-3.0 we'll stick to the same backwards compatibility strategy that we used for the 2.x series. No new incompatibilities. No new features in 3.0.1 etc.; those go in 3.1, 3.2, etc. The only compromise I'd be willing to make is that 3.1 can be rather sooner than the typical 18-24 months cycle. But any API that exists in 3.0 will have to take the regular deprecation route, and if we start having releases close together we should be careful to measure the deprecation time in years, not releases. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Fri Dec 5 05:42:10 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 4 Dec 2008 21:42:10 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <20081205035550.12555.1158502921.divmod.xquotient.958@weber.divmod.com> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> <20081205035550.12555.1158502921.divmod.xquotient.958@weber.divmod.com> Message-ID: On Thu, Dec 4, 2008 at 8:55 PM, wrote: > > On 02:32 am, rhamph at gmail.com wrote: >> >> On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight wrote: > >>> FWIW, I still agree with Martin that that's the most reasonable solution. >> >> It died because nobody presented a viable solution, and I maintain no >> solution is possible. All suggestions involve arbitrary >> transformations that fail to round trip correctly at some point or >> another. They're simply about shuffling the failure around to >> somewhere the poster happens to like. >> >> Please, if you have a *new* idea that doesn't have a failure mode, by >> all means post it. But don't resurrect a pointless bikeshed. > > Despite my objection to the funny-encoding strategy (which I've documented > thoroughly in my other message to this thread) this isn't accurate. The PUA > solution doesn't work, but using NUL does. This was proposed last time, as > a copy of what Mono does. You can't get a NUL in os.environ or a filename; > it's not valid. So, it works fine as an escape character. It can > round-trip perfectly. The failure is more subtle, in that a path from the filesystem cannot round trip via a different return path. i.e. list the dir via python, pass it to an external lib to open. If you don't need that to work it's quite easy to explicitly encode byte strings into text for storage or whatever. -- Adam Olsen, aka Rhamphoryncus From barry at python.org Fri Dec 5 06:07:53 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 5 Dec 2008 00:07:53 -0500 Subject: [Python-Dev] RELEASED Python 2.6.1 Message-ID: <6898A62C-3BA0-4EF1-BDB5-07B2961BF026@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hot on the heals of Python 3.0 comes the Python 2.6.1 bug-fix release. This is the latest production-ready version in the Python 2.6 family. Dozens of issues have fixed since Python 2.6 final was released in October. Please see the NEWS file for details: http://www.python.org/download/releases/2.6.1/NEWS.txt For more information on Python 2.6 please see http://docs.python.org/dev/whatsnew/2.6.html Source tarballs and Windows installers can be downloaded from the Python 2.6.1 page: http://www.python.org/download/releases/2.6.1/ Bugs can be reported in the Python bug tracker: http://bugs.python.org Enjoy, - -Barry Barry Warsaw barry at python.org Python 2.6/3.0 Release Manager (on behalf of the entire python-dev team) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTi3KnEjvBPtnXfVAQLhQAP7BR8eqlVLDlu/bp2tGaRRQS8GW5X8KQQk h0RwCcAKK19WH6YS6zH+VoIpD8LnD37YqZL3m5MQZ/rDf0o3e6152CZ6GJvWE+0i 6w0cSvDqdWuOpfUfpYR21eQnoFuC6x/yfI//yWCnu8bZCypjmJCLKZAvu4pMjYgD ceChg4lLE68= =u/iW -----END PGP SIGNATURE----- From guido at python.org Fri Dec 5 06:14:39 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Dec 2008 21:14:39 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: >> On Dec 4, 2008, at 6:39 PM, Martin v. L?wis wrote: >>> I'm in favour of a different, fifth solution: >>> >>> 5) represent all environment variables in Unicode strings, >>> including the ones that currently fail to decode. >>> (then do the same to file names, then drop the byte-oriented >>> file operations again) > On Thu, Dec 4, 2008 at 6:14 PM, James Y Knight wrote: [...] >> FWIW, I still agree with Martin that that's the most reasonable solution. On Thu, Dec 4, 2008 at 6:32 PM, Adam Olsen wrote: > It died because nobody presented a viable solution, and I maintain no > solution is possible. All suggestions involve arbitrary > transformations that fail to round trip correctly at some point or > another. They're simply about shuffling the failure around to > somewhere the poster happens to like. > > Please, if you have a *new* idea that doesn't have a failure mode, by > all means post it. But don't resurrect a pointless bikeshed. I don't like Martin's solution at all. Glyph's message nails the problem -- the "funny encoding" solution breaks as soon as filenames get passed to other components, and as that's what Python is often all about, it's likely to happen all the time. The simplest example I can think of is a program that prints a directory listing to stdout -- printing the "funny" encoding to stdout isn't going to be what users expect. So the program has to be aware of the possibility of "funny" encoded filenames, and the roundtripping isn't useful at all. At the risk of bringing up something that was already rejected, let me propose something that follows the path taken in 3.0 for filenames, rather than doubling back: For os.environ, os.getenv() and os.putenv(), I think a similar approach as used for os.listdir() and os.getcwd() makes sense: let os.environ skip variables whose name or value is undecodable, and have a separate os.environb() which contains bytes; let os.getenv() and os.putenv() do the right thing when the arguments passed in are bytes. For sys.argv, because it's positional, you can't skip undecodable values, so I propose to use error=replace for the decoding; again, we can add sys.argvb that contains the raw bytes values. The various os.exec*() and os.spawn*() calls (as well as os.system(), os.popen() and the subprocess module) should all accept bytes as well as strings. On Windows, the bytes APIs should probably not exist. I predict that most developers can get away with not using the bytes APIs at all. The small minority that needs to be robust if not all filenames use the system encoding can use the bytes APIs. This would be developers on various Unix systems except OSX (which uses UTF8 for its filesystems), and perhaps the occasional developer on OSX whose app needs to work with files on mounted filesystems that use a different encoding. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From glyph at divmod.com Fri Dec 5 06:40:46 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 05 Dec 2008 05:40:46 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> Message-ID: <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com> On 4 Dec, 07:12 pm, python at rcn.com wrote: >The latter statement worries me. It seems to unnecessarily undermine >adoption of 3.0. It essentially says, "don't use this". Is that what >we want? I think so. The default case, the case of the user without the wherewithal to understand the nuances of the distinction between 2.x and 3.x, is a user who should use 2.x. If the user understands what's going on, they're not going to pay attention to such a notice anyway. I think Barry did a great job phrasing this; the language in this comment has to be strong enough to counter the prevailing wisdom that "higher version number = better". I think it did that without being overly negative. For most users, especially new users who have yet to be impressed with Python's power, 2.x is much better. It's not like "library support" is one small check-box on the language's feature sheet: most of the attractive things about Python are libraries. Of course I am not free from bias, being the author of many libraries myself, but it was other libraries that drew me to Python in the first place. If you're writing an application with 2.x, you get GTK, Qt, PyGame, PIL, NumPy, and of course the wonderful Twisted. With 3.0, you get... Tkinter, and ... pywin32, I guess, although I can't find the download on sourceforge? A fork of django that "just barely works"? A "half broken" email module in the stdlib? All things which you can *also* get on 2.x, modulo the "barely works" and "half broken". If you're writing a library, even if you intend to support py3 as a platform on day one, you could reach a much wider audience by simply writing in 2to3-friendly style and releasing 2.x source. Writing a 3.x-only library will artificially limit your audience and make it much harder to combine your library with *other* useful Python libraries which have not yet been ported. There's no 3to2 yet, and maybe there never will be. ("py3to2" looks like an interesting project, but seems to be misleadingly named, since I don't think it will help you run your 3.x-source programs on a stock 2.x VM). The third (albeit much less likely) option is that you're learning Python to learn to interact with a system that's scriptable in embedded Python, like Blender or Gimp. I don't think there's a single system of that variety which uses 3.0 yet, and these will likely be even slower to move than libraries. So if the user downloads Python 3 and the accompanying tutorial they're likely to be confused when they try to use their newly-acquired knowledge to script the tool in question. Of course, in the long term, maintenance for 2.x is going away and we are all being gently herded to 3.x. Aren't the things I just talked about the reason for the continued maintenance of 2.x, though? It makes sense to talk about 3.1 and beyond, because that points to some continuity with 3.0. It doesn't make sense to say "don't use it", but it does make sense to say "use it to get ready for the eventual direction of the language". For example, my experience so far suggests that the only motion on Twisted towards 3.x during the 3.0.x/2.6.x cycle will be us reporting bugs in 2to3 and in the new version of the stdlib. 3.1 is likely to be the first version we could realistically target. I am sure that many other libraries are in a similar situation, since 2to3 has not yet been exposed to a wide variety of ugly, real-world code, and nobody's maintaining an #ifdef'd up C extension module yet. By the time 3.1 rolls around, we will all know how this migration strategy is really working out, and will be able to predict the likely migration timetable for various libraries with some degree of accuracy. From rhamph at gmail.com Fri Dec 5 06:46:14 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 4 Dec 2008 22:46:14 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: On Thu, Dec 4, 2008 at 10:14 PM, Guido van Rossum wrote: > At the risk of bringing up something that was already rejected, let me > propose something that follows the path taken in 3.0 for filenames, > rather than doubling back: > > For os.environ, os.getenv() and os.putenv(), I think a similar > approach as used for os.listdir() and os.getcwd() makes sense: let > os.environ skip variables whose name or value is undecodable, and have > a separate os.environb() which contains bytes; let os.getenv() and > os.putenv() do the right thing when the arguments passed in are bytes. +1 (as that's what I suggested) > For sys.argv, because it's positional, you can't skip undecodable > values, so I propose to use error=replace for the decoding; again, we > can add sys.argvb that contains the raw bytes values. The various > os.exec*() and os.spawn*() calls (as well as os.system(), os.popen() > and the subprocess module) should all accept bytes as well as strings. +1. I wish there was a better solution to sys.argv. > On Windows, the bytes APIs should probably not exist. -0. I'd prefer byte APIs return UTF-16 bytes and the unicode APIs become validating. > I predict that most developers can get away with not using the bytes > APIs at all. The small minority that needs to be robust if not all > filenames use the system encoding can use the bytes APIs. This would > be developers on various Unix systems except OSX (which uses UTF8 for > its filesystems), and perhaps the occasional developer on OSX whose > app needs to work with files on mounted filesystems that use a > different encoding. -- Adam Olsen, aka Rhamphoryncus From guido at python.org Fri Dec 5 07:05:05 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Dec 2008 22:05:05 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com> Message-ID: On Thu, Dec 4, 2008 at 9:40 PM, wrote: > The default case, the case of the user without the wherewithal > to understand the nuances of the distinction between 2.x and 3.x, is a user > who should use 2.x. Not at all clear. If they're not sensitive to those nuances it's just as likely that they're a casual developer (e.g. a student just learning to program). Such users are unlikely to start using major 3rd party packages like Twisted or Django, which would be completely overwhelming to someone just learning. As shown in http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html, Python 3.0 removes quite a few warts that are likely to trip up learners. Once they are ready (probably under the wings of some guru) to dive deeper, they may have to learn about 2.6 and how it differs -- that's a useful exercise by itself, but if I'm right, most learners won't have to go there because by the time they get to that point, the 3.0 ecosystem has matured enough to support their needs. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Fri Dec 5 07:06:17 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Dec 2008 22:06:17 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: On Thu, Dec 4, 2008 at 9:46 PM, Adam Olsen wrote: > On Thu, Dec 4, 2008 at 10:14 PM, Guido van Rossum wrote: >> On Windows, the bytes APIs should probably not exist. > > -0. I'd prefer byte APIs return UTF-16 bytes and the unicode APIs > become validating. -1 on UTF-16 bytes, as this seems extremely useless and confusing to me. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From martin at v.loewis.de Fri Dec 5 07:55:29 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 05 Dec 2008 07:55:29 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: <4938D061.2020105@v.loewis.de> > Let's bring out all the same arguments, come to no conclusion, and let > it taper off unresolved, yet again! :) This time, it will be different. I will write a PEP, and will request that anybody proposing an alternative solution also write a PEP (and no change is made to the code before the PEPs have been fully specified, discussed, and a BDFL pronouncement has been made). Regards, Martin From martin at v.loewis.de Fri Dec 5 08:00:35 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 05 Dec 2008 08:00:35 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: <4938D193.4080608@v.loewis.de> > Please, if you have a *new* idea that doesn't have a failure mode, by > all means post it. But don't resurrect a pointless bikeshed. While I completely agree that it is pointless to reiterate the same arguments over and over, I disagree that the bikeshed metapher applies. This metapher (IIUC) describes a trivial design issue that is merely a matter of taste, rather than having deep technical implications. Using Unicode or bytes for strings is not of that kind. Regards, Martin From glyph at divmod.com Fri Dec 5 08:27:05 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 05 Dec 2008 07:27:05 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> Message-ID: <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> On 04:16 am, guido at python.org wrote: >I hear some folks are considering advertising 3.0 as experimental or >not ready for serious use yet. With all due respect, for me, "library support" and "serious use" are synonymous. When prompted I would say that 2.5 is probably the version that a new Python user should use. It's what's already installed on their Mac or their Ubuntu box, and it's easiest to get libraries for. I've already said in my other note why I think the python website should say the same. Speaking of respect, out of respect for all of you folks I have refrained from shouting this opinion from the rooftops. I have avoided blogging about it, I've kept all my public feedback on this list, and I plan to continue saying nothing (elsewhere) until I have something nice to say. (The occasional snide comment on IRC notwithstanding.) That doesn't mean I'm going to tell people who have real problems to solve to mess around trying out 3.0, just to see if it has the library support that they need, when I already know that it doesn't. Sorry, but community spirit only goes so far: when people ask for my recommendation, I'm going to tell the truth. For example, I recently helped my sister do some work that involved running a Fourier transform over a large amount of data. Doing this with python 2.5 took only a few minutes (numpy apparently preinstalled on leopard!); much faster than trying to debug the obscure errors she was getting out of Fortran. Doing it with Python 3.0 would have been an exercise in frustration (no numpy yet at all), and even 2.6 would have been a pain (download, compile, install, get numpy, compile, install, etc etc). If python 3.0 had for some reason *been* the preinstalled version, we would have needed to download 2.6 or 2.5. For this reason I don't want to encourage the upstream, in this case Apple, to consider 3.0 "ready" yet either. 2.x is still a necessity, even if they want to start shipping 3.0 soon. In my experience this is an entirely typical usage of Python. I know very few people who have learned the language for its own sake (and in fact, the two I can think of right now have long since switched to Haskell); it's almost always for this or that library. In the cases where it is for the language itself, the conversation almost always begins, "Hey, I've been thinking about learning Python. Can it do $TASK?". If the answer is (as it often is) "Sure, just use Py$TASK" then they're immediately sold. If not, "learn python" remains one of their never-done back-burner projects like "clean out the garage". Even in my own case, I learned Python because it was easier to write GTK+ programs in than C; Java's GUI libraries having been demonstrated deficient, I wanted something better. The networking stuff was a side- effect. Given that this is my typical experience of Python introductions (of which I have done quite a few), until a majority of Py$TASK for $TASKs that I'm interested in have been ported to py3, then even in the abstract, py3 remains "experimental" and "not ready for serious use". That's not the same thing as "bad": >IMO it's as solid as 2.6. (2.6.1 being more solid, of course, as will >be 3.0.1). I have not heard anyone saying that 3.0 is flaky, broken, or "beta". I certainly haven't said that, or even thought it. Library support is _the_ problem. >Especially from the education front I've heard a lot of positive >noises about 3.0. See e.g. an early review, posted 8 months ago: >http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html. To be fair, if someone asked me specifically about educating non- programmer adults about programming, I would probably at least *mention* py3, if not recommend it outright. The improved consistency is worth a lot in an educational setting. (But, if one is educating children and interested in soliciting their genuine enthusiasm, whiz-bang graphics are really a must-have, not a negotiable extra.) Note, however, that even this paper specifically mentions several libraries which must be available, or they will have to "abandon these examples entirely or (reluctantly) delay adoption of version 3.0". I hope for Mr. Efford's sake that these libraries will all become available shortly. They have all taken steps to produce 3.0-compatible versions. However, none are available today, making it still a difficult choice to use 3 rather than 2. From martin at v.loewis.de Fri Dec 5 08:26:03 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 05 Dec 2008 08:26:03 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> Message-ID: <4938D78B.6010406@v.loewis.de> > Here's a bright idea. On the 3.0 release page, include a box listing > which major third-party apps have been converted. Update it > once every couple of weeks. That way, we're not explicitly > discouraging adoption of 3.0, we're just listing what support is > then currently available (if you need twisted and its not on the list, > then that would be your guide). As a slight variation: that should be a wiki page (or, as AMK suggests, a weblog). The release page should link to it. If maintenance of this list was in the hands of a single person (the release manager), or a few (the pydotorg editors), it would always be outdated. FWIW, there is also the py3 category in PyPI: http://pypi.python.org/pypi?:action=browse&c=533 Regads, Martin From martin at v.loewis.de Fri Dec 5 08:27:53 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 05 Dec 2008 08:27:53 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> Message-ID: <4938D7F9.80908@v.loewis.de> > I agree, this would be ideal. I'm not sure the metadata is there to > support it, though. There is. There have been the following trove classifiers defined for a few weeks now: Programming Language :: Python :: 2 Programming Language :: Python :: 2.3 Programming Language :: Python :: 2.4 Programming Language :: Python :: 2.5 Programming Language :: Python :: 2.6 Programming Language :: Python :: 2.7 Programming Language :: Python :: 3 Programming Language :: Python :: 3.0 Programming Language :: Python :: 3.1 Regards, Martin From fdrake at acm.org Fri Dec 5 08:40:20 2008 From: fdrake at acm.org (Fred Drake) Date: Fri, 05 Dec 2008 02:40:20 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <4938D7F9.80908@v.loewis.de> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <4938D7F9.80908@v.loewis.de> Message-ID: On Dec 5, 2008, at 2:27 AM, Martin v. L?wis wrote: > There is. There have been the following trove classifiers defined for > a few weeks now: Wonderful! Thanks for clueing me in. I'll update my projects to use those in future releases. -Fred -- Fred Drake From glyph at divmod.com Fri Dec 5 08:58:30 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 05 Dec 2008 07:58:30 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com> Message-ID: <20081205075830.12555.1834157056.divmod.xquotient.1370@weber.divmod.com> On 06:05 am, guido at python.org wrote: >On Thu, Dec 4, 2008 at 9:40 PM, wrote: >>The default case, the case of the user without the wherewithal >>to understand the nuances of the distinction between 2.x and 3.x, is a >>user >>who should use 2.x. > >Not at all clear. If they're not sensitive to those nuances it's just >as likely that they're a casual developer (e.g. a student just >learning to program). Such users are unlikely to start using major 3rd >party packages like Twisted or Django, which would be completely >overwhelming to someone just learning. As shown in >http://www.comp.leeds.ac.uk/nde/papers/teachpy3.html, Python 3.0 >removes quite a few warts that are likely to trip up learners. > >Once they are ready (probably under the wings of some guru) to dive >deeper, they may have to learn about 2.6 and how it differs -- that's >a useful exercise by itself, but if I'm right, most learners won't >have to go there because by the time they get to that point, the 3.0 >ecosystem has matured enough to support their needs. Well, ultimately the way you want to position this is your decision, but you haven't convinced me. My experience of casual developers suggests that they are _extremely_ sensitive to such nuances. Library support is a big one, but even bigger than that is the reporting of errors when mismatched versions don't work together. Are they going to understand that 3.0 and 2.6 are actually different languages, or are they just going to think that something's broken when they double-click on a .pyw file they got from some random python 2.x tutorial, with python 3 for windows installed? My interest is not hypothetical. I am trying to avoid hearing someone say this to me: "Oh yeah, Python, I tried that, but it didn't work. I use Visual Basic now and it's pretty good. It has good graphics." This type of confusion will persist for years. It will probably be worst at the point where both versions are enjoying equal popularity, but at least by then all the tutorials and tools will loudly say "python TWO" or "python THREE" on them. At least now, at the outset, it is pretty clear what direction the confusatron's going to tilt in. From steve at holdenweb.com Fri Dec 5 09:06:03 2008 From: steve at holdenweb.com (Steve Holden) Date: Fri, 05 Dec 2008 03:06:03 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4938D193.4080608@v.loewis.de> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> <4938D193.4080608@v.loewis.de> Message-ID: Martin v. L?wis wrote: >> Please, if you have a *new* idea that doesn't have a failure mode, by >> all means post it. But don't resurrect a pointless bikeshed. > > While I completely agree that it is pointless to reiterate the same > arguments over and over, I disagree that the bikeshed metapher applies. > This metapher (IIUC) describes a trivial design issue that is merely > a matter of taste, rather than having deep technical implications. > Using Unicode or bytes for strings is not of that kind. > +1 These issues are very important because they affect everyone. Even though very few people actually understand them. Including me, which is why I've been so quiet on this thread. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From g.brandl at gmx.net Fri Dec 5 09:21:04 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 05 Dec 2008 09:21:04 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <880dece00812031813t78ec560cy69dd3710fbd4c2a9@mail.gmail.com> <46FC4EDF-A0A6-4310-A854-4CB5F7A791EE@python.org> <85b5c3130812040142p9e5ba8cx616604d56add0c19@mail.gmail.com> <4937B80D.9070309@gmail.com> <4938660E.9080809@v.loewis.de> Message-ID: Barry Warsaw schrieb: > On Dec 4, 2008, at 6:21 PM, Martin v. L?wis wrote: > >>>> I can't find any docs built for Python 3.0 (not 3.1a0). >>> >>> The Windows installation has new 3.0 doc dated Dec 3, so it was >>> built, >>> just not posted correctly. > >> That doesn't mean very much. I built it on my local machine. Anybody >> with subversion and python could do that; the documentation is in >> subversion. > >> Whether or not it appears on the web site as part of the release >> process is an entirely different matter. It used to be that the >> doc maintainer (Fred Drake) was part of the release team and release >> process. I think Georg is complaining that he is release maintainer, >> but not part of the release process. > > I've asked Georg to update PEP 101 to make his role as Documentation > Expert explicit. Unfortunately we only debug major releases once (or > twice) every 18 months. But next time, we'll get that part right for > sure! Done that now. Since release.py builds the docs all right, there's not much left for me to do except check that everything is ok. > In the meantime, I'll make sure Georg is involved in point releases > moving forward. That's good. Thanks! Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From rhamph at gmail.com Fri Dec 5 09:23:13 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 5 Dec 2008 01:23:13 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4938D193.4080608@v.loewis.de> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> <4938D193.4080608@v.loewis.de> Message-ID: On Fri, Dec 5, 2008 at 12:00 AM, "Martin v. L?wis" wrote: >> Please, if you have a *new* idea that doesn't have a failure mode, by >> all means post it. But don't resurrect a pointless bikeshed. > > While I completely agree that it is pointless to reiterate the same > arguments over and over, I disagree that the bikeshed metapher applies. > This metapher (IIUC) describes a trivial design issue that is merely > a matter of taste, rather than having deep technical implications. > Using Unicode or bytes for strings is not of that kind. That we need to support both unicode and bytes is important, but already seems to have consensus. However, they present two distinct usage patterns: * unicode text, presentable to the user, interacts with all manor of standardized APIs * bytes, limited to local, internal use. Only approximated forms can be presented to the user, only custom formats can be saved externally None of the proposals have turned these into a single use case. All they do is trade off various forms of subtly switch back and forth, which leads to failure. Debating which subtle failure is better is a bikeshed. Not only that, but we already have a solution that makes the choice explicit, avoiding the subtle failure. This is the solution already in use for os file & path functions. It's the solution Guido supports. -- Adam Olsen, aka Rhamphoryncus From ncoghlan at gmail.com Fri Dec 5 10:01:08 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 05 Dec 2008 19:01:08 +1000 Subject: [Python-Dev] Taint Mode in Python 3.0 In-Reply-To: <693bc9ab0812041538u714e4e18y6f9aa9a656ba9460@mail.gmail.com> References: <200812041836.48146.nicole@cats-muvva.net> <693bc9ab0812041538u714e4e18y6f9aa9a656ba9460@mail.gmail.com> Message-ID: <4938EDD4.5000001@gmail.com> Maciej Fijalkowski wrote: > Hello, > > The thing is pypy's taint code is broken. Basically you don't only > need to patch all places that return pyobject, but also all places > that might modify anything. (All side effects) For example innocently > looking call to addition might end up calling arbitrary python code > (and have arbitrary side effects). There is a question how do you > approach such things? Taint isn't an easy problem, but PyPy is still a *much* better platform for that kind of experimentation than CPython. RPython, objects spaces, the code generation, etc all give you much more powerful tools to play with than the raw C code of the reference interpreter. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Fri Dec 5 10:21:32 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 05 Dec 2008 19:21:32 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> <20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com> Message-ID: <4938F29C.4050706@gmail.com> glyph at divmod.com wrote: > At least this time I think I've encapsulated pretty much my entire > argument here, so if you don't buy it, we can probably just agree to > disagree :). Glyph, the only point I would add to your message is this one: Adding a "blessed" way to encode arbitrary binary data into a Python 3.0 str object strikes me as giving up on one of the key advances in the new version of the language. 8-bit strings were a problem in Python 2.x because they blurred the boundary between arbitrary binary data and ASCII or latin-1 character data. One of the most interesting aspects of Python 3.0 is its attempt to get developers to be explicit about this distinction (both in the code and in their own minds) by enforcing separation between arbitrary binary data (held in bytes and bytearray instances) and character data (held in str instances). I don't understand how tunneling arbitrary binary data through str instances (*regardless* of encoding mechanism) can possibly fail to recreate exactly the same "is it text or binary data?" ambiguity problems that the str/bytes split is intended to eliminate. And if that happens, then what exactly was the point in moving to an all Unicode string model for Py3k? Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From victor.stinner at haypocalc.com Fri Dec 5 10:43:10 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 5 Dec 2008 10:43:10 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <49386A2C.60208@v.loewis.de> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> Message-ID: <200812051043.10938.victor.stinner@haypocalc.com> Le Friday 05 December 2008 00:39:24 Martin v. L?wis, vous avez ?crit?: > 5) represent all environment variables in Unicode strings, > including the ones that currently fail to decode. > (then do the same to file names, then drop the byte-oriented > file operations again) Please, don't do that! Bytes are not characters! -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From eckhardt at satorlaser.com Fri Dec 5 10:35:50 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Fri, 5 Dec 2008 10:35:50 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com> References: <4938374B.8000006@gmail.com> <20081205035236.12555.235022312.divmod.xquotient.954@weber.divmod.com> Message-ID: <200812051035.50493.eckhardt@satorlaser.com> On Friday 05 December 2008, glyph at divmod.com wrote: > Filenames and environment variables would all need to be encoded or > decoded according to this magic encoding. Those, and commandline arguments, too. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From eckhardt at satorlaser.com Fri Dec 5 10:41:05 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Fri, 5 Dec 2008 10:41:05 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <350E7D38B6D819428718949920EC2355564A7FB096@NA-EXMSG-C102.redmond.corp.microsoft.com> Message-ID: <200812051041.05992.eckhardt@satorlaser.com> On Friday 05 December 2008, Adam Olsen wrote: > Many of the windows APIs use UTF-16 without validating it. They'll > pass through invalid strings until they hit something that does > validate, at which point it'll blow up. > > I suspect that it doesn't happen very often in practice, as having > only one encoding makes it quite clear that it's a broken file name, > not a mixed encoding environment. Actually, I wouldn't say that's a problem at all. The point is that stuff that is blissfully unaware of encodings typically uses some ASCII-de(p)rived text. Those char-strings are translated according to the current locale, which then does the filtering and validation. The result may be gibberish (GIGO principle) but at least it's UTF-16 gibberish. ;) Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From victor.stinner at haypocalc.com Fri Dec 5 11:18:48 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 5 Dec 2008 11:18:48 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4938374B.8000006@gmail.com> References: <4938374B.8000006@gmail.com> Message-ID: <200812051118.48096.victor.stinner@haypocalc.com> Hi, Le Thursday 04 December 2008 21:02:19 Toshio Kuratomi, vous avez ?crit?: > I opened up bug http://bugs.python.org/issue4006 a while ago and it was > suggested in the report that it's not a bug but a feature and so I > should come here to see about getting the feature changed :-) Yeah, I prefer to discuss such changes on the mailing list. > These mixed encodings can occur for a variety of reasons. Here's an > example that isn't too contrived :-) > (...) > Furthermore, they don't want to suffer from the space loss of using > utf-8 to encode Japanese so they use shift-jis everywhere. "space loss"? Really? If you configure your server correctly, you should get UTF-8 even if the file system is Shift-JIS. But it would be much easier to use UTF-8 everywhere. Hum... I don't think that the discussion is about one specific server, but the lack of bytes environment variables in Python3 :-) > 1) return mixed unicode and byte types in ... NO! > 2) return only byte types in os.environ Hum... Most users have UTF-8 everywhere (eg. all Windows users ;-)), and Python3 already use Unicode everywhere (input(), open(), filenames, ...). > 3) silently ignore non-decodable value when accessing os.environ['PATH'] > as we do now but allow access to the full information via > os.environ[b'PATH'] and os.getenvb() I don't like os.environ[b'PATH']. I prefer to always get the same result type... But os.listdir() doesn't respect that :-( os.listdir(str) -> list of str os.listdir(bytes) -> list of bytes I would prefer a similar API for easier migration from Python2/Python3 (unicode). os.environb sounds like the best choice for me. But they are open questions (already asked in the bug tracker): (a) Should os.environ be updated if os.environb is changed? If yes, how? os.environb['PATH'] = '\xff' (or any invalid string in the system default encoding) => os.environ['PATH'] = ??? (b) Should os.environb be updated if os.environ is changed? If yes, how? The problem comes with non-Unicode locale (eg. latin-1 or ASCII): most charset are unable to encode the whole Unicode charset (eg. codes >= 65535). os.environ['PATH'] = chr(0x10000) => os.environb['PATH'] = ??? (c) Same question when a key is deleted (del os.environ['PATH']). If Python 3.1 will have os.environ and os.environb, I'm quite sure that some modules will user os.environ and other will prefer os.environb. If both environments are differents, the two modules set will work differently :-/ It would be maybe easier if os.environ supports bytes and unicode keys. But we have to keep these assertions: os.environ[bytes] -> bytes os.environ[str] -> str > 4) raise an exception when non-decodable values are *accessed* and > continue as in #3. I like os.listdir() behaviour: just *ignore* non-decodable files. If you really want to access these files, use a bytes directory name ;-) > I think that the ease of debugging is lost when we silently ignore an error. Guido gave a good example. If your directory contains an non decodable filename (eg. "???.txt"): glob('*.py') will fail because of the evil filename. With the current behaviour, you're unable to list all files but glob('*.py') will list all Python scripts! And Python3 is released, it's maybe a bad idea to change the behaviour (of os.environ) in Python 3.1 :-/ > The bug report I opened suggests creating a PEP to address this issue. Please, try to answer to my questions about os.environ and os.environb consistency. I also like bytes environment variables. I need them for my fuzzing program. The lack of bytes variables is a regression from Python2 (for my program). On UNIX, filenames are bytes and the environment variables are bytes. For the best interoperability, Python3 should support bytes. But the default choice should always be characters (unicode) and to never mix the bytes and str types ;-) --- As usual, it goes faster if someone writes a patch :-) I could try to work on it. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From eckhardt at satorlaser.com Fri Dec 5 11:27:35 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Fri, 5 Dec 2008 11:27:35 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> Message-ID: <200812051127.35880.eckhardt@satorlaser.com> On Friday 05 December 2008, Guido van Rossum wrote: > At the risk of bringing up something that was already rejected, let me > propose something that follows the path taken in 3.0 for filenames, > rather than doubling back: > > For os.environ, os.getenv() and os.putenv(), I think a similar > approach as used for os.listdir() and os.getcwd() makes sense: let > os.environ skip variables whose name or value is undecodable, and have > a separate os.environb() which contains bytes; let os.getenv() and > os.putenv() do the right thing when the arguments passed in are bytes. > > For sys.argv, because it's positional, you can't skip undecodable > values, so I propose to use error=replace for the decoding; again, we > can add sys.argvb that contains the raw bytes values. The various > os.exec*() and os.spawn*() calls (as well as os.system(), os.popen() > and the subprocess module) should all accept bytes as well as strings. > > On Windows, the bytes APIs should probably not exist. > > I predict that most developers can get away with not using the bytes > APIs at all. The small minority that needs to be robust if not all > filenames use the system encoding can use the bytes APIs. I know some of those developers, you can contact them via python-dev at python.org. Seriously, what would you suggest to someone that wants to handle paths in a portable way? Using the Unicode variants of functions is fubar, because encoding/decoding is not universally possible. Using the byte variant is equally fubar, because e.g. on MS Windows it is not supported, except through a very lossy roundtrip through the locale's codepage, limiting your functionality. I actually think it is about time to give up on trying to think about a path as a string. Dito for data received from os.environ or sys.argv. There are only very few things that are universal to them and a reliable encoding is none of them. Then, once you have let that idea go, meditate a bit over the Zen. What I propose is that paths must be treated as OS-specific, with the only common reliable operations being joining them, concatenating them and splitting them into segments divided by the (again, OS-specific) separator. Other operations, like e.g. appending a string or converting it to a string in order to display it can fail. And if they fail, they should fail noisily. In 99% of all cases, using the default encoding will work and do what people expect, which is why I would make this conversion automatic. In all other cases, it will at least not fail silently (which would lead to garbage and data loss) and allow more sophisticated applications to handle it. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From Fabien.Bouleau at ses-engineering.com Fri Dec 5 11:42:08 2008 From: Fabien.Bouleau at ses-engineering.com (Fabien.Bouleau at ses-engineering.com) Date: Fri, 5 Dec 2008 11:42:08 +0100 Subject: [Python-Dev] Fix for frame_setlineno() in frameobject.c function Message-ID: Hello, This concerns a known bug in the frame_setlineno() function for Python 2.5.x and 2.6.x (maybe in earlier version too). It is not possible to use this function when the address or line offset are greater than 127. The problem comes from the lnotab variable which is typed char*, therefore implicitely signed char*. Any value above 127 becomes a negative number. The fix is very simple (applied on the Python 2.6.1 version of the source code): --- frameobject.c Thu Oct 02 19:39:50 2008 +++ frameobject_fixed.c Fri Dec 05 11:27:42 2008 @@ -119,8 +119,8 @@ line = f->f_code->co_firstlineno; new_lasti = -1; for (offset = 0; offset < lnotab_len; offset += 2) { - addr += lnotab[offset]; - line += lnotab[offset+1]; + addr += ((unsigned char*)lnotab)[offset]; + line += ((unsigned char*)lnotab)[offset+1]; if (line >= new_lineno) { new_lasti = addr; new_lineno = line; It would be nice to fix it for Python 2.5 and above, in order to have a proper MSI installer for Windows. Best regards, Fabien Bouleau DISCLAIMER: This e-mail contains proprietary information some or all of which may be legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this e-mail, please notify the author by replying to this e-mail. If you are not the intended recipient you must not use, disclose, distribute, copy, print, or rely on this e-mail. From LambertDW at Corning.com Fri Dec 5 10:40:16 2008 From: LambertDW at Corning.com (Lambert, David W (S&T)) Date: Fri, 05 Dec 2008 04:40:16 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final FFT Message-ID: <84B204FFB016BA4984227335D8257FBA5A3860@CVCV0XI05.na.corning.com> http://code.activestate.com/recipes/576550/ This recipe shows how to use gsl FFT with python 3. ctypes is really good! From exarkun at divmod.com Fri Dec 5 13:30:59 2008 From: exarkun at divmod.com (Jean-Paul Calderone) Date: Fri, 5 Dec 2008 07:30:59 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: Message-ID: <20081205123059.20272.808184471.divmod.quotient.16127@ohm> On Thu, 4 Dec 2008 22:05:05 -0800, Guido van Rossum wrote: >On Thu, Dec 4, 2008 at 9:40 PM, wrote: >> The default case, the case of the user without the wherewithal >> to understand the nuances of the distinction between 2.x and 3.x, is a user >> who should use 2.x. > >Not at all clear. If they're not sensitive to those nuances it's just >as likely that they're a casual developer (e.g. a student just >learning to program). Such users are unlikely to start using major 3rd >party packages like Twisted or Django, which would be completely >overwhelming to someone just learning. That seems like it would be right to me, but two or three times a month someone shows up in the Twisted IRC channel who is learning both Python and Twisted at the same time. So apparently there are a lot of people for whom this isn't overwhelming. Jean-Paul From eduardo.padoan at gmail.com Fri Dec 5 13:38:36 2008 From: eduardo.padoan at gmail.com (Eduardo O. Padoan) Date: Fri, 5 Dec 2008 10:38:36 -0200 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205023514.GA1723@amk.local> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> Message-ID: On Fri, Dec 5, 2008 at 12:35 AM, A.M. Kuchling wrote: > On Thu, Dec 04, 2008 at 05:29:31PM -0800, Raymond Hettinger wrote: >> Here's a bright idea. On the 3.0 release page, include a box listing >> which major third-party apps have been converted. Update it >> once every couple of weeks. That way, we're not explicitly > > That's an excellent idea. We could have a webpage, or start a > topic-specific weblog for posting announcements. > > I've started a draft of a 3.0 FAQ in the wiki at > . Once it's finished we > can move it into the 3.0 release pages. Everyone please edit and > improve it! Sometime ago I started a page on the wiki to collect reports of early migrations by the community: http://wiki.python.org/moin/Early2to3Migrations Maybe this would be relevant to point on the FAQ. > --amk > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/eduardo.padoan%40gmail.com > -- Eduardo de Oliveira Padoan http://djangopeople.net/edcrypt/ "Distrust those in whom the desire to punish is strong." -- Goethe, Nietzsche, Dostoevsky From musiccomposition at gmail.com Fri Dec 5 13:51:33 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 5 Dec 2008 06:51:33 -0600 Subject: [Python-Dev] Fix for frame_setlineno() in frameobject.c function In-Reply-To: References: Message-ID: <1afaf6160812050451l286b5f6bw9332bc3ade886926@mail.gmail.com> Hi, Please post this on the issue tracker. http://bugs.python.org On Fri, Dec 5, 2008 at 4:42 AM, wrote: > Hello, > > This concerns a known bug in the frame_setlineno() function for Python > 2.5.x and 2.6.x (maybe in earlier version too). It is not possible to use > this function when the address or line offset are greater than 127. The > problem comes from the lnotab variable which is typed char*, therefore > implicitely signed char*. Any value above 127 becomes a negative number. > > The fix is very simple (applied on the Python 2.6.1 version of the source > code): > > --- frameobject.c Thu Oct 02 19:39:50 2008 > +++ frameobject_fixed.c Fri Dec 05 11:27:42 2008 > @@ -119,8 +119,8 @@ > line = f->f_code->co_firstlineno; > new_lasti = -1; > for (offset = 0; offset < lnotab_len; offset += 2) { > - addr += lnotab[offset]; > - line += lnotab[offset+1]; > + addr += ((unsigned char*)lnotab)[offset]; > + line += ((unsigned char*)lnotab)[offset+1]; > if (line >= new_lineno) { > new_lasti = addr; > new_lineno = line; > -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From foom at fuhm.net Fri Dec 5 15:27:37 2008 From: foom at fuhm.net (James Y Knight) Date: Fri, 5 Dec 2008 09:27:37 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812051127.35880.eckhardt@satorlaser.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> Message-ID: <0F0D1942-A841-4098-ACE4-479B21D08524@fuhm.net> On Dec 5, 2008, at 5:27 AM, Ulrich Eckhardt wrote: > Using the byte variant is equally fubar, because e.g. on MS Windows > it is not > supported, except through a very lossy roundtrip through the locale's > codepage, limiting your functionality. Yeah, IMO whole mess could have been avoided by keeping the filename/ args/environ simply *bytes*, like it really is, on unix. Then, make the Windows version of python use (always! not dependent upon locale!) utf-8 to decode the utf-8 bytestring to the UTF-16 that the Windows platform APIs expect (and vice versa). And never use the ASCII variant of the windows APIs. This would mean that all *inputs* would succeed, but some *outputs* would not, on Windows. But that's not a new kind of failure: NUL has never been allowed in argv/environ, and filenames have all sorts of platform-dependent restrictions. But unfortunately, it's too late for that solution... James From a.badger at gmail.com Fri Dec 5 16:06:06 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 05 Dec 2008 07:06:06 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49385EED.9040004@gmail.com> Message-ID: <4939435E.3020103@gmail.com> Terry Reedy wrote: > Toshio Kuratomi wrote: >> >>> I would think life would be ultimately easier if either the file server >>> or the shell server automatically translated file names from jis and >>> utf8 and back, so that the PATH on the *nix shell server is entirely >>> utf8. >> >> This is not possible because no part of the computer knows what the >> encoding is. To the computer, it's just a sequence of bytes. Unlike >> xml or the windows filesystem (winfs? ntfs?) where the encoding is >> specified as part of the document/filesystem there's nothing to tell >> what encoding the filenames are in. > > I thought you said that the file server keep all filenames in shift-jis, > and the shell server all in utf-8. Yes. But this is part of the setup of the example to keep things simple. The fileserver or shell server could themselves be of mixed encodings (for instance, if it was serving home directories to users all over the world each user might be using a different encoding.) > If so, then the shell server could > know if it were told so. > Where are you going to store that information? In order for python to run without errors, will it have to be configured on each system it's installed on to know the encoding of each filename? Or are we going to try to talk each *NIX vendor into creating new filesystems that record that information and after a five year span of time declare that python will not run on other filesystems in corner cases? I think that this way does not hold a reasonable expectation of keeping python a portable language. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From victor.stinner at haypocalc.com Fri Dec 5 16:09:19 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 5 Dec 2008 16:09:19 +0100 Subject: [Python-Dev] Python security: draft article on the wiki Message-ID: <200812051609.19822.victor.stinner@haypocalc.com> Hi, I started to write a short article about Python security on the wiki: http://wiki.python.org/moin/Security Nothing useful yet. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From skip at pobox.com Fri Dec 5 16:25:01 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 5 Dec 2008 09:25:01 -0600 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <4938D7F9.80908@v.loewis.de> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <4938D7F9.80908@v.loewis.de> Message-ID: <18745.18381.364105.121084@montanaro-dyndns-org.local> Martin> There is. There have been the following trove classifiers Martin> defined for a few weeks now: Martin> Programming Language :: Python :: 2 Martin> Programming Language :: Python :: 2.3 Martin> Programming Language :: Python :: 2.4 Martin> Programming Language :: Python :: 2.5 Martin> Programming Language :: Python :: 2.6 Martin> Programming Language :: Python :: 2.7 Martin> Programming Language :: Python :: 3 Martin> Programming Language :: Python :: 3.0 Martin> Programming Language :: Python :: 3.1 Good. Now we just need to populate them. I take it the classifiers without minor numbers imply any known minor version (e.g., 2 ==> 2.3 and greater)? Skip From amk at amk.ca Fri Dec 5 17:40:53 2008 From: amk at amk.ca (A.M. Kuchling) Date: Fri, 5 Dec 2008 11:40:53 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com> Message-ID: <20081205164053.GA10632@amk-desktop.matrixgroup.net> On Fri, Dec 05, 2008 at 05:40:46AM -0000, glyph at divmod.com wrote: > For most users, especially new users who have yet to be impressed with > Python's power, 2.x is much better. It's not like "library support" is > one small check-box on the language's feature sheet: most of the > attractive things about Python are libraries. Of course I am not free Here I agree, sort of. Newbies may not understand what they're giving up in terms of libraries. (The 'sort of' is because, having learned 3.0, learning the changes for 2.6 is certainly much easier than learning a first programming language is.) > The third (albeit much less likely) option is that you're learning > Python to learn to interact with a system that's scriptable in embedded > Python, like Blender or Gimp. I don't think there's a single system of > that variety which uses 3.0 yet, and these will likely be even slower to > move than libraries. Let me note that if some application embeds Python for a specialized purpose, where the only modules imported are either user-written or part of the application, it seems much *easier* to move to Python 3 because the scripts don't use arbitrary third-party libraries. Python embedded in an e-mail MTA might use libraries for DNS or file I/O or databases and has to be cautious about versions; Python in Gimp probably doesn't, in practice. --amk From janssen at parc.com Fri Dec 5 17:39:57 2008 From: janssen at parc.com (Bill Janssen) Date: Fri, 5 Dec 2008 08:39:57 PST Subject: [Python-Dev] Python + Java Integration In-Reply-To: References: Message-ID: <8291.1228495197@parc.com> > One thing that would help Python in this "debate" (or, perhaps simply > put it in the running, at least as a "next Java" candidate) would be > if Python had an easier migration path for Java developers that > currently rely upon various third-party libraries. The wealth of > third-party libraries available for Java has always been one of its > great strengths. Ergo, if Python had an easy-to-use, recommended way > to use those libraries within the Python environment, that would be a > significant advantage to present to Java developers and those who > would choose Ruby over Java. Platform compatibility is always a huge > motivator for those looking to migrate or upgrade. Personally, I'm using Andi Vajda's JCC for this purpose. Recommended. The nice thing about it is that it turns jar files into Python modules; you don't need the source. http://pypi.python.org/pypi/JCC Bill From status at bugs.python.org Fri Dec 5 18:06:58 2008 From: status at bugs.python.org (Python tracker) Date: Fri, 5 Dec 2008 18:06:58 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20081205170658.63FCD780B1@psf.upfronthosting.co.za> ACTIVITY SUMMARY (11/28/08 - 12/05/08) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2233 open (+55) / 14139 closed (+41) / 16372 total (+96) Open issues with patches: 753 Average duration of open issues: 705 days. Median duration of open issues: 2193 days. Open Issues Breakdown open 2214 (+54) pending 19 ( +1) Issues Created Or Reopened (96) _______________________________ Coding cookie crashes IDLE 11/28/08 CLOSED http://bugs.python.org/issue4454 created tjreedy No Windows List in IDLE if several windows have the same title 11/28/08 CLOSED http://bugs.python.org/issue4455 created amaury.forgeotdarc patch xmlrpc is broken 11/28/08 CLOSED http://bugs.python.org/issue4456 created benjamin.peterson __import__ documentation obsolete 11/29/08 http://bugs.python.org/issue4457 created stevenjd getopt.gnu_getopt() loses dash argument 11/29/08 CLOSED http://bugs.python.org/issue4458 created muntyan bdist_rpm assumes python 11/29/08 http://bugs.python.org/issue4459 created John5342 The parameter of PyInt_AsSsize_t() is not checked to see if it i 11/29/08 CLOSED http://bugs.python.org/issue4460 created CWRU_Researcher1 parameters of PyLong_FromString() are not checked for NULL 11/29/08 http://bugs.python.org/issue4461 created CWRU_Researcher1 patch result of PyList_GetItem() not validated 11/29/08 CLOSED http://bugs.python.org/issue4462 created CWRU_Researcher1 Parameters and result of PyList_GetItem() are not validated 11/29/08 CLOSED http://bugs.python.org/issue4463 created CWRU_Researcher1 PyList_GetItem() result and parameters not fully validated 11/29/08 CLOSED http://bugs.python.org/issue4464 created CWRU_Researcher1 The result of set_copy() is not checked for NULL 11/29/08 CLOSED http://bugs.python.org/issue4465 created CWRU_Researcher1 The return value of PyFile_FromFile is not checked for NULL 11/29/08 CLOSED http://bugs.python.org/issue4466 created CWRU_Researcher1 return value of PyUnicode_AsEncodedString() is not checked for N 11/29/08 CLOSED http://bugs.python.org/issue4467 created CWRU_Researcher1 Restore chapter enumeration in Python docs 11/30/08 CLOSED http://bugs.python.org/issue4468 created schluehk CVE-2008-5031 multiple integer overflows 11/30/08 http://bugs.python.org/issue4469 created doko smtplib SMTP_SSL not working. 11/30/08 http://bugs.python.org/issue4470 created lcatucci patch IMAP4 missing support for starttls 11/30/08 http://bugs.python.org/issue4471 created lcatucci patch Is shared lib building broken on trunk? 11/30/08 http://bugs.python.org/issue4472 created skip.montanaro POP3 missing support for starttls 11/30/08 http://bugs.python.org/issue4473 created lcatucci patch PyUnicode_FromWideChar incorrect for characters outside the BMP 11/30/08 http://bugs.python.org/issue4474 created marketdickinson More verbose error message for Py_FindMethod 11/30/08 http://bugs.python.org/issue4475 created gpolo patch compileall.py fails if current dir has a "types" subdir with 3. 12/01/08 http://bugs.python.org/issue4476 created aivazis Speed up PyEval_EvalFrameEx when tracing is off. 12/01/08 http://bugs.python.org/issue4477 created jyasskin patch, needs review shutil.copyfile documentation 12/01/08 CLOSED http://bugs.python.org/issue4478 created steve21 True division is not smart -> proposing smart True division 12/01/08 CLOSED http://bugs.python.org/issue4479 created nassrat bdist_msi and bdist_wininst are missing an uninstaller icon 12/01/08 http://bugs.python.org/issue4480 created lemburg Windows installer crash 12/01/08 http://bugs.python.org/issue4481 created Konam 10e667.__format__('+') should return 'inf' 12/01/08 http://bugs.python.org/issue4482 created DinoV Error to build _dbm module during make 12/01/08 http://bugs.python.org/issue4483 created legerf patch, easy struct: per item endianess specification 12/02/08 http://bugs.python.org/issue4484 created da4an1qu1 fast swap of "default" Windows python versions 12/02/08 http://bugs.python.org/issue4485 created v+python Exception traceback is incorrect for strange exception handling 12/02/08 http://bugs.python.org/issue4486 created ncoghlan Add utf8 alias for email charsets 12/02/08 http://bugs.python.org/issue4487 created maxua patch Python Documentation not Newb Friendly 12/02/08 http://bugs.python.org/issue4488 created mez shutil.rmtree is vulnerable to a symlink attack 12/02/08 http://bugs.python.org/issue4489 created mrts xml/sax/expatreader.py raises AttributeError when run 12/02/08 http://bugs.python.org/issue4490 created exarkun email.Header.decode_header() doesn't work if encoded-word was se 12/02/08 http://bugs.python.org/issue4491 created ishimoto patch httplib code thinks it closes connection, but does not 12/02/08 http://bugs.python.org/issue4492 created jjlee urllib2 doesn't always supply / where URI path component is empt 12/02/08 http://bugs.python.org/issue4493 created jjlee Python 2.6 fails to build with Py_NO_ENABLE_SHARED 12/02/08 http://bugs.python.org/issue4494 created snaury patch Fix signed/unsigned warning 12/02/08 CLOSED http://bugs.python.org/issue4495 created rhettinger misleading comment in urllib2 12/02/08 http://bugs.python.org/issue4496 created jjlee Compiler warnings in longobject.c 12/02/08 http://bugs.python.org/issue4497 created rhettinger patch Compiler warning "signed/unsigned comparion in mmapmodule" 12/02/08 http://bugs.python.org/issue4498 created rhettinger redefinition of TILDE macro on AIX platform 12/02/08 http://bugs.python.org/issue4499 created apaprocki Compiler warnings when compiling Python 3.0 with a C89 compiler 12/03/08 http://bugs.python.org/issue4500 created christian.heimes asyncore's urgent data management and connection closed events 12/03/08 http://bugs.python.org/issue4501 created giampaolo.rodola patch Allowing get_pre_input_hook from Readline 12/03/08 http://bugs.python.org/issue4502 created Conrad.Irwin patch exception traceback sometimes slow 12/03/08 http://bugs.python.org/issue4503 created ocean-city Doc/includes out of date 12/03/08 CLOSED http://bugs.python.org/issue4504 created exe ob_size not removed from docs 12/03/08 CLOSED http://bugs.python.org/issue4505 created exe 3.0 make test failures on Solaris 10 12/03/08 http://bugs.python.org/issue4506 created skip.montanaro 64bit 3.0 test failure on Mac OS X 10.5.5 12/03/08 http://bugs.python.org/issue4507 created skip.montanaro distutils compiler not handling spaces in path to output/src fil 12/03/08 http://bugs.python.org/issue4508 created Thorney patch possible memoryview bug 12/03/08 http://bugs.python.org/issue4509 created gumpy ValueError for list.remove() not very helpful 12/03/08 http://bugs.python.org/issue4510 created brett.cannon easy Decorators should have an index entry 12/04/08 CLOSED http://bugs.python.org/issue4511 created dvusboy Add get_filename method to zipimport 12/04/08 http://bugs.python.org/issue4512 created belopolsky patch Finish updating zip docstring 12/04/08 CLOSED http://bugs.python.org/issue4513 created tjreedy test_binascii is failing 12/04/08 CLOSED http://bugs.python.org/issue4514 created rhettinger Formatting error in "What's New in Python 3.0" 12/04/08 CLOSED http://bugs.python.org/issue4515 created pwang Another formatting error in "What's New in Python 3.0" 12/04/08 CLOSED http://bugs.python.org/issue4516 created pwang improve __getattribute__ documentation 12/04/08 CLOSED http://bugs.python.org/issue4517 created LambertDW broken link to python 3 doc on main doc page 12/04/08 CLOSED http://bugs.python.org/issue4518 created cleary .pyc files included in 2.6 and 3.0 release tarballs 12/04/08 CLOSED http://bugs.python.org/issue4519 created doko Online 3.0 documentation says it's for 3.1a0 12/04/08 CLOSED http://bugs.python.org/issue4520 created paulmelis "What's New in Python 3.0" mentions "getcwdu" instead of "getcwd 12/04/08 CLOSED http://bugs.python.org/issue4521 created hagen patch Module wsgiref is not python3000 ready (unicode issues) 12/04/08 http://bugs.python.org/issue4522 created tordmor patch logging module __init__ uses has_key 12/04/08 http://bugs.python.org/issue4523 created bitdancer patch Build fails at running build_scripts 12/04/08 http://bugs.python.org/issue4524 created chaz6 patch, needs review metaclass fixer fails with AttributeError, causing 2to3 to exit 12/04/08 CLOSED http://bugs.python.org/issue4525 created exarkun Clarify documentation for binary literals 12/04/08 CLOSED http://bugs.python.org/issue4526 created nneonneo Obsolete 'string or unicode' in fractions doc 12/04/08 CLOSED http://bugs.python.org/issue4527 created tjreedy easy test_httpservers consistently fails on OS X 12/04/08 http://bugs.python.org/issue4528 created mwdiers parser module failure on valid try/except/finally blocks 12/04/08 CLOSED http://bugs.python.org/issue4529 created kaiw IDLE crashes with Japanese text on print command 12/04/08 CLOSED http://bugs.python.org/issue4530 created Vultaire Deprecation warnings in lib\compiler\ast.py 12/04/08 CLOSED http://bugs.python.org/issue4531 created edreamleo Fails to build on QNX 6.3.2 12/04/08 http://bugs.python.org/issue4532 created kraai 3.0 file.read dreadfully slow 12/04/08 http://bugs.python.org/issue4533 created tjreedy patch problem with str.join - should work with list input, error says 12/04/08 CLOSED http://bugs.python.org/issue4534 created lopgok Build / Test Py3K failed on Ubuntu 8.10 12/04/08 http://bugs.python.org/issue4535 created lbhudda SystemError if invalid arguments passed to range() and step=-1 12/04/08 http://bugs.python.org/issue4536 created laszlo patch, needs review webbrowser.UnixBrowser should use builtins.open 12/05/08 http://bugs.python.org/issue4537 reopened amaury.forgeotdarc ctypes could include data type limits 12/04/08 http://bugs.python.org/issue4538 created roysmith askdirectory() in tkinter.filedialog is broken 12/04/08 http://bugs.python.org/issue4539 created dogtato typo in a module describes utf-8 as uft-8 12/04/08 http://bugs.python.org/issue4540 created john.weldon patch, needs review Add str method for removing leading or trailing substrings 12/05/08 CLOSED http://bugs.python.org/issue4541 created zhirsch patch test_binascii fails on windows 12/05/08 CLOSED http://bugs.python.org/issue4542 created amaury.forgeotdarc patch, easy container constructors destroy argument 12/05/08 CLOSED http://bugs.python.org/issue4543 created kjwcode textwrap: __all__ atribute missing 'dedent' function 12/05/08 CLOSED http://bugs.python.org/issue4544 created wolfdown doctest seems to always fail on numpy.array2string 12/05/08 CLOSED http://bugs.python.org/issue4545 created ekorn Small thingy in "What's New in Python 3.0" 12/05/08 CLOSED http://bugs.python.org/issue4546 created paulmelis Long jumps with frame_setlineno 12/05/08 http://bugs.python.org/issue4547 created fboule patch, needs review OptionParser : Weird comportement in args processing 12/05/08 CLOSED http://bugs.python.org/issue4548 created ohervieu A defect in - proposing smart True division 0 days http://bugs.python.org/issue4479 nassrat Fix signed/unsigned warning 0 days http://bugs.python.org/issue4495 rhettinger Doc/includes out of date 2 days http://bugs.python.org/issue4504 georg.brandl ob_size not removed from docs 2 days http://bugs.python.org/issue4505 georg.brandl Decorators should have an index entry 2 days http://bugs.python.org/issue4511 dvusboy Finish updating zip docstring 1 days http://bugs.python.org/issue4513 georg.brandl test_binascii is failing 1 days http://bugs.python.org/issue4514 amaury.forgeotdarc Formatting error in "What's New in Python 3.0" 1 days http://bugs.python.org/issue4515 georg.brandl Another formatting error in "What's New in Python 3.0" 1 days http://bugs.python.org/issue4516 georg.brandl improve __getattribute__ documentation 1 days http://bugs.python.org/issue4517 georg.brandl broken link to python 3 doc on main doc page 1 days http://bugs.python.org/issue4518 georg.brandl .pyc files included in 2.6 and 3.0 release tarballs 1 days http://bugs.python.org/issue4519 barry Online 3.0 documentation says it's for 3.1a0 0 days http://bugs.python.org/issue4520 georg.brandl "What's New in Python 3.0" mentions "getcwdu" instead of "getcwd 0 days http://bugs.python.org/issue4521 georg.brandl patch metaclass fixer fails with AttributeError, causing 2to3 to exit 0 days http://bugs.python.org/issue4525 benjamin.peterson Clarify documentation for binary literals 0 days http://bugs.python.org/issue4526 georg.brandl Obsolete 'string or unicode' in fractions doc 0 days http://bugs.python.org/issue4527 georg.brandl easy parser module failure on valid try/except/finally blocks 1 days http://bugs.python.org/issue4529 georg.brandl IDLE crashes with Japanese text on print command 0 days http://bugs.python.org/issue4530 amaury.forgeotdarc Deprecation warnings in lib\compiler\ast.py 0 days http://bugs.python.org/issue4531 edreamleo problem with str.join - should work with list input, error says 0 days http://bugs.python.org/issue4534 amaury.forgeotdarc Add str method for removing leading or trailing substrings 0 days http://bugs.python.org/issue4541 rhettinger patch test_binascii fails on windows 0 days http://bugs.python.org/issue4542 amaury.forgeotdarc patch, easy container constructors destroy argument 0 days http://bugs.python.org/issue4543 rhettinger textwrap: __all__ atribute missing 'dedent' function 0 days http://bugs.python.org/issue4544 georg.brandl doctest seems to always fail on numpy.array2string 0 days http://bugs.python.org/issue4545 amaury.forgeotdarc Small thingy in "What's New in Python 3.0" 0 days http://bugs.python.org/issue4546 georg.brandl OptionParser : Weird comportement in args processing 0 days http://bugs.python.org/issue4548 georg.brandl registry functions don't handle null characters 2144 days http://bugs.python.org/issue672132 amaury.forgeotdarc SIGSEGV in _sre.c (IRIX 6.5.20) 1948 days http://bugs.python.org/issue783789 amaury.forgeotdarc pickletools support for multiple pickles in a string 1792 days http://bugs.python.org/issue873150 fdrake catch invalid chunk length in httplib read routine 1748 days http://bugs.python.org/issue900744 jjlee patch cgi.py does not correctly handle fields with ';' 1499 days http://bugs.python.org/issue1055234 fdrake patch attempting to use urllib2 on some URLs fails starting on 2.4 1385 days http://bugs.python.org/issue1123695 amaury.forgeotdarc httplib patch to make _read_chunked() more robust 1047 days http://bugs.python.org/issue1411097 jjlee patch pdb find_function does not find Class methods. 681 days http://bugs.python.org/issue1643369 georg.brandl Draft implementation for PEP 364 640 days http://bugs.python.org/issue1675334 barry patch 'exec' does not accept what 'open' returns 495 days http://bugs.python.org/issue1762972 georg.brandl urllib2 hangs with some documents. 479 days http://bugs.python.org/issue1772481 amaury.forgeotdarc glob doesn't return unicode with unicode parameter 473 days http://bugs.python.org/issue1777458 georg.brandl Top Issues Most Discussed (10) ______________________________ 17 Error to build _dbm module during make 4 days open http://bugs.python.org/issue4483 15 AssertionError in Doc/includes/mp_benchmarks.py 7 days open http://bugs.python.org/issue4449 10 3.0 make test failures on Solaris 10 2 days open http://bugs.python.org/issue4506 10 BaseHTTPRequestHandler depends on GC to close connections 87 days open http://bugs.python.org/issue3826 9 Update What's new in 3.0 262 days closed http://bugs.python.org/issue2306 8 POP3 missing support for starttls 5 days open http://bugs.python.org/issue4473 7 typo in a module describes utf-8 as uft-8 1 days open http://bugs.python.org/issue4540 7 3.0 file.read dreadfully slow 1 days open http://bugs.python.org/issue4533 7 bdist_msi and bdist_wininst are missing an uninstaller icon 4 days open http://bugs.python.org/issue4480 7 Speed up PyEval_EvalFrameEx when tracing is off. 5 days open http://bugs.python.org/issue4477 From a.badger at gmail.com Fri Dec 5 18:37:14 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 05 Dec 2008 09:37:14 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812051118.48096.victor.stinner@haypocalc.com> References: <4938374B.8000006@gmail.com> <200812051118.48096.victor.stinner@haypocalc.com> Message-ID: <493966CA.2010801@gmail.com> Victor Stinner wrote: > Hi, > > Le Thursday 04 December 2008 21:02:19 Toshio Kuratomi, vous avez ?crit : > >> These mixed encodings can occur for a variety of reasons. Here's an >> example that isn't too contrived :-) >> (...) >> Furthermore, they don't want to suffer from the space loss of using >> utf-8 to encode Japanese so they use shift-jis everywhere. > > "space loss"? Really? If you configure your server correctly, you should get > UTF-8 even if the file system is Shift-JIS. But it would be much easier to > use UTF-8 everywhere. > > Hum... I don't think that the discussion is about one specific server, but the > lack of bytes environment variables in Python3 :-) > Yep. I can't change the logicalness of the policies of a different organization, only code my application to deal with it :-) >> 1) return mixed unicode and byte types in ... > > NO! > It's nice that we agree... but I would prefer if you leave enough context so that others can see that we agree as well :-) >> 2) return only byte types in os.environ > > Hum... Most users have UTF-8 everywhere (eg. all Windows users ;-)), and > Python3 already use Unicode everywhere (input(), open(), filenames, ...). > We're also in agreement here. >> 3) silently ignore non-decodable value when accessing os.environ['PATH'] >> as we do now but allow access to the full information via >> os.environ[b'PATH'] and os.getenvb() > > I don't like os.environ[b'PATH']. I prefer to always get the same result > type... But os.listdir() doesn't respect that :-( > > os.listdir(str) -> list of str > os.listdir(bytes) -> list of bytes > > I would prefer a similar API for easier migration from Python2/Python3 > (unicode). os.environb sounds like the best choice for me. > . After thinking about how it would be used in subprocess calls I agree. os.environb would allow us to retrieve the full dict as bytes. os.environ[b''] only works on individual keys. Also os.getenv serves the same purpose as os.environ[b''] would whereas os.environb would have its own uses. > > But they are open questions (already asked in the bug tracker): > I answered these in the bug tracker. Here are the answers for the mailing list: > (a) Should os.environ be updated if os.environb is changed? If yes, how? > os.environb['PATH'] = '\xff' (or any invalid string in the system > default encoding) > => os.environ['PATH'] = ??? > The underlying environment that both variables reflect should be updated but what is displayed by os.environ should continue to follow the same rules. So if we follow option #3:: os.environb['PATH'] = b'\xff' os.environ['PATH'] => raises KeyError because PATH is not a key in the unicode decoded environment. (option #4 would issue a UnicodeDecodeError instead of a KeyError) Similarly, if you start with a variable in os.environb that can only be represented as bytes and your program transforms it into something that is decodable it should then show up in os.environ. > (b) Should os.environb be updated if os.environ is changed? If yes, how? > > The problem comes with non-Unicode locale (eg. latin-1 or ASCII): most charset > are unable to encode the whole Unicode charset (eg. codes >= 65535). > > os.environ['PATH'] = chr(0x10000) > => os.environb['PATH'] = ??? > Ah, this is a good question. I misunderstood what you were getting at when you posted this to the bug report. I see several options but the one that seems the most sane is to raise UnicodeEncodeError when setting the value. With that, proper code to set an environment variable might look like this:: LANG=C python3.0 >>> variable = chr(0x10000) >>> try: >>> # Unicode aware locales >>> os.environ['MYVAR'] = variable >>> except UnicodeEncodeError: >>> # Non-Unicode locales >>> os.environb['MYVAR'] = bytes(variable, encoding='utf8') > (c) Same question when a key is deleted (del os.environ['PATH']). > Update the underlying env so both os.environ and os.environb reflect the change. Deleting should not hold the problems that updating does. > If Python 3.1 will have os.environ and os.environb, I'm quite sure that some > modules will user os.environ and other will prefer os.environb. If both > environments are differents, the two modules set will work differently :-/ > Exactly. So making sure they hold the same information is a priority. > It would be maybe easier if os.environ supports bytes and unicode keys. But we > have to keep these assertions: > os.environ[bytes] -> bytes > os.environ[str] -> str > I think the same choices have to be made here. If LANG=C, we still have to decide what to do when os.environ[str] is set to a non-ASCii string. Additionally, the subprocess question makes using the key value undesirable compared with having a separate os.environb that accesses the same underlying data. >> 4) raise an exception when non-decodable values are *accessed* and >> continue as in #3. > > I like os.listdir() behaviour: just *ignore* non-decodable files. If you > really want to access these files, use a bytes directory name ;-) > Since you wrote the code for that I would hope so ;-) Here's my problem with it, though. With these semantics any program that works on arbitrary files and runs on *NIX has to check os.listdir(b'') and do the conversion manually. The only code that doesn't have to care is code that is working on files that the program created and thus controls. Since it is not obvious that this has to be done most programs won't do this by default, there will be subtle bugs in a lot of code that individual application authors will have to discover and change when a user realizes something is wrong. Since there's no traceback being issued, the process of discovery and debugging will be longer. >> I think that the ease of debugging is lost when we silently ignore an error. > > Guido gave a good example. If your directory contains an non decodable > filename (eg. "???.txt"): glob('*.py') will fail because of the evil > filename. With the current behaviour, you're unable to list all files but > glob('*.py') will list all Python scripts! > Current behaviour is this: os.listdir('.') => Only decodable filenames glob.glob('*') => Only decodable filenames os.listdir(b'.') => All filenames as bytes glob.glob(b'*') => All filenames as bytes I think the desired behaviour assuming the existence of anondecodable file is this: os.listdir('.') => traceback glob.glob('*') => traceback os.listdir(b'.') => All filenames as bytes glob.glob(b'*') => All filenames as bytes Both of these approaches are internally consistent. Why do you think that glob.glob('*.py') is special and should not traceback? > And Python3 is released, it's maybe a bad idea to change the behaviour (of > os.environ) in Python 3.1 :-/ > As you've pointed out, os.environ will have to change slightly. But others have already said that this is on the agenda to fix in 3.1. The current state is just broken as the environment is currently only partially readable from python. >> The bug report I opened suggests creating a PEP to address this issue. > > Please, try to answer to my questions about os.environ and os.environb > consistency. > I have. Twice now :-) > I also like bytes environment variables. I need them for my fuzzing program. > The lack of bytes variables is a regression from Python2 (for my program). On > UNIX, filenames are bytes and the environment variables are bytes. For the > best interoperability, Python3 should support bytes. But the default choice > should always be characters (unicode) and to never mix the bytes and str > types ;-) > I agree 100%. * Never mixing bytes and str is a *huge* benefit of python3 over python2. * Unicode str everywhere possible is a python3 benefit that helps to get conversion done at the border. I just differ in that I think lack of tracebacks when UnicodeDecodeErrors are encountered is a wart in python3 that did not exist in python2. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From guido at python.org Fri Dec 5 18:59:52 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Dec 2008 09:59:52 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812051127.35880.eckhardt@satorlaser.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> Message-ID: On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt wrote: > Seriously, what would you suggest to someone that > wants to handle paths in a portable way? Using the Unicode variants of > functions is fubar, because encoding/decoding is not universally possible. > Using the byte variant is equally fubar, because e.g. on MS Windows it is not > supported, except through a very lossy roundtrip through the locale's > codepage, limiting your functionality. Write a lightweight abstraction layer that uses Unicode when possible and bytes otherwise. You'd need to write a few functions for the path handling code you need, with a platform check or two sprinkled in. Writing such an abstraction for the purpose of one specific application is usually simple enough. However, writing a similar abstraction that serves all apps and all use cases is hard. I hope that eventually someone will come up with one though -- the failure of earlier path object proposals notwithstanding. > I actually think it is about time to give up on trying to think about a path > as a string. Dito for data received from os.environ or sys.argv. There are > only very few things that are universal to them and a reliable encoding is > none of them. Then, once you have let that idea go, meditate a bit over the > Zen. This sounds too pessimistic to me. I expect that in five years it will be universally accepted that these variables must be encoded in a standard encoding. People are never going to give up thinking about filenames etc. as strings, because that's what they are conceptually. The problem is purely one of encoding, and that's where Unix/Linux are behind the curve, since (so far) they haven't taken the plunge and picked a universal standard encoding, the way Windows and Mac OS X have done. > What I propose is that paths must be treated as OS-specific, with the only > common reliable operations being joining them, concatenating them and > splitting them into segments divided by the (again, OS-specific) separator. > Other operations, like e.g. appending a string or converting it to a string > in order to display it can fail. And if they fail, they should fail noisily. That's bad though, since filenames are being displayed all the time (e.g. in error messages). > In 99% of all cases, using the default encoding will work and do what people > expect, which is why I would make this conversion automatic. In all other > cases, it will at least not fail silently (which would lead to garbage and > data loss) and allow more sophisticated applications to handle it. I think the "always fail noisily" approach isn't the best approach. E.g. if I am globbing for *.py, and there's an undecodable .txt file in a directory, its presence shouldn't cause the glob to fail. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From Ted.Leung at Sun.COM Fri Dec 5 18:48:07 2008 From: Ted.Leung at Sun.COM (Ted Leung) Date: Fri, 05 Dec 2008 09:48:07 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> Message-ID: <30F5311C-8857-4486-99ED-7380BAC51B29@sun.com> On Dec 4, 2008, at 7:59 PM, glyph at divmod.com wrote: > > On 02:35 am, amk at amk.ca wrote: >> On Thu, Dec 04, 2008 at 05:29:31PM -0800, Raymond Hettinger wrote: >>> Here's a bright idea. On the 3.0 release page, include a box >>> listing >>> which major third-party apps have been converted. Update it >>> once every couple of weeks. That way, we're not explicitly >> >> That's an excellent idea. We could have a webpage, or start a >> topic-specific weblog for posting announcements. >> >> I've started a draft of a 3.0 FAQ in the wiki at >> . Once it's finished we >> can move it into the 3.0 release pages. Everyone please edit and >> improve it! > > It occurs to me that this specific idea (the box with the list of > supported applications / libraries) should be implementable as a > simple query against PyPI. I don't know if it actually is :), but > it should be. In general it would be nice to know whether one's > favorite tools were available for *any* new Python version. I agree with this. Plus it might act as an incentive for people to port libraries faster... Ted -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 5 19:10:03 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Dec 2008 10:10:03 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> References: <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> Message-ID: On Thu, Dec 4, 2008 at 11:27 PM, wrote: > With all due respect, for me, "library support" and "serious use" are > synonymous. Glyph, I cannot have a discussion with you if every single post of yours is longer than my combined daily output. Please spend some time writing shorter posts. I'm sure I'm not the only one here with a short attention span. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Fri Dec 5 19:16:35 2008 From: fdrake at acm.org (Fred Drake) Date: Fri, 05 Dec 2008 13:16:35 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <18745.18381.364105.121084@montanaro-dyndns-org.local> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <4938D7F9.80908@v.loewis.de> <18745.18381.364105.121084@montanaro-dyndns-org.local> Message-ID: <5EB84A2F-93A9-450D-A98C-0267031CAB88@acm.org> On Dec 5, 2008, at 10:25 AM, skip at pobox.com wrote: > Good. Now we just need to populate them. I take it the classifiers > without > minor numbers imply any known minor version (e.g., 2 ==> 2.3 and > greater)? This is an excellent question, Skip. There was already "Programming Language :: Python", provided by many packages. I think version compatibility relationships meant by each of these classifiers should be made explicit, wherever it is that documentation for classifiers is provided. I don't recall having seen any such documentation; hopefully I just need to be hit by another clue. -Fred -- Fred Drake From g.brandl at gmx.net Fri Dec 5 19:24:27 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 05 Dec 2008 19:24:27 +0100 Subject: [Python-Dev] __import__ docs follow-up Message-ID: Hi, as a follow-up to the thread a few days ago, and the bug report, I've rewritten most of the __import__ docs. I've attached the suggested patch to the issue . I'd be glad for reviews. Also, I'd like to ask about opinions if this "winning idiom" (as a bug comment states) should be in it, instead of the getattr() helper function: >>> import sys >>> __import__('x.y.z') >>> mod = sys.modules['x.y.z'] Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Fri Dec 5 19:36:24 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 05 Dec 2008 19:36:24 +0100 Subject: [Python-Dev] ANN: new python-porting mailing list Message-ID: Hi all, to facilitate discussion about porting Python code between different versions (mainly of course from 2.x to 3.x), we've created a new mailing list python-porting at python.org It is a public mailing list open to everyone. We expect active participation of many people porting their libraries/programs, and hope that the list can be a help to all wanting to go this (not always smooth :-) way. @python-dev: it would of course be nice to have more than a few developers on that list ;-) regards, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From dickinsm at gmail.com Fri Dec 5 20:20:56 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 5 Dec 2008 19:20:56 +0000 Subject: [Python-Dev] Merging flow In-Reply-To: References: Message-ID: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com> On Thu, Dec 4, 2008 at 3:12 PM, Christian Heimes wrote: > Flow diagram > ------------ > > trunk ---> release26-maint > \-> py3k ---> release30-maint > I'm running into problems making this work, with a trivial change: I committed r67590 (which adds a single assert to ast.c) to the trunk, then merged to 2.6 and py3k in r67592 and r67595 respectively. Then I tried: ../svnmerge.py merge -r67595 from the root directory of a clean copy of the release30-maint branch (svn status gives no output), and got conflicts on '.': property 'svnmerge-integrated' set on '.' property 'svnmerge-blocked' set on '.' --- Merging r67595 into '.': U Python/ast.c C . property 'svnmerge-integrated' set on '.' property 'svnmerge-blocked' deleted from '.'. I now have a new file dir_conflicts.prej that looks something like: Trying to change property 'svnmerge-integrated' from '/python/trunk:1-61437,...,67528,67590', but property has been locally changed from '/python/branches/py3k:1-67498,67522-67524,67539,67541,67559,67588' to '/python/trunk:1-61437,...,67467,67484,67528'. (where the ... abbreviates a big long list of revision numbers). Did I mess up somewhere, or does svnmerge not work on a revision that was itself the result of an svnmerge? Mark From brett at python.org Fri Dec 5 20:21:28 2008 From: brett at python.org (Brett Cannon) Date: Fri, 5 Dec 2008 11:21:28 -0800 Subject: [Python-Dev] ANN: new python-porting mailing list In-Reply-To: References: Message-ID: On Fri, Dec 5, 2008 at 10:36, Georg Brandl wrote: > Hi all, > > to facilitate discussion about porting Python code between different versions > (mainly of course from 2.x to 3.x), we've created a new mailing list > > python-porting at python.org > > It is a public mailing list open to everyone. We expect active participation > of many people porting their libraries/programs, and hope that the list can > be a help to all wanting to go this (not always smooth :-) way. > The mailing list URL is http://mail.python.org/mailman/listinfo/python-porting for those who don't want to search on the mail.python.org home page (which looks really dated at this point). -Brett From brett at python.org Fri Dec 5 20:36:19 2008 From: brett at python.org (Brett Cannon) Date: Fri, 5 Dec 2008 11:36:19 -0800 Subject: [Python-Dev] Merging flow In-Reply-To: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com> References: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com> Message-ID: On Fri, Dec 5, 2008 at 11:20, Mark Dickinson wrote: > On Thu, Dec 4, 2008 at 3:12 PM, Christian Heimes wrote: >> Flow diagram >> ------------ >> >> trunk ---> release26-maint >> \-> py3k ---> release30-maint >> > > I'm running into problems making this work, with a trivial change: > I committed r67590 (which adds a single assert to ast.c) to the > trunk, then merged to 2.6 and py3k in r67592 and r67595 respectively. > Then I tried: > > ../svnmerge.py merge -r67595 > > from the root directory of a clean copy of the release30-maint > branch (svn status gives no output), and got conflicts on '.': > > property 'svnmerge-integrated' set on '.' > > property 'svnmerge-blocked' set on '.' > > --- Merging r67595 into '.': > U Python/ast.c > C . > > property 'svnmerge-integrated' set on '.' > > property 'svnmerge-blocked' deleted from '.'. > > I now have a new file dir_conflicts.prej that looks something like: > > Trying to change property 'svnmerge-integrated' from > '/python/trunk:1-61437,...,67528,67590', but property has been locally > changed from > '/python/branches/py3k:1-67498,67522-67524,67539,67541,67559,67588' to > '/python/trunk:1-61437,...,67467,67484,67528'. > > (where the ... abbreviates a big long list of revision numbers). > > Did I mess up somewhere, or does svnmerge not work on > a revision that was itself the result of an svnmerge? Someone might know better than me, but I am willing to bet you can't svnmerge a svnmerge revision. Since the svnmerge revision contains changes to the metadata on . that will conflict with the new svnmerge values that the svnmerge you are trying to do causes. But if I am right about this then won't that require blocking the svnmerge revision on release30-maint the svnmerge revision on py3k? Ugh. Is this getting to the point that we can only svnmerge between trunk and py3k and the maintenance branches just have to be managed the old-fashion way? And I have pinged the people helping me with the DVCS PEP in hopes of getting us moved off of svn sooner rather than later. -Brett From gregor.lingl at aon.at Fri Dec 5 20:36:26 2008 From: gregor.lingl at aon.at (Gregor Lingl) Date: Fri, 05 Dec 2008 20:36:26 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> Message-ID: <493982BA.6090604@aon.at> Guido van Rossum schrieb: > I hear some folks are considering advertising 3.0 as experimental or > not ready for serious use yet. > > I think that's too negative -- we should encourage people to use it, > period. They'll have to decide for themselves whether they can live > with the lack of ported 3rd party libraries -- which may resolve > itself soon enough. I'd find it useful to have a special regularly updated index of libraries already ported to 3.0 somewhere on python.org Gregor From fdrake at acm.org Fri Dec 5 20:38:45 2008 From: fdrake at acm.org (Fred Drake) Date: Fri, 05 Dec 2008 14:38:45 -0500 Subject: [Python-Dev] Merging flow In-Reply-To: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com> References: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com> Message-ID: <9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org> On Dec 5, 2008, at 2:20 PM, Mark Dickinson wrote: > Did I mess up somewhere, or does svnmerge not work on > a revision that was itself the result of an svnmerge? I ran into this yesterday as well with my patch to the cgi module. The work-around was to revert the change to that property and edit it manually. I think this is a significant issue, since editing that property is about as error-prone as it can be. I've not really looked at the code in svnmerge.py, so I'm not sure how hard it would be to fix. -Fred -- Fred Drake From gregor.lingl at aon.at Fri Dec 5 20:44:09 2008 From: gregor.lingl at aon.at (Gregor Lingl) Date: Fri, 05 Dec 2008 20:44:09 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> References: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> Message-ID: <49398489.3060907@aon.at> glyph at divmod.com schrieb: > > To be fair, if someone asked me specifically about educating non- > programmer adults about programming, I would probably at least > *mention* py3, if not recommend it outright. The improved consistency > is worth a lot in an educational setting. (But, if one is educating > children and interested in soliciting their genuine enthusiasm, > whiz-bang graphics are really a must-have, not a negotiable extra.) As a non native English speaker I'm not sure if I understand correctly, what you mean with whiz-bang graphics. Nevertheless I'd like to point you to the new turtle graphics module (which is part of the standard librarys since 2.6). At least it was designed especially for use in the educational domain. Moreover the source-distribution also contains a bunch of some ten example scripts. Regards, Gregor From skip at pobox.com Fri Dec 5 20:53:42 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 5 Dec 2008 13:53:42 -0600 Subject: [Python-Dev] ANN: new python-porting mailing list In-Reply-To: References: Message-ID: <18745.34502.329661.301314@montanaro-dyndns-org.local> Georg> python-porting at python.org Georg> It is a public mailing list open to everyone. We expect active Georg> participation of many people porting their libraries/programs, Georg> and hope that the list can be a help to all wanting to go this Georg> (not always smooth :-) way. I trust you will announce this in python-list and python-announce-list if you haven't already? Skip From g.brandl at gmx.net Fri Dec 5 20:57:43 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 05 Dec 2008 20:57:43 +0100 Subject: [Python-Dev] ANN: new python-porting mailing list In-Reply-To: <18745.34502.329661.301314@montanaro-dyndns-org.local> References: <18745.34502.329661.301314@montanaro-dyndns-org.local> Message-ID: skip at pobox.com schrieb: > Georg> python-porting at python.org > > Georg> It is a public mailing list open to everyone. We expect active > Georg> participation of many people porting their libraries/programs, > Georg> and hope that the list can be a help to all wanting to go this > Georg> (not always smooth :-) way. > > I trust you will announce this in python-list and python-announce-list if > you haven't already? I've sent it to python-announce, it's in the moderator queue. I'm not on python-list so I can't answer followups. If you'd like to do an announcement there, I'd be happy :) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From mike.klaas at gmail.com Fri Dec 5 21:01:35 2008 From: mike.klaas at gmail.com (Mike Klaas) Date: Fri, 5 Dec 2008 12:01:35 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081205164053.GA10632@amk-desktop.matrixgroup.net> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <20081205054046.12555.1291084461.divmod.xquotient.1132@weber.divmod.com> <20081205164053.GA10632@amk-desktop.matrixgroup.net> Message-ID: On 5-Dec-08, at 8:40 AM, A.M. Kuchling wrote: > On Fri, Dec 05, 2008 at 05:40:46AM -0000, glyph at divmod.com wrote: >> For most users, especially new users who have yet to be impressed >> with >> Python's power, 2.x is much better. It's not like "library >> support" is >> one small check-box on the language's feature sheet: most of the >> attractive things about Python are libraries. Of course I am not >> free > > Here I agree, sort of. Newbies may not understand what they're giving > up in terms of libraries. (The 'sort of' is because, having learned > 3.0, learning the changes for 2.6 is certainly much easier than > learning a first programming language is.) For possible insight, here is a current discussion on the topic: http://www.reddit.com/r/programming/comments/7hlra/ask_progit_ive_got_the_itch_to_learn_python_since/ (note that these would be programmers interested in learning python, not people trying to learn programming) -Mike From a.badger at gmail.com Fri Dec 5 21:05:20 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 05 Dec 2008 12:05:20 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> Message-ID: <49398980.7050209@gmail.com> Guido van Rossum wrote: > On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt wrote: >> In 99% of all cases, using the default encoding will work and do what people >> expect, which is why I would make this conversion automatic. In all other >> cases, it will at least not fail silently (which would lead to garbage and >> data loss) and allow more sophisticated applications to handle it. > > I think the "always fail noisily" approach isn't the best approach. > E.g. if I am globbing for *.py, and there's an undecodable .txt file > in a directory, its presence shouldn't cause the glob to fail. > But why should it make glob() fail? This sounds like an implementation detail of glob. Here's some pseudo-code:: def glob(pattern): string = False if isinstance(pattern, str): string = True if platform == 'POSIX': pattern = bytes(pattern, encoding=defaultencoding) rawfiles = os.listdir(os.path.dirname(pattern) or pattern) if string and platform == 'POSIX': return [str(f) for f in rawfiles if match(f, pattern)] else: return rawfiles This way the traceback occurs if anything in the result set is undecodable. What am I missing? -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From guido at python.org Fri Dec 5 21:11:28 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Dec 2008 12:11:28 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <49398980.7050209@gmail.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> Message-ID: On Fri, Dec 5, 2008 at 12:05 PM, Toshio Kuratomi wrote: > Guido van Rossum wrote: >> On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt wrote: >>> In 99% of all cases, using the default encoding will work and do what people >>> expect, which is why I would make this conversion automatic. In all other >>> cases, it will at least not fail silently (which would lead to garbage and >>> data loss) and allow more sophisticated applications to handle it. >> >> I think the "always fail noisily" approach isn't the best approach. >> E.g. if I am globbing for *.py, and there's an undecodable .txt file >> in a directory, its presence shouldn't cause the glob to fail. >> > But why should it make glob() fail? This sounds like an implementation > detail of glob. Glob was just an example. Many use cases for directory traversal couldn't care less if they see *all* files. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From a.badger at gmail.com Fri Dec 5 21:40:51 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 05 Dec 2008 12:40:51 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> Message-ID: <493991D3.9030003@gmail.com> Guido van Rossum wrote: > Glob was just an example. Many use cases for directory traversal > couldn't care less if they see *all* files. > Okay. Makes it harder to prove correct or not if I don't know what the use case is :-) I can't think of a single use case off-hand. Even your example of a ??.txt file making retrieval of *.py files fail is a little broken. If there was a ??.py file that was undecodable the program would most likely want to know that file existed. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From tseaver at palladion.com Fri Dec 5 21:49:44 2008 From: tseaver at palladion.com (Tres Seaver) Date: Fri, 05 Dec 2008 15:49:44 -0500 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <49398489.3060907@aon.at> References: <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <49398489.3060907@aon.at> Message-ID: <493993E8.5000807@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Gregor Lingl wrote: > > glyph at divmod.com schrieb: >> To be fair, if someone asked me specifically about educating non- >> programmer adults about programming, I would probably at least >> *mention* py3, if not recommend it outright. The improved consistency >> is worth a lot in an educational setting. (But, if one is educating >> children and interested in soliciting their genuine enthusiasm, >> whiz-bang graphics are really a must-have, not a negotiable extra.) > As a non native English speaker I'm not sure if I understand correctly, > what you mean with whiz-bang graphics. Nevertheless I'd like to point > you to the new turtle graphics module (which is part of the standard > librarys since 2.6). At least it was designed especially for use in the > educational domain. Moreover the source-distribution also contains a > bunch of some ten example scripts. I'm pretty sure he that turtle graphics are not "whiz-bang" (in this century, at least). Begin able to do pygame-style OpenGL stuff would be "whiz bang"[1] in my book. [1] http://www.merriam-webster.com/dictionary/whizbang Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJOZPn+gerLs4ltQ4RAnE1AKCl+Z51tACSJLBmAOcp5q534Mx+2ACg1I28 re6gaV7AFEU0WS1yvUIiZS0= =4Pda -----END PGP SIGNATURE----- From a.badger at gmail.com Fri Dec 5 21:57:35 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 05 Dec 2008 12:57:35 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <25AD8D27-C315-4F16-8FEB-3FA13E4BF77E@fuhm.net> Message-ID: <493995BF.3000705@gmail.com> Guido van Rossum wrote: > At the risk of bringing up something that was already rejected, let me > propose something that follows the path taken in 3.0 for filenames, > rather than doubling back: > > For os.environ, os.getenv() and os.putenv(), I think a similar > approach as used for os.listdir() and os.getcwd() makes sense: let > os.environ skip variables whose name or value is undecodable, and have > a separate os.environb() which contains bytes; let os.getenv() and > os.putenv() do the right thing when the arguments passed in are bytes. > I prefer the method used by file.read() where an error is thrown when accessing undecodable data. I think in time python programmers will consider not throwing an exception a wart in python3. However, this is enough to allow programmers to do the right thing once an error is reported by users and the cause has been tracked down so it doesn't block fixing errors as the current code does. And it's not like anyone expected python3 to be wart-free just because the python2 warts were fixed ;-) > For sys.argv, because it's positional, you can't skip undecodable > values, so I propose to use error=replace for the decoding; again, we > can add sys.argvb that contains the raw bytes values. The various > os.exec*() and os.spawn*() calls (as well as os.system(), os.popen() > and the subprocess module) should all accept bytes as well as strings. > This also seems sane with the same comment about throwing errors. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From victor.stinner at haypocalc.com Fri Dec 5 19:20:59 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 5 Dec 2008 19:20:59 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493966CA.2010801@gmail.com> References: <4938374B.8000006@gmail.com> <200812051118.48096.victor.stinner@haypocalc.com> <493966CA.2010801@gmail.com> Message-ID: <200812051920.59463.victor.stinner@haypocalc.com> Hi, > > But they are open questions (already asked in the bug tracker): > > I answered these in the bug tracker. Here are the answers for the > mailing list: Oh, sorry. I didn't follow the end of the discussion on the bug tracker. > > os.environb['PATH'] = '\xff' > > => os.environ['PATH'] = ??? > > os.environ['PATH'] => raises KeyError because PATH is not a key in > the unicode decoded environment. Ok, good answer :-) > > os.environ['PATH'] = chr(0x10000) > > => os.environb['PATH'] = ??? > > raise UnicodeEncodeError when setting the value. Ok, it's consistent the current behaviour. $ LANG=C ./python Python 3.0rc3+ (py3k:67498M, Dec 4 2008, 17:45:54) >>> import os >>> os.environ['x'] = '\xff' >>> os.environ['x'] Traceback (most recent call last): File "", line 1, in File "/home/haypo/prog/py3k/Lib/io.py", line 1491, in write b = encoder.encode(s) File "/home/haypo/prog/py3k/Lib/encodings/ascii.py", line 22, in encode return codecs.ascii_encode(input, self.errors)[0] UnicodeEncodeError: 'ascii' codec can't encode character '\xff' in position 1: ordinal not in range(128) Oh, that's strange :-p The error is delayed when we read the value. > > It would be maybe easier if os.environ supports bytes and unicode keys. > > But we have to keep these assertions: > > os.environ[bytes] -> bytes > > os.environ[str] -> str > > I think the same choices have to be made here. If LANG=C, we still have > to decide what to do when os.environ[str] is set to a non-ASCii string. If the charset is US-ASCII, os.environ will drop non-ASCII values. But most variables are ASCII only. Examples with my shell: $ env XCURSOR_THEME=kubuntu LANG=fr_FR.UTF-8 EDITOR=vim HOME=/home/haypo ... > Additionally, the subprocess question makes using the key value > undesirable compared with having a separate os.environb that accesses > the same underlying data. The user should be able to choose bytes or unicode. Examples: - subprocess.Popen('ls') => use unicode environment (os.environ) - subprocess.Popen(b'ls') => use bytes environment (os.environb) > Here's my problem with it, though. With these semantics any program > that works on arbitrary files and runs on *NIX has to check > os.listdir(b'') and do the conversion manually. Only programs that have to support strange environment like yours (mixing Shift-JIS and UTF-8) :-) Most programs don't have to support these charset mixture. We can imagine an higher library working on UNIX and Windows (bytes or Unicode). But that would be later. > I think the desired behaviour assuming the existence of a nondecodable > file is this: I prefer the current behaviour :-) > Why do you think that glob.glob('*.py') is special and should not traceback? It's not special. glob() reuses listdir(), and it was an example to show that "it just works". > I just differ in that I think lack of tracebacks when > UnicodeDecodeErrors are encountered is a wart in python3 that did not > exist in python2. Right. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From ncoghlan at gmail.com Fri Dec 5 23:18:47 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 06 Dec 2008 08:18:47 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493991D3.9030003@gmail.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> Message-ID: <4939A8C7.6050209@gmail.com> Toshio Kuratomi wrote: > Guido van Rossum wrote: >> Glob was just an example. Many use cases for directory traversal >> couldn't care less if they see *all* files. >> > Okay. Makes it harder to prove correct or not if I don't know what the > use case is :-) I can't think of a single use case off-hand. > > Even your example of a ??.txt file making retrieval of *.py files fail > is a little broken. If there was a ??.py file that was undecodable the > program would most likely want to know that file existed. Why? Most programs won't be able to do anything with it. And if the program *can* do something with it... that's what the bytes version of the APIs are for. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Fri Dec 5 23:21:55 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 06 Dec 2008 08:21:55 +1000 Subject: [Python-Dev] __import__ docs follow-up In-Reply-To: References: Message-ID: <4939A983.2060400@gmail.com> Georg Brandl wrote: > Hi, > > as a follow-up to the thread a few days ago, and the bug report, I've > rewritten most of the __import__ docs. I've attached the suggested patch > to the issue . > > I'd be glad for reviews. Also, I'd like to ask about opinions if this > "winning idiom" (as a bug comment states) should be in it, instead of > the getattr() helper function: > >>>> import sys >>>> __import__('x.y.z') >>>> mod = sys.modules['x.y.z'] That way is a lot cleaner than other mechanisms I've seen (including the current mechanism in the docs). Making that the recommended way of doing a dynamic import seems like a good idea to me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From a.badger at gmail.com Fri Dec 5 23:21:50 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 05 Dec 2008 14:21:50 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812051920.59463.victor.stinner@haypocalc.com> References: <4938374B.8000006@gmail.com> <200812051118.48096.victor.stinner@haypocalc.com> <493966CA.2010801@gmail.com> <200812051920.59463.victor.stinner@haypocalc.com> Message-ID: <4939A97E.9030609@gmail.com> Victor Stinner wrote: >>> It would be maybe easier if os.environ supports bytes and unicode keys. >>> But we have to keep these assertions: >>> os.environ[bytes] -> bytes >>> os.environ[str] -> str >> I think the same choices have to be made here. If LANG=C, we still have >> to decide what to do when os.environ[str] is set to a non-ASCii string. > > If the charset is US-ASCII, os.environ will drop non-ASCII values. But most > variables are ASCII only. Examples with my shell: > Yes. But you still have the question of what to do when: os.environ[str] = chr(0x10000) So I don't think it makes things simpler than having separate os.environ and os.environb that update the same data behind the scenes. >> Additionally, the subprocess question makes using the key value >> undesirable compared with having a separate os.environb that accesses >> the same underlying data. > > The user should be able to choose bytes or unicode. Examples: the subprocess question was posed further up the thread as basically -- does the user need to access os.environb in order to override things in the environment when calling subprocess? I think the answer to that is yes since you might want to start with your environment and modify it slightly when you call programs via subprocess. If you just try to copy os.environ and os.environ only iterates through the decodable env vars, that doesn't work. If you have an os.environb to copy it becomes possible. > - subprocess.Popen('ls') => use unicode environment (os.environ) > - subprocess.Popen(b'ls') => use bytes environment (os.environb) > That's... not expected to me :-( If I never touch os.environ and invoke subprocess the normal way, I'd still expect the whole environment to be passed on to the program being called. This is how invoking programs manually, shell scripting, invoking programs from perl, python2, etc work. Also, it's not really a good fit with the other things that key off of the initial argument. os.listdir(b'.') changes the output to bytes. subprocess.Popen(b'ls') would change what environment gets input into the call. >> Here's my problem with it, though. With these semantics any program >> that works on arbitrary files and runs on *NIX has to check >> os.listdir(b'') and do the conversion manually. > > Only programs that have to support strange environment like yours (mixing > Shift-JIS and UTF-8) :-) Most programs don't have to support these charset > mixture. > Any program that is intended to be distributed, accesses arbitrary files, and works on *nix platforms needs to take this into account. Just because the environment inside of my organization is sane doesn't mean that when we release the code to customers, clients, or the free software community that the places it runs will be as strict about these things. Are most programs specific to one organization or are they distributed to other people? I can't answer that... everything I work on (except passwords:-) is distributed -- from sys admin cronjobs to web applications since I'm lucky that my whole job is devoted to working on free software. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From ncoghlan at gmail.com Fri Dec 5 23:31:27 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 06 Dec 2008 08:31:27 +1000 Subject: [Python-Dev] Merging flow In-Reply-To: <9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org> References: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com> <9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org> Message-ID: <4939ABBF.90400@gmail.com> Fred Drake wrote: > On Dec 5, 2008, at 2:20 PM, Mark Dickinson wrote: >> Did I mess up somewhere, or does svnmerge not work on >> a revision that was itself the result of an svnmerge? > > I ran into this yesterday as well with my patch to the cgi module. The > work-around was to revert the change to that property and edit it manually. > > I think this is a significant issue, since editing that property is > about as error-prone as it can be. I've not really looked at the code > in svnmerge.py, so I'm not sure how hard it would be to fix. I think we're discovering the real reasons why people generally prefer to use a DVCS when trying to manage multiple branches :P For now it looks like we might have to maintain 3.0 manually, with svnmerge only helping out for trunk->2.6 and trunk->py3k... Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Fri Dec 5 23:34:25 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 06 Dec 2008 08:34:25 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939A97E.9030609@gmail.com> References: <4938374B.8000006@gmail.com> <200812051118.48096.victor.stinner@haypocalc.com> <493966CA.2010801@gmail.com> <200812051920.59463.victor.stinner@haypocalc.com> <4939A97E.9030609@gmail.com> Message-ID: <4939AC71.7010702@gmail.com> Toshio Kuratomi wrote: > Are most programs specific to one organization or are they distributed > to other people? The former. That's pretty well documented in assorted IT literature ('shrink-wrap' and open source commodity software are still relatively new players on the scene that started to shift the balance the other way, but now the server side elements of web services are shifting it back again). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From lists at cheimes.de Fri Dec 5 23:47:49 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 05 Dec 2008 23:47:49 +0100 Subject: [Python-Dev] Merging flow In-Reply-To: <4939ABBF.90400@gmail.com> References: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com> <9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org> <4939ABBF.90400@gmail.com> Message-ID: <4939AF95.3050506@cheimes.de> Nick Coghlan wrote: > I think we're discovering the real reasons why people generally prefer > to use a DVCS when trying to manage multiple branches :P > > For now it looks like we might have to maintain 3.0 manually, with > svnmerge only helping out for trunk->2.6 and trunk->py3k... The problem seems to be trunk -> py3k -> 3.0. I had no issues with py3k -> 3.0. Christian From a.badger at gmail.com Fri Dec 5 23:47:54 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 05 Dec 2008 14:47:54 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939AC71.7010702@gmail.com> References: <4938374B.8000006@gmail.com> <200812051118.48096.victor.stinner@haypocalc.com> <493966CA.2010801@gmail.com> <200812051920.59463.victor.stinner@haypocalc.com> <4939A97E.9030609@gmail.com> <4939AC71.7010702@gmail.com> Message-ID: <4939AF9A.50809@gmail.com> Nick Coghlan wrote: > Toshio Kuratomi wrote: >> Are most programs specific to one organization or are they distributed >> to other people? > > The former. That's pretty well documented in assorted IT literature > ('shrink-wrap' and open source commodity software are still relatively > new players on the scene that started to shift the balance the other > way, but now the server side elements of web services are shifting it > back again). > Cool. So it's only people writing code to be shared with the larger community or written for multiple customers that are affected by bugs like this. :-/ -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From a.badger at gmail.com Fri Dec 5 23:48:38 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 05 Dec 2008 14:48:38 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939A8C7.6050209@gmail.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> Message-ID: <4939AFC6.7000106@gmail.com> Nick Coghlan wrote: > Toshio Kuratomi wrote: >> Guido van Rossum wrote: >>> Glob was just an example. Many use cases for directory traversal >>> couldn't care less if they see *all* files. >>> >> Okay. Makes it harder to prove correct or not if I don't know what the >> use case is :-) I can't think of a single use case off-hand. >> >> Even your example of a ??.txt file making retrieval of *.py files fail >> is a little broken. If there was a ??.py file that was undecodable the >> program would most likely want to know that file existed. > > Why? Most programs won't be able to do anything with it. And if the > program *can* do something with it... that's what the bytes version of > the APIs are for. > Nonsense. A program can do tons of things with a non-decodable filename. Where it's limited is non-decodable filedata. For instance, if you have a graphical text editor, you need to let the user select files to load. To do that you need to list all the files in a directory, even the ones that aren't decodable. The ones that aren't decodable need to substitute something like: str(filename, errors='replace') + '(Filename not encoded in UTF8)' in the file listing that the user sees. When the file is loaded, it needs to access the actual raw filename. The file can then be loaded and operated upon and even saved back to disk using the raw, undecodable filename. If you have a file manager, you need to code something that let's the user move the file around. Once again, the program loads the raw filenames. It transforms the name into something representable to the user. It displays that. The user selects it and asks that it be moved to another location. Then the program uses the raw filename to move from one location to another. If you have a backup program, you need to list all the files in a directory. Then you need to copy those files to another location. Once again you have to retrieve the byte version of any non-decodable filenames. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From fdrake at acm.org Sat Dec 6 00:09:53 2008 From: fdrake at acm.org (Fred Drake) Date: Fri, 05 Dec 2008 18:09:53 -0500 Subject: [Python-Dev] Merging flow In-Reply-To: <4939ABBF.90400@gmail.com> References: <5c6f2a5d0812051120q3333c124mdb68fdf0710b42c9@mail.gmail.com> <9B7B5B67-0634-4ED4-B6F7-9A484D50A8CC@acm.org> <4939ABBF.90400@gmail.com> Message-ID: On Dec 5, 2008, at 5:31 PM, Nick Coghlan wrote: > I think we're discovering the real reasons why people generally prefer > to use a DVCS when trying to manage multiple branches :P Really? I don't. The issue has nothing to do with someone maintaining private change sets, or wanting to do development with local commits without having access to commit to the project. I expect (and someone from work has said they do as well) that Subversion 1.5's merge tracking would have handled this situation. > For now it looks like we might have to maintain 3.0 manually, with > svnmerge only helping out for trunk->2.6 and trunk->py3k... I don't know if I'll have time to look at svnmerge this weekend (with house guests and all), but I really don't expect it's a difficult problem to solve in the tool. The behavior suggests that this tiered set of branch relationships wasn't expected. -Fred -- Fred Drake From jimjjewett at gmail.com Sat Dec 6 00:12:05 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 5 Dec 2008 18:12:05 -0500 Subject: [Python-Dev] Merging flow Message-ID: Nick Coghlan wrote: > For now it looks like we might have to maintain 3.0 manually, with > svnmerge only helping out for trunk->2.6 and trunk->py3k Does it make the bookkeeping horrible if you merge from trunk straight to 3.0, and then blocked svnmerged changes from propagating? -jJ From martin at v.loewis.de Sat Dec 6 00:46:22 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 06 Dec 2008 00:46:22 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <18745.18381.364105.121084@montanaro-dyndns-org.local> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <4938D7F9.80908@v.loewis.de> <18745.18381.364105.121084@montanaro-dyndns-org.local> Message-ID: <4939BD4E.5020004@v.loewis.de> > Good. Now we just need to populate them. I take it the classifiers without > minor numbers imply any known minor version (e.g., 2 ==> 2.3 and greater)? Perhaps. As usual, they mean what people use them for. I intended them to mean 2.x and 3.x, respectively, with no constraint on x (i.e. including possibly 2.0 and 2.1). In particular, presence of "2" and absence of "3" is meant to indicate "I know that it won't work on Python 3". Regards, Martin From rdmurray at bitdance.com Sat Dec 6 01:04:01 2008 From: rdmurray at bitdance.com (rdmurray at bitdance.com) Date: Fri, 5 Dec 2008 19:04:01 -0500 (EST) Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> Message-ID: On Fri, 5 Dec 2008 at 12:11, Guido van Rossum wrote: > On Fri, Dec 5, 2008 at 12:05 PM, Toshio Kuratomi wrote: >> Guido van Rossum wrote: >>> On Fri, Dec 5, 2008 at 2:27 AM, Ulrich Eckhardt wrote: >>>> In 99% of all cases, using the default encoding will work and do what people >>>> expect, which is why I would make this conversion automatic. In all other >>>> cases, it will at least not fail silently (which would lead to garbage and >>>> data loss) and allow more sophisticated applications to handle it. >>> >>> I think the "always fail noisily" approach isn't the best approach. >>> E.g. if I am globbing for *.py, and there's an undecodable .txt file >>> in a directory, its presence shouldn't cause the glob to fail. >>> >> But why should it make glob() fail? This sounds like an implementation >> detail of glob. > > Glob was just an example. Many use cases for directory traversal > couldn't care less if they see *all* files. I agree with Toshio. The only use case I can think of for not seeing all files is when selecting a subset, and if the thing that does the selecting only generates a traceback if a file that falls into the subset is undecodable, then I don't see a problem. That is, if I'm selecting a subset of the files in a directory, and one of that subset is undecodable, I _want_ a traceback, because I'll be wanting _all_ of the files that match my selection criteria.(*) So I'm curious to hear your use cases where undecodable files are "don't care". (*) More specifically, I want the program of a developer who didn't think about the fact that users might have files with undecodable filenames in their directory to generate a traceback rather than silently losing those files. (This is spoken to both by the principle of least surprise and the zen rule that errors should never pass silently :) --RDM From ncoghlan at gmail.com Sat Dec 6 01:48:27 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 06 Dec 2008 10:48:27 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939AFC6.7000106@gmail.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> Message-ID: <4939CBDB.30305@gmail.com> Toshio Kuratomi wrote: > Nick Coghlan wrote: >> Toshio Kuratomi wrote: >>> Guido van Rossum wrote: >>>> Glob was just an example. Many use cases for directory traversal >>>> couldn't care less if they see *all* files. >>>> >>> Okay. Makes it harder to prove correct or not if I don't know what the >>> use case is :-) I can't think of a single use case off-hand. >>> >>> Even your example of a ??.txt file making retrieval of *.py files fail >>> is a little broken. If there was a ??.py file that was undecodable the >>> program would most likely want to know that file existed. >> Why? Most programs won't be able to do anything with it. And if the >> program *can* do something with it... that's what the bytes version of >> the APIs are for. >> > Nonsense. A program can do tons of things with a non-decodable > filename. Where it's limited is non-decodable filedata. You can't display a non-decodable filename to the user, hence the user will have no idea what they're working on. Non-filesystem related apps have no business trying to deal with insane filenames. Linux is moving towards a standard of UTF-8 for filenames, and once we get to the point where the idea of encoding filenames and environment variables any other way is seen as crazy, then the Python 3 approach will work seamlessly. In the meantime, raw bytes APIs will provide an alternative for those that disagree with that philosophy. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From thomas at python.org Sat Dec 6 01:49:08 2008 From: thomas at python.org (Thomas Wouters) Date: Sat, 6 Dec 2008 01:49:08 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> Message-ID: <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> On Fri, Dec 5, 2008 at 19:10, Guido van Rossum wrote: > On Thu, Dec 4, 2008 at 11:27 PM, wrote: > > With all due respect, for me, "library support" and "serious use" are > > synonymous. > > Glyph, I cannot have a discussion with you if every single post of > yours is longer than my combined daily output. Please spend some time > writing shorter posts. I'm sure I'm not the only one here with a short > attention span. :-) Allow me to paraphrase glyph (with whom I'm in complete agreement, for what it's worth): many newbies will be disappointed by Python if they start with Python 3.0 and discover that most of the cool possibilities they had heard about are 'being worked on' and not quite ready. I don't doubt that 3.0 will be easier for the new programmer to learn, but I do not believe the average "Oh, I heard about Python, let's learn it" person should be pointed to 3.0 right now. They should be encouraged to learn 2.6 -- or even 2.5. In spite of Python being a programming language, there is a difference between 'casual user of the language' and 'library developer'; 3.0 is certainly a must for all actual library developers, and I'm sure most of them know about 3.0 by now. We're talking about first impressions for people without that knowledge. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: From murman at gmail.com Sat Dec 6 02:00:45 2008 From: murman at gmail.com (Michael Urman) Date: Fri, 5 Dec 2008 19:00:45 -0600 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939CBDB.30305@gmail.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> Message-ID: On Fri, Dec 5, 2008 at 18:48, Nick Coghlan wrote: > Toshio Kuratomi wrote: >> Nick Coghlan wrote: >>> Toshio Kuratomi wrote: >>>> Guido van Rossum wrote: >>>>> Glob was just an example. Many use cases for directory traversal >>>>> couldn't care less if they see *all* files. >>>>> >>>> Okay. Makes it harder to prove correct or not if I don't know what the >>>> use case is :-) I can't think of a single use case off-hand. >>>> >>>> Even your example of a ??.txt file making retrieval of *.py files fail >>>> is a little broken. If there was a ??.py file that was undecodable the >>>> program would most likely want to know that file existed. >>> Why? Most programs won't be able to do anything with it. And if the >>> program *can* do something with it... that's what the bytes version of >>> the APIs are for. >>> >> Nonsense. A program can do tons of things with a non-decodable >> filename. Where it's limited is non-decodable filedata. > > You can't display a non-decodable filename to the user, hence the user > will have no idea what they're working on. Non-filesystem related apps > have no business trying to deal with insane filenames. And what of python's batteries---does a library that takes filenames or directories from a controlling program and processes the contents of the file need to care whether the file can be encoded properly? Is said library filesystem related or not? Won't it be awful when it's the directory name, and processing the file works if you change into its directory, but not if you're outside of it? And if there's an error during processing and the library reports a full filename using os.abspath("file.ext"), but cannot get the results? > Linux is moving towards a standard of UTF-8 for filenames, and once we > get to the point where the idea of encoding filenames and environment > variables any other way is seen as crazy, then the Python 3 approach > will work seamlessly. > > In the meantime, raw bytes APIs will provide an alternative for those > that disagree with that philosophy. And until that time, it's agony for the library writers who didn't think they needed to care, but find that their users (other developers) do. -- Michael Urman From steve at pearwood.info Sat Dec 6 02:03:55 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 6 Dec 2008 12:03:55 +1100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939A8C7.6050209@gmail.com> References: <4938374B.8000006@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> Message-ID: <200812061203.55624.steve@pearwood.info> On Sat, 6 Dec 2008 09:18:47 am Nick Coghlan wrote: > Toshio Kuratomi wrote: > > Guido van Rossum wrote: > >> Glob was just an example. Many use cases for directory traversal > >> couldn't care less if they see *all* files. > > > > Okay. Makes it harder to prove correct or not if I don't know what > > the use case is :-) I can't think of a single use case off-hand. > > > > Even your example of a ??.txt file making retrieval of *.py files > > fail is a little broken. If there was a ??.py file that was > > undecodable the program would most likely want to know that file > > existed. > > Why? Most programs won't be able to do anything with it. But the program can report a sensible error message, so the user can fix the problem. I'd rather have the Python API report errors then silence them, at least by default. I don't suppose it's on the table for functions to grow an extra argument that tells them to skip broken file names and environment variables? What I have in mind is something like: os.listdir(path, silence_errors=False) -> list_of_strings By default, if a filename in path is not a valid string, an exception is raised, with the guilty file name given in bytes as an attribute of the exception. If silence_errors is true, the invalid file names are silently skipped. -- Steven From ncoghlan at gmail.com Sat Dec 6 02:05:24 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 06 Dec 2008 11:05:24 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939AF9A.50809@gmail.com> References: <4938374B.8000006@gmail.com> <200812051118.48096.victor.stinner@haypocalc.com> <493966CA.2010801@gmail.com> <200812051920.59463.victor.stinner@haypocalc.com> <4939A97E.9030609@gmail.com> <4939AC71.7010702@gmail.com> <4939AF9A.50809@gmail.com> Message-ID: <4939CFD4.1050203@gmail.com> Toshio Kuratomi wrote: > Nick Coghlan wrote: >> Toshio Kuratomi wrote: >>> Are most programs specific to one organization or are they distributed >>> to other people? >> The former. That's pretty well documented in assorted IT literature >> ('shrink-wrap' and open source commodity software are still relatively >> new players on the scene that started to shift the balance the other >> way, but now the server side elements of web services are shifting it >> back again). >> > Cool. So it's only people writing code to be shared with the larger > community or written for multiple customers that are affected by bugs > like this. :-/ True, but it's still a fairly important problem to have a solution to. Even internally in large organisations there can be some pretty insane environments as cruft accumulates over the years. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From martin at v.loewis.de Sat Dec 6 02:19:24 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 06 Dec 2008 02:19:24 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <5EB84A2F-93A9-450D-A98C-0267031CAB88@acm.org> References: <20081204123750.GA890@amk.local> <6A8A7B58F5164C879B66D9A8DAF16C42@RaymondLaptop1> <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <4938D7F9.80908@v.loewis.de> <18745.18381.364105.121084@montanaro-dyndns-org.local> <5EB84A2F-93A9-450D-A98C-0267031CAB88@acm.org> Message-ID: <4939D31C.4010101@v.loewis.de> > There was already "Programming Language :: Python", provided by many > packages. I think version compatibility relationships meant by each of > these classifiers should be made explicit, wherever it is that > documentation for classifiers is provided. > > I don't recall having seen any such documentation; hopefully I just need > to be hit by another clue. There is no documentation for classifiers whatsoever. I don't think nuances matter much, anyway. Regards, Martin From martin at v.loewis.de Sat Dec 6 02:22:29 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 06 Dec 2008 02:22:29 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812051043.10938.victor.stinner@haypocalc.com> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <200812051043.10938.victor.stinner@haypocalc.com> Message-ID: <4939D3D5.1030403@v.loewis.de> >> 5) represent all environment variables in Unicode strings, >> including the ones that currently fail to decode. >> (then do the same to file names, then drop the byte-oriented >> file operations again) > > Please, don't do that! Bytes are not characters! And environment variables, command line arguments, and file names are not bytes, but characters. Regards, Martin From foom at fuhm.net Sat Dec 6 02:37:45 2008 From: foom at fuhm.net (James Y Knight) Date: Fri, 5 Dec 2008 20:37:45 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939CBDB.30305@gmail.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> Message-ID: On Dec 5, 2008, at 7:48 PM, Nick Coghlan wrote: > You can't display a non-decodable filename to the user, hence the user > will have no idea what they're working on. Non-filesystem related apps > have no business trying to deal with insane filenames. Sigh, same arguments, all over again. Again, *both* KDE and Gnome apps display non-decodable filenames to the user, and let the user work with the files. They display as good a rendition as they can, using a replacement character as appropriate. In some earlier versions, KDE did not work at all on poorly-encoded files, and, users submitted bug reports. People do care, it does happen in real life, and it is a bug in your software if you cannot deal with the users' files. They just want the software to work. If it shows something weird in the window titlebar, that's a bit irritating but at least it doesn't get in the way of working. > Linux is moving towards a standard of UTF-8 for filenames, and once we > get to the point where the idea of encoding filenames and environment > variables any other way is seen as crazy, then the Python 3 approach > will work seamlessly. I seriously doubt that would ever enforce utf-8 filenames/env vars/ command arguments. Oddly encoded strings will always be with us in some form or another. Now, perhaps you use crontab? At least on the systems I have, programs run by cron don't have any locale environment variables set, and so default to the "C" locale. So utf-8 encoded filenames/etc will fail, by default, for any python3 program run under cron. I'd like to make an analogy: what if Python3 couldn't deal with filenames with spaces in them on unix? Most filenames don't have spaces in them, so it should be okay, right? And those people who really need to deal with space-containing filenames can use this other API variant, instead of the recommended and most obvious one. That'd be okay, right? No, of course it wouldn't be okay! James From guido at python.org Sat Dec 6 02:47:45 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Dec 2008 17:47:45 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> Message-ID: On Fri, Dec 5, 2008 at 4:49 PM, Thomas Wouters wrote: > On Fri, Dec 5, 2008 at 19:10, Guido van Rossum wrote: >> >> On Thu, Dec 4, 2008 at 11:27 PM, wrote: >> > With all due respect, for me, "library support" and "serious use" are >> > synonymous. >> >> Glyph, I cannot have a discussion with you if every single post of >> yours is longer than my combined daily output. Please spend some time >> writing shorter posts. I'm sure I'm not the only one here with a short >> attention span. :-) > > Allow me to paraphrase glyph (with whom I'm in complete agreement, for what > it's worth): many newbies will be disappointed by Python if they start with > Python 3.0 and discover that most of the cool possibilities they had heard > about are 'being worked on' and not quite ready. I don't doubt that 3.0 will > be easier for the new programmer to learn, but I do not believe the average > "Oh, I heard about Python, let's learn it" person should be pointed to 3.0 > right now. They should be encouraged to learn 2.6 -- or even 2.5. Thanks for the summary! Maybe Glyph should just pipe his email through you. :-) Without more context it's impossible to make a good recommendation. Most people probably want to learn Python because they want to access some system for which Python is required -- whether that's Blender, Google App Engine, their Nokia cell phone, or something that some of their colleagues have written (most Googlers learning Python fall in that category :-). In that case they don't have a choice -- they should learn the version that is used by the system they want to use. Obviously that's going to be 2.x in most cases, at least for a while. But I disagree that "most of the cool possibilities they have heard about" are necessarily third party libraries. Python's standard library has lots of stuff to offer. > In spite of Python being a programming language, there is a difference > between 'casual user of the language' and 'library developer'; 3.0 is > certainly a must for all actual library developers, and I'm sure most of > them know about 3.0 by now. We're talking about first impressions for people > without that knowledge. Well if most library developers already know 3.0 by now, I would hope they aren't going to sit on their hands, and solve the issues at hand! In the mean time, I don't mind if people learn 3.0 first and 2.6 second. It's probably easier that way than the other way around. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From murman at gmail.com Sat Dec 6 02:55:44 2008 From: murman at gmail.com (Michael Urman) Date: Fri, 5 Dec 2008 19:55:44 -0600 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939D3D5.1030403@v.loewis.de> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <200812051043.10938.victor.stinner@haypocalc.com> <4939D3D5.1030403@v.loewis.de> Message-ID: On Fri, Dec 5, 2008 at 19:22, "Martin v. L?wis" wrote: >> Please, don't do that! Bytes are not characters! > > And environment variables, command line arguments, and file names > are not bytes, but characters. On Windows NT, sure. On Unix they're still bytes no matter how much we want them to be characters. This difference, and secondarily the way python 3 tries to sweep it under the rug, seem to be the roots of the problem. -- Michael Urman From steve at pearwood.info Sat Dec 6 02:58:27 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 6 Dec 2008 12:58:27 +1100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> Message-ID: <200812061258.27507.steve@pearwood.info> On Sat, 6 Dec 2008 12:47:45 pm Guido van Rossum wrote: > But I disagree that "most of the cool possibilities they have heard > about" are necessarily third party libraries. Python's standard > library has lots of stuff to offer. +1 on that. I've been using Python for a decade now, and the first third party library I've downloaded and used was Pyparsing a month or two ago. I'll be the first to admit that my programs tend to be on the small size, but they're useful to me. The lack of third party libraries to Python 3 is not necessarily a show-stopper. -- Steven From martin at v.loewis.de Sat Dec 6 03:02:49 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sat, 06 Dec 2008 03:02:49 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <200812051043.10938.victor.stinner@haypocalc.com> <4939D3D5.1030403@v.loewis.de> Message-ID: <4939DD49.7030600@v.loewis.de> >> And environment variables, command line arguments, and file names >> are not bytes, but characters. > > On Windows NT, sure. On Unix they're still bytes no matter how much we > want them to be characters. Only in the API of the OS itself. Treating them as bytes in the application is a mistake. The bytes are intended to represent characters, so Python should treat them as what they are. Regards, Martin From steve at pearwood.info Sat Dec 6 03:06:40 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 6 Dec 2008 13:06:40 +1100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939CBDB.30305@gmail.com> References: <4938374B.8000006@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> Message-ID: <200812061306.40613.steve@pearwood.info> On Sat, 6 Dec 2008 11:48:27 am Nick Coghlan wrote: > Toshio Kuratomi wrote: > > Nick Coghlan wrote: ... > >> Why? Most programs won't be able to do anything with it. And if > >> the program *can* do something with it... that's what the bytes > >> version of the APIs are for. > > > > Nonsense. A program can do tons of things with a non-decodable > > filename. Where it's limited is non-decodable filedata. > > You can't display a non-decodable filename to the user, hence the > user will have no idea what they're working on. Non-filesystem > related apps have no business trying to deal with insane filenames. I don't agree. Putting my user's hat on, I know what I would expect: the app should display *some* name, it doesn't matter exactly what, so long as: * it's as close as possible to the "real" name; * it is unique in that directory (doesn't shadow another file); and * it's enough to identify the file so I can read/save/delete/rename the file. I think there are analogous situations: long-time Windows users will be used to seeing files listed as "longfilename.txt" in some applications and "longfi~1.txt" in another. Under POSIX, file names can contain unprintable ctrl characters, and the shell will print them at least three ways, depending on context. E.g. for a file containing a formfeed, I get one of ? \f or ^L in bash. Applications can deal with such weird file names. KDE's file manager (konqueror) and file selection dialog both show the character as a small square, presumably the font's missing character glyph, and KDE apps can open and save the file. Still speaking as a user, I think it is quite reasonable to expect applications to deal with undisplayable filenames: displaying the name and opening the file are orthogonal concepts, although I accept that command-line interfaces will have difficulty with file names that can't be typed by the user! I appreciate that broken unicode is more difficult to deal with than unprintable control characters, but the basic principle is the same. -- Steven From janssen at parc.com Sat Dec 6 04:22:18 2008 From: janssen at parc.com (Bill Janssen) Date: Fri, 5 Dec 2008 19:22:18 PST Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> References: <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> Message-ID: <27924.1228533738@parc.com> Thomas Wouters wrote: > Allow me to paraphrase glyph (with whom I'm in complete agreement, for what > it's worth): many newbies will be disappointed by Python if they start with > Python 3.0 and discover that most of the cool possibilities they had heard > about are 'being worked on' and not quite ready. I don't doubt that 3.0 will > be easier for the new programmer to learn, but I do not believe the average > "Oh, I heard about Python, let's learn it" person should be pointed to 3.0 > right now. They should be encouraged to learn 2.6 -- or even 2.5. I think that's right. I was asked this question today, and it comes up (to me) fairly often at PARC. I usually suggest using the Python version that's standard for the user's platform, if they use OS X or Linux (and most do), which is typically 2.5 (for OS X Leopard), and 2.4 (for Linux -- may be out of date). For Windows users, I suggest the latest release (2.6). Bill From tseaver at palladion.com Sat Dec 6 05:57:01 2008 From: tseaver at palladion.com (Tres Seaver) Date: Fri, 05 Dec 2008 23:57:01 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812051127.35880.eckhardt@satorlaser.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> Message-ID: <493A061D.1060406@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ulrich Eckhardt wrote: > On Friday 05 December 2008, Guido van Rossum wrote: >> At the risk of bringing up something that was already rejected, let me >> propose something that follows the path taken in 3.0 for filenames, >> rather than doubling back: >> >> For os.environ, os.getenv() and os.putenv(), I think a similar >> approach as used for os.listdir() and os.getcwd() makes sense: let >> os.environ skip variables whose name or value is undecodable, and have >> a separate os.environb() which contains bytes; let os.getenv() and >> os.putenv() do the right thing when the arguments passed in are bytes. >> >> For sys.argv, because it's positional, you can't skip undecodable >> values, so I propose to use error=replace for the decoding; again, we >> can add sys.argvb that contains the raw bytes values. The various >> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen() >> and the subprocess module) should all accept bytes as well as strings. >> >> On Windows, the bytes APIs should probably not exist. >> >> I predict that most developers can get away with not using the bytes >> APIs at all. The small minority that needs to be robust if not all >> filenames use the system encoding can use the bytes APIs. > > I know some of those developers, you can contact them via > python-dev at python.org. Seriously, what would you suggest to someone that > wants to handle paths in a portable way? Using the Unicode variants of > functions is fubar, because encoding/decoding is not universally possible. > Using the byte variant is equally fubar, because e.g. on MS Windows it is not > supported, except through a very lossy roundtrip through the locale's > codepage, limiting your functionality. > > I actually think it is about time to give up on trying to think about a path > as a string. Dito for data received from os.environ or sys.argv. There are > only very few things that are universal to them and a reliable encoding is > none of them. Then, once you have let that idea go, meditate a bit over the > Zen. > > What I propose is that paths must be treated as OS-specific, with the only > common reliable operations being joining them, concatenating them and > splitting them into segments divided by the (again, OS-specific) separator. > Other operations, like e.g. appending a string or converting it to a string > in order to display it can fail. And if they fail, they should fail noisily. > In 99% of all cases, using the default encoding will work and do what people > expect, which is why I would make this conversion automatic. In all other > cases, it will at least not fail silently (which would lead to garbage and > data loss) and allow more sophisticated applications to handle it. Amen! the idea that paths, environment varioables, and stuff pulled off of sockets can be treated as text rather than strings is just wishful thinking. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJOgYd+gerLs4ltQ4RArQFAKDUZLXjwsIvNfNji4hbqM/aOZ0lMQCfRBq/ DHdYt2GGA1CrYA4a5pj+AZ4= =4CcT -----END PGP SIGNATURE----- From tseaver at palladion.com Sat Dec 6 05:57:01 2008 From: tseaver at palladion.com (Tres Seaver) Date: Fri, 05 Dec 2008 23:57:01 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812051127.35880.eckhardt@satorlaser.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> Message-ID: <493A061D.1060406@palladion.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ulrich Eckhardt wrote: > On Friday 05 December 2008, Guido van Rossum wrote: >> At the risk of bringing up something that was already rejected, let me >> propose something that follows the path taken in 3.0 for filenames, >> rather than doubling back: >> >> For os.environ, os.getenv() and os.putenv(), I think a similar >> approach as used for os.listdir() and os.getcwd() makes sense: let >> os.environ skip variables whose name or value is undecodable, and have >> a separate os.environb() which contains bytes; let os.getenv() and >> os.putenv() do the right thing when the arguments passed in are bytes. >> >> For sys.argv, because it's positional, you can't skip undecodable >> values, so I propose to use error=replace for the decoding; again, we >> can add sys.argvb that contains the raw bytes values. The various >> os.exec*() and os.spawn*() calls (as well as os.system(), os.popen() >> and the subprocess module) should all accept bytes as well as strings. >> >> On Windows, the bytes APIs should probably not exist. >> >> I predict that most developers can get away with not using the bytes >> APIs at all. The small minority that needs to be robust if not all >> filenames use the system encoding can use the bytes APIs. > > I know some of those developers, you can contact them via > python-dev at python.org. Seriously, what would you suggest to someone that > wants to handle paths in a portable way? Using the Unicode variants of > functions is fubar, because encoding/decoding is not universally possible. > Using the byte variant is equally fubar, because e.g. on MS Windows it is not > supported, except through a very lossy roundtrip through the locale's > codepage, limiting your functionality. > > I actually think it is about time to give up on trying to think about a path > as a string. Dito for data received from os.environ or sys.argv. There are > only very few things that are universal to them and a reliable encoding is > none of them. Then, once you have let that idea go, meditate a bit over the > Zen. > > What I propose is that paths must be treated as OS-specific, with the only > common reliable operations being joining them, concatenating them and > splitting them into segments divided by the (again, OS-specific) separator. > Other operations, like e.g. appending a string or converting it to a string > in order to display it can fail. And if they fail, they should fail noisily. > In 99% of all cases, using the default encoding will work and do what people > expect, which is why I would make this conversion automatic. In all other > cases, it will at least not fail silently (which would lead to garbage and > data loss) and allow more sophisticated applications to handle it. Amen! the idea that paths, environment varioables, and stuff pulled off of sockets can be treated as text rather than strings is just wishful thinking. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJOgYd+gerLs4ltQ4RArQFAKDUZLXjwsIvNfNji4hbqM/aOZ0lMQCfRBq/ DHdYt2GGA1CrYA4a5pj+AZ4= =4CcT -----END PGP SIGNATURE----- From rdmurray at bitdance.com Sat Dec 6 06:15:44 2008 From: rdmurray at bitdance.com (rdmurray at bitdance.com) Date: Sat, 6 Dec 2008 00:15:44 -0500 (EST) Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812061306.40613.steve@pearwood.info> References: <4938374B.8000006@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <200812061306.40613.steve@pearwood.info> Message-ID: On Sat, 6 Dec 2008 at 13:06, Steven D'Aprano wrote: > Applications can deal with such weird file names. KDE's file manager > (konqueror) and file selection dialog both show the character as a > small square, presumably the font's missing character glyph, and KDE > apps can open and save the file. Still speaking as a user, I think it > is quite reasonable to expect applications to deal with undisplayable > filenames: displaying the name and opening the file are orthogonal Agreed. I would file a bug report if an application couldn't handle a file that validly exists in my file system, no matter how broken the filename might appear to be. > concepts, although I accept that command-line interfaces will have > difficulty with file names that can't be typed by the user! Difficult, but not impossible: tab completion in the shell can allow the user to submit otherwise difficult to type filenames to a program. Which means python should be able to handle such things in argument strings, so that my python utilities can manipulate such files when specified as command line arguments....and a sensible error should be generated by default if the program hasn't been written in such a way that it can handle such input. It would be wonderful if all Unix variants would switch to all UTF-8 (I have done so on my own machines...I think :). But it is a slow process. --RDM From glyph at divmod.com Sat Dec 6 06:28:44 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sat, 06 Dec 2008 05:28:44 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <79990c6b0812041220x4352b715pb83b0bf95d868ec9@mail.gmail.com> <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> Message-ID: <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> On 5 Dec, 06:10 pm, guido at python.org wrote: >On Thu, Dec 4, 2008 at 11:27 PM, wrote: >>With all due respect, for me, "library support" and "serious use" are >>synonymous. > >Glyph, I cannot have a discussion with you if every single post of >yours is longer than my combined daily output. Please spend some time >writing shorter posts. I'm sure I'm not the only one here with a short >attention span. :-) I already spend a lot of time trying to remove extraneous details. The drafts of these messages are usually 3x as long :). So, trying to keep it short: Thomas paraphrased my point pretty well. The importance of libraries cannot be overemphasized. Maybe you're right and the stdlib is enough for a large audience, but I don't know that audience. Everyone I know who uses Python, uses it because of a library. In some cases, an equivalent library exists for another language, and Python wins because it has a nicer syntax. But, in no case does Python win where it *doesn't* have the library. I think that the marketing for py3 needs to target library vendors before targeting novices. If the novices are targeted first, they are going to have a bad experience when "python" libraries don't work with py3, and library maintainers are going to have a bad experience when clueless newbies harass them to update their software without understanding the magnitude of the work to do so. I've been predicting this for years, but two days into Python 3's release, I've already seen real-world examples of this pattern in #twisted. I can tell these people to "downgrade" to py2 when they come ask me for help, but I don't think most of them ask for help. They just get angry and learn Java instead. From stephen at xemacs.org Sat Dec 6 06:31:51 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 06 Dec 2008 14:31:51 +0900 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939CFD4.1050203@gmail.com> References: <4938374B.8000006@gmail.com> <200812051118.48096.victor.stinner@haypocalc.com> <493966CA.2010801@gmail.com> <200812051920.59463.victor.stinner@haypocalc.com> <4939A97E.9030609@gmail.com> <4939AC71.7010702@gmail.com> <4939AF9A.50809@gmail.com> <4939CFD4.1050203@gmail.com> Message-ID: <874p1hq37c.fsf@xemacs.org> Nick Coghlan writes: > True, but it's still a fairly important problem to have a solution to. > Even internally in large organisations there can be some pretty insane > environments as cruft accumulates over the years. M&A and globalization makes it inevitable. Toshio will remember the Mizuho April Fool's Day fiasco (a couple of large banks merged, and when they reopened as a merged entity called "Mizuho", the ATM system immediately crashed). Japan being a country that doesn't believe in GAAP, such mergers are a very difficult problem. I don't know the details, but I wouldn't even be surprised if encodings played a role in that mess because Japanese companies often have their own internal variants of the national standard JIS encoding. From stephen at xemacs.org Sat Dec 6 06:39:39 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 06 Dec 2008 14:39:39 +0900 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939D3D5.1030403@v.loewis.de> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <200812051043.10938.victor.stinner@haypocalc.com> <4939D3D5.1030403@v.loewis.de> Message-ID: <873ah1q2uc.fsf@xemacs.org> "Martin v. L?wis" writes: > >> 5) represent all environment variables in Unicode strings, > >> including the ones that currently fail to decode. > >> (then do the same to file names, then drop the byte-oriented > >> file operations again) > > > > Please, don't do that! Bytes are not characters! > > And environment variables, command line arguments, and file names > are not bytes, but characters. Unfortunately, both POSIX and OS implementation practice (including, for example, VFAT file systems: NT-derived OSes are not safe!) say otherwise, and that makes your line of argument extremely dangerous. Remember, in a fight between human custom and machine programming, the machine can always win by crashing. For that reason, bytes must be the underlying representation, always available, although I think it's essential to make a text representation easily accessible, and even the default. Humans who would rather kvetch about the machine's breakage than get a useful answer can (and should---problems will be rare for most usage patterns) use the text representation. Humans who want reliability or debuggability, on the other hand, should have something that cannot be mistaken for text immediately available. From stephen at xemacs.org Sat Dec 6 07:04:22 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 06 Dec 2008 15:04:22 +0900 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> Message-ID: <871vwlq1p5.fsf@xemacs.org> Guido van Rossum writes: > This sounds too pessimistic to me. I expect that in five years it will > be universally accepted that these variables must be encoded in a > standard encoding. Archival material will not catch up until the plastic rots. And I bet it takes ten years before the Japanese accept the same standard encoding as the rest of the world (the Japanese cellphone system and iMode still speak Shift JIS). Five years should be plenty of time, but big Japanese companies are very sensitive (and resistant to) anything that might tend to open their turf to invaders. > People are never going to give up thinking about filenames etc. as > strings, because that's what they are conceptually. People can't win this one 100%, they have to choose between convenience with occasional fatal errors, or reliability. Python should not make it hard to achieve either. The default should be convenience, of course, but there should be a layer where "decodable per standard" values and "not decoded" values are different types. This is why Martin's proposal (or any other proposal to use strings with invalid values) is nearly unacceptable, really. What those who want reliability would have to do is to immediately decode all strings from the system into something like what Toshio proposes. This would be a lot more reliable if done by Python rather than an explicitly imported library, though, and would be available for debugging of cases where the default "values are text" representation falls down. The same "text on the surface, bytes in the background" type could be used by the email module (which already implements something like this). > The problem is purely one of encoding, No, it's not. It's that strings (as understood by people) and system "text" are different types (even on Mac: VFAT and NFS filesystem filenames for example), and Python is not type-safe in this sense. There ought to be a "you think this is text but I'm keeping an accurate backup just in case" type for this. From glyph at divmod.com Sat Dec 6 07:03:55 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sat, 06 Dec 2008 06:03:55 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> Message-ID: <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> On 01:47 am, guido at python.org wrote: >>In spite of Python being a programming language, there is a difference >>between 'casual user of the language' and 'library developer'; 3.0 is >>certainly a must for all actual library developers, and I'm sure most >>of >>them know about 3.0 by now. We're talking about first impressions for >>people >>without that knowledge. > >Well if most library developers already know 3.0 by now, I would hope >they aren't going to sit on their hands, and solve the issues at hand! The best thing for 3.0 adoption would be a 3.0 "welcoming committee". A group of hackers wandering from one popular open source library to another, writing patches for 3.x compatibility issues. There must be lots of people who care about 3.x adoption, and this is probably the most effective way they can reach that goal. Each time I am going to fix a 3.0 compatibility issue, I have a choice: I can either make Twisted itself better (add features, fix bugs), or I can keep Twisted exactly the same but do lots of work so it will work on 3.0. It seems pretty clear to me that, to the extent that I have time for Twisted, fixing bugs in the HTTP implementation would be a better deal than puzzling through a megabyte of diffs generated by 2to3, trying to understand where it went wrong, and how. This doesn't mean I'm "sitting on my hands". It just means I have better things to be doing with my hands. (To be precise, 1054 better things to do, re: Twisted. Add in the Divmod projects and it's more like 3000.) Of course the distant threat of an unmaintained 2.x series is enough to motivate me to push a *little* in this direction, but it doesn't make me happy about it. I think this is exactly what the marketing effort around 3.0 needs to be doing: making a positive case for library and application authors to spend time to update to 3.x. This is a lot of work, and many (I might even say most) of us need a lot of cajoling. Free patches are a good incentive :). From larry.bugbee at boeing.com Sat Dec 6 07:18:37 2008 From: larry.bugbee at boeing.com (Bugbee, Larry) Date: Fri, 5 Dec 2008 22:18:37 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: Message-ID: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com> There has been some discussion here that users should use the str or byte function variant based on what is relevant to their system, for example when getting a list of file names or opening a file. That thought process really doesn't do much for those of us that write code that needs to run on any platform type, without alteration or the addition of complex if-statements and/or exceptions. Whatever the resolution here, and those of you addressing this thorny issue have my admiration, the solution should be such that it gives consistent behavior regardless of platform type and doesn't require the programmer to know of all the minute details of each possible target platform. That may not be possible for a while, so interim solutions should be such that it minimizes later pain. If that means hiding "implementation details" behind a new function, so be it. Then, at least, the body of one's app is not burdened with this problem later when conditions change. I'm glad I'm not the only one with hard problems. ;-) Larry From ncoghlan at gmail.com Sat Dec 6 09:10:05 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 06 Dec 2008 18:10:05 +1000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <27924.1228533738@parc.com> References: <20081204213104.GA24509@amk-desktop.matrixgroup.net> <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <27924.1228533738@parc.com> Message-ID: <493A335D.7000007@gmail.com> Bill Janssen wrote: > Thomas Wouters wrote: > >> Allow me to paraphrase glyph (with whom I'm in complete agreement, for what >> it's worth): many newbies will be disappointed by Python if they start with >> Python 3.0 and discover that most of the cool possibilities they had heard >> about are 'being worked on' and not quite ready. I don't doubt that 3.0 will >> be easier for the new programmer to learn, but I do not believe the average >> "Oh, I heard about Python, let's learn it" person should be pointed to 3.0 >> right now. They should be encouraged to learn 2.6 -- or even 2.5. > > I think that's right. > > I was asked this question today, and it comes up (to me) fairly often at > PARC. I usually suggest using the Python version that's standard for > the user's platform, if they use OS X or Linux (and most do), which is > typically 2.5 (for OS X Leopard), and 2.4 (for Linux -- may be out of date). For Linux, it depends on the distro. I think Ubuntu has been on 2.5 since 7.04 or so. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From thomas at python.org Sat Dec 6 11:12:22 2008 From: thomas at python.org (Thomas Wouters) Date: Sat, 6 Dec 2008 11:12:22 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> Message-ID: <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com> On Sat, Dec 6, 2008 at 02:47, Guido van Rossum wrote: > In the mean time, I don't mind if people learn 3.0 first and 2.6 > second. It's probably easier that way than the other way around. :-) It may be easier in a vacuum -- although I don't think it is. 3.0 is more logical than 2.x, and I don't think it's easier to learn about the better way first, and then realize that you have to use some archaic form later. In fact, we had someone on #python just last week who had learned Python from a 2.6 tutorial, then found out he had to use 2.5, and he was actually tripping over some 2.6-only features he'd been taught. When he learned he had to go back and relearn and fix them by hand, his actual words were "if thats the case, I'm gonna be forced to use another language". I hope that isn't a typical example of such a case, but I can partly understand the sentiment. But even if it's true, people don't learn in a vacuum. Almost everybody else will be thinking of 3.0 in terms of 'changes since 2.x', tools such as 2to3 are oriented that way, and explanations on bits and pieces of Python available to be googled are by and large about 2.x, not 3.0. Right now, it's just much easier to go from 2.x to 3.0 than the other way 'round. -- Thomas Wouters Hi! I'm a .signature virus! copy me into your .signature file to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phd.pp.ru Sat Dec 6 15:34:54 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sat, 6 Dec 2008 17:34:54 +0300 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> Message-ID: <20081206143454.GA15293@phd.pp.ru> On Fri, Dec 05, 2008 at 08:37:45PM -0500, James Y Knight wrote: > On Dec 5, 2008, at 7:48 PM, Nick Coghlan wrote: >> You can't display a non-decodable filename to the user, hence the user >> will have no idea what they're working on. Non-filesystem related apps >> have no business trying to deal with insane filenames. > > Sigh, same arguments, all over again. > > Again, *both* KDE and Gnome apps display non-decodable filenames to the > user, and let the user work with the files. They display as good a > rendition as they can, using a replacement character as appropriate. In > some earlier versions, KDE did not work at all on poorly-encoded files, > and, users submitted bug reports. People do care, it does happen in real > life, and it is a bug in your software if you cannot deal with the users' > files. They just want the software to work. If it shows something weird > in the window titlebar, that's a bit irritating but at least it doesn't > get in the way of working. I agree 100%. Russian Unix users use at least 5 different encodings (koi8-r, cp1251 and utf-8 are the most frequent in use, cp866 and iso-8859-5 are less frequent). I have an FTP server with some filenames in koi8 encoding - these filenames are for unix clients, - and some filenames in cp1251 for w32 clients. Sometimes I run utf-8 xterm (I am a commandline/console unixhead) for my needs (read email, write files in utf-8 with characters beyond koi8-r, which is my primary encoding) - and I still can work with filenames in koi8/cp1251 encodings. My filemanager (Midnight Commander, for the matter) shows these files and directories as "?????.???", but I can chdir to such directories, and I can open such files. It would be a big bad blow for me if filemanagers (or other programs) start to filter these filenames. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From phd at phd.pp.ru Sat Dec 6 15:37:47 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sat, 6 Dec 2008 17:37:47 +0300 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812061203.55624.steve@pearwood.info> References: <4938374B.8000006@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <200812061203.55624.steve@pearwood.info> Message-ID: <20081206143747.GB15293@phd.pp.ru> On Sat, Dec 06, 2008 at 12:03:55PM +1100, Steven D'Aprano wrote: > I'd rather have the Python API report errors then silence them, at least > by default. +1 for encoding errors by default. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From phd at phd.pp.ru Sat Dec 6 15:43:12 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sat, 6 Dec 2008 17:43:12 +0300 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939D3D5.1030403@v.loewis.de> References: <4938374B.8000006@gmail.com> <49386A2C.60208@v.loewis.de> <200812051043.10938.victor.stinner@haypocalc.com> <4939D3D5.1030403@v.loewis.de> Message-ID: <20081206144312.GC15293@phd.pp.ru> On Sat, Dec 06, 2008 at 02:22:29AM +0100, "Martin v. L?wis" wrote: > And environment variables, command line arguments, and file names > are not bytes, but characters. "There is no such thing as plain text!" If you say "these are characters" you must also name the encoding for them. LANG/LC_ALL/LC_CTYPE provide a sensible default, but if a program has problems decoding bytes to characters there must be a way for the user to override the default. But the user must be notified about the error, so programs must not silently filters out non-decodable characters. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From skip at pobox.com Sat Dec 6 16:17:43 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 6 Dec 2008 09:17:43 -0600 Subject: [Python-Dev] Where/how should I check this in? Message-ID: <18746.38807.422664.986710@montanaro-dyndns-org.local> I have a change to check in from this issue: http://bugs.python.org/issue4483 It is a build error for _dbmmodule.c which was reported against Python 3.0 involving a change to the layout of symbols in libgdbm. (There is now a libgdbm_compat in some systems which holds the dbm_* symbols.) With one tweak I'm certain needs to be applied to both 2.6 and trunk. Do I just check it in on all three branches and run svnmerge block to keep it from being considered again? Thanks, Skip From a.badger at gmail.com Sat Dec 6 16:52:38 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Sat, 06 Dec 2008 07:52:38 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939CBDB.30305@gmail.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> Message-ID: <493A9FC6.8090201@gmail.com> Nick Coghlan wrote: > Toshio Kuratomi wrote: >>> >> Nonsense. A program can do tons of things with a non-decodable >> filename. Where it's limited is non-decodable filedata. > > You can't display a non-decodable filename to the user, hence the user > will have no idea what they're working on. Non-filesystem related apps > have no business trying to deal with insane filenames. > This is where we disagree. There are many ways to display the non-decodable filename to the user because the user is not a machine. The computer must know the unique sequence of bytes in order to access a file. The user, OTOH, usually only needs to know that the file exists. In most GUI-based end-user oriented desktop apps, it's enough to do str(filename, errors='replace'). For instance, the GNOME file manager displays: "? (Invalid encoding)" and Konqueror, the KDE file manager just displays: "?" The file can still be displayed this way, accessed via the raw bytes that the program keeps internally, and operated upon by applications. For applications in which the user needs more information to differentiate the files the program has the option to display the raw byte sequences as if they were the filename. The *NIX shell and command line tools have this ability. $ LANG=en_US.utf8 ls -b ? ? $ LANG=C ls -b . .. \303\241 \303\255 $ mv $'\303\241' $'\303\263' $ LANG=C ls -b \303\255 \303\263 $ LANG=en_US.utf8 ls -b ? ? > Linux is moving towards a standard of UTF-8 for filenames, and once we > get to the point where the idea of encoding filenames and environment > variables any other way is seen as crazy, then the Python 3 approach > will work seamlessly. > With the caveat that I haven't seen movement by Linux and other Unix variants to enforce UTF-8. What I have seen are statements by kernel programmers that having the filesystem use bytes and not know about encoding is the correct thing to do. This means that utf-8 will be a convention rather than a necessity for a very long time and consequently programs will need to worry about the problems of mixed encoding systems for an equally long time. (Remember, encoding is something that can be changed per user and per file. So on a multiuser OS, mixed encodings can be out of the control of the system administrator for perfectly valid reasons.) > In the meantime, raw bytes APIs will provide an alternative for those > that disagree with that philosophy. > Oh I agree with the UTF-8 everywhere philosophy. I just know that there's tons of real-world systems out there that don't conform to my expectations for sanity and my code has to account for those :-) -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From guido at python.org Sat Dec 6 18:00:58 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 09:00:58 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com> References: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com> Message-ID: On Fri, Dec 5, 2008 at 10:18 PM, Bugbee, Larry wrote: > There has been some discussion here that users should use the str or > byte function variant based on what is relevant to their system, for > example when getting a list of file names or opening a file. That > thought process really doesn't do much for those of us that write code > that needs to run on any platform type, without alteration or the > addition of complex if-statements and/or exceptions. > > Whatever the resolution here, and those of you addressing this thorny > issue have my admiration, the solution should be such that it gives > consistent behavior regardless of platform type and doesn't require the > programmer to know of all the minute details of each possible target > platform. My prediction is that it won't ever be possible to completely hide this difference between platforms. The platforms differ fundamentally in how they see filenames. An elaborate abstraction can certainly be created that smooths out most of the differences, but at some point useful functionality will have to be lost in order to maintain strict platform independence. This is the fate of most platform-independence abstractions by the way. For example, there are many elaborate packages for platform-independent I/O, but they generally don't provide access to all functionality that is available on a platform. Where they do, the application is once again placed in the position of having to use complex if-statements and/or exceptions. Consider just this example. Many programs have a need to ask their user for a filename to be created by the program. On systems where filenames are raw byte strings, do you want to provide the user with a way to specify an arbitrary byte string? (That is, in addition to the normal case of entering a text string that will be transformed into a filename using some encoding.) Your choices are either not to support the case of bytes that aren't a valid encoding in the current encoding, or add a UI element to select an encoding, or add a UI element to enter raw bytes. An abstraction package is likely to only support the first option (this is what Java does BTW), but this is not acceptable to all applications. > That may not be possible for a while, so interim solutions should be > such that it minimizes later pain. If that means hiding "implementation > details" behind a new function, so be it. Then, at least, the body of > one's app is not burdened with this problem later when conditions > change. I believe the problem's severity is actually overstated. The interim solution with the least amount of pain that will work for almost all apps is to treat filenames as text strings encoded in some default encoding, and ignore filenames that aren't valid encodings of any text string. Yes, it is possible that you'll find that you can't completely remove or traverse certain directory trees. But that's a fact of life anyway (filesystems have many hidden failure modes), so you're better off dealing with *that* possibility than worrying over the issue of undecodable filenames. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Dec 6 18:05:23 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 09:05:23 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493A061D.1060406@palladion.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <493A061D.1060406@palladion.com> Message-ID: On Fri, Dec 5, 2008 at 8:57 PM, Tres Seaver wrote: > Amen! the idea that paths, environment varioables, and stuff pulled off > of sockets can be treated as text rather than strings is just wishful > thinking. Unfortunately most of the programmers of the world *do* think that way(*), and it's not easy to wean them off the idea. It's a powerful meme that you can use your own name as a file name, even if you happen to be Czech or Vietnamese -- and it's promoted by the two most popular consumer operating systems. (*) With the exception of sockets. Sockets are typically dealt with through protocols and APIs that provide guidance about how to convert between bytes and strings, and whether that is even a meaningful operation. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From a.badger at gmail.com Sat Dec 6 18:18:30 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Sat, 06 Dec 2008 09:18:30 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com> References: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com> Message-ID: <493AB3E6.7070806@gmail.com> Bugbee, Larry wrote: > There has been some discussion here that users should use the str or > byte function variant based on what is relevant to their system, for > example when getting a list of file names or opening a file. That > thought process really doesn't do much for those of us that write code > that needs to run on any platform type, without alteration or the > addition of complex if-statements and/or exceptions. > > Whatever the resolution here, and those of you addressing this thorny > issue have my admiration, the solution should be such that it gives > consistent behavior regardless of platform type and doesn't require the > programmer to know of all the minute details of each possible target > platform. > I've been thinking about this and I can only see one option. I don't think that it really makes less work for the programmer, though -- it just shifts the problem and makes it more apparent what your code is doing. To avoid exceptions and if-then's in program code when accessing filenames, environment variables, etc, you would need to access each of these resources via the byte API. Then, to avoid having to keep track of what's a string and what's a byte in your other code, you probably want to convert those bytes to strings. This is where the burden gets shifted. You'll have your own routine(s) to do the conversion and have to have exception handling code to deal with undecodable filenames. Note 1: your particular app might be able to get away without doing the conversion from bytes to string -- it depends on what you're planning on doing with the filename/environment data. Note 2: If there isn't a parallel API on all platforms, for instance, Guido's proposal to not have os.environb on Windows, then you'll still have to have a platform specific check. (Likely you should try to access os.evironb in this instance and if it doesn't exist, use os.environ instead... and remember that you need to either change os.environ's data into str type or change os.environb's data into byte type.) -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From guido at python.org Sat Dec 6 18:54:18 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 09:54:18 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> Message-ID: On Fri, Dec 5, 2008 at 9:28 PM, wrote: > On 5 Dec, 06:10 pm, guido at python.org wrote: >> On Thu, Dec 4, 2008 at 11:27 PM, wrote: >>> With all due respect, for me, "library support" and "serious use" are >>> synonymous. >> >> Glyph, I cannot have a discussion with you if every single post of >> yours is longer than my combined daily output. Please spend some time >> writing shorter posts. I'm sure I'm not the only one here with a short >> attention span. :-) > > I already spend a lot of time trying to remove extraneous details. The > drafts of these messages are usually 3x as long :). So, trying to keep it > short: Thanks! > Thomas paraphrased my point pretty well. The importance of libraries cannot > be overemphasized. Maybe you're right and the stdlib is enough for a large > audience, but I don't know that audience. Everyone I know who uses Python, > uses it because of a library. In some cases, an equivalent library exists > for another language, and Python wins because it has a nicer syntax. But, > in no case does Python win where it *doesn't* have the library. Clearly you're not reading the edu-sig list. :-) > I think that the marketing for py3 needs to target library vendors before > targeting novices. If the novices are targeted first, they are going to > have a bad experience when "python" libraries don't work with py3, and > library maintainers are going to have a bad experience when clueless newbies > harass them to update their software without understanding the magnitude of > the work to do so. I think it's great to have specific marketing targeted towards library developers. I know we haven't done enough -- for example I promised a C extension porting guide which didn't materialize. :-( But I do *not* think it is a good idea to emphasize elsewhere that most people shouldn't use Python 3.0. Py3k will have a hard enough time gaining mindshare without the very developers who created it discouraging its use. If you can't find it in your heart to recommend 3.0, can you at least keep that within your circle of library-producing friends? Whenever someone asks me which version to use, I alwasys respond with a question -- what do you want to use it for? And then I'll give them an answer based on what's available for their needs. Sometimes I have to recommend Python 2.2. It's been a while since I had to recommend 1.5.2 but a few years ago that was still common. (A large company I know still has servers where 1.5.2 is the default, although 2.4 is also installed.) > I've been predicting this for years, but two days into Python 3's release, > I've already seen real-world examples of this pattern in #twisted. I can > tell these people to "downgrade" to py2 when they come ask me for help, but > I don't think most of them ask for help. They just get angry and learn Java > instead. If they're that easily convinced that Java is better they probably were a lost cause anyway, so I won't mourn their departure too much. The one thing I would warn against is replacing a default Python 2.x with Python 3.0 -- if you find 2.x pre-installed, it's likely that other parts of the OS infrastructure depend on it, and *any* upgrade except to 2.x.n is likely to cause trouble. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Dec 6 19:09:28 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 10:09:28 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com> References: <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com> Message-ID: On Sat, Dec 6, 2008 at 2:12 AM, Thomas Wouters wrote: > On Sat, Dec 6, 2008 at 02:47, Guido van Rossum wrote: >> In the mean time, I don't mind if people learn 3.0 first and 2.6 >> second. It's probably easier that way than the other way around. :-) > > It may be easier in a vacuum -- although I don't think it is. 3.0 is more > logical than 2.x, and I don't think it's easier to learn about the better > way first, and then realize that you have to use some archaic form later. True, though (at least when writing new 2.x code) it's often not needed to use the archaic forms. E.g. you don't have to use backticks or __cmp__ or string exceptions. And if you can live with 2.6 it gets even better (e.g. relative import, "except ... as ..."). > In > fact, we had someone on #python just last week who had learned Python from a > 2.6 tutorial, then found out he had to use 2.5, and he was actually tripping > over some 2.6-only features he'd been taught. When he learned he had to go > back and relearn and fix them by hand, his actual words were "if thats the > case, I'm gonna be forced to use another language". I hope that isn't a > typical example of such a case, but I can partly understand the sentiment. You can't prevent this kind of thing happening occasionally. I don't generally lie awake over it -- I don't expect a massive exodus. I think some people like to say this kind of thing (especially publicly) because they expect us to be insecure about Python adoption and worried about the competition. Don't fall for the troll bait! When they go home they'll realize that learning Ruby or Java is a lot more work than learning the differences between Python 2.5 and 2.6. Or they'll learn one of those and find that it's not all roses their either. (Ruby is also going through a language transition, and the choice of which version of Java to learn isn't that easy either, despite the strict backwards compatibility -- you can choose to use a somewhat awkward older version, or use the latest and find it's not supported on the next platform you're porting to.) > But even if it's true, people don't learn in a vacuum. Almost everybody else > will be thinking of 3.0 in terms of 'changes since 2.x', tools such as 2to3 > are oriented that way, and explanations on bits and pieces of Python > available to be googled are by and large about 2.x, not 3.0. Right now, it's > just much easier to go from 2.x to 3.0 than the other way 'round. True, but we should work on fixing this rather than giving up. What happened to the 3to2 project? Wasn't someone planning to write a 3.0 to 2.6 (or 2.5?) converter using the same technology in 2to3? We probably need two different marketing/PR streams: one aimed at *existing* Python users (reaffirming we will be supporting 2.x fully for many years to come), another at *new* users (suggesting that now is a better time than ever to learn Python, with 3.0 available and new packages being ported to it all the time). -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Dec 6 19:16:21 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 10:16:21 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> References: <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> Message-ID: On Fri, Dec 5, 2008 at 10:03 PM, wrote: > The best thing for 3.0 adoption would be a 3.0 "welcoming committee". A > group of hackers wandering from one popular open source library to another, > writing patches for 3.x compatibility issues. There must be lots of people > who care about 3.x adoption, and this is probably the most effective way > they can reach that goal. > > Each time I am going to fix a 3.0 compatibility issue, I have a choice: I > can either make Twisted itself better (add features, fix bugs), or I can > keep Twisted exactly the same but do lots of work so it will work on 3.0. > It seems pretty clear to me that, to the extent that I have time for > Twisted, fixing bugs in the HTTP implementation would be a better deal than > puzzling through a megabyte of diffs generated by 2to3, trying to understand > where it went wrong, and how. > > This doesn't mean I'm "sitting on my hands". It just means I have better > things to be doing with my hands. (To be precise, 1054 better things to do, > re: Twisted. Add in the Divmod projects and it's more like 3000.) > > Of course the distant threat of an unmaintained 2.x series is enough to > motivate me to push a *little* in this direction, but it doesn't make me > happy about it. > > I think this is exactly what the marketing effort around 3.0 needs to be > doing: making a positive case for library and application authors to spend > time to update to 3.x. This is a lot of work, and many (I might even say > most) of us need a lot of cajoling. Free patches are a good incentive :). This is a really good idea. I hope and expect that the information and tools available for porting to 3.0 will dramatically improve over the next half year or so (hopefully the situation is a lot less gloomy already by the time we meet again at PyCon). The porting list that was just created also sounds like a step in the right direction. I do think that in many cases *some* support from the regular maintainers of a library would be needed -- for example if you (in particular) were to express a negative attitude towards porting Twisted to 3.0 (I'm not saying that you do, it's just a hypothetical that would apply to any "BDFL" for any sizable library) then this would discourage others from trying to contribute. OTOH if you made a branch available where you check in the results of running 2to3 over Twisted, with instructions for people to contribute fixes, that would be great -- at almost no cost to you! (Assuming you can get someone else to work on merging trunk improvements into that branch.) Remember the open source mantra -- reap the benefit of all those eyeballs! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From glyph at divmod.com Sat Dec 6 19:48:04 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sat, 06 Dec 2008 18:48:04 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com> References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com> Message-ID: <20081206184804.12555.1413861742.divmod.xquotient.1538@weber.divmod.com> On 10:12 am, thomas at python.org wrote: >When he learned he had to go >back and relearn and fix them by hand, his actual words were "if thats >the >case, I'm gonna be forced to use another language". I hope that isn't a >typical example of such a case, but I can partly understand the >sentiment. This is an overreaction, but it's a very typical overreaction. It's difficult to recover from a negative first impression even if you have lots of opportunities; in the case of an anonymous user trying out Python, the user will often stop using it, without telling anyone, and never come back. There's no opportunity to recover. From glyph at divmod.com Sat Dec 6 19:53:19 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sat, 06 Dec 2008 18:53:19 -0000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <20081206143454.GA15293@phd.pp.ru> References: <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> Message-ID: <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> On 02:34 pm, phd at phd.pp.ru wrote: >On Fri, Dec 05, 2008 at 08:37:45PM -0500, James Y Knight wrote: >>On Dec 5, 2008, at 7:48 PM, Nick Coghlan wrote: >>>You can't display a non-decodable filename to the user, hence the >>>user >>>will have no idea what they're working on. Non-filesystem related >>>apps >>>have no business trying to deal with insane filenames. >>Sigh, same arguments, all over again. >>People do care, it does happen in real >>life, and it is a bug in your software if you cannot deal with the >>users' >>files. They just want the software to work. If it shows something >>weird >>in the window titlebar, that's a bit irritating but at least it >>doesn't >>get in the way of working. > I agree 100%. Russian Unix users use at least 5 different encodings >(koi8-r, cp1251 and utf-8 are the most frequent in use, cp866 and >iso-8859-5 are less frequent). I have an FTP server with some filenames >in >koi8 encoding - these filenames are for unix clients, - and some >filenames >in cp1251 for w32 clients. Sometimes I run utf-8 xterm (I am >a commandline/console unixhead) for my needs (read email, write files >in >utf-8 with characters beyond koi8-r, which is my primary encoding) - >and >I still can work with filenames in koi8/cp1251 encodings. My >filemanager >(Midnight Commander, for the matter) shows these files and directories >as >"?????.???", but I can chdir to such directories, and I can open such >files. It would be a big bad blow for me if filemanagers (or other >programs) start to filter these filenames. I find it interesting to note that the only users in this discussion who actually have these problems in real life all have this attitude. It is expected that in an imperfect world we will have imperfect encodings, but it is super important that software which can open files can deal with not understanding the character translation of the filename. From guido at python.org Sat Dec 6 20:04:44 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 11:04:44 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206184804.12555.1413861742.divmod.xquotient.1538@weber.divmod.com> References: <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <9e804ac0812060212y341f3c1cya25aab01a6e92f78@mail.gmail.com> <20081206184804.12555.1413861742.divmod.xquotient.1538@weber.divmod.com> Message-ID: On Sat, Dec 6, 2008 at 10:48 AM, wrote: > On 10:12 am, thomas at python.org wrote: >> When he learned he had to go >> back and relearn and fix them by hand, his actual words were "if thats the >> case, I'm gonna be forced to use another language". I hope that isn't a >> typical example of such a case, but I can partly understand the sentiment. > > This is an overreaction, but it's a very typical overreaction. It's > difficult to recover from a negative first impression even if you have lots > of opportunities; in the case of an anonymous user trying out Python, the > user will often stop using it, without telling anyone, and never come back. > There's no opportunity to recover. Sorry, but I really don't see it that dark. Either they weren't ready to learn a new language anyway, or they'll try something else, and find that the grass isn't actually that green on the other side of the fence either. In general I don't worry about losing one individual potential user; there are plenty of others. I'd be more worried if someone wrote a nasty blog rant or a Slashdot article after such an experience -- but there will always be lots of people pointing out the other side, so the negative effect of such blogs is usually neutralized quite well. The one overraction that would really worry me is if influential people inside the Python developer community were to start dissing Python 3.0 based on the response of someone in #python. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Dec 6 20:13:38 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 11:13:38 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> References: <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> Message-ID: On Sat, Dec 6, 2008 at 10:53 AM, wrote: > On 02:34 pm, phd at phd.pp.ru wrote: >> I agree 100%. Russian Unix users use at least 5 different encodings >> (koi8-r, cp1251 and utf-8 are the most frequent in use, cp866 and >> iso-8859-5 are less frequent). I have an FTP server with some filenames in >> koi8 encoding - these filenames are for unix clients, - and some filenames >> in cp1251 for w32 clients. Sometimes I run utf-8 xterm (I am >> a commandline/console unixhead) for my needs (read email, write files in >> utf-8 with characters beyond koi8-r, which is my primary encoding) - and >> I still can work with filenames in koi8/cp1251 encodings. My filemanager >> (Midnight Commander, for the matter) shows these files and directories as >> "?????.???", but I can chdir to such directories, and I can open such >> files. It would be a big bad blow for me if filemanagers (or other >> programs) start to filter these filenames. > > I find it interesting to note that the only users in this discussion who > actually have these problems in real life all have this attitude. It is > expected that in an imperfect world we will have imperfect encodings, but it > is super important that software which can open files can deal with not > understanding the character translation of the filename. For file managers and similar tools I am absolutely 100% in agreement -- that's why the binary APIs are there. Most apps aren't file managers or ftp clients though. The sky is not falling. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ctb at msu.edu Sat Dec 6 20:43:42 2008 From: ctb at msu.edu (C. Titus Brown) Date: Sat, 6 Dec 2008 11:43:42 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> Message-ID: <20081206194342.GB26208@idyll.org> On Sat, Dec 06, 2008 at 06:03:55AM -0000, glyph at divmod.com wrote: -> On 01:47 am, guido at python.org wrote: -> >>In spite of Python being a programming language, there is a difference -> >>between 'casual user of the language' and 'library developer'; 3.0 is -> >>certainly a must for all actual library developers, and I'm sure most -> >>of -> >>them know about 3.0 by now. We're talking about first impressions for -> >>people -> >>without that knowledge. -> > -> >Well if most library developers already know 3.0 by now, I would hope -> >they aren't going to sit on their hands, and solve the issues at hand! -> -> The best thing for 3.0 adoption would be a 3.0 "welcoming committee". A -> group of hackers wandering from one popular open source library to -> another, writing patches for 3.x compatibility issues. There must be -> lots of people who care about 3.x adoption, and this is probably the -> most effective way they can reach that goal. Does anyone smell a few GSoC projects? (And maybe GHOP if Google decides to run it again; no word yet.) --titus -- C. Titus Brown, ctb at msu.edu From warren at delsci.com Sat Dec 6 20:38:51 2008 From: warren at delsci.com (Warren DeLano) Date: Sat, 6 Dec 2008 11:38:51 -0800 Subject: [Python-Dev] "as" keyword woes Message-ID: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> > Date: Fri, 05 Dec 2008 22:22:38 -0800 > From: Dennis Lee Bieber > Subject: Re: "as" keyword woes > To: python-list at python.org > Message-ID: > > I'm still in the dark as to what type of data could > even inspire the > use of "as" as an object name... A collection of "a" objects? In which > case, what are the "a"s? Please let me clarify. It is not "as" as a standalone object that we specifically miss in 2.6/3, but rather, the ability to use ".as" used as a method or attribute name. In other words we have lost the ability to refer to "as" as the generalized OOP-compliant/syntax-independent method name for casting: new_object = old_object.as(class_hint) # For example: float_obj = int_obj.as("float") # or float_obj = int_obj.as(float_class) # as opposed to something like float_obj = int_obj.asFloat() # which requires a separate method for each cast, or float_obj = (float)int_obj # which required syntax-dependent casting [language-based rather than object-based]. Of course, use of explicit casting syntax "(float)" is fine if you're restricting yourself to Python and other languages which support casting, but that solution is unavailable inside of a pure OOP message-passing paradigm where object.method(argument) invocations are all you have to work with. Please note that use of object.asClassname(...) is a ubiqitous convention for casting objects to specific classes (seen in ObjectiveC, Java, SmallTalk, etc.). There, I assert that 'object.as(class_reference)' is the simplest and most elegant generalization of this widely-used convention. Indeed, it is the only obvious concise answer, if you are limited to using methods for casting. Although there are other valid domain-specific uses for "as" as either a local variable or attribute names (e.g. systematic naming: as, bs, cs), those aren't nearly as important compared to "as" being available as the name of a generalized casting method -- one that is now strictly denied to users of Python 2.6 and 3. As someone somewhat knowledgable of how parsers work, I do not understand why a method/attribute name "object_name.as(...)" must necessarily conflict with a standalone keyword " as ". It seems to me that it should be possible to unambiguously separate the two without ambiguity or undue complication of the parser. So, assuming I now wish to propose a corrective PEP to remedy this situation for Python 3.1 and beyond, what is the best way to get started on such a proposal? Cheers, Warren From scott+python-dev at scottdial.com Sat Dec 6 21:06:42 2008 From: scott+python-dev at scottdial.com (Scott Dial) Date: Sat, 06 Dec 2008 15:06:42 -0500 Subject: [Python-Dev] "as" keyword woes In-Reply-To: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> References: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> Message-ID: <493ADB52.7090608@scottdial.com> Warren DeLano wrote: > There, I assert that 'object.as(class_reference)' is the simplest and > most elegant generalization of this widely-used convention. Indeed, it > is the only obvious concise answer, if you are limited to using methods > for casting. How about "to"? Almost every language I have ever used uses "to" and not "as". Python predominately uses "to" already, so why would you fight that? And moreover, I have never seen a language or library that preferred "as", so I remain to be convinced that "as" is a good choice. > As someone somewhat knowledgable of how parsers work, I do not > understand why a method/attribute name "object_name.as(...)" must > necessarily conflict with a standalone keyword " as ". It seems to me > that it should be possible to unambiguously separate the two without > ambiguity or undue complication of the parser. It's not a matter of whether it is possible. It's a matter of simplicity and a lack of a worthy use-case for allowing it. In general, the trend has been to not allow any keywords as identifiers in the Python language. If there were such a worthy use-case, then what is really import is that it increases the complexity of /the language/ a human programmer needs to parse. > So, assuming I now wish to propose a corrective PEP to remedy this > situation for Python 3.1 and beyond, what is the best way to get started > on such a proposal? I think you will need to work on making a convincing argument as to why the keyword "as" is anymore special than say "for", or any other keywords for that matter. Unless you plan on proposing a reversal of the current keyword/identifier ideology, which is likely to be reject outright. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From glyph at divmod.com Sat Dec 6 21:19:15 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sat, 06 Dec 2008 20:19:15 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> Message-ID: <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com> As far as the original point of this thread, I started off just defending the cautionary text already present in the announcements and on the website. Since I'm not advocating any changes to that (the brief caveat on the "download" page is fine), we'll just have to agree to disagree on the abstractly appropriate audience for 3.0. I'll respond to some other points though: On 05:54 pm, guido at python.org wrote: >On Fri, Dec 5, 2008 at 9:28 PM, wrote: >>On 5 Dec, 06:10 pm, guido at python.org wrote: >>>On Thu, Dec 4, 2008 at 11:27 PM, wrote: >I think it's great to have specific marketing targeted towards library >developers. I know we haven't done enough -- for example I promised a >C extension porting guide which didn't materialize. :-( Well, get cracking, then! :) >If you can't find it in your heart to recommend >3.0, can you at least keep that within your circle of >library-producing friends? In another (longer) message, I already said this is what I'm doing. Assuming that we are all my "library-producing friends" here :). I am deliberately refraining from blogging about 3.0 until I have something nice to say. But still, you can't honestly expect me to recommend 3.0 until someone has gotten at least a basic skeleton of Twisted up and running under it :). My own attempts to do so have failed miserably, to the point where I can't even produce a useful bug report without a lot more work. Would you recommend a C compiler that couldn't build Python, or link with it? >Whenever someone asks me which version to use, I alwasys respond with >a question -- what do you want to use it for? In the longer term, I think that you should look at this as a symptom of a problem. If you learn Java, you learn the most recent version. If you need your software to work with an older version, you just pass a special option to the compiler. If you want your *old* software to work with a *new* version, it basically just does (at least, 99% of the time). I don't think there's anything about the 3.0 language which couldn't be supported in a VM that understood both 2 and 3. "py3to2" seems at least a rough proof of concept of that idea, although it still has some issues. Library availability should be a separate concern from a clean source language. I also don't think 3.0 is perfect, and five years on, there will be a temptation to make more "just this once" incompatible changes. Of course, you've promised these changes won't be made, and *this* set of design mistakes will be with us forever. It would be nice if there were a way for evolution to continue without another reboot of the world. >If they're that easily convinced that Java is better they probably >were a lost cause anyway, so I won't mourn their departure too much. I really believe that *all* new users are fickle, if they don't have a mandate as to what they need to be learning. Personally, I learned Python because of a memory leak in Swing. From guido at python.org Sat Dec 6 21:29:09 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 12:29:09 -0800 Subject: [Python-Dev] "as" keyword woes In-Reply-To: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> References: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> Message-ID: On Sat, Dec 6, 2008 at 11:38 AM, Warren DeLano wrote: [...] > There, I assert that 'object.as(class_reference)' is the simplest and > most elegant generalization of this widely-used convention. Indeed, it > is the only obvious concise answer, if you are limited to using methods > for casting. Well, that's too bad, as 'as' is now a reserved word. > Although there are other valid domain-specific uses for "as" as either a > local variable or attribute names (e.g. systematic naming: as, bs, cs), > those aren't nearly as important compared to "as" being available as the > name of a generalized casting method -- one that is now strictly denied > to users of Python 2.6 and 3. If you had brought this up 5-10 years ago when we first introduced 'as' as a semi-keyword (in the import statement) we might have been able to avert this disaster. As it was, nobody ever brought this up AFICR, so I don't think it's *that* obvious. > As someone somewhat knowledgable of how parsers work, I do not > understand why a method/attribute name "object_name.as(...)" must > necessarily conflict with a standalone keyword " as ". It seems to me > that it should be possible to unambiguously separate the two without > ambiguity or undue complication of the parser. That's possible with sufficiently powerful parser technology, but that's not how the Python parser (and most parsers, in my experience) treat reserved words. Reserved words are reserved in all contexts, regardless of whether ambiguity could arise. Otherwise *every* reserved word would have to be allowed right after a '.', and many keywords would have to be allowed as identifiers in other contexts. That way lies PL/1... Furthermore, how would you define the 'as' method? Would you also want to be allowed to write def as(self, target): ... ??? Trust me, it's a slippery slope, and you don't want to start going down there. > So, assuming I now wish to propose a corrective PEP to remedy this > situation for Python 3.1 and beyond, what is the best way to get started > on such a proposal? Don't bother writing a PEP to make 'as' available as an attribute again. It has no chance of being accepted. Instead, think of a different word you could use. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From glyph at divmod.com Sat Dec 6 21:37:25 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sat, 06 Dec 2008 20:37:25 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> Message-ID: <20081206203725.12555.893422998.divmod.xquotient.1717@weber.divmod.com> On 06:16 pm, guido at python.org wrote: >On Fri, Dec 5, 2008 at 10:03 PM, wrote: >I do think that in many cases *some* support from the regular >maintainers of a library would be needed -- for example if you (in >particular) were to express a negative attitude towards porting >Twisted to 3.0 (I'm not saying that you do, it's just a hypothetical >that would apply to any "BDFL" for any sizable library) then this >would discourage others from trying to contribute. Of course. Grumpy as we are, we're preparing for the 3.0 migration, and have been for a while. There are tickets open in the tracker, a buildslave reporting 2.6's -3 warnings, and soon, apparently, a buildslave that will attempt to run the tests with 3.0, although getting anything but a traceback bootstrapping the testing tool is a ways off. My attitude in every public statement I've ever made about 3.0 has been that there is too much migration work for our tiny team to do, but we are very open to getting help from the community. >OTOH if you made a >branch available where you check in the results of running 2to3 over >Twisted, with instructions for people to contribute fixes, that would >be great -- at almost no cost to you! (Assuming you can get someone >else to work on merging trunk improvements into that branch.) Remember >the open source mantra -- reap the benefit of all those eyeballs! This isn't really the way our development process works on Twisted - we don't have enough developers to support more than one line of development. Modules and subsystems can be patched individually, and the whole idea with 2to3 is that source changes should remain compatible with 2.6 (and appropriate level of swaddling can paper over library changes back to 2.3) so those fixes can just go into trunk, right? Nevertheless the sentiment is the same. If someone desperately interested in getting Twisted to work on 3.0, there would be lots of work for them to do and a clear place for them to go do it. From guido at python.org Sat Dec 6 21:51:55 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 12:51:55 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com> References: <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com> Message-ID: On Sat, Dec 6, 2008 at 12:19 PM, wrote: > I also don't think 3.0 is perfect, and five years on, there will be a > temptation to make more "just this once" incompatible changes. Of course, > you've promised these changes won't be made, and *this* set of design > mistakes will be with us forever. It would be nice if there were a way for > evolution to continue without another reboot of the world. It would be nice indeed. But we (and any other language that's alive) will need to walk a careful line between evolving too slow and too fast. Hopefully we'll be able to evolve mostly through deprecation and eventual removal of misfeatures rather than through a series of hiccups like 3.0. But it will still be too slow for some and too fast for others. Since one of your favorite themes is that your team is too small, I would like to reuse that idea. If we had as many Python core developers as Sun and IBM have working on Java, we could most likely have introduced all Python 3.0 features gradually, with compiler flags and __future__ imports to support different versions. But despite being a bit bigger than Twisted, we're still severely constrained by resources. My estimation when we started was that it would be easier for the core team to maintain two separate versions over a long time, than to try and produce a single binary capable of running both versions of the language. (Maybe Jython and/or IronPython provide a better platform for doing that though.) Hopefully by the time Python 4000 rolls along, technology will be available to make the transition more smoothly. But we'll still have to break some eggs... -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sat Dec 6 21:53:16 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 12:53:16 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206203725.12555.893422998.divmod.xquotient.1717@weber.divmod.com> References: <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> <20081206203725.12555.893422998.divmod.xquotient.1717@weber.divmod.com> Message-ID: On Sat, Dec 6, 2008 at 12:37 PM, wrote: > Of course. Grumpy as we are, we're preparing for the 3.0 migration, and > have been for a while. There are tickets open in the tracker, a buildslave > reporting 2.6's -3 warnings, and soon, apparently, a buildslave that will > attempt to run the tests with 3.0, although getting anything but a traceback > bootstrapping the testing tool is a ways off. Thank you very much for this. > My attitude in every public statement I've ever made about 3.0 has been that > there is too much migration work for our tiny team to do, but we are very > open to getting help from the community. If I were a Twisted user I wouldn't hesitate to help. Open source to the rescue! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From glyph at divmod.com Sat Dec 6 22:26:49 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sat, 06 Dec 2008 21:26:49 -0000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com> Message-ID: <20081206212649.12555.749860363.divmod.xquotient.1720@weber.divmod.com> On 08:51 pm, guido at python.org wrote: >On Sat, Dec 6, 2008 at 12:19 PM, wrote: >>I also don't think 3.0 is perfect, and five years on, there will be a >>temptation to make more "just this once" incompatible changes. Of >>course, >>you've promised these changes won't be made, and *this* set of design >>mistakes will be with us forever. It would be nice if there were a >>way for >>evolution to continue without another reboot of the world. >Since one of your favorite themes is that your team is too small, I >would like to reuse that idea. If we had as many Python core >developers as Sun and IBM have working on Java, we could most likely >have introduced all Python 3.0 features gradually, with compiler flags >and __future__ imports to support different versions. But despite >being a bit bigger than Twisted, we're still severely constrained by >resources. Ah, the dangers of over-editing. I originally had a whole paragraph about how I understood that the Python dev team was also resource constrained, but I deleted it for brevity. Now you see why my posts are so long! :) From brett at python.org Sat Dec 6 23:36:11 2008 From: brett at python.org (Brett Cannon) Date: Sat, 6 Dec 2008 14:36:11 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> References: <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <9e804ac0812051649u427f2088h21c7d86d7c83b118@mail.gmail.com> <20081206060355.12555.1553839479.divmod.xquotient.1516@weber.divmod.com> Message-ID: On Fri, Dec 5, 2008 at 22:03, wrote: > On 01:47 am, guido at python.org wrote: >>> >>> In spite of Python being a programming language, there is a difference >>> between 'casual user of the language' and 'library developer'; 3.0 is >>> certainly a must for all actual library developers, and I'm sure most of >>> them know about 3.0 by now. We're talking about first impressions for >>> people >>> without that knowledge. >> >> Well if most library developers already know 3.0 by now, I would hope >> they aren't going to sit on their hands, and solve the issues at hand! > > The best thing for 3.0 adoption would be a 3.0 "welcoming committee". A > group of hackers wandering from one popular open source library to another, > writing patches for 3.x compatibility issues. There must be lots of people > who care about 3.x adoption, and this is probably the most effective way > they can reach that goal. > The welcoming committee has somewhat already started. Martin announced on python-porting that he ported psycopg2 himself and submitted the patch. Martin also mostly ported Django at the last PyCon. -Brett From brett at python.org Sat Dec 6 23:42:38 2008 From: brett at python.org (Brett Cannon) Date: Sat, 6 Dec 2008 14:42:38 -0800 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: <4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com> References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> <4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com> Message-ID: On Thu, Dec 4, 2008 at 17:02, Frank Wierzbicki wrote: > On Thu, Dec 4, 2008 at 3:16 PM, Brett Cannon wrote: >> On Thu, Dec 4, 2008 at 12:05, Frank Wierzbicki wrote: >>> On Wed, Dec 3, 2008 at 10:31 AM, A.M. Kuchling wrote: >>>> 14:00 - 15:30 >>>> ============= >>>> >>>> Two tracks: >>>> >>>> Cross-implementation issues: >>>> >>>> What do the various VMs want/need from CPython to help with their >>>> implementations? >>>> >>>> * Marking CPython-specific tests in the test suite? >>>> * Getting an implementation agnostic test suite for the Python language? >>>> * Separating the language tests and the pure Python part of the stdlib into >>>> a separate project? (Or publish them as a separate package.) >>>> * Transition plans for 3.0? >>>> >>>> Champion needed. >>> I would like to champion this one. >>> >> >> I told AMK this a while back, but might as well make it more public; I >> am up for chairing as well. > Brett, > > Are you saying you've already called the cross-implementation champion > role? No, I am saying I had told AMK I was interested in championing the session. He chose you, and that's that. One less thing for me to worry about. =) > If so I'm happy to defer or co-chair. Your call. I will definitely be there representing CPython as best as I can, so I will be making noise regardless of whether I am standing in front of the room or not. -Brett From jnoller at gmail.com Sat Dec 6 23:48:58 2008 From: jnoller at gmail.com (jnoller at gmail.com) Date: Sat, 06 Dec 2008 22:48:58 +0000 Subject: [Python-Dev] Holding a Python Language Summit at PyCon Message-ID: <0016361e89b66d8dd8045d689975@google.com> On Dec 6, 2008 5:42pm, Brett Cannon wrote: > On Thu, Dec 4, 2008 at 17:02, Frank Wierzbicki wrote: > > > On Thu, Dec 4, 2008 at 3:16 PM, Brett Cannon wrote: > > >> On Thu, Dec 4, 2008 at 12:05, Frank Wierzbicki wrote: > > >>> On Wed, Dec 3, 2008 at 10:31 AM, AM Kuchling wrote: > > >>>> 14:00 - 15:30 > > >>>> ============= > > >>>> > > >>>> Two tracks: > > >>>> > > >>>> Cross-implementation issues: > > >>>> > > >>>> What do the various VMs want/need from CPython to help with their > > >>>> implementations? > > >>>> > > >>>> * Marking CPython-specific tests in the test suite? > > >>>> * Getting an implementation agnostic test suite for the Python language? > > >>>> * Separating the language tests and the pure Python part of the stdlib into > > >>>> a separate project? (Or publish them as a separate package.) > > >>>> * Transition plans for 3.0? > > >>>> > > >>>> Champion needed. > > >>> I would like to champion this one. > > >>> > > >> > > >> I told AMK this a while back, but might as well make it more public; I > > >> am up for chairing as well. > > > Brett, > > > > > > Are you saying you've already called the cross-implementation champion > > > role? > > > > No, I am saying I had told AMK I was interested in championing the > > session. He chose you, and that's that. One less thing for me to worry > > about. =) > > > > > If so I'm happy to defer or co-chair. > > > > Your call. I will definitely be there representing CPython as best as > > I can, so I will be making noise regardless of whether I am standing > > in front of the room or not. > > > > -Brett > Is heckling covered as an official obligation? :) -jesse -------------- next part -------------- An HTML attachment was scrubbed... URL: From hsoft at hardcoded.net Sun Dec 7 00:01:39 2008 From: hsoft at hardcoded.net (Virgil Dupras) Date: Sun, 7 Dec 2008 00:01:39 +0100 Subject: [Python-Dev] "as" keyword woes In-Reply-To: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> References: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> Message-ID: <1BA80D7C-44A6-4DE0-AC43-A99B50DF3F5E@hardcoded.net> On 06 Dec 2008, at 20:38, Warren DeLano wrote: > >> Date: Fri, 05 Dec 2008 22:22:38 -0800 >> From: Dennis Lee Bieber >> Subject: Re: "as" keyword woes >> To: python-list at python.org >> Message-ID: >> >> I'm still in the dark as to what type of data could >> even inspire the >> use of "as" as an object name... A collection of "a" objects? In >> which >> case, what are the "a"s? > > Please let me clarify. It is not "as" as a standalone object that we > specifically miss in 2.6/3, but rather, the ability to use ".as" > used as > a method or attribute name. > > In other words we have lost the ability to refer to "as" as the > generalized OOP-compliant/syntax-independent method name for casting: > > new_object = old_object.as(class_hint) > > # For example: > > float_obj = int_obj.as("float") > > # or > > float_obj = int_obj.as(float_class) > > # as opposed to something like > > float_obj = int_obj.asFloat() > > # which requires a separate method for each cast, or > > float_obj = (float)int_obj > > # which required syntax-dependent casting [language-based rather than > object-based]. > > Of course, use of explicit casting syntax "(float)" is fine if you're > restricting yourself to Python and other languages which support > casting, but that solution is unavailable inside of a pure OOP > message-passing paradigm where object.method(argument) invocations are > all you have to work with. > > Please note that use of object.asClassname(...) is a ubiqitous > convention for casting objects to specific classes (seen in > ObjectiveC, > Java, SmallTalk, etc.). > > There, I assert that 'object.as(class_reference)' is the simplest and > most elegant generalization of this widely-used convention. Indeed, > it > is the only obvious concise answer, if you are limited to using > methods > for casting. > > Although there are other valid domain-specific uses for "as" as > either a > local variable or attribute names (e.g. systematic naming: as, bs, > cs), > those aren't nearly as important compared to "as" being available as > the > name of a generalized casting method -- one that is now strictly > denied > to users of Python 2.6 and 3. > > As someone somewhat knowledgable of how parsers work, I do not > understand why a method/attribute name "object_name.as(...)" must > necessarily conflict with a standalone keyword " as ". It seems to me > that it should be possible to unambiguously separate the two without > ambiguity or undue complication of the parser. > > So, assuming I now wish to propose a corrective PEP to remedy this > situation for Python 3.1 and beyond, what is the best way to get > started > on such a proposal? > > Cheers, > Warren > As long as "as" is widely known as a keyword, I don't see the problem. Every python developer knows that the convention is to add a trailing underscore when you want to use a reserved word in your code. Besides, your examples are quite abstract. I'm sure it's possible to find good examples for "while", "with", "import", "from" (I often use "from_") or "try" as well. Or perhaps that the python keywords should be "as_" so we leave "as" free for eventual methods? As for the implicit proposition of allowing keywords only for methods, I agree with Guido about it being a slippery slope. So we would end up with a language where it is allowed to name methods after keywords, but not functions (they can be declared in the local scope)? Yikes! Oh well, maybe it's possible for an intelligent parser to distinguish between keywords and function references, but think of the poor grammar highlighters in all source editors! What a nightmare it will be for them. Anyway, is there any language that does this, allowing keywords as method names? I don't know, but if not, there's probably a reason for it. Your views on code elegance are also rather Javaish. I'd go for "class_reference(object)" (and why the heck would you "be limited to using method for casting"?). Ciao, Virgil From solipsis at pitrou.net Sun Dec 7 00:15:12 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Dec 2008 23:15:12 +0000 (UTC) Subject: [Python-Dev] Buildbots for 2.6 and 3.0 Message-ID: Hello people, Looking at http://www.python.org/dev/buildbot/, we are still missing buildbots for the release26-maint and release30-maint branches. Is someone working on that? Regards Antoine. From musiccomposition at gmail.com Sun Dec 7 00:18:05 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sat, 6 Dec 2008 17:18:05 -0600 Subject: [Python-Dev] 3.0.1 possibilities Message-ID: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Since the release of 3.0, several critical issues have come to our attention. Namely, the builtin cmp function wasn't removed [1] and the new IO library proved to be (as expected) abysmally slow [2][3][4]. Christian proposed that we release 3.0.1 within the next week to patch up this critical issues. Thoughts? [1] http://bugs.python.org/1717 [2] http://bugs.python.org/4533 [3] http://bugs.python.org/4561 [4] http://bugs.python.org/4565 -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From python at rcn.com Sun Dec 7 00:19:39 2008 From: python at rcn.com (Raymond Hettinger) Date: Sat, 6 Dec 2008 15:19:39 -0800 Subject: [Python-Dev] Buildbots for 2.6 and 3.0 References: Message-ID: <8AADF944CB714CE5B31E1FC495735901@RaymondLaptop1> BTW, 3.0 went out the door with test_binascii failing on windows. Was surprised that some buildbot wasn't complaining. ----- Original Message ----- From: "Antoine Pitrou" To: Sent: Saturday, December 06, 2008 3:15 PM Subject: [Python-Dev] Buildbots for 2.6 and 3.0 > > Hello people, > > Looking at http://www.python.org/dev/buildbot/, we are still missing buildbots > for the release26-maint and release30-maint branches. Is someone working on that? > > Regards > > Antoine. > > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/python%40rcn.com From python at rcn.com Sun Dec 7 00:25:06 2008 From: python at rcn.com (Raymond Hettinger) Date: Sat, 6 Dec 2008 15:25:06 -0800 Subject: [Python-Dev] 3.0.1 possibilities References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Message-ID: Strong +1 Are the RMs on board? ----- Original Message ----- From: "Benjamin Peterson" To: Sent: Saturday, December 06, 2008 3:18 PM Subject: [Python-Dev] 3.0.1 possibilities > Since the release of 3.0, several critical issues have come to our > attention. Namely, the builtin cmp function wasn't removed [1] and the > new IO library proved to be (as expected) abysmally slow [2][3][4]. > Christian proposed that we release 3.0.1 within the next week to patch > up this critical issues. Thoughts? > > > [1] http://bugs.python.org/1717 > [2] http://bugs.python.org/4533 > [3] http://bugs.python.org/4561 > [4] http://bugs.python.org/4565 > > -- > Cheers, > Benjamin Peterson > "There's nothing quite as beautiful as an oboe... except a chicken > stuck in a vacuum cleaner." > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/python%40rcn.com From guido at python.org Sun Dec 7 00:25:41 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 15:25:41 -0800 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Message-ID: +1 On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson wrote: > Since the release of 3.0, several critical issues have come to our > attention. Namely, the builtin cmp function wasn't removed [1] and the > new IO library proved to be (as expected) abysmally slow [2][3][4]. > Christian proposed that we release 3.0.1 within the next week to patch > up this critical issues. Thoughts? > > > [1] http://bugs.python.org/1717 > [2] http://bugs.python.org/4533 > [3] http://bugs.python.org/4561 > [4] http://bugs.python.org/4565 -- --Guido van Rossum (home page: http://www.python.org/~guido/) From solipsis at pitrou.net Sun Dec 7 00:39:07 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Dec 2008 23:39:07 +0000 (UTC) Subject: [Python-Dev] 3.0.1 possibilities References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Message-ID: Benjamin Peterson gmail.com> writes: > > Since the release of 3.0, several critical issues have come to our > attention. Namely, the builtin cmp function wasn't removed [1] and the > new IO library proved to be (as expected) abysmally slow [2][3][4]. > Christian proposed that we release 3.0.1 within the next week to patch > up this critical issues. The IO library needs a lot of work to make it as fast as in 2.6, one week isn't enough. I'm not sure an emergency release with the linked patches is very useful honestly. From barry at python.org Sun Dec 7 00:41:41 2008 From: barry at python.org (Barry Warsaw) Date: Sat, 6 Dec 2008 18:41:41 -0500 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote: > On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson > wrote: >> Since the release of 3.0, several critical issues have come to our >> attention. Namely, the builtin cmp function wasn't removed [1] and >> the >> new IO library proved to be (as expected) abysmally slow [2][3][4]. >> Christian proposed that we release 3.0.1 within the next week to >> patch >> up this critical issues. Thoughts? >> >> >> [1] http://bugs.python.org/1717 >> [2] http://bugs.python.org/4533 >> [3] http://bugs.python.org/4561 >> [4] http://bugs.python.org/4565 I've set the priority on all these to release blockers, but I have my reservations about 4561 and 4565. Resolution of those seem like more than a week or so away. If we want to do a bug fix release for 3.0.1, I'd like to do it no later than the 19th. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTsNtXEjvBPtnXfVAQKI4AP8CNQEEb2KuN8cvd+t6YK39jFPxEo8j/YV 022zAWX3nNgj/R88C7OwoP6nYLx+zz4D3USj65OZN4NS9W9tJYKs+Lv6CnjIJi2X cVceihcJHVYbyx8r14mYt6VjSmpTuNBD8uPZGv23WLZJZ5pNpWeuEMqI6XR27bY2 NYxbwSEUQpw= =3wZN -----END PGP SIGNATURE----- From aahz at pythoncraft.com Sun Dec 7 01:20:32 2008 From: aahz at pythoncraft.com (Aahz) Date: Sat, 6 Dec 2008 16:20:32 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> Message-ID: <20081207002032.GA13190@panix.com> On Sat, Dec 06, 2008, Guido van Rossum wrote: > > But I do *not* think it is a good idea to emphasize elsewhere that > most people shouldn't use Python 3.0. Py3k will have a hard enough > time gaining mindshare without the very developers who created > it discouraging its use. If you can't find it in your heart to > recommend 3.0, can you at least keep that within your circle of > library-producing friends? Sorry, I don't think I can do that. It's difficult-to-impossible to leap straight from Python 2.2 or 2.3 to 3.0, and I think that most released Python software still ought to support versions going back that far. Unless someone plans to use Python only on machines where they can guarantee availability of 3.0, I think that sticking with 2.x is the prudent course. Then again, until the release of 3.0, I was still advocating the use of classic classes in the 2.x series, and I haven't yet decided whether I should change that stance now that there is a released version of Python where new-style classes are the default. I believe that it would be a shame and a disservice to Python if there were a large proportion of the Python community that discouraged the use of 3.0; I also believe it would be a shame and a disservice to Python if you (and other people) tell conservatives like me that we should keep our mouths shut. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From aahz at pythoncraft.com Sun Dec 7 01:23:57 2008 From: aahz at pythoncraft.com (Aahz) Date: Sat, 6 Dec 2008 16:23:57 -0800 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Message-ID: <20081207002357.GB13190@panix.com> On Sat, Dec 06, 2008, Benjamin Peterson wrote: > > Since the release of 3.0, several critical issues have come to our > attention. Namely, the builtin cmp function wasn't removed [1] and the > new IO library proved to be (as expected) abysmally slow [2][3][4]. > Christian proposed that we release 3.0.1 within the next week to patch > up this critical issues. Thoughts? Seems overly aggressive to me. These prohibit use of 3.0 in production environments; they do not prohibit development in 3.0. I think we should target early January and make it public that we are doing so. That will give more time for any additional similar bugs to get fixed at once. -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From amauryfa at gmail.com Sun Dec 7 01:32:00 2008 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Sun, 7 Dec 2008 01:32:00 +0100 Subject: [Python-Dev] Buildbots for 2.6 and 3.0 In-Reply-To: <8AADF944CB714CE5B31E1FC495735901@RaymondLaptop1> References: <8AADF944CB714CE5B31E1FC495735901@RaymondLaptop1> Message-ID: Hello, On Sun, Dec 7, 2008 at 00:19, Raymond Hettinger wrote: > BTW, 3.0 went out the door with test_binascii failing on windows. > Was surprised that some buildbot wasn't complaining. They were complaining. But not loud enough to stop the release. (see bottom of http://www.python.org/dev/buildbot/3.0/x86%20W2k8%203.0/builds/486/step-test/0 ) -- Amaury Forgeot d'Arc From ncoghlan at gmail.com Sun Dec 7 02:02:21 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 07 Dec 2008 11:02:21 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <20081206143454.GA15293@phd.pp.ru> References: <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> Message-ID: <493B209D.5070306@gmail.com> Oleg Broytmann wrote: > My filemanager > (Midnight Commander, for the matter) shows these files and directories as > "?????.???", but I can chdir to such directories, and I can open such > files. It would be a big bad blow for me if filemanagers (or other > programs) start to filter these filenames. Summary for those without the time to read the longer version below: - File managers, backup managers and similar apps should use the binary APIs worldwide - Most apps in countries where encoding problems are common will also need to use the binary APIs to be acceptable to their uses - Many apps in countries where the 'native' encoding is UTF-8, ASCII or latin-1 will be able to use the Unicode APIs without any issues whatsoever - Apps targeting a limited, well-controlled execution environment (e.g. web services) will also be able to use the Unicode APIs - I think the binary and Unicode APIs should be available (and fully functional) on all platforms (including Windows) so that app developers don't create portability problems for themselves when they make the decision as to which API to use ------------- The point about *filesystem* apps (i.e. file managers, backup tools, indexing engines) needing to deal with the imperfect world of dodgy filesystem encodings isn't in dispute at all - that's why the binary alternative APIs were added. The point is that there is a spectrum from providing a completely clean solution that addresses only the ideal case of "file paths and other items such as environment variable names and values retrieved from the OS are always well-formed text in the appropriate default encoding" (which will actually work for large chunks of the planet - those where the locals are native ASCII speakers and those where computers didn't start to enter widespread use until after Unicode was already available) to addressing only the most pessimistic case of "you can't trust the default encoding at all, and need to assume that all strings retrieved from the OS contain arbitrary binary data" (which is actually true for some parts of the planet, but thankfully not for all of it). Hopefully people can at least agree that the first extreme is unacceptable because that ideal world doesn't exist. I personally think that the other extreme is *also* unacceptable, because it burdens every single application developer with dealing with a potential problem that quite simply may not be a problem for them because they're in a situation where the naive assumption of a sane operating environment is actually a valid one for their particular application. The idea of parallel Unicode and bytes APIs means that for those with an appropriately limited target environment and/or audience, the Unicode APIs will "just work", while the developers that aren't so lucky can rely on the binary APIs instead. That's actually the one place where I disagree with Guido: I agree with Adam that the binary APIs *should* be available on Windows. The difference would be that whereas on *nix type systems, the bytes APIs are the 'lower level' that more accurately represents the underlying OS, on Windows it would be the other way around, with the Unicode APIs as the lower level ones, and the binary APIs as wrappers around them that automatically decoded the bytes representation to a Unicode one when writing to the OS, and encoded from Unicode to bytes when reading from the OS. If the binary APIs are missing from a major platform (i.e. Windows) then the choice to use them brings with it a major cross-platform portability problem that should really be handled by the standard library. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From brett at python.org Sun Dec 7 02:10:04 2008 From: brett at python.org (Brett Cannon) Date: Sat, 6 Dec 2008 17:10:04 -0800 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Message-ID: On Sat, Dec 6, 2008 at 15:41, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote: > >> On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson >> wrote: >>> >>> Since the release of 3.0, several critical issues have come to our >>> attention. Namely, the builtin cmp function wasn't removed [1] and the >>> new IO library proved to be (as expected) abysmally slow [2][3][4]. >>> Christian proposed that we release 3.0.1 within the next week to patch >>> up this critical issues. Thoughts? >>> >>> >>> [1] http://bugs.python.org/1717 >>> [2] http://bugs.python.org/4533 >>> [3] http://bugs.python.org/4561 >>> [4] http://bugs.python.org/4565 > > I've set the priority on all these to release blockers, but I have my > reservations about 4561 and 4565. Resolution of those seem like more than a > week or so away. > > If we want to do a bug fix release for 3.0.1, I'd like to do it no later > than the 19th. > +1 just to get rid of cmp(). And if io speedups can happen, great, but they can also wait for 3.0.2. -Brett From ncoghlan at gmail.com Sun Dec 7 02:12:24 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 07 Dec 2008 11:12:24 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493AB3E6.7070806@gmail.com> References: <9418DB6C0B9D434190E54A78E931C3D1087D7491@XCH-NW-7V1.nw.nos.boeing.com> <493AB3E6.7070806@gmail.com> Message-ID: <493B22F8.8090902@gmail.com> Toshio Kuratomi wrote: > Note 2: If there isn't a parallel API on all platforms, for instance, > Guido's proposal to not have os.environb on Windows, then you'll still > have to have a platform specific check. (Likely you should try to access > os.evironb in this instance and if it doesn't exist, use os.environ > instead... and remember that you need to either change os.environ's data > into str type or change os.environb's data into byte type.) Note that this is why I personally think the binary API variants *should* exist on Windows, just with the sense of the system encoding flipped around. That is, on *nix: - underlying OS API uses bytes - binary API just passes values straight through - Unicode API uses the system encoding to encode Unicode names and values to be passed to the OS API and to decode bytes names and values received from the OS API While on Windows: - underlying OS API uses Unicode - Unicode API just passes values straight through - binary API uses the system encoding to decode bytes names and values to be passed to the OS API and to encode Unicode names and values received from the OS API Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Sun Dec 7 02:12:47 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 7 Dec 2008 01:12:47 +0000 (UTC) Subject: [Python-Dev] Python-3.0, unicode, and os.environ References: <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <493B209D.5070306@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > If the binary APIs are missing from a major platform (i.e. Windows) then > the choice to use them brings with it a major cross-platform portability > problem that should really be handled by the standard library. +1 I might also add that providing binary APIs does not prevent us to implement some special representation of broken filenames when using the unicode APIs (for example using private Unicode characters - I'm not sure what the right terminology is - as sometimes suggested). Regards Antoine. From ncoghlan at gmail.com Sun Dec 7 02:27:56 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 07 Dec 2008 11:27:56 +1000 Subject: [Python-Dev] "as" keyword woes In-Reply-To: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> References: <896B75251BA19745A529B1B867893FA5DB0F@planet.delsci.local> Message-ID: <493B269C.9020303@gmail.com> Warren DeLano wrote: > In other words we have lost the ability to refer to "as" as the > generalized OOP-compliant/syntax-independent method name for casting: Other possible spellings: # Use the normal Python idiom for avoiding keyword clashes # and append a trailing underscore new_object = old_object.as_(class_hint) float_obj = int_obj.as_("float") float_obj = int_obj.as_(float_class) # Use a different word (such as, oh, "cast" perhaps?) new_object = old_object.cast(class_hint) float_obj = int_obj.cast("float") float_obj = int_obj.cast(float_class) You could make a PEP if you really wanted to, but it's going to be rejected. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From nd at perlig.de Sun Dec 7 02:35:41 2008 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Sun, 7 Dec 2008 02:35:41 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493B22F8.8090902@gmail.com> References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> Message-ID: <200812070235.41321@news.perlig.de> * Nick Coghlan wrote: > Toshio Kuratomi wrote: > > Note 2: If there isn't a parallel API on all platforms, for instance, > > Guido's proposal to not have os.environb on Windows, then you'll still > > have to have a platform specific check. (Likely you should try to > > access os.evironb in this instance and if it doesn't exist, use > > os.environ instead... and remember that you need to either change > > os.environ's data into str type or change os.environb's data into byte > > type.) > > Note that this is why I personally think the binary API variants > *should* exist on Windows, just with the sense of the system encoding > flipped around. > > That is, on *nix: > - underlying OS API uses bytes > - binary API just passes values straight through > - Unicode API uses the system encoding to encode Unicode names and > values to be passed to the OS API and to decode bytes names and values > received from the OS API > > While on Windows: > - underlying OS API uses Unicode > - Unicode API just passes values straight through > - binary API uses the system encoding to decode bytes names and values > to be passed to the OS API and to encode Unicode names and values > received from the OS API Now that is somewhat strange. That way you'll have two unreliable APIs and need to switch depending on the platform again. nd -- +++++[>++++++<-]>++>++++++[>++++++++++++<-]>++.<++++[>++++++++++<-]>+++.--. +.< <.>++++[>----<-]>---.<+++[>++++<-]>+.+. +++++.<+++[>----<-]>.---.<+++[>++++<-]> +.<<.>+++++[>-------<-]>+.<+++++[>++++<-]>+.<+++[>++++<-]>+.------.<<.>++++++ [>------<-]>.<+++++[>+++++<-]>.++.++++++++.------.<+++[>++++<-]>+. From ncoghlan at gmail.com Sun Dec 7 02:45:52 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 07 Dec 2008 11:45:52 +1000 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081207002032.GA13190@panix.com> References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> <20081207002032.GA13190@panix.com> Message-ID: <493B2AD0.30303@gmail.com> Aahz wrote: > I believe that it would be a shame and a disservice to Python if there > were a large proportion of the Python community that discouraged the use > of 3.0; I also believe it would be a shame and a disservice to Python if > you (and other people) tell conservatives like me that we should keep our > mouths shut. I don't think being honest about the situation is going to hurt anything in the long run. There are lots of advantages to 3.0, but also plenty of good reasons to stick with 2.x as well. At this point in time, my own recommendation would be that if someone doesn't have time to do a proper evaluation of the situation (talking production development here, not "learning for fun"), then I would probably still point them at 2.5. That recommendation will probably change to 2.6 in a couple of months (since it usually takes a few months after a release for the rest of the Python ecosystem to catch up with a new 2.x release). If they have the time though, my recommendation would be for them to do their *own* evaluation, looking both at things that favour 3.0 like Unicode handling and general developer convenience, as well as the things that currently favour 2.x like IO speed and availability of 3rd party libraries. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Sun Dec 7 02:51:30 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 07 Dec 2008 11:51:30 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812070235.41321@news.perlig.de> References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> Message-ID: <493B2C22.5060907@gmail.com> Andr? Malo wrote: >> While on Windows: >> - underlying OS API uses Unicode >> - Unicode API just passes values straight through >> - binary API uses the system encoding to decode bytes names and values >> to be passed to the OS API and to encode Unicode names and values >> received from the OS API > > Now that is somewhat strange. That way you'll have two unreliable APIs and > need to switch depending on the platform again. Sory, system encoding was probably a poor choice of words there, since that generally means mbcs when talking about windows (which would indeed be a very poor choice of encoding). For binary wrappers around the Windows Unicode APIs, I was thinking specifically of using UTF-8, since that should be able to encode anything the Unicode APIs can handle. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From martin at v.loewis.de Sun Dec 7 02:56:44 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 07 Dec 2008 02:56:44 +0100 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081207002032.GA13190@panix.com> References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> <20081207002032.GA13190@panix.com> Message-ID: <493B2D5C.4090505@v.loewis.de> > Sorry, I don't think I can do that. It's difficult-to-impossible to leap > straight from Python 2.2 or 2.3 to 3.0 My experience is different. That is very well possible (of course, I haven't heard in a long time of a project that needs to maintain compatibility with 2.2). Regards, Martin From martin at v.loewis.de Sun Dec 7 02:58:00 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 07 Dec 2008 02:58:00 +0100 Subject: [Python-Dev] Buildbots for 2.6 and 3.0 In-Reply-To: References: Message-ID: <493B2DA8.2000105@v.loewis.de> > Looking at http://www.python.org/dev/buildbot/, we are still missing buildbots > for the release26-maint and release30-maint branches. Is someone working on that? Yes. I won't enable 2.6 build slaves until 2.5.3 is released, but will afterwards. Regards, Martin From aahz at pythoncraft.com Sun Dec 7 04:31:16 2008 From: aahz at pythoncraft.com (Aahz) Date: Sat, 6 Dec 2008 19:31:16 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493B209D.5070306@gmail.com> References: <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <493B209D.5070306@gmail.com> Message-ID: <20081207033116.GB12097@panix.com> On Sun, Dec 07, 2008, Nick Coghlan wrote: > > If the binary APIs are missing from a major platform (i.e. Windows) then > the choice to use them brings with it a major cross-platform portability > problem that should really be handled by the standard library. +1 -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From warren at delsci.com Sun Dec 7 05:19:08 2008 From: warren at delsci.com (Warren DeLano) Date: Sat, 6 Dec 2008 20:19:08 -0800 Subject: [Python-Dev] "as" keyword woes Message-ID: <896B75251BA19745A529B1B867893FA5DB11@planet.delsci.local> > Date: Sat, 6 Dec 2008 12:13:16 -0800 (PST) > From: Carl Banks > Subject: Re: "as" keyword woes > To: python-list at python.org > Message-ID: > > (snip) > > If you write a PEP, I advise you to try to sound less whiny and than > you have in this thread. > > (snip) Ehem, well, such comments notwithstanding, I thank everyone who responded to my latest post on this topic for taking my inquiry seriously, and for providing cogent, focused, well-reasoned feedback while not resorting to name-calling, to false accusations on top of baseless assumptions, or to explicit personal attacks on my competence, sincerity, experience, credibility, or form. To you especially, I am grateful for your input for your years of service to the community and to the noble ideals you embody in the Python project. May the rest of us (not just myself) be ashamed of our lesser conduct and learn from you exemplary performance. So to summarize, having assimilated all responses over the past several days (python-list as well as python-dev, for the newcomers), I now accept the following as self-evident: -> "as", as a Python keyword, is a here to stay: Love it or leave it. -> Likewise ditto for the GIL: if you truly need Python concurrency within a single process, then use a Python implementation other than CPython. Season's greetings to all! Peace. Cheers, Warren From guido at python.org Sun Dec 7 05:20:07 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Dec 2008 20:20:07 -0800 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <493B2AD0.30303@gmail.com> References: <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> <20081207002032.GA13190@panix.com> <493B2AD0.30303@gmail.com> Message-ID: On Sat, Dec 6, 2008 at 5:45 PM, Nick Coghlan wrote: > Aahz wrote: >> I believe that it would be a shame and a disservice to Python if there >> were a large proportion of the Python community that discouraged the use >> of 3.0; I also believe it would be a shame and a disservice to Python if >> you (and other people) tell conservatives like me that we should keep our >> mouths shut. I hope I am not perceived as telling you to keep your mouth shut. I am merely hoping that you will decide for yourself after having heard me out. > I don't think being honest about the situation is going to hurt anything > in the long run. There are lots of advantages to 3.0, but also plenty of > good reasons to stick with 2.x as well. > > At this point in time, my own recommendation would be that if someone > doesn't have time to do a proper evaluation of the situation (talking > production development here, not "learning for fun"), then I would > probably still point them at 2.5. That recommendation will probably > change to 2.6 in a couple of months (since it usually takes a few months > after a release for the rest of the Python ecosystem to catch up with a > new 2.x release). > > If they have the time though, my recommendation would be for them to do > their *own* evaluation, looking both at things that favour 3.0 like > Unicode handling and general developer convenience, as well as the > things that currently favour 2.x like IO speed and availability of 3rd > party libraries. That sounds right. I just heard (via Martin) that PEP 3131 (Unicode letters in identifiers) is already a big hit in Japan. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Sun Dec 7 05:53:07 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 6 Dec 2008 21:53:07 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493B2C22.5060907@gmail.com> References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> Message-ID: On Sat, Dec 6, 2008 at 6:51 PM, Nick Coghlan wrote: > Andr? Malo wrote: >>> While on Windows: >>> - underlying OS API uses Unicode >>> - Unicode API just passes values straight through >>> - binary API uses the system encoding to decode bytes names and values >>> to be passed to the OS API and to encode Unicode names and values >>> received from the OS API >> >> Now that is somewhat strange. That way you'll have two unreliable APIs and >> need to switch depending on the platform again. > > Sory, system encoding was probably a poor choice of words there, since > that generally means mbcs when talking about windows (which would indeed > be a very poor choice of encoding). > > For binary wrappers around the Windows Unicode APIs, I was thinking > specifically of using UTF-8, since that should be able to encode > anything the Unicode APIs can handle. If the Unicode APIs only have correct unicode, sure. If not you'll get errors translating to UTF-8 (and the byte APIs are supposed to pass bad names through unaltered.) Kinda ironic, no? -- Adam Olsen, aka Rhamphoryncus From a.badger at gmail.com Sun Dec 7 07:07:08 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Sat, 06 Dec 2008 22:07:08 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> Message-ID: <493B680C.6010605@gmail.com> Guido van Rossum wrote: > On Sat, Dec 6, 2008 at 10:53 AM, wrote: >> I find it interesting to note that the only users in this discussion who >> actually have these problems in real life all have this attitude. It is >> expected that in an imperfect world we will have imperfect encodings, but it >> is super important that software which can open files can deal with not >> understanding the character translation of the filename. > > For file managers and similar tools I am absolutely 100% in agreement > -- that's why the binary APIs are there. > > Most apps aren't file managers or ftp clients though. The sky is not falling. > I agree that the sky is not falling (as long as we get a binary API for env vars in 3.1) but I'm still wondering what the use case you see is. Most apps aren't file managers or ftp clients but when they interact with files (for instance, a file selection dialog) they need to be able to show the user all the relevant files. So on an app-by-app basis the need for this is high. On a code basis, I'd hope that most file selection dialogs are pulled out into libraries... but that still doesn't help me identify when someone would expect that asking python for a list of all files in a directory or a specific set of files in a directory should, without warning, return only a subset of them. In what situations is this appropriate behaviour? -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From glyph at divmod.com Sun Dec 7 08:05:48 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Sun, 07 Dec 2008 07:05:48 -0000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493B680C.6010605@gmail.com> References: <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> Message-ID: <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> On 06:07 am, a.badger at gmail.com wrote: >Guido van Rossum wrote: >>On Sat, Dec 6, 2008 at 10:53 AM, wrote: > >>>I find it interesting to note that the only users in this discussion >>>who >>>actually have these problems in real life all have this attitude. >>For file managers and similar tools I am absolutely 100% in agreement >>-- that's why the binary APIs are there. >>Most apps aren't file managers or ftp clients though. The sky is not >>falling. >Most apps aren't file managers or ftp clients but when they interact >with files (for instance, a file selection dialog) they need to be able >to show the user all the relevant files. So on an app-by-app basis the >need for this is high. While I tend to agree emphatically with this, the *real* solution here is a path-abstraction library. In separate discussions, the difficulty of getting such a thing into the standard library has been discussed, due to the wide variety of opinions as to what it should look like (and the shocking level of difficulty involved in making such a thing really work correctly). I'd be very happy to talk to you off-list about my ideas for such a thing, but I'd rather not resurrect yet another tedious discussion here just now :). >On a code basis, I'd hope that most file >selection dialogs are pulled out into libraries... but that still >doesn't help me identify when someone would expect that asking python >for a list of all files in a directory or a specific set of files in a >directory should, without warning, return only a subset of them. In >what situations is this appropriate behaviour? If you say listdir(unicode) on a POSIX OS, your program is saying "I only know how to deal with unicode results from this function, so please only give me those.". If your program is smart enough to deal with bytes, then you would have asked for bytes, no? Returning only filenames which can be properly decoded makes sense. Otherwise everyone needs to learn about this highly confusing issue, even for the simplest scripts. Skipping undecodable values is good enough that it will work 90% of the time. When you need to get to 100%, it won't be impossible - the bytes APIs will be there. In the longer term, hopefully some path abstraction will eventually be there too. We should not wait for a perfectly correct path abstraction to arrive before providing the primitives to do it yourself, though. From hfuerstenau at gmx.net Sun Dec 7 10:19:52 2008 From: hfuerstenau at gmx.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=) Date: Sun, 07 Dec 2008 10:19:52 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> Message-ID: <493B9538.9080107@gmx.net> > If the Unicode APIs only have correct unicode, sure. If not you'll > get errors translating to UTF-8 (and the byte APIs are supposed to > pass bad names through unaltered.) Kinda ironic, no? As far as I can see all Python Unicode strings can be encoded to UTF-8, even things like lone surrogates because Python doesn't care about them. So both the Unicode API and the binary API would be fail-safe on Windows. - Hagen From rhamph at gmail.com Sun Dec 7 10:21:01 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 7 Dec 2008 02:21:01 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493B923F.6010706@gmx.net> References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> Message-ID: On Sun, Dec 7, 2008 at 2:07 AM, Hagen F?rstenau wrote: >> If the Unicode APIs only have correct unicode, sure. If not you'll >> get errors translating to UTF-8 (and the byte APIs are supposed to >> pass bad names through unaltered.) Kinda ironic, no? > > As far as I can see all Python Unicode strings can be encoded to UTF-8, > even things like lone surrogates because Python doesn't care about them. > So both the Unicode API and the binary API would be fail-safe on Windows. Python is broken and needs to be fixed. http://bugs.python.org/issue3672 http://bugs.python.org/issue3297 -- Adam Olsen, aka Rhamphoryncus From hfuerstenau at gmx.net Sun Dec 7 10:35:15 2008 From: hfuerstenau at gmx.net (=?ISO-8859-1?Q?Hagen_F=FCrstenau?=) Date: Sun, 07 Dec 2008 10:35:15 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> Message-ID: <493B98D3.8070405@gmx.net> >> As far as I can see all Python Unicode strings can be encoded to UTF-8, >> even things like lone surrogates because Python doesn't care about them. >> So both the Unicode API and the binary API would be fail-safe on Windows. > > Python is broken and needs to be fixed. > > http://bugs.python.org/issue3672 > http://bugs.python.org/issue3297 But the question of whether Python should care about lone surrogates or not is at best tangential to the issue at hand. If you have lone surrogates in the Unicode API (and didn't raise an exception on the way getting there), then the sensible thing is to encode them into lone UTF-8 surrogates. Even if you wanted to prevent lone surrogates, encoding to UTF-8 for the binary API would not be the place to enforce it. - Hagen From g.brandl at gmx.net Sun Dec 7 12:41:03 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 07 Dec 2008 12:41:03 +0100 Subject: [Python-Dev] Rewrite map for old URLs in place Message-ID: Hi, with a bit of delay I finally got around to creating a mod_rewrite map of the 2.5 URLs. URLs like http://docs.python.org/tut/node3.html will now point permanently to the new URL. Let me know if you find a problem. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From ncoghlan at gmail.com Sun Dec 7 13:55:08 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 07 Dec 2008 22:55:08 +1000 Subject: [Python-Dev] Rewrite map for old URLs in place In-Reply-To: References: Message-ID: <493BC7AC.50405@gmail.com> Georg Brandl wrote: > Hi, > > with a bit of delay I finally got around to creating a mod_rewrite map of > the 2.5 URLs. URLs like http://docs.python.org/tut/node3.html will now > point permanently to the new URL. > > Let me know if you find a problem. Excellent news! Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From steve at holdenweb.com Sun Dec 7 14:38:58 2008 From: steve at holdenweb.com (Steve Holden) Date: Sun, 07 Dec 2008 08:38:58 -0500 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Message-ID: <493BD1F2.5080300@holdenweb.com> Brett Cannon wrote: > On Sat, Dec 6, 2008 at 15:41, Barry Warsaw wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote: >> >>> On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson >>> wrote: >>>> Since the release of 3.0, several critical issues have come to our >>>> attention. Namely, the builtin cmp function wasn't removed [1] and the >>>> new IO library proved to be (as expected) abysmally slow [2][3][4]. >>>> Christian proposed that we release 3.0.1 within the next week to patch >>>> up this critical issues. Thoughts? >>>> >>>> >>>> [1] http://bugs.python.org/1717 >>>> [2] http://bugs.python.org/4533 >>>> [3] http://bugs.python.org/4561 >>>> [4] http://bugs.python.org/4565 >> I've set the priority on all these to release blockers, but I have my >> reservations about 4561 and 4565. Resolution of those seem like more than a >> week or so away. >> >> If we want to do a bug fix release for 3.0.1, I'd like to do it no later >> than the 19th. >> > > +1 just to get rid of cmp(). And if io speedups can happen, great, but > they can also wait for 3.0.2. > A point release just to remove a function whose withdrawal has been advertised as a 3.0 change hardly seems worth the substantial effort of cutting a release. If cmp() shouldn't have been in 3.0 and was then there's surely no problem about removing it later as promised: anyone who uses it in 3.0 code shouldn't be. If it doesn't have to wait for a major release then is there any real need to cut the minor release immediately? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From steve at holdenweb.com Sun Dec 7 14:38:58 2008 From: steve at holdenweb.com (Steve Holden) Date: Sun, 07 Dec 2008 08:38:58 -0500 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> Message-ID: <493BD1F2.5080300@holdenweb.com> Brett Cannon wrote: > On Sat, Dec 6, 2008 at 15:41, Barry Warsaw wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> On Dec 6, 2008, at 6:25 PM, Guido van Rossum wrote: >> >>> On Sat, Dec 6, 2008 at 3:18 PM, Benjamin Peterson >>> wrote: >>>> Since the release of 3.0, several critical issues have come to our >>>> attention. Namely, the builtin cmp function wasn't removed [1] and the >>>> new IO library proved to be (as expected) abysmally slow [2][3][4]. >>>> Christian proposed that we release 3.0.1 within the next week to patch >>>> up this critical issues. Thoughts? >>>> >>>> >>>> [1] http://bugs.python.org/1717 >>>> [2] http://bugs.python.org/4533 >>>> [3] http://bugs.python.org/4561 >>>> [4] http://bugs.python.org/4565 >> I've set the priority on all these to release blockers, but I have my >> reservations about 4561 and 4565. Resolution of those seem like more than a >> week or so away. >> >> If we want to do a bug fix release for 3.0.1, I'd like to do it no later >> than the 19th. >> > > +1 just to get rid of cmp(). And if io speedups can happen, great, but > they can also wait for 3.0.2. > A point release just to remove a function whose withdrawal has been advertised as a 3.0 change hardly seems worth the substantial effort of cutting a release. If cmp() shouldn't have been in 3.0 and was then there's surely no problem about removing it later as promised: anyone who uses it in 3.0 code shouldn't be. If it doesn't have to wait for a major release then is there any real need to cut the minor release immediately? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From ziade.tarek at gmail.com Sun Dec 7 18:27:54 2008 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Sun, 7 Dec 2008 18:27:54 +0100 Subject: [Python-Dev] distutils patches, request for review Message-ID: <94bdd2610812070927o5154c4edx9114c4a006edb9d@mail.gmail.com> Hi, I am looking for a core developer to review a few patches for distutils. #1 is mandatory (it removes a bad bug) #2 is very nice to have #3 to #5 are test coverage and code beautication In order: 1. #4400 : the default generated .pypirc is broken. This patch fixes it: http://bugs.python.org/issue4400 2. #4394 : no need to store the password in pypirc anymore : using the prompt if not stored. http://bugs.python.org/issue4394 3. #2461 : more test coverage. http://bugs.python.org/issue2461 4. #3992 : removes custom log implementation -> uses logging instead. http://bugs.python.org/issue3992 5. #3985 : more cleanup. http://bugs.python.org/issue3985 6. #3986 : http://bugs.python.org/issue3986 Some of them are a few month old so I can refresh the patch on the current trunk(s) as soon as they are picked. Regards Tarek -- Tarek Ziad? | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/ From guido at python.org Sun Dec 7 18:32:51 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Dec 2008 09:32:51 -0800 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <493BD1F2.5080300@holdenweb.com> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> Message-ID: On Sun, Dec 7, 2008 at 5:38 AM, Steve Holden wrote: > A point release just to remove a function whose withdrawal has been > advertised as a 3.0 change hardly seems worth the substantial effort of > cutting a release. If cmp() shouldn't have been in 3.0 and was then > there's surely no problem about removing it later as promised: anyone > who uses it in 3.0 code shouldn't be. > > If it doesn't have to wait for a major release then is there any real > need to cut the minor release immediately? Well, since 2to3 doesn't remove cmp, and it actually works, it's likely that people will be accidentally depending on it in code converted from 2.x. In the past, where there was a discrepancy between docs and code, we've often ruled in favor of the code using arguments like "it always worked like this so we'll break working code if we change it now". There's clearly an argument of timeliness there, which is why we'd like to get this fixed ASAP. The alternative, which nobody likes, would be to keep it around, deprecate it in 3.1, and remove it in 3.2 or 3.3. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Sun Dec 7 18:35:53 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 7 Dec 2008 10:35:53 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493B98D3.8070405@gmx.net> References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> Message-ID: On Sun, Dec 7, 2008 at 2:35 AM, Hagen F?rstenau wrote: >>> As far as I can see all Python Unicode strings can be encoded to UTF-8, >>> even things like lone surrogates because Python doesn't care about them. >>> So both the Unicode API and the binary API would be fail-safe on Windows. >> >> Python is broken and needs to be fixed. >> >> http://bugs.python.org/issue3672 >> http://bugs.python.org/issue3297 > > But the question of whether Python should care about lone surrogates or > not is at best tangential to the issue at hand. If you have lone > surrogates in the Unicode API (and didn't raise an exception on the way > getting there), then the sensible thing is to encode them into lone > UTF-8 surrogates. Even if you wanted to prevent lone surrogates, > encoding to UTF-8 for the binary API would not be the place to enforce it. No. Unicode *requires* them to be treated as errors. If you want to pass them through then you're creating a custom encoding... which you might argue for in this case, but it needs to be clearly separate from the real UTF-8. -- Adam Olsen, aka Rhamphoryncus From a.badger at gmail.com Sun Dec 7 19:03:13 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Sun, 07 Dec 2008 10:03:13 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> References: <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> Message-ID: <493C0FE1.30506@gmail.com> glyph at divmod.com wrote: > > On 06:07 am, a.badger at gmail.com wrote: >> Most apps aren't file managers or ftp clients but when they interact >> with files (for instance, a file selection dialog) they need to be able >> to show the user all the relevant files. So on an app-by-app basis the >> need for this is high. > > While I tend to agree emphatically with this, the *real* solution here > is a path-abstraction library. Why don't you send me some information offlist. I'm not sure I agree that a path-abstraction library can work correctly but if it can it would be nice to have that at a level higher than the file-dialog libraries that I was envisioning. [snip] >> ... but that still >> doesn't help me identify when someone would expect that asking python >> for a list of all files in a directory or a specific set of files in a >> directory should, without warning, return only a subset of them. In >> what situations is this appropriate behaviour? > > If you say listdir(unicode) on a POSIX OS, your program is saying "I > only know how to deal with unicode results from this function, so please > only give me those.". No. (explained below) > If your program is smart enough to deal with > bytes, then you would have asked for bytes, no? Yes (explained below) > Returning only > filenames which can be properly decoded makes sense. Otherwise everyone > needs to learn about this highly confusing issue, even for the simplest > scripts. > os.listdir(unicode) (currently) means that the *programmer* is asking that the stdlib return the decodable filenames from this directory. The question is whether the programmer understood that this is what they were asking for and whether it is what they most likely want. I would make the following statements WRT to this: 1) The programmer most likely does not want decodable filenames and only decodable filename. If they were, we'd see a lot of python2.x code that turns pathnames into unicode and discards everything that wasn't decodable. No one has given a use case for finding only the *decodable* subset of files. If I request to see all *.py files in a directory, I want to see all of the *.py files in the directory, decodable or not. If you can show how programmers intend "90%" of their calls to os.listdir()/glob.glob('*.txt') to show only the decodable subset of the results, then the foundation of my arguments is gone. So please, give examples to prove this wrong. - If this is true, a definition of os.listdir() that would better meet programmer expectation would be: "Give me all files in a directory with the output as str type". The definition of os.listdir() would be "Give me all files in a directory with the output as bytes type". Raising an exception when the filenames are undecodable is perfectly reasonable in this situation. 2) For the programmer to understand the difference between os.listdir() and os.listdir() they have to understand the "highly confusing issue" and what it means for their code. So the current method is forcing programmers to understand it even for the simplest scripts if their environment is not uniform with no clue from the interpreter that there is an issue. - Similarly, raising an exception on undecodable values means that the programmer can ignore the issue in any scripts in sane environments and will be told that they need to deal with it (via an exception) when their script runs in a non-sane environment. 3) The usage of unicode vs bytes is easy to miss for someone starting with py2.x or windows and moving to a multi-platform or unix project. Even simple testing won't reveal the problem unless the programmer knows that they have to test what happens when encodings are mixed. Once again, this is requiring the programmer to understand the encoding issue without help from the interpreter. > Skipping undecodable values is good enough that it will work 90% of the > time. You and Guido have now made this claim to defend not raising an exception but I still don't have a use case. Here are use cases that I see: * Bill is coding an application for use inside his company. His company only uses utf-8. His code naively uses os.listdir(). - The code does not throw an exception whether we use the current os.listdir() or one that could throw an exception because the system admins have sanitised the environment. Bill did not need to understand the implications of encoding for his code to work in this script whether simple or complex. * Mary is coding an application for use inside her company. It finds all html files on a system and updates her company's copyright, privacy policy, and other legal boilerplate. Her expectation is that after her program runs every file will have been updated. Her environment is a mixture of different filename encodings due to having many legacy documents for users in different locales. Mary's code also naively uses os.listdir(). Her test case checks that the code does the right thing on many languages but unfortunately doesn't check with different encodings because she'd have to already understand the encoding issue to check for that. - With the current approach, the code will silently do the wrong thing in production for years, until someone notices and alerts the company that something is wrong with certain files in certain locales. By then, Mary may no longer be involved with the company and there are thousands of users who thought they were operating under the old legal terms instead of the new ones. - With exceptions raised, Mary will be alerted of the problem when she tries to run the code in production for the first time. She can then do a little research and fix it to run correctly. The traceback that's issued can be googled and the line that it points to will show where the error is occurring. * Arthur's company has shipped some of his code in a product. The code uses os.listdir() to find images and movies in a directory subsequent to deciding if they contain pornography. A cron job runs the code and the messages it prints are sent by cron to the system admins to take action on. A customer calls to complain that the code did not detect that a recently fired employee had a 30 minute pornographic movie on his office computer. Arthur has to figure out why. - With the current code, Arthur might start with the algorithms that examines the movies, try to get samples of the pornography from the company, and look in many wrong places before finding out that the code that searches for files is not listing all the files in directories. - With tracebacks raised, the system admins, at least, will have received messages from cron stating that the undecodable filenames are causing errors that need to be addressed. They can call Arthur's company when they notice this and Arthur can fix it quickly because the traceback contains all the necessary information. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From murman at gmail.com Sun Dec 7 19:18:19 2008 From: murman at gmail.com (Michael Urman) Date: Sun, 7 Dec 2008 12:18:19 -0600 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> Message-ID: On Sun, Dec 7, 2008 at 11:35, Adam Olsen wrote: >>> http://bugs.python.org/issue3672 >>> http://bugs.python.org/issue3297 > > No. Unicode *requires* them to be treated as errors. If you want to > pass them through then you're creating a custom encoding... which you > might argue for in this case, but it needs to be clearly separate from > the real UTF-8. I suspect it is a common and convenient but (according to what you say) misconceived expectation that using UTF-8 to encode any Unicode string will not raise an exception. This behavior is not something which should be discarded lightly. I see little reason that this couldn't be a new codec or error handler that allowed people to choose between correct pure UTF-8 behavior or the technically incorrect but very practical behavior it currently has. [My apologies, Adam, for sending this only to you the first time] -- Michael Urman From rhamph at gmail.com Sun Dec 7 19:56:35 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 7 Dec 2008 11:56:35 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> Message-ID: On Sun, Dec 7, 2008 at 11:18 AM, Michael Urman wrote: > On Sun, Dec 7, 2008 at 11:35, Adam Olsen wrote: >>>> http://bugs.python.org/issue3672 >>>> http://bugs.python.org/issue3297 >> >> No. Unicode *requires* them to be treated as errors. If you want to >> pass them through then you're creating a custom encoding... which you >> might argue for in this case, but it needs to be clearly separate from >> the real UTF-8. > > I suspect it is a common and convenient but (according to what you > say) misconceived expectation that using UTF-8 to encode any Unicode > string will not raise an exception. This behavior is not something > which should be discarded lightly. It is *not* a valid Unicode string in the first place. Therein lies the problem. > I see little reason that this couldn't be a new codec or error handler > that allowed people to choose between correct pure UTF-8 behavior or > the technically incorrect but very practical behavior it currently > has. Note that many of the restrictions were added for security reasons. You might receive a UTF-8 encoded file name from a malicious user, check if it contains something dangerous (like "../../../../../etc/password"), then decode it. If your decoder isn't compliant (ie doesn't check for overly long sequences) then a b'\xC0\xAF' gets translated into u'/', bypassing your previous check. However, in this context we only need to allow lone surrogates. CESU-8 comes to mind. (It is a perverse world we live in.) -- Adam Olsen, aka Rhamphoryncus From paul at boddie.org.uk Sun Dec 7 22:06:21 2008 From: paul at boddie.org.uk (Paul Boddie) Date: Sun, 7 Dec 2008 22:06:21 +0100 Subject: [Python-Dev] "as" keyword woes Message-ID: <200812072206.21908.paul@boddie.org.uk> On Sat Dec 6 21:29:09 CET 2008, Guido van Rossum wrote: > > On Sat, Dec 6, 2008 at 11:38 AM, Warren DeLano > wrote: > > As someone somewhat knowledgable of how parsers work, I do not > > understand why a method/attribute name "object_name.as(...)" must > > necessarily conflict with a standalone keyword " as ". It seems to me > > that it should be possible to unambiguously separate the two without > > ambiguity or undue complication of the parser. > > That's possible with sufficiently powerful parser technology, but > that's not how the Python parser (and most parsers, in my experience) > treat reserved words. Reserved words are reserved in all contexts, > regardless of whether ambiguity could arise. Just a quick aside from someone who merely lurks on this list: in SQL, it's quite possible to use keywords in a fashion similar to that desired by the inquirer, and it's actually possible to double-quote keywords and use them as names for things. I'm not advocating more complicated parsing technology for any Python implementation, but I think it's pertinent to point out that the technology isn't particularly obscure. Apologies for the interruption, Paul From martin at v.loewis.de Sun Dec 7 22:10:18 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 07 Dec 2008 22:10:18 +0100 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> Message-ID: <493C3BBA.1040106@v.loewis.de> > There's clearly an argument of timeliness there, which > is why we'd like to get this fixed ASAP. I think it is still timely when fixed in January or February. In fact, releasing it still in December might not be possible, due to the limited time available. Regards, Martin From tjreedy at udel.edu Sun Dec 7 22:20:06 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 07 Dec 2008 16:20:06 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493C0FE1.30506@gmail.com> References: <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: Toshio Kuratomi wrote: > - If this is true, a definition of os.listdir() that would > better meet programmer expectation would be: "Give me all files in a > directory with the output as str type". The definition of > os.listdir() would be "Give me all files in a directory > with the output as bytes type". Raising an exception when the filenames > are undecodable is perfectly reasonable in this situation. Your examples (snipped) pretty well convince me that there is a use case for raising exceptions. We should move beyond arguing over which one way is right. I think there should be a second argument 'ignorebad=False' to ignore undecodable files rather than raise the exception (or 'strict=True' to stop and raise exception on non-decodable names -- then code is 'if strict: raise ...'). I believe other functions have a similar parameter. tjr From guido at python.org Sun Dec 7 22:33:57 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Dec 2008 13:33:57 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy wrote: > Toshio Kuratomi wrote: > >> - If this is true, a definition of os.listdir() that would >> better meet programmer expectation would be: "Give me all files in a >> directory with the output as str type". The definition of >> os.listdir() would be "Give me all files in a directory >> with the output as bytes type". Raising an exception when the filenames >> are undecodable is perfectly reasonable in this situation. > > Your examples (snipped) pretty well convince me that there is a use case for > raising exceptions. We should move beyond arguing over which one way is > right. I think there should be a second argument 'ignorebad=False' to > ignore undecodable files rather than raise the exception (or 'strict=True' > to stop and raise exception on non-decodable names -- then code is 'if > strict: raise ...'). I believe other functions have a similar parameter. If you want the exceptions, just use the bytes API and try to decode the byte strings using the system encoding. My problem with raising exceptions *by default* when an undecodable name exists is that it may render an app completely useless in a situation where the developer is no longer around. This happened all the time with the 2.x Unicode API, where the developer hadn't anticipated a particular input potentially containing non-ASCII bytes, and the user fed the application non-ASCII text. Making os.listdir raise an exception when a directory contains a single undecodable file means that the entire directory can't be read, and most likely the entire app crashes at that point. Most likely the developer never anticipated this situation (since in most places it is either impossible or very unlikely) -- after all, if they had anticipated it they would have used the bytes API in the first place. (It's worse because the exception being raised would be UnicodeError -- most people expect os.listdir to raise OSError, not other errors.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fabiofz at gmail.com Sun Dec 7 22:46:57 2008 From: fabiofz at gmail.com (Fabio Zadrozny) Date: Sun, 7 Dec 2008 19:46:57 -0200 Subject: [Python-Dev] Nonlocal shortcut Message-ID: Hi, I'm currently implementing a parser to handle Python 3.0, and one of the points I found conflicting with the grammar specification is the PEP 3104. It says that a shortcut would be added to Python 3.0 so that "nonlocal x = 0" can be written. However, the latest grammar specification (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) doesn't seem to take that into account... So, can someone enlighten me on what should be the correct treatment for that on a grammar that wants to support Python 3.0? Thanks, Fabio From ncoghlan at gmail.com Sun Dec 7 22:49:41 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 08 Dec 2008 07:49:41 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: <493C44F5.80806@gmail.com> Terry Reedy wrote: > Toshio Kuratomi wrote: > >> - If this is true, a definition of os.listdir() that would >> better meet programmer expectation would be: "Give me all files in a >> directory with the output as str type". The definition of >> os.listdir() would be "Give me all files in a directory >> with the output as bytes type". Raising an exception when the filenames >> are undecodable is perfectly reasonable in this situation. > > Your examples (snipped) pretty well convince me that there is a use case > for raising exceptions. We should move beyond arguing over which one > way is right. I think there should be a second argument > 'ignorebad=False' to ignore undecodable files rather than raise the > exception (or 'strict=True' to stop and raise exception on non-decodable > names -- then code is 'if strict: raise ...'). I believe other > functions have a similar parameter. If we were going to do anything like that for os.listdir() and other filesystem APIs (like glob) that return multiple paths, we'd probably be best advised to just have a normal Unicode 'errors' parameter which allowed: 'strict' - raise an Exception for malformed binary data 'replace' - insert '?' or some other symbol in place of malformed binary data 'ignore' - simply leave out the malformed binary data 'skip' - run the underlying codec in strict mode, but skip over any items which raise UnicodeDecodeError (default/current Py3k behaviour) Obviously, 'skip' doesn't make any sense for APIs like getcwd() that return a single value - a case could be made for those defaulting to either replace or strict. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From amauryfa at gmail.com Sun Dec 7 23:45:09 2008 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Sun, 7 Dec 2008 23:45:09 +0100 Subject: [Python-Dev] Nonlocal shortcut In-Reply-To: References: Message-ID: Hello, Fabio Zadrozny wrote: > Hi, > > I'm currently implementing a parser to handle Python 3.0, and one of > the points I found conflicting with the grammar specification is the > PEP 3104. > > It says that a shortcut would be added to Python 3.0 so that "nonlocal > x = 0" can be written. However, the latest grammar specification > (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) > doesn't seem to take that into account... So, can someone enlighten me > on what should be the correct treatment for that on a grammar that > wants to support Python 3.0? An issue was already filed about this: http://bugs.python.org/issue4199 It should be ready for inclusion in 3.0.1. -- Amaury Forgeot d'Arc From greg.ewing at canterbury.ac.nz Mon Dec 8 00:42:50 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 08 Dec 2008 12:42:50 +1300 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493B2C22.5060907@gmail.com> References: <493AB3E6.7070806@gmail.com> <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> Message-ID: <493C5F7A.9070105@canterbury.ac.nz> Nick Coghlan wrote: > For binary wrappers around the Windows Unicode APIs, I was thinking > specifically of using UTF-8, since that should be able to encode > anything the Unicode APIs can handle. Why shouldn't the binary interface just expose the raw utf16 as bytes? -- Greg From tjreedy at udel.edu Mon Dec 8 00:53:37 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 07 Dec 2008 18:53:37 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: Guido van Rossum wrote: > On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy wrote: >> Toshio Kuratomi wrote: >> >>> - If this is true, a definition of os.listdir() that would >>> better meet programmer expectation would be: "Give me all files in a >>> directory with the output as str type". The definition of >>> os.listdir() would be "Give me all files in a directory >>> with the output as bytes type". Raising an exception when the filenames >>> are undecodable is perfectly reasonable in this situation. >> Your examples (snipped) pretty well convince me that there is a use case for >> raising exceptions. We should move beyond arguing over which one way is >> right. I think there should be a second argument 'ignorebad=False' to >> ignore undecodable files rather than raise the exception (or 'strict=True' >> to stop and raise exception on non-decodable names -- then code is 'if >> strict: raise ...'). I believe other functions have a similar parameter. I was thinking of the "normal Unicode 'errors' parameter", as described by Nick. > If you want the exceptions, just use the bytes API and try to decode > the byte strings using the system encoding. If it was a matter of adding a new method, I might agree. But: 1. We already have a method that does exactly what you describe. It is only a matter of adding flexibility to the response to problems, for which there is already precedent. 2. Suggesting that people who want strings and not bytes should have to deal with bytes, just to get an error notification, seems to negate that point of moving to 3.0 3. A builtin would probably do so better than most programmers would, with little touches such as the one suggested below. 4. An error parameter would ALERT programmers to the possibility of a PROBLEM, both in the present and future. As you say below, people need to better anticipate the future. > My problem with raising exceptions *by default* when an undecodable > name exists is that it may render an app completely useless in a > situation where the developer is no longer around. This happened all > the time with the 2.x Unicode API, where the developer hadn't > anticipated a particular input potentially containing non-ASCII bytes, > and the user fed the application non-ASCII text. Making os.listdir > raise an exception when a directory contains a single undecodable file > means that the entire directory can't be read, and most likely the > entire app crashes at that point. Most likely the developer never > anticipated this situation (since in most places it is either > impossible or very unlikely) -- after all, if they had anticipated it > they would have used the bytes API in the first place. (It's worse > because the exception being raised would be UnicodeError -- most > people expect os.listdir to raise OSError, not other errors.) This to be is an argument for keeping the default the current behavior, but not for rejecting flexibility. The computing world seems to be messier than we would like and worse that I realized until this week. As you say below, people need to better anticipate the future, and an errors parameter would help do that. Is Windows really immune? What about when it reads the directory of possibly old removable media with whatever byte name encodings? Is this a possible source of 'unanticipated' problems? As to your last sentence, os.listdir() with an errors parameter could convert a decoding UnicodeError to "OSError: undecodable file name ", thereby supplying the expected exception as well as an extractable representation of problematical the raw bytes Here is a possible use case: I want filenames as 3.0 strings and I anticipate no problems at present but, as you say above, something might happen years in the future. I am using 3.0 *because* of the strings == unicode feature. I would like to write try: files = os.listdir(somedir, errors = strict) except OSError as e: log() files = os.listdir(somedir) and go one without the problem file but not without logging the problem so a future maintainer can consider what to do about it, but only when there is an actual need to think about it. Terry Jan Reedy From tjreedy at udel.edu Mon Dec 8 01:02:01 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 07 Dec 2008 19:02:01 -0500 Subject: [Python-Dev] Nonlocal shortcut In-Reply-To: References: Message-ID: Fabio Zadrozny wrote: > Hi, > > I'm currently implementing a parser to handle Python 3.0, and one of > the points I found conflicting with the grammar specification is the > PEP 3104. > > It says that a shortcut would be added to Python 3.0 so that "nonlocal > x = 0" can be written. As near as I can tell from testing, that did not happen. The PEP needs revision to delete that or push it to a later version. > However, the latest grammar specification > (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) > doesn't seem to take that into account... So, can someone enlighten me > on what should be the correct treatment for that on a grammar that > wants to support Python 3.0? From lists at cheimes.de Mon Dec 8 01:05:13 2008 From: lists at cheimes.de (Christian Heimes) Date: Mon, 08 Dec 2008 01:05:13 +0100 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <493C3BBA.1040106@v.loewis.de> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> Message-ID: <493C64B9.2040701@cheimes.de> Martin v. L?wis wrote: > I think it is still timely when fixed in January or February. > In fact, releasing it still in December might not be possible, > due to the limited time available. The cmp() / PyObject_Compare() removal patch is almost done. With some help I can finish it until Tuesday evening. We can have another release by Monday Dec 15th. Python 3.0.0 has some defects that should be fixed before people are spending their Xmas holidays with 3.0. The defects include * cmp(), PyObject_Compare() and frieds * global/nonlocal shortcuts (global x = 0) aren't working * unnecessary slowdown of read() due slow buffer resizing. An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases again. If we release it now we can have an combined release of 2.6.2 and 3.0.2 in two months from now. Two months are quite some time to fix the performance issue of the new IO library. If Guido and Barry are fine with a lax policy on performance fixes we can integrate more tweaks. I believe performances patches were considered as features in the past. For this reason they weren't allowed for minor releases. Mark's work on long integer optimizations and json speedup are good candidates. Christian From musiccomposition at gmail.com Mon Dec 8 01:11:45 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sun, 7 Dec 2008 18:11:45 -0600 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <493C64B9.2040701@cheimes.de> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> Message-ID: <1afaf6160812071611r5808db6ej6a96c17c86ca3986@mail.gmail.com> On Sun, Dec 7, 2008 at 6:05 PM, Christian Heimes wrote: > Martin v. L?wis wrote: >> >> I think it is still timely when fixed in January or February. >> In fact, releasing it still in December might not be possible, >> due to the limited time available. > > The cmp() / PyObject_Compare() removal patch is almost done. With some help > I can finish it until Tuesday evening. We can have another release by Monday > Dec 15th. Python 3.0.0 has some defects that should be fixed before people > are spending their Xmas holidays with 3.0. The defects include > > * cmp(), PyObject_Compare() and frieds > * global/nonlocal shortcuts (global x = 0) aren't working I have a patch for this [1], but I don't think this should be considered a release blocker or even backported to 3.0. It's merely a convenience feature and doesn't inhibit the usefulness of the PEP in any way. > * unnecessary slowdown of read() due slow buffer resizing. -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From lists at cheimes.de Mon Dec 8 01:14:53 2008 From: lists at cheimes.de (Christian Heimes) Date: Mon, 08 Dec 2008 01:14:53 +0100 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <1afaf6160812071611r5808db6ej6a96c17c86ca3986@mail.gmail.com> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <1afaf6160812071611r5808db6ej6a96c17c86ca3986@mail.gmail.com> Message-ID: <493C66FD.2000506@cheimes.de> Benjamin Peterson wrote: > I have a patch for this [1], but I don't think this should be > considered a release blocker or even backported to 3.0. It's merely a > convenience feature and doesn't inhibit the usefulness of the PEP in > any way. Amaury said: An issue was already filed about this: http://bugs.python.org/issue4199 It should be ready for inclusion in 3.0.1. I'm +0 for the patch. Given the nature of Python 3.0 I'm fine with getting it right. Christian From barry at python.org Mon Dec 8 01:52:53 2008 From: barry at python.org (Barry Warsaw) Date: Sun, 7 Dec 2008 19:52:53 -0500 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <493C64B9.2040701@cheimes.de> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 7, 2008, at 7:05 PM, Christian Heimes wrote: > Martin v. L?wis wrote: >> I think it is still timely when fixed in January or February. >> In fact, releasing it still in December might not be possible, >> due to the limited time available. > > The cmp() / PyObject_Compare() removal patch is almost done. With > some help I can finish it until Tuesday evening. We can have another > release by Monday Dec 15th. Python 3.0.0 has some defects that > should be fixed before people are spending their Xmas holidays with > 3.0. The defects include > > * cmp(), PyObject_Compare() and frieds > * global/nonlocal shortcuts (global x = 0) aren't working > * unnecessary slowdown of read() due slow buffer resizing. > > An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases > again. If we release it now we can have an combined release of 2.6.2 > and 3.0.2 in two months from now. Two months are quite some time to > fix the performance issue of the new IO library. > > If Guido and Barry are fine with a lax policy on performance fixes > we can integrate more tweaks. I believe performances patches were > considered as features in the past. For this reason they weren't > allowed for minor releases. Mark's work on long integer > optimizations and json speedup are good candidates. I'm personally okay with performance fixes in point releases, as long it doesn't change API or add additional features. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSTxv5XEjvBPtnXfVAQIu6AQAkxyGwhapcREx5/E3yHUf8lWvM4lh/FdR AfHwwp7hs+yX8rR05CWAUfllY9dHcHKHvBCwTCgfuIrc4GJWbJHcx9/b19GTpzre 7fcikjQ0sk6zUq85DiJah7qL5AkA6Jmiby+rol7iudHlmQO/+6F6+aeL+vSKG8IC vYbLILAFapI= =ScYg -----END PGP SIGNATURE----- From lists at cheimes.de Mon Dec 8 01:56:25 2008 From: lists at cheimes.de (Christian Heimes) Date: Mon, 08 Dec 2008 01:56:25 +0100 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> Message-ID: <493C70B9.2030601@cheimes.de> Barry Warsaw wrote: > I'm personally okay with performance fixes in point releases, as long it > doesn't change API or add additional features. Does your okay include or exclude new internal APIs like new helper functions or a new C modules? Christian From fabiofz at gmail.com Mon Dec 8 02:06:21 2008 From: fabiofz at gmail.com (Fabio Zadrozny) Date: Sun, 7 Dec 2008 23:06:21 -0200 Subject: [Python-Dev] Nonlocal shortcut In-Reply-To: References: Message-ID: >> I'm currently implementing a parser to handle Python 3.0, and one of >> the points I found conflicting with the grammar specification is the >> PEP 3104. >> >> It says that a shortcut would be added to Python 3.0 so that "nonlocal >> x = 0" can be written. However, the latest grammar specification >> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) >> doesn't seem to take that into account... So, can someone enlighten me >> on what should be the correct treatment for that on a grammar that >> wants to support Python 3.0? > > An issue was already filed about this: > http://bugs.python.org/issue4199 > It should be ready for inclusion in 3.0.1. > Thanks for pointing that out. Fabio From v+python at g.nevcal.com Mon Dec 8 03:17:04 2008 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 07 Dec 2008 18:17:04 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> Message-ID: <493C83A0.5020606@g.nevcal.com> On approximately 12/7/2008 10:56 AM, came the following characters from the keyboard of Adam Olsen: > You might receive a UTF-8 encoded file name from a malicious user, > check if it contains something dangerous (like > "../../../../../etc/password"), then decode it. If your decoder isn't > compliant (ie doesn't check for overly long sequences) then a > b'\xC0\xAF' gets translated into u'/', bypassing your previous check. You might indeed. But if you are interested in checking for security issues, shouldn't you _first_ decode into some canonical form, specifying what sorts of Unicode strictness (such as overlong sequences) to check for during the decode process, and once the string is in canonical form, _then_ do checks for various attacks, such as the ../ sequence you mention? And with that order of operation, even if you don't reject overlong sequences, you have canonized them, and can recognize the resulting characters as good or bad. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From stephen at xemacs.org Mon Dec 8 03:34:50 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 08 Dec 2008 11:34:50 +0900 Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com> References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com> Message-ID: <87tz9fo0mt.fsf@xemacs.org> glyph at divmod.com writes: > But still, you can't honestly expect me to recommend 3.0 until someone > has gotten at least a basic skeleton of Twisted up and running under it > :). My own attempts to do so have failed miserably, to the point where > I can't even produce a useful bug report without a lot more work. How about an issue in the Python tracker---or the Twisted one, with a xref from the Python tracker to the Twisted tracker where the work will be done---that says "Twisted wants to be ported but we don't have enough developers, please help"? Maybe with some encouraging statement about how you can provide X amount of advice. In general, maybe there should be some sort of (semi-)formal process for proposing ports of libraries and coordinating work on them. Even just a focal point for where to make such requests, and a way to saerch for them so you can find others with similar interests. > I don't think there's anything about the 3.0 language which > couldn't be supported in a VM that understood both 2 and 3. Strings vs. bytes. It can't do both 2-style "bytes are text" and 3-style "no way are bytes text" simultaneously AFAICS. > I also don't think 3.0 is perfect, and five years on, there will be > a temptation to make more "just this once" incompatible changes. > Of course, you've promised these changes won't be made, and *this* > set of design mistakes will be with us forever. For values of "forever" approximating ten years. > It would be nice if there were a way for evolution to continue > without another reboot of the world. Stephen J. Gould says not. I think Java is a very different case from Python. It is the product of a language evolution that goes back to the early 1970s or so, and the standardization effort was carefully shepherded by a powerful company which provided resources to ensure that things went its way. For that reason, I think it's a remarkable compliment to Python and to Python 3 in particular that you consider Java an appropriate standard of comparison for Python. There's also the danger of stasis. I think Lisp will never die, and Common Lisp has done a good job of avoiding reboots. But for precisely that reason there continues to be a lively evolution of seriously incompatible dialects, both Lisp-1 (Scheme) and Lisp-2. I see Python 3 as an attempt to bridle and ride this tiger, without turning the rope into a noose and strangling the beast. > >If they're that easily convinced that Java is better they probably > >were a lost cause anyway, so I won't mourn their departure too much. > > I really believe that *all* new users are fickle, if they don't have a > mandate as to what they need to be learning. Personally, I learned > Python because of a memory leak in Swing. Sure, but what Guido is saying, I think, is that as long as prominent Python developers don't announce its funeral, the other things we could do to encourage them are going to get lost in the noise of inherent fickleness. Which isn't just random, it depends on things like availability of just the right library for one's app, etc. But there are too many of those to do them all, or even just to list them up and try to prioritize them "objectively"---might as well be random. From stephen at xemacs.org Mon Dec 8 05:13:38 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 08 Dec 2008 13:13:38 +0900 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493C83A0.5020606@g.nevcal.com> References: <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> <493C83A0.5020606@g.nevcal.com> Message-ID: <87prk3nw25.fsf@xemacs.org> Glenn Linderman writes: > But if you are interested in checking for security issues, shouldn't you > _first_ decode into some canonical form, Yes. That's all that is being asked for: that Python do strict decoding to a canonical form by default. That's a lot to ask, as it turns out, but that is what we (the minority of strict Unicode adherents, that is) want. If you want the convenience and risk, I believe you should ask for it by name (I suggest a name like "own_me" for the relaxed decoding flag). Failing that, it would be nice to have a global flag to change the default. From martin at v.loewis.de Mon Dec 8 05:17:43 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 08 Dec 2008 05:17:43 +0100 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <493C64B9.2040701@cheimes.de> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> Message-ID: <493C9FE7.7040908@v.loewis.de> >> I think it is still timely when fixed in January or February. >> In fact, releasing it still in December might not be possible, >> due to the limited time available. > > The cmp() / PyObject_Compare() removal patch is almost done. I wasn't (primarily) talking about fixing this particular issue. Time needs to be made available also for the upcoming 2.4.6 and 2.5.3 releases (which should, IMO, get priority over a 3.0 bugfix release at this point) > With some > help I can finish it until Tuesday evening. We can have another release > by Monday Dec 15th. Python 3.0.0 has some defects that should be fixed > before people are spending their Xmas holidays with 3.0. The defects > include > > * cmp(), PyObject_Compare() and frieds > * global/nonlocal shortcuts (global x = 0) aren't working > * unnecessary slowdown of read() due slow buffer resizing. I think 3.0.1 should also address other serious bugs in 3.0, such as - various IDLE bugs with non-ASCII characters (2827, 4008, 4323, 4410) - various ways to crash Python through the buffer protocol (4583, 4509; also 4580) > An early 3.0.1 release makes it possible to sync 2.6 and 3.0 relases > again. IIUC, you want the bugfix version number to be sync'ed. I don't think that is a useful thing to have. > If Guido and Barry are fine with a lax policy on performance fixes we > can integrate more tweaks. I believe performances patches were > considered as features in the past. For this reason they weren't allowed > for minor releases. Mark's work on long integer optimizations and json > speedup are good candidates. I don't recall such policy, and I can't see anything wrong with including performance fixes in a bug fix release. Maybe you were confusing this with whether performance fixes can be considered release-critical (which they shouldn't, IMO)? Regards, Martin From v+python at g.nevcal.com Mon Dec 8 05:45:12 2008 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 07 Dec 2008 20:45:12 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <87prk3nw25.fsf@xemacs.org> References: <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> <493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org> Message-ID: <493CA658.6030106@g.nevcal.com> On approximately 12/7/2008 8:13 PM, came the following characters from the keyboard of Stephen J. Turnbull: > Glenn Linderman writes: > > > But if you are interested in checking for security issues, shouldn't you > > _first_ decode into some canonical form, > > Yes. That's all that is being asked for: that Python do strict > decoding to a canonical form by default. That's a lot to ask, as it > turns out, but that is what we (the minority of strict Unicode > adherents, that is) want. I have no problem with having strict validation available. But doesn't validation take significantly longer than decoding? So I think it should be logically decoupled... do validation when/where it is needed for security reasons, and allow internal [de]coding to be faster. I'm mostly indifferent about which should be the default... maybe there shouldn't be a default! Use the "vUTF-8" decoder for strict validation, and the "fUTF-8" decoder for the faster, non-validating version. Or something like that. With appropriate documentation. Of course, "UTF-8" already exists... as "fUTF-8", so for compatibility, I guess it shouldn't change... but it could be deprecated. You didn't address the issue that if the decoding to a canonical form is done first, many of the insecurities just go away, so why throw errors? -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From rhamph at gmail.com Mon Dec 8 06:11:21 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 7 Dec 2008 22:11:21 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493CA658.6030106@g.nevcal.com> References: <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> <493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org> <493CA658.6030106@g.nevcal.com> Message-ID: On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman wrote: > On approximately 12/7/2008 8:13 PM, came the following characters from the > keyboard of Stephen J. Turnbull: >> >> Glenn Linderman writes: >> >> > But if you are interested in checking for security issues, shouldn't >> you > _first_ decode into some canonical form, >> >> Yes. That's all that is being asked for: that Python do strict >> decoding to a canonical form by default. That's a lot to ask, as it >> turns out, but that is what we (the minority of strict Unicode >> adherents, that is) want. > > > I have no problem with having strict validation available. But doesn't > validation take significantly longer than decoding? So I think it should be > logically decoupled... do validation when/where it is needed for security > reasons, and allow internal [de]coding to be faster. I'd like to see benchmarks of such a claim. > I'm mostly indifferent about which should be the default... maybe there > shouldn't be a default! Use the "vUTF-8" decoder for strict validation, and > the "fUTF-8" decoder for the faster, non-validating version. Or something > like that. With appropriate documentation. Of course, "UTF-8" already > exists... as "fUTF-8", so for compatibility, I guess it shouldn't change... > but it could be deprecated. > > > You didn't address the issue that if the decoding to a canonical form is > done first, many of the insecurities just go away, so why throw errors? Unicode is intended to allow interaction between various bits of software. It may be that a library checked it in UTF-8, then passed it to python. It would be nice if the library validated too, but a major advantage of UTF-8 is older libraries (or protocols!) intended for ASCII need only be 8-bit clean to be repurposed for UTF-8. Their security checks continue to work, so long as nobody down stream introduces problems with a non-validating decoder. -- Adam Olsen, aka Rhamphoryncus From v+python at g.nevcal.com Mon Dec 8 07:04:08 2008 From: v+python at g.nevcal.com (Glenn Linderman) Date: Sun, 07 Dec 2008 22:04:08 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> <493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org> <493CA658.6030106@g.nevcal.com> Message-ID: <493CB8D8.604@g.nevcal.com> On approximately 12/7/2008 9:11 PM, came the following characters from the keyboard of Adam Olsen: > On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman wrote: >> On approximately 12/7/2008 8:13 PM, came the following characters from the >> keyboard of Stephen J. Turnbull: >>> Glenn Linderman writes: >>> >>> > But if you are interested in checking for security issues, shouldn't >>> you > _first_ decode into some canonical form, >>> >>> Yes. That's all that is being asked for: that Python do strict >>> decoding to a canonical form by default. That's a lot to ask, as it >>> turns out, but that is what we (the minority of strict Unicode >>> adherents, that is) want. >> >> I have no problem with having strict validation available. But doesn't >> validation take significantly longer than decoding? So I think it should be >> logically decoupled... do validation when/where it is needed for security >> reasons, and allow internal [de]coding to be faster. > > I'd like to see benchmarks of such a claim. "significantly" seems to be the only word at question; it seems that there are a fair number of validation checks that could be performed; the numeric part of UTF-8 decoding is just a sequence of shifts, masks, and ORs, so can be coded pretty tightly in C or assembly language. Anything extra would be slower; how much slower is hard to predict prior to the implementation. My "significantly" was just the expectation that the larger code with more conditional branches that is required for validation is less likely to stay in cache, and take longer to load into cache, and take longer to execute. This also seems to be supported by Stephen's comment "That's a lot to ask, as it turns out." Once upon a time I did write an unvalidated UTF-8 encoder/decoder in C, I wonder if I could find that code? Can you supply a validated decoder? Then we could run some benchmarks, eh? >> I'm mostly indifferent about which should be the default... maybe there >> shouldn't be a default! Use the "vUTF-8" decoder for strict validation, and >> the "fUTF-8" decoder for the faster, non-validating version. Or something >> like that. With appropriate documentation. Of course, "UTF-8" already >> exists... as "fUTF-8", so for compatibility, I guess it shouldn't change... >> but it could be deprecated. >> >> >> You didn't address the issue that if the decoding to a canonical form is >> done first, many of the insecurities just go away, so why throw errors? > > Unicode is intended to allow interaction between various bits of > software. It may be that a library checked it in UTF-8, then passed > it to python. It would be nice if the library validated too, but a > major advantage of UTF-8 is older libraries (or protocols!) intended > for ASCII need only be 8-bit clean to be repurposed for UTF-8. Their > security checks continue to work, so long as nobody down stream > introduces problems with a non-validating decoder. So I don't understand how this is responsive to the "decoding removes many insecurities" issue? Yes, you might use libraries. Either they have insecurities, or not. Either they validate, or not. Either they decode, or not. They may be immune to certain attacks, because of their structure and code, or not. So when you examine a library for potential use, you have documentation or code to help you set your expectations about what it does, and whether or not it may have vulnerabilities, and whether or not those vulnerabilities are likely or unlikely, whether you can reduce the likelihood or prevent the vulnerabilities by wrapping the API, etc. And so you choose to use the library, or not. This whole discussion about libraries seems somewhat irrelevant to the question at hand, although it is certainly true that understanding how a library handles Unicode is an important issue for the potential user of a library. So how does a non-validating decoder introduce problems? I can see that it might not solve all problems, but how does it introduce problems? Wouldn't the problems be introduced by something else, and the use of a non-validating decoder may not catch the problem... but not be the cause of the problem? And then, if you would like to address the original issue, that would be fine too. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From rhamph at gmail.com Mon Dec 8 08:04:15 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 8 Dec 2008 00:04:15 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493CB8D8.604@g.nevcal.com> References: <493B98D3.8070405@gmx.net> <493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org> <493CA658.6030106@g.nevcal.com> <493CB8D8.604@g.nevcal.com> Message-ID: On Sun, Dec 7, 2008 at 11:04 PM, Glenn Linderman wrote: > On approximately 12/7/2008 9:11 PM, came the following characters from the > keyboard of Adam Olsen: >> On Sun, Dec 7, 2008 at 9:45 PM, Glenn Linderman >> wrote: > > Once upon a time I did write an unvalidated UTF-8 encoder/decoder in C, I > wonder if I could find that code? Can you supply a validated decoder? Then > we could run some benchmarks, eh? There is no point for me, as the behaviour of a real UTF-8 codec is clear. It is you who needs to justify a second non-standard UTF-8-ish codec. See below. >>> You didn't address the issue that if the decoding to a canonical form is >>> done first, many of the insecurities just go away, so why throw errors? >> >> Unicode is intended to allow interaction between various bits of >> software. It may be that a library checked it in UTF-8, then passed >> it to python. It would be nice if the library validated too, but a >> major advantage of UTF-8 is older libraries (or protocols!) intended >> for ASCII need only be 8-bit clean to be repurposed for UTF-8. Their >> security checks continue to work, so long as nobody down stream >> introduces problems with a non-validating decoder. > > > So I don't understand how this is responsive to the "decoding removes many > insecurities" issue? > > Yes, you might use libraries. Either they have insecurities, or not. Either > they validate, or not. Either they decode, or not. They may be immune to > certain attacks, because of their structure and code, or not. > > So when you examine a library for potential use, you have documentation or > code to help you set your expectations about what it does, and whether or > not it may have vulnerabilities, and whether or not those vulnerabilities > are likely or unlikely, whether you can reduce the likelihood or prevent the > vulnerabilities by wrapping the API, etc. And so you choose to use the > library, or not. > > This whole discussion about libraries seems somewhat irrelevant to the > question at hand, although it is certainly true that understanding how a > library handles Unicode is an important issue for the potential user of a > library. > > So how does a non-validating decoder introduce problems? I can see that it > might not solve all problems, but how does it introduce problems? Wouldn't > the problems be introduced by something else, and the use of a > non-validating decoder may not catch the problem... but not be the cause of > the problem? > > And then, if you would like to address the original issue, that would be > fine too. Your non-validating encoder is translating an invalid sequence into a valid one, thus you are introducing the problem. A completely naive environment (8-bit clean ASCII) would leave it as an invalid sequence throughout. This is not a theoretical problem. See http://tools.ietf.org/html/rfc3629#section-10 . We MUST reject invalid sequences, or else we are not using UTF-8. There is no wiggle room, no debate. (The absoluteness is why the standard behaviour doesn't need a benchmark. You are essentially arguing that, when logging in as root over the internet, it's a lot faster if you use telnet rather than ssh. One is simply not an option.) -- Adam Olsen, aka Rhamphoryncus From stephen at xemacs.org Mon Dec 8 09:57:19 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 08 Dec 2008 17:57:19 +0900 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493CA658.6030106@g.nevcal.com> References: <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> <493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org> <493CA658.6030106@g.nevcal.com> Message-ID: <87ljurnixc.fsf@xemacs.org> Glenn Linderman writes: > On approximately 12/7/2008 8:13 PM, came the following characters from > I have no problem with having strict validation available. But > doesn't validation take significantly longer than decoding? I think you're thinking of XML, where validation can take significant resources over and above syntax checking. For Unicode, not unless you're seriously CPU-bound. Unicode validation is a matter of a few range checks and a couple of flags to handle things like lone surrogates. In the case of "excess length" in UTF-8, you can actually often do it in *zero* time if you use a table to analyze the leading byte (eg, 0xC0 and 0xC1 are invalid UTF-8 leading bytes because they would necessarily decode to U+0000 to U+007F, ie, the ASCII range), because you have to make a check for 0xFE and 0xFF anyway, which can't be UTF-8 leading bytes. (I'm not sure this generalizes to longer UTF-8 sequences, but it would reject the use of 0xC0 0xAF to sneak in a "/" in zero time!) > So I think it should be logically decoupled... do validation > when/where it is needed for security reasons, Security is an important application, but the real issue is that naively decoded text is a bomb with a sensitive impact fuse. Pass it around long enough, and it will blow up eventually. The whole point of the fairly complex rules about Unicode formats and the *requirement* that broken coding be a fatal error *in a connforming Unicode process* is intended to ensure that Unicode exceptions[1] only ever occur on input (or memory corruption and the like, which is actually a form of I/O, of course). That's where efficiency comes from. I think Python 3 should aspire to (eventually) be a conforming process by default, with lax behavior an option. > and allow internal [de]coding to be faster. "Internal decoding" is (or should be) an oxymoron. Why would your software be passing around text in any format other than internal? So decoding will happen (a) on I/O, which is itself almost certainly slower than making a few checks for Unicode hygiene, or (b) on receipt of data from other software that whose sanitation you shouldn't trust more than you trust the Internet. Encoding isn't a problem, AFAICS. > You didn't address the issue that if the decoding to a canonical > form is done first, many of the insecurities just go away, so why > throw errors? Because as long as you're decoding anyway, it costs no more to do it right, except in rare cases. Why do you think Python should aspire to "quick and dirty" in a context where dirty is known to be unhealthy, and there is no known need for speed? Why impose "doing it right" on the application programmer when there's a well-defined spec for that that we could implement in the standard library? It's the errors themselves that people are objecting to. See Guido's posts for concisely stated arguments for a "don't ask, don't tell" policy toward Unicode breakage. I agree that Python should implement that policy as an option, but I think that the user should have to request it either with a runtime option or (in the case of user == app programmer) by deliberately specifying a lax codec. The default *Unicode* codecs should definitely aspire to full Unicode conformance within their sphere of responsibility. Footnotes: [1] A character outside the repertoire that the app can handle is not a "Unicode exception", unless the reason the app can't handle it is that the Unicode handler blew up. From stephen at xemacs.org Mon Dec 8 10:21:32 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 08 Dec 2008 18:21:32 +0900 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493CB8D8.604@g.nevcal.com> References: <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> <493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org> <493CA658.6030106@g.nevcal.com> <493CB8D8.604@g.nevcal.com> Message-ID: <87k5abnhsz.fsf@xemacs.org> Glenn Linderman writes: > "significantly" seems to be the only word at question; it seems that > there are a fair number of validation checks that could be performed; > the numeric part of UTF-8 decoding is just a sequence of shifts, masks, > and ORs, so can be coded pretty tightly in C or assembly language. > > Anything extra would be slower; how much slower is hard to predict prior > to the implementation. Not much, see my previous response. > This also seems to be supported by Stephen's comment "That's a lot > to ask, as it turns out." Not what I meant. Inefficiency is not an objection to checking for validity at the level a codec can handle. The objection is that "we don't want *any* exceptions thrown that we didn't explicitly ask for", and adding validation certainly will violate that. > So I don't understand how this is responsive to the "decoding removes > many insecurities" issue? Because you have to recheck every time the data crosses from Python into your code. To the extent that Python codecs promise validation and keep that promise, internal code *never* has to make those checks. That is a significant savings in programmer effort, because auditing a large body of code for *any* I/O from Python is going to be costly. > So when you examine a library for potential use, you have documentation > or code to help you set your expectations about what it does, and > whether or not it may have vulnerabilities, and whether or not those > vulnerabilities are likely or unlikely, whether you can reduce the > likelihood or prevent the vulnerabilities by wrapping the API, etc. And > so you choose to use the library, or not. Python is precisely such a component that people will choose to use, or not, based on whether they can expect that when Python hands them a Unicode object freshly input from the outside world, it won't contain lone surrogates, or invalid UTF-8 characters that got through a 3rd-party spam filter, or whatever. > This whole discussion about libraries seems somewhat irrelevant to the > question at hand, No, it's the *only* point that matters. IMO, speed is not relevant here. The question is whether throwing a Unicode exception on invalid encoding by default generally does more good than harm. Guido seems to think "not!", which gives me pause. I still disagree, though. From eckhardt at satorlaser.com Mon Dec 8 10:20:42 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Mon, 8 Dec 2008 10:20:42 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <0F0D1942-A841-4098-ACE4-479B21D08524@fuhm.net> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <0F0D1942-A841-4098-ACE4-479B21D08524@fuhm.net> Message-ID: <200812081020.42448.eckhardt@satorlaser.com> On Friday 05 December 2008, James Y Knight wrote: > On Dec 5, 2008, at 5:27 AM, Ulrich Eckhardt wrote: > > Using the byte variant is equally fubar, because e.g. on MS Windows > > it is not supported, except through a very lossy roundtrip through > > the locale's codepage, limiting your functionality. > > Yeah, IMO whole mess could have been avoided by keeping the filename/ > args/environ simply *bytes*, like it really is, on unix. Then, make > the Windows version of python use (always! not dependent upon locale!) > utf-8 to decode the utf-8 bytestring to the UTF-16 that the Windows > platform APIs expect (and vice versa). If possible, I would try to avoid this useless roundtrip from UTF-16 to UTF-8 and back. > And never use the ASCII variant of the windows APIs. That's okay, but I'm afraid it's not possible. The problem is not so much doing it, but finding all those places where it is currently done. Those could be outside of Python itself. So, even to Python code, there could still be APIs that would need the MBCS-encoded strings. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From v+python at g.nevcal.com Mon Dec 8 10:54:54 2008 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 08 Dec 2008 01:54:54 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <87ljurnixc.fsf@xemacs.org> References: <493B22F8.8090902@gmail.com> <200812070235.41321@news.perlig.de> <493B2C22.5060907@gmail.com> <493B923F.6010706@gmx.net> <493B98D3.8070405@gmx.net> <493C83A0.5020606@g.nevcal.com> <87prk3nw25.fsf@xemacs.org> <493CA658.6030106@g.nevcal.com> <87ljurnixc.fsf@xemacs.org> Message-ID: <493CEEEE.6010308@g.nevcal.com> On approximately 12/8/2008 12:57 AM, came the following characters from the keyboard of Stephen J. Turnbull: > "Internal decoding" is (or should be) an oxymoron. Why would your > software be passing around text in any format other than internal? So > decoding will happen (a) on I/O, which is itself almost certainly > slower than making a few checks for Unicode hygiene, or (b) on receipt > of data from other software that whose sanitation you shouldn't trust > more than you trust the Internet. > > Encoding isn't a problem, AFAICS. So I can see validating user supplied data, which always comes in via I/O. But during manipulation of internal data, including file and database I/O, there is a need for encoding and decoding also. If all the data has already been validated, then there would be no need to revalidate on every conversion. I hear you when you say that clever coding can make the validation nearly free, and I applaud that: the UTF-8 coder that I wrote predated most of the rules that have been created since, so I didn't attempt to be clever in that regard. Thanks to you and Adam for your explanations; I see your points, and if it is nearly free, I withdraw most of my negativity on this topic. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From ncoghlan at gmail.com Mon Dec 8 11:12:05 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 08 Dec 2008 20:12:05 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: <493CF2F5.9000904@gmail.com> Terry Reedy wrote: > This to be is an argument for keeping the default the current behavior, > but not for rejecting flexibility. The computing world seems to be > messier than we would like and worse that I realized until this week. As > you say below, people need to better anticipate the future, and an > errors parameter would help do that. It just occurred to me that this seems like a perfect situation to address via the warning system. The normal warnings mechanics can then be used to turn it into an exception if so desired, and this can be done once per application rather than having to pass a separate argument every time the affected APIs are called. And the decoding problems don't pass silently either - they just get emitted as a warning by default instead of causing the application to crash. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From eckhardt at satorlaser.com Mon Dec 8 11:20:49 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Mon, 8 Dec 2008 11:20:49 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: Message-ID: <200812081120.49409.eckhardt@satorlaser.com> On Sunday 07 December 2008, Guido van Rossum wrote: > My problem with raising exceptions *by default* when an undecodable > name exists is that it may render an app completely useless in a > situation where the developer is no longer around. This happened all > the time with the 2.x Unicode API, where the developer hadn't > anticipated a particular input potentially containing non-ASCII bytes, > and the user fed the application non-ASCII text. Making os.listdir > raise an exception when a directory contains a single undecodable file > means that the entire directory can't be read, and most likely the > entire app crashes at that point. Most likely the developer never > anticipated this situation (since in most places it is either > impossible or very unlikely) -- after all, if they had anticipated it > they would have used the bytes API in the first place. There is another way to handle this that noisily signals errors but doesn't cause programs to suddenly fail. Using os.listdir as example, the problem there is that the OS actually returns a list of strings that can not be reliably decoded, so I would propose to simply not decode them. Now, the idea is what if this function simply returned neither a byte string nor a Unicode string, but e.g. an environment string type (called env_str)? os.listdir would only fail if it really failed to read the dir. If a user wants to display an element from the returned list, they would get something akin to what repr() returns, i.e. a recognisable string that can be written to a logfile. However, this thing will also include additional markup that makes it clear that it is not just a piece of text and not suitable to display to the end user. This type distinction is important, because it means that any developer will immediately see that something unexpected is going on here. They will invoke "type(lst[0])" and see the unexpected type env_str, which will (via documentation) redirect them to the issue with different encodings and that all they have to do is 'map( unicode, lst)' in order to get at a list of real text strings, but they will also read that this operation might fail, forcing an informed decision. If they don't care about a textual representation at all but only want to invoke os.popen with arguments received from the commandline, then everything is fine, too, because that function will take the strings as they are and just give them back to the OS. This allows roundtripping from OS over Python and back to the OS without any conversions and thus without any conversions that could fail. In the case of e.g. a backup program, this is exactly what is needed. Now, if you have any hard-coded strings in your program but a function like os.popen needs an env_str object, this string is converted via a default encoding, i.e. the same that is used when converting an env_str object to Unicode. In this case, I would go so far to say that os.popen should accept normal str strings, too, and perform that conversion itself. An alternative way would be to reject the string because it is the wrong type, but since this internal string's encoding is known, there is no reason to force users to convert explicitly, it is just that the conversion might fail. Similarly, when modifying such an env_str object, like e.g. "bak = sys.argv[1]+'.backup'". In this case, the string '.backup' is converted according to the default encoding and then appended to the commandline argument, the result would again be an env_str object. Note: There is an option in this design, and that is to make the default behaviour in case of nonconvertable env_str objects configurable. A filemanager would then replace the undecodable bytes by an approximation, a backup program would use strict mode and a music player would perhaps simply skip and ignore such strings. The problem there is that changing this option would possibly affect other library code that one doesn't even know about because it is only used indirectly and its implementation is unknown. For that reason, I would rather not make this policy a configurable element. If you want that, you can easily code it yourself. BTW: there was a PEP that proposed a new path class, which was rejected. This class was actually pretty similar, except that it also included several other features (globbing, path handling, opening files and the kitchen sink) which eventually made it too bloated. Otherwise, the idea of creating a separate type for these strings is the same. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From lists at cheimes.de Mon Dec 8 11:53:09 2008 From: lists at cheimes.de (Christian Heimes) Date: Mon, 08 Dec 2008 11:53:09 +0100 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <493C9FE7.7040908@v.loewis.de> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C9FE7.7040908@v.loewis.de> Message-ID: <493CFC95.1050306@cheimes.de> Martin v. L?wis wrote: > I wasn't (primarily) talking about fixing this particular issue. > Time needs to be made available also for the upcoming 2.4.6 and 2.5.3 > releases (which should, IMO, get priority over a 3.0 bugfix release > at this point) I've no opinion on the priority of the releases. Since you are responsible for the 2.4 and 2.5 releases as well as the Windows binaries, it's your choice. For the future we should find somebody to assist you with the Windows installers in order to release some pressure from you. > I think 3.0.1 should also address other serious bugs in 3.0, such > as > - various IDLE bugs with non-ASCII characters (2827, 4008, 4323, 4410) > - various ways to crash Python through the buffer protocol > (4583, 4509; also 4580) My list wasn't complete. I'm +1 for your additions. > IIUC, you want the bugfix version number to be sync'ed. I don't > think that is a useful thing to have. Yeah. Barry also said it's a neat thing to have - but just a neat thing. > I don't recall such policy, and I can't see anything wrong with > including performance fixes in a bug fix release. Maybe you were > confusing this with whether performance fixes can be considered > release-critical (which they shouldn't, IMO)? Maybe I'm a confused person? :] Christian From barry at python.org Mon Dec 8 14:11:10 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 8 Dec 2008 08:11:10 -0500 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <493C70B9.2030601@cheimes.de> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C70B9.2030601@cheimes.de> Message-ID: <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote: > Barry Warsaw wrote: >> I'm personally okay with performance fixes in point releases, as >> long it doesn't change API or add additional features. > > Does your okay include or exclude new internal APIs like new helper > functions or a new C modules? I /personally/ don't have a problem with that, but we need consensus before that becomes policy. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBST0c7nEjvBPtnXfVAQJvQwQAjrCuivCuLT3HNq6n5VvUKVkxto5wyBzW ka9YuFoBCVRDt7Z7Sn59UeLGVgrsL9Zw2rSra4cXE/1QaUzpxJlaFpafWVJilCPh +hv6/t6ky0Ww0FsEv+56SRHOVRlfqgNMIbmDXemf40Oo/IYxqNL5HP59NeIvk0oa u3Mmc7qsP1k= =ZK8M -----END PGP SIGNATURE----- From barry at python.org Mon Dec 8 14:12:04 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 8 Dec 2008 08:12:04 -0500 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <493C9FE7.7040908@v.loewis.de> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C9FE7.7040908@v.loewis.de> Message-ID: <59E06D94-4596-46C4-BFF1-BA10A46C76E0@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 7, 2008, at 11:17 PM, Martin v. L?wis wrote: > I don't recall such policy, and I can't see anything wrong with > including performance fixes in a bug fix release. Maybe you were > confusing this with whether performance fixes can be considered > release-critical (which they shouldn't, IMO)? I agree with that. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBST0dJHEjvBPtnXfVAQIqhwQAkdJgQs8aq452mQRWGdNKLBw5Fsu1m/uV PGcYbRvfD5nzKPhRvCK42okPaUTWXOAuLHf8gvLT+LwRewmztsMVb0JZKVf1MIuT Msw60Du7jjNgjcbgd55i5mn7swQmGONB7iFfyq5htL3Bp1zQIi+Fhhi4/hZconHl BTnbqfLGz1Q= =u9GH -----END PGP SIGNATURE----- From skip at pobox.com Mon Dec 8 14:13:31 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 8 Dec 2008 07:13:31 -0600 Subject: [Python-Dev] Deciding on dbm API in setup.py Message-ID: <18749.7547.117133.919493@montanaro-dyndns-org.local> Several packages provide a dbm-compatible API. Currently, the code in Python's setup.py hardcodes the order of consideration: ndbm, then gdbm, then Berkeley DB. While the APIs are compatible, the file formats are all different as far as I know. If you have ndbm but want to use Berkeley DB format, you're stuck. Right now editing setup.py is the only way to influence the order. I opened an issue on the bug tracker about this: http://bugs.python.org/issue4587 It includes a patch which adds an optional environment variable (PYDBMLIBORDER) which builders can use to override the order of the default library checks. I'm not sure that's the "correct" way to do this, but I'm at a loss to figure out how else to do it. Is it possible to easily add a flag to setup.py, say --dbm-order=gdbm:bdb:ndbm? If you've got any -- even passing -- interest in this, please read the issue and add a comment if you feel so moved. This grew out of a change to adapt to new gdbm library organization: http://bugs.python.org/issue4487 Unbeknownst to me, I apparently wound up fixing a previously reported issue about the change: http://bugs.python.org/issue1167 Skip From mal at egenix.com Mon Dec 8 15:54:44 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 08 Dec 2008 15:54:44 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4939CBDB.30305@gmail.com> References: <4938374B.8000006@gmail.com> <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> Message-ID: <493D3534.30505@egenix.com> On 2008-12-06 01:48, Nick Coghlan wrote: > You can't display a non-decodable filename to the user, hence the user > will have no idea what they're working on. Non-filesystem related apps > have no business trying to deal with insane filenames. This is not entirely true: OSes, shells, and applications will typically represent the file names using either ?-replacements or some form of hex or decimal escapes for the characters they can't decode. Since humans are usually very good at pattern recognition, this goes a long way. Of course, how the application maps that partially converted file name back to the real thing is another issue and that's something that Python should not make harder than it should be. > Linux is moving towards a standard of UTF-8 for filenames, and once we > get to the point where the idea of encoding filenames and environment > variables any other way is seen as crazy, then the Python 3 approach > will work seamlessly. It's going to take a long time before file names, environment variables and command line parameters are all encoded using UTF-8, so "practicality beats purity" will have to get more attention in this thread. Python APIs should work out of the box most of the time. Currently, if you live in a non-ASCII and non-pure-UTF-8 environment, you have to deal with different and mixed encodings on a regular basis. Whether that's a USB stick, you're trying to read, a ZIP file you're trying to open, a mounted network drive, etc. the problem pops up in many different kinds of areas. If I write "do_something.py *" I expect Python to indeed work on all the files in my directory, not just the one that happen to fit a particular encoding. If I hook up a CGI script written in Python with a web server, I expect all data to be received by the script, not just data that happens to be UTF-8 encoded. > In the meantime, raw bytes APIs will provide an alternative for those > that disagree with that philosophy. I think that's a wrong way to put it: The problems are not made up by people who disagree with the one-encoding-for-everything strategy. The problems occur in real-life IT processing all the time - maybe not so much in places where English scripts dominate, but certainly in most other places with non-English scripts. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 08 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Mon Dec 8 17:18:01 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Dec 2008 16:18:01 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= Message-ID: Hello, The Py_buffer struct has two pointers named `shape` and `strides`. Each points to an array of Py_ssize_t values whose length is equal to the number of dimensions of the buffer object. Unfortunately, the buffer protocol spec doesn't explain how allocation of these arrays should be handled. Right now this is circumvented by either pointing them to an externally-managed piece of memory (e.g. a Py_ssize_t field in the original PyObject), or by pointing them to another field in the Py_buffer (because in the case of a one-dimensional buffer with itemsize == 1, shape[0] is simply equal to the length of the buffer in bytes). Of course this is not flexible, and it makes fixing the situation with buffers of itemsize larger than 1 difficult (indeed, for those buffers, we can't simply point the shape array to the byte length, and if we are taking a slice of the memoryview, we can't either point it to the size of the original object (for example an array.array)). Therefore, arises the problem of allocation of the shape array. For the one-dimensional case, I had in mind a simple scheme where the Py_buffer struct has an additional two-member Py_ssize_t array. Then `shape` and `strides` can point to the first and second member of this array, respectively. This wouldn't solve the multi-dimensional case, however. Thanks for any ideas on how to solve this. Regards Antoine. From rdmurray at bitdance.com Mon Dec 8 18:30:39 2008 From: rdmurray at bitdance.com (rdmurray at bitdance.com) Date: Mon, 8 Dec 2008 12:30:39 -0500 (EST) Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: On Sun, 7 Dec 2008 at 13:33, Guido van Rossum wrote: > My problem with raising exceptions *by default* when an undecodable > name exists is that it may render an app completely useless in a > situation where the developer is no longer around. This happened all I think Nick Coghlan's suggestion of emitting warnings would be an excellent solution that addresses both your concerns and the concerns Toshio has expressed (and with which I agree 100%). The above is the only use case I've heard in this thread for ignoring files with names that can't be decoded: so that a user can use the program on those files whose names can be decoded even when the user does not have the resources to get the program fixed to handle undecodable filenames. I agree that that is a worthwhile goal. If warnings were emitted, then files would not be silently ignored, yet the program could still be used. --RDM PS: I'd like to see a similar warning issued when an access attempt is made through os.environ to a variable that cannot be decoded. From janssen at parc.com Mon Dec 8 18:56:21 2008 From: janssen at parc.com (Bill Janssen) Date: Mon, 8 Dec 2008 09:56:21 PST Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493B209D.5070306@gmail.com> References: <200812051127.35880.eckhardt@satorlaser.com> <49398980.7050209@gmail.com> <493991D3.9030003@gmail.com> <4939A8C7.6050209@gmail.com> <4939AFC6.7000106@gmail.com> <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <493B209D.5070306@gmail.com> Message-ID: <33922.1228758981@parc.com> Nick Coghlan wrote: > - I think the binary and Unicode APIs should be available (and fully > functional) on all platforms (including Windows) so that app developers > don't create portability problems for themselves when they make the > decision as to which API to use +1 I'm perhaps biased here; most of my Python programs don't have user interfaces, because they don't "talk" to people, they talk to other programs. The binary APIs for the OS are essential. I use and deeply appreciate all the string handling features in Python, particularly its firm grip on Unicode issues, but that's *useful* instead of *essential*. Bill From tjreedy at udel.edu Mon Dec 8 19:16:03 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 08 Dec 2008 13:16:03 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493CF2F5.9000904@gmail.com> References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493CF2F5.9000904@gmail.com> Message-ID: Nick Coghlan wrote: > Terry Reedy wrote: >> This to be is an argument for keeping the default the current behavior, >> but not for rejecting flexibility. The computing world seems to be >> messier than we would like and worse that I realized until this week. As >> you say below, people need to better anticipate the future, and an >> errors parameter would help do that. > > It just occurred to me that this seems like a perfect situation to > address via the warning system. I disagree. > The normal warnings mechanics can then > be used to turn it into an exception if so desired, and this can be done > once per application rather than having to pass a separate argument > every time the affected APIs are called. The warning mechanism, as far as I know, because I have never dealt with it (and do not want to) is for version issues. In any case, the snippet that you clipped try: files = os.listdir(somedir, errors = strict) except OSError as e: log() files = os.listdir(somedir) specifically requires a per call parameter. > And the decoding problems don't pass silently either - they just get > emitted as a warning by default instead of causing the application to crash. Do they get automatically logged? In any case, the errors parameter has an in between option to neither ignore or raise but to replace and give *something* printable. This situation seems like an ideal situation for a parameter which gives the application program who uses Python a range of options to working with an un-ideal world. I am really flabbergasted why there is so much opposition to doing so in favor of more difficult or less functional alternatives. Terry Jan Reedy From guido at python.org Mon Dec 8 19:26:46 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Dec 2008 10:26:46 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy wrote: > Guido van Rossum wrote: >> >> On Sun, Dec 7, 2008 at 1:20 PM, Terry Reedy wrote: >>> >>> Toshio Kuratomi wrote: >>> >>>> - If this is true, a definition of os.listdir() that would >>>> better meet programmer expectation would be: "Give me all files in a >>>> directory with the output as str type". The definition of >>>> os.listdir() would be "Give me all files in a directory >>>> with the output as bytes type". Raising an exception when the filenames >>>> are undecodable is perfectly reasonable in this situation. >>> >>> Your examples (snipped) pretty well convince me that there is a use case >>> for >>> raising exceptions. We should move beyond arguing over which one way is >>> right. I think there should be a second argument 'ignorebad=False' to >>> ignore undecodable files rather than raise the exception (or >>> 'strict=True' >>> to stop and raise exception on non-decodable names -- then code is 'if >>> strict: raise ...'). I believe other functions have a similar parameter. > > I was thinking of the "normal Unicode 'errors' parameter", as described by > Nick. > >> If you want the exceptions, just use the bytes API and try to decode >> the byte strings using the system encoding. > > If it was a matter of adding a new method, I might agree. But: > > 1. We already have a method that does exactly what you describe. It is only > a matter of adding flexibility to the response to problems, for which there > is already precedent. > > 2. Suggesting that people who want strings and not bytes should have to deal > with bytes, just to get an error notification, seems to negate that point of > moving to 3.0 > > 3. A builtin would probably do so better than most programmers would, with > little touches such as the one suggested below. > > 4. An error parameter would ALERT programmers to the possibility of a > PROBLEM, both in the present and future. As you say below, people need to > better anticipate the future. > >> My problem with raising exceptions *by default* when an undecodable >> name exists is that it may render an app completely useless in a >> situation where the developer is no longer around. This happened all >> the time with the 2.x Unicode API, where the developer hadn't >> anticipated a particular input potentially containing non-ASCII bytes, >> and the user fed the application non-ASCII text. Making os.listdir >> raise an exception when a directory contains a single undecodable file >> means that the entire directory can't be read, and most likely the >> entire app crashes at that point. Most likely the developer never >> anticipated this situation (since in most places it is either >> impossible or very unlikely) -- after all, if they had anticipated it >> they would have used the bytes API in the first place. (It's worse >> because the exception being raised would be UnicodeError -- most >> people expect os.listdir to raise OSError, not other errors.) > > This to be is an argument for keeping the default the current behavior, but > not for rejecting flexibility. The computing world seems to be messier than > we would like and worse that I realized until this week. As you say below, > people need to better anticipate the future, and an errors parameter would > help do that. I'm fine with whatever API enhancements you can come up with (assuming others like them too :-) as long as the default remains the current behavior. > Is Windows really immune? What about when it reads the directory of > possibly old removable media with whatever byte name encodings? Is this a > possible source of 'unanticipated' problems? > > As to your last sentence, os.listdir() with an errors parameter could > convert a decoding UnicodeError to "OSError: undecodable file name > ", thereby supplying the expected exception as well as an > extractable representation of problematical the raw bytes > > Here is a possible use case: I want filenames as 3.0 strings and I > anticipate no problems at present but, as you say above, something might > happen years in the future. I am using 3.0 *because* of the strings == > unicode feature. I would like to write > > try: > files = os.listdir(somedir, errors = strict) > except OSError as e: > log() > files = os.listdir(somedir) > > and go one without the problem file but not without logging the problem so a > future maintainer can consider what to do about it, but only when there is > an actual need to think about it. > > Terry Jan Reedy > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rdmurray at bitdance.com Mon Dec 8 19:34:37 2008 From: rdmurray at bitdance.com (rdmurray at bitdance.com) Date: Mon, 8 Dec 2008 13:34:37 -0500 (EST) Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493CF2F5.9000904@gmail.com> Message-ID: On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote: >> And the decoding problems don't pass silently either - they just get >> emitted as a warning by default instead of causing the application to >> crash. > > Do they get automatically logged? In any case, the errors parameter has an > in between option to neither ignore or raise but to replace and give > *something* printable. > > This situation seems like an ideal situation for a parameter which gives the > application program who uses Python a range of options to working with an > un-ideal world. I am really flabbergasted why there is so much opposition to > doing so in favor of more difficult or less functional alternatives. I'm in favor of an option to control what happens. I just really really don't want the _default_ to be "ignore". Defaulting to a warning is fine with me, as would be defaulting to a traceback. But defaulting to "silently ignore", as we have now, is just asking for user confusion and debugging headaches, as detailed by Toshio. A _worse_ user experience, IMO, than having a program fail when undecodable filenames match the selection criteria. --RDM From brett at python.org Mon Dec 8 20:14:24 2008 From: brett at python.org (Brett Cannon) Date: Mon, 8 Dec 2008 11:14:24 -0800 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org> References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C70B9.2030601@cheimes.de> <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org> Message-ID: On Mon, Dec 8, 2008 at 05:11, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote: > >> Barry Warsaw wrote: >>> >>> I'm personally okay with performance fixes in point releases, as long it >>> doesn't change API or add additional features. >> >> Does your okay include or exclude new internal APIs like new helper >> functions or a new C modules? > > I /personally/ don't have a problem with that, but we need consensus before > that becomes policy. > Internal as in just for us I am fine with, but not nothing publicly available. As for new C modules, I am fine with that as well as long as they add no new build dependencies. -Brett From guido at python.org Mon Dec 8 20:25:18 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Dec 2008 11:25:18 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493CF2F5.9000904@gmail.com> Message-ID: On Mon, Dec 8, 2008 at 10:34 AM, wrote: > On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote: >>> >>> And the decoding problems don't pass silently either - they just get >>> emitted as a warning by default instead of causing the application to >>> crash. >> >> Do they get automatically logged? In any case, the errors parameter has >> an in between option to neither ignore or raise but to replace and give >> *something* printable. >> >> This situation seems like an ideal situation for a parameter which gives >> the application program who uses Python a range of options to working with >> an un-ideal world. I am really flabbergasted why there is so much >> opposition to doing so in favor of more difficult or less functional >> alternatives. > > I'm in favor of an option to control what happens. > > I just really really don't want the _default_ to be "ignore". Defaulting > to a warning is fine with me, as would be defaulting to a traceback. > > But defaulting to "silently ignore", as we have now, is just asking for user > confusion and debugging headaches, as detailed by Toshio. A _worse_ user > experience, IMO, than having a program fail when undecodable filenames > match the selection criteria. Do you really not care about the risk where apps that weren't written to be prepared to handle this will be rendered completely useless if a single file in a directory has an unencodable name? This is similar to an issue that Python had for a long time where it wouldn't start up if the current directory contained non-ASCII characters. Given that most developers will not have this issue in their own environment, most apps will not be prepared for this issue, and that makes it worse for the app's user! -- --Guido van Rossum (home page: http://www.python.org/~guido/) From scott+python-dev at scottdial.com Mon Dec 8 20:39:13 2008 From: scott+python-dev at scottdial.com (Scott Dial) Date: Mon, 08 Dec 2008 14:39:13 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493CF2F5.9000904@gmail.com> Message-ID: <493D77E1.2000401@scottdial.com> Guido van Rossum wrote: > On Mon, Dec 8, 2008 at 10:34 AM, wrote: >> On Mon, 8 Dec 2008 at 13:16, Terry Reedy wrote: >>>> And the decoding problems don't pass silently either - they just get >>>> emitted as a warning by default instead of causing the application to >>>> crash. >>> Do they get automatically logged? In any case, the errors parameter has >>> an in between option to neither ignore or raise but to replace and give >>> *something* printable. >> >> I just really really don't want the _default_ to be "ignore". Defaulting >> to a warning is fine with me, as would be defaulting to a traceback. > > Do you really not care about the risk where apps that weren't written > to be prepared to handle this will be rendered completely useless if a > single file in a directory has an unencodable name? Since when do warnings cause apps to be rendered completely useless? I think it's easy to agree that defaulting to an exception is not good for the reason you give, but I don't see how that applies to a warning. And, it seems like a warning covers the issues that the other people want as well. If there is a warning, then there is at least a record of the fact that some filenames were ignored. Presumably if I was responsible for the correctness of some piece of code, I would see the warning in a log of some sort and could investigate it further (if I cared), otherwise I could choose to ignore it. I don't see os.listdir(name) to be one of those situations that emitting a warning is a nuisance at all. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From larry.bugbee at boeing.com Mon Dec 8 20:49:17 2008 From: larry.bugbee at boeing.com (Bugbee, Larry) Date: Mon, 8 Dec 2008 11:49:17 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: Message-ID: <9418DB6C0B9D434190E54A78E931C3D1087D7A0B@XCH-NW-7V1.nw.nos.boeing.com> > I'm perhaps biased here; most of my Python programs don't have user > interfaces, because they don't "talk" to people, they talk to other > programs. The binary APIs for the OS are essential. I use and > deeply appreciate all the string handling features in Python, > particularly its firm grip on Unicode issues, but that's *useful* > instead of *essential*. Exactly! Another +1. Larry From rdmurray at bitdance.com Mon Dec 8 21:07:16 2008 From: rdmurray at bitdance.com (rdmurray at bitdance.com) Date: Mon, 8 Dec 2008 15:07:16 -0500 (EST) Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493CF2F5.9000904@gmail.com> Message-ID: On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote: > On Mon, Dec 8, 2008 at 10:34 AM, wrote: >> I'm in favor of an option to control what happens. >> >> I just really really don't want the _default_ to be "ignore". Defaulting >> to a warning is fine with me, as would be defaulting to a traceback. >> >> But defaulting to "silently ignore", as we have now, is just asking for user >> confusion and debugging headaches, as detailed by Toshio. A _worse_ user >> experience, IMO, than having a program fail when undecodable filenames >> match the selection criteria. > > Do you really not care about the risk where apps that weren't written > to be prepared to handle this will be rendered completely useless if a > single file in a directory has an unencodable name? This is similar to > an issue that Python had for a long time where it wouldn't start up if > the current directory contained non-ASCII characters. No, I do care. In another message I agreed with you that having the ap not fail was a reasonable goal. What I'm saying is that having it ignore the undecodable files fail _silently_ is bad. And not picking up a file that matches some selection criteria (ex: *.py) because it is undecodable is a _failure_, in my opinion, that is _worse_ than getting a traceback because there's an undecodable file in the directory. But I'm happy with just issuing a warning by default. That would mean it doesn't fail silently, but neither does it crash. Seems like the best compromise with the broken nature of the real world IT environment. > Given that most developers will not have this issue in their own > environment, most apps will not be prepared for this issue, and that > makes it worse for the app's user! It is exactly because most developers won't have the issue in their own environment that ignoring files silently is a problem. If they did, they'd fix their code before it went out the door. Since they don't, when their code is used by somebody in a mixed encoding environment, the programs _will_ fail by ignoring files that they should process. The question, it seems to me, is do they fail silently and mysteriously by failing to process files they are supposed to, or do they fail with at least a little bit of noise? --RDM From guido at python.org Mon Dec 8 21:12:57 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Dec 2008 12:12:57 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493C0FE1.30506@gmail.com> <493CF2F5.9000904@gmail.com> Message-ID: On Mon, Dec 8, 2008 at 12:07 PM, wrote: > On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote: >> >> On Mon, Dec 8, 2008 at 10:34 AM, wrote: >>> >>> I'm in favor of an option to control what happens. >>> >>> I just really really don't want the _default_ to be "ignore". Defaulting >>> to a warning is fine with me, as would be defaulting to a traceback. >>> >>> But defaulting to "silently ignore", as we have now, is just asking for >>> user >>> confusion and debugging headaches, as detailed by Toshio. A _worse_ user >>> experience, IMO, than having a program fail when undecodable filenames >>> match the selection criteria. >> >> Do you really not care about the risk where apps that weren't written >> to be prepared to handle this will be rendered completely useless if a >> single file in a directory has an unencodable name? This is similar to >> an issue that Python had for a long time where it wouldn't start up if >> the current directory contained non-ASCII characters. > > No, I do care. In another message I agreed with you that having the > ap not fail was a reasonable goal. What I'm saying is that having it > ignore the undecodable files fail _silently_ is bad. And not picking > up a file that matches some selection criteria (ex: *.py) because it is > undecodable is a _failure_, in my opinion, that is _worse_ than getting > a traceback because there's an undecodable file in the directory. > > But I'm happy with just issuing a warning by default. That would mean > it doesn't fail silently, but neither does it crash. Seems like the > best compromise with the broken nature of the real world IT > environment. OK, I can live with that too. >> Given that most developers will not have this issue in their own >> environment, most apps will not be prepared for this issue, and that >> makes it worse for the app's user! > > It is exactly because most developers won't have the issue in their own > environment that ignoring files silently is a problem. If they did, > they'd fix their code before it went out the door. Since they don't, > when their code is used by somebody in a mixed encoding environment, > the programs _will_ fail by ignoring files that they should process. > The question, it seems to me, is do they fail silently and mysteriously > by failing to process files they are supposed to, or do they fail with > at least a little bit of noise? A warning is fine. Whether the app *fails* or *succeeds* when the warning is issued depends on what the app is trying to do and what the user expects. There certainly are valid use cases for both, but I expect that succeeding noisily is going to be at least as common as failing (in the sense of not doing the right thing, not necessarily crashing) noisily. This is an improvement over always crashing. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ncoghlan at gmail.com Mon Dec 8 21:35:35 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 06:35:35 +1000 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C70B9.2030601@cheimes.de> <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org> Message-ID: <493D8517.60904@gmail.com> Brett Cannon wrote: > On Mon, Dec 8, 2008 at 05:11, Barry Warsaw wrote: >> On Dec 7, 2008, at 7:56 PM, Christian Heimes wrote: >>> Barry Warsaw wrote: >>>> I'm personally okay with performance fixes in point releases, as long it >>>> doesn't change API or add additional features. >>> Does your okay include or exclude new internal APIs like new helper >>> functions or a new C modules? >> I /personally/ don't have a problem with that, but we need consensus before >> that becomes policy. > Internal as in just for us I am fine with, but not nothing publicly available. Where would adding a (undocumented) get_filename() method to ZipImporter objects for the benefit of the -m switch fit then? There are a few things which don't always work properly because runpy doesn't currently know how to set __file__ properly when the module comes a zipfile. Although now that I think about it, I could actually fix that "the right way" (with a documented get_filename() method on ZipImporter) for 2.7 and 3.1, while using a runpy internal workaround specifically for ZipImporter instances in the maintenance branches... Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From dato at net.com.org.es Mon Dec 8 19:51:57 2008 From: dato at net.com.org.es (Adeodato =?utf-8?B?U2ltw7M=?=) Date: Mon, 8 Dec 2008 19:51:57 +0100 Subject: [Python-Dev] [PATCH] Make 2to3 --write preserve file mode (eg. execution bit) Message-ID: <20081208185157.GA19135@chistera.yi.org> Hello, after using 2to3 --write over some scripts, I found it very cumbersome having to run `chmod +x` on each of them afterwards. The attached patch is a possible way to fix this issue. It'd be great if somebody could apply it, or write a more appropriate fix. Many thanks in advance! P.S.: Please CC me on replies. -- Adeodato Sim? dato at net.com.org.es Debian Developer adeodato at debian.org Listening to: Manolo Garc?a - Prend? la flor -------------- next part -------------- A non-text attachment was scrubbed... Name: 2to3_preserve_mode.diff Type: text/x-diff Size: 584 bytes Desc: not available URL: From mal at egenix.com Mon Dec 8 21:37:52 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 08 Dec 2008 21:37:52 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: <493D85A0.6060601@egenix.com> On 2008-12-08 19:26, Guido van Rossum wrote: > On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy wrote: >> Here is a possible use case: I want filenames as 3.0 strings and I >> anticipate no problems at present but, as you say above, something might >> happen years in the future. I am using 3.0 *because* of the strings == >> unicode feature. I would like to write >> >> try: >> files = os.listdir(somedir, errors = strict) >> except OSError as e: >> log() >> files = os.listdir(somedir) >> >> and go one without the problem file but not without logging the problem so a >> future maintainer can consider what to do about it, but only when there is >> an actual need to think about it. If that error parameter is the same as in unicode(value, errors), then this would be a useful feature: People could then choose among the already existing error handlers ('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register their own ones via the codecs module. Such application specific error handlers could then also apply whatever fancy round-trip safe encoding of non-decodable bytes to Unicode escapes, private code points, etc. as seen fit by the application. Perhaps we should also add an ''encoding'' parameter that can be set on a per directory basis (if necessary) and defaults to the global file system encoding. If an application hits directory that is known to cause problems, it could then chose to receive the file names in a different, more suitable encoding. This allows implementing fallback mechanisms with a list of common encodings for a locale. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 08 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From solipsis at pitrou.net Mon Dec 8 21:39:07 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Dec 2008 20:39:07 +0000 (UTC) Subject: [Python-Dev] 3.0.1 possibilities References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C70B9.2030601@cheimes.de> <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org> <493D8517.60904@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > Where would adding a (undocumented) get_filename() method to ZipImporter > objects for the benefit of the -m switch fit then? Why not call it _get_filename() in 3.0 and get_filename() in 3.1? From solipsis at pitrou.net Mon Dec 8 21:45:50 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Dec 2008 20:45:50 +0000 (UTC) Subject: [Python-Dev] Python-3.0, unicode, and os.environ References: <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493D85A0.6060601@egenix.com> Message-ID: M.-A. Lemburg egenix.com> writes: > > Such application specific error handlers could then also apply > whatever fancy round-trip safe encoding of non-decodable bytes > to Unicode escapes, private code points, etc. as seen fit by the > application. I'd argue that such fancy round-trip safe error handler should be provided by Python. It's not reasonable to expect application coders to come up with their own codec variation based on subtle details of the unicode spec. Regards Antoine. From ncoghlan at gmail.com Mon Dec 8 21:46:53 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 06:46:53 +1000 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: Message-ID: <493D87BD.90106@gmail.com> Antoine Pitrou wrote: > For the one-dimensional case, I had in mind a simple scheme where the Py_buffer > struct has an additional two-member Py_ssize_t array. Then `shape` and `strides` > can point to the first and second member of this array, respectively. This > wouldn't solve the multi-dimensional case, however. > > Thanks for any ideas on how to solve this. Actually, I think your suggested scheme for the one-dimensional case shows the way forward: ownership of the shape and strides memory belongs to the object issuing the Py_buffer struct, and that object needs to deal with it when the buffer is released. Defining a larger memory chunk with the Py_buffer as the first item and the shape and stride info tacked onto the end and returning that from PyObject_GetBuffer() means that the shape/stride info will be released automatically when the view is released via PyBuffer_Release(). For more complicated cases, the object providing the views may need to do some internally bookkeeping to map from Py_buffer pointers to separately allocated shape/stride information and release those when the views are released. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Mon Dec 8 21:50:47 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 06:50:47 +1000 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C70B9.2030601@cheimes.de> <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org> <493D8517.60904@gmail.com> Message-ID: <493D88A7.60701@gmail.com> Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> Where would adding a (undocumented) get_filename() method to ZipImporter >> objects for the benefit of the -m switch fit then? > > Why not call it _get_filename() in 3.0 and get_filename() in 3.1? Actually, since it should only be a fairly trivial couple of lines of code, I think I'm going to put it in the runpy._get_filename() helper function in the maintenance branches and only move it over to ZipImporter on the trunk and the py3k branch. That way it's completely unambiguous that this is just a bug fix for runpy rather than a new feature for ZipImporter. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From dickinsm at gmail.com Mon Dec 8 21:56:25 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Mon, 8 Dec 2008 20:56:25 +0000 Subject: [Python-Dev] [PATCH] Make 2to3 --write preserve file mode (eg. execution bit) In-Reply-To: <20081208185157.GA19135@chistera.yi.org> References: <20081208185157.GA19135@chistera.yi.org> Message-ID: <5c6f2a5d0812081256l7926602cra099ae25e80a11a9@mail.gmail.com> On Mon, Dec 8, 2008 at 6:51 PM, Adeodato Sim? wrote: > > The attached patch is a possible way to fix this issue. It'd be great if > somebody could apply it, or write a more appropriate fix. Please could you submit your patch to the bug tracker, at http://bugs.python.org That way it's less likely to get lost. :) Thanks, Mark From barry at python.org Mon Dec 8 22:01:29 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 8 Dec 2008 16:01:29 -0500 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C70B9.2030601@cheimes.de> <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org> <493D8517.60904@gmail.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 8, 2008, at 3:39 PM, Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> >> Where would adding a (undocumented) get_filename() method to >> ZipImporter >> objects for the benefit of the -m switch fit then? > > Why not call it _get_filename() in 3.0 and get_filename() in 3.1? +1 - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBST2LKXEjvBPtnXfVAQJZzAP/avX4YgpBSmOAh6Zc2TZEnsllRz6CRa86 bEPCWF1an7H9zzDl6gS5ZjbstXoEPf0Irr+W6BTSLVnRT/G7rFgw5q/QlG2yqvCP dgOCT1Vr3PXgXouNkGaBFI5L/Aw2fuDadWUpGeA3FgH3PxaAH0XAr5LcKP2SidXc v5nDim8lCxc= =k3gW -----END PGP SIGNATURE----- From mal at egenix.com Mon Dec 8 22:01:40 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 08 Dec 2008 22:01:40 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493D85A0.6060601@egenix.com> Message-ID: <493D8B34.1070506@egenix.com> On 2008-12-08 21:45, Antoine Pitrou wrote: > M.-A. Lemburg egenix.com> writes: >> Such application specific error handlers could then also apply >> whatever fancy round-trip safe encoding of non-decodable bytes >> to Unicode escapes, private code points, etc. as seen fit by the >> application. > > I'd argue that such fancy round-trip safe error handler should be provided by > Python. It's not reasonable to expect application coders to come up with their > own codec variation based on subtle details of the unicode spec. Fair enough. We could add some e.g. * a round-trip safe escape error handler that uses a Unicode private code point area which we officially reserve for the Python interpreter * a human readable escape error handler that encodes the problem bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1 encoded directory name instead of failing * a warning error handler that replaces the problem cases with a question mark and issues a warning through the warning framework -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 08 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Mon Dec 8 22:03:56 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 07:03:56 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493CF2F5.9000904@gmail.com> Message-ID: <493D8BBC.10503@gmail.com> Terry Reedy wrote: > Nick Coghlan wrote: >> Terry Reedy wrote: >>> This to be is an argument for keeping the default the current behavior, >>> but not for rejecting flexibility. The computing world seems to be >>> messier than we would like and worse that I realized until this week. As >>> you say below, people need to better anticipate the future, and an >>> errors parameter would help do that. >> >> It just occurred to me that this seems like a perfect situation to >> address via the warning system. > > I disagree. > >> The normal warnings mechanics can then >> be used to turn it into an exception if so desired, and this can be done >> once per application rather than having to pass a separate argument >> every time the affected APIs are called. > > The warning mechanism, as far as I know, because I have never dealt with > it (and do not want to) is for version issues. No, it's just DeprecationWarning in particular that is specific to versioning issues. That's obviously the one that comes up most often for core development, but there are other warnings as well (e.g. the off-by-default ImportWarning when potential packages are skipped because __init__.py is missing). For this particular case, I would suggest adding something like EnvironmentWarning (to parallel the EnvironmentError that is the common parent of OSError and IOError). > In any case, the snippet > that you clipped > > try: > files = os.listdir(somedir, errors = strict) > except OSError as e: > log() > files = os.listdir(somedir) > > specifically requires a per call parameter. True, but the decision to have "errors=warn" as the default behaviour is independent of the decision of whether or not to allow the behaviour to be changed on a case-by-case basis. There is nothing stopping us from doing both. >> And the decoding problems don't pass silently either - they just get >> emitted as a warning by default instead of causing the application to >> crash. > > Do they get automatically logged? By default warnings are written to sys.stderr. Whether that gets logged or not will depend on the nature of the application There are also mechanisms in warnings that allow an application to override the handling of warnings (and for 2.7/3.1, there are mechanisms in logging to make it easy to hook the warning system and the logging system together, so that warnings are automatically logged). > In any case, the errors parameter has > an in between option to neither ignore or raise but to replace and give > *something* printable. That's true, and why I would actually support doing both. Adding the warning is a more pressing need though, since it is what will prevent the errors from passing silently in the default case. > This situation seems like an ideal situation for a parameter which gives > the application program who uses Python a range of options to working > with an un-ideal world. I am really flabbergasted why there is so much > opposition to doing so in favor of more difficult or less functional > alternatives. A warning will stop the failure from passing silently in the default case - that's solving a different problem to the one that the error handling argument will solve. I do agree that being able to override the handling on a per-call basis could be a useful feature. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From alexander.belopolsky at gmail.com Mon Dec 8 22:05:08 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 8 Dec 2008 16:05:08 -0500 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493D87BD.90106@gmail.com> References: <493D87BD.90106@gmail.com> Message-ID: I don't have much to add to Nick's reply other than to point you to numpy, , as a reference implementation. You may also get better responses on the numpy list, < numpy-discussion at scipy.org>. On Mon, Dec 8, 2008 at 3:46 PM, Nick Coghlan wrote: > Antoine Pitrou wrote: >> For the one-dimensional case, I had in mind a simple scheme where the Py_buffer >> struct has an additional two-member Py_ssize_t array. Then `shape` and `strides` >> can point to the first and second member of this array, respectively. This >> wouldn't solve the multi-dimensional case, however. >> >> Thanks for any ideas on how to solve this. > > Actually, I think your suggested scheme for the one-dimensional case > shows the way forward: ownership of the shape and strides memory belongs > to the object issuing the Py_buffer struct, and that object needs to > deal with it when the buffer is released. Defining a larger memory chunk > with the Py_buffer as the first item and the shape and stride info > tacked onto the end and returning that from PyObject_GetBuffer() means > that the shape/stride info will be released automatically when the view > is released via PyBuffer_Release(). > > For more complicated cases, the object providing the views may need to > do some internally bookkeeping to map from Py_buffer pointers to > separately allocated shape/stride information and release those when the > views are released. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com > From rhamph at gmail.com Mon Dec 8 22:06:28 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 8 Dec 2008 14:06:28 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493D85A0.6060601@egenix.com> Message-ID: On Mon, Dec 8, 2008 at 1:45 PM, Antoine Pitrou wrote: > M.-A. Lemburg egenix.com> writes: >> >> Such application specific error handlers could then also apply >> whatever fancy round-trip safe encoding of non-decodable bytes >> to Unicode escapes, private code points, etc. as seen fit by the >> application. > > I'd argue that such fancy round-trip safe error handler should be provided by > Python. It's not reasonable to expect application coders to come up with their > own codec variation based on subtle details of the unicode spec. Except they're clearly NOT part of the unicode spec. Moreover, whatever tricks you use vary depending on if your garbage input is from UTF-8, UTF-16, or UTF-32 (or any other arbitrary encoding, like CP-1252 or Shift-JIS.) At this point someone suggests we have a type that can store an arbitrary mix of unicode and bytes, so the undecodable portions stay in their original form. :P -- Adam Olsen, aka Rhamphoryncus From ncoghlan at gmail.com Mon Dec 8 22:06:49 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 07:06:49 +1000 Subject: [Python-Dev] 3.0.1 possibilities In-Reply-To: References: <1afaf6160812061518m2a6ea910y8de6a4594f2e95b1@mail.gmail.com> <493BD1F2.5080300@holdenweb.com> <493C3BBA.1040106@v.loewis.de> <493C64B9.2040701@cheimes.de> <493C70B9.2030601@cheimes.de> <133FA4E1-5BD2-4EEF-845C-E6F4CB4B330B@python.org> <493D8517.60904@gmail.com> Message-ID: <493D8C69.4010708@gmail.com> Barry Warsaw wrote: > On Dec 8, 2008, at 3:39 PM, Antoine Pitrou wrote: > >> Nick Coghlan gmail.com> writes: >>> >>> Where would adding a (undocumented) get_filename() method to ZipImporter >>> objects for the benefit of the -m switch fit then? > >> Why not call it _get_filename() in 3.0 and get_filename() in 3.1? > > +1 Well, with release manager blessing I'll go with that approach then :) Now, where are those round tuits to actually get it implemented... Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Mon Dec 8 22:12:25 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Dec 2008 21:12:25 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= References: <493D87BD.90106@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > Actually, I think your suggested scheme for the one-dimensional case > shows the way forward: ownership of the shape and strides memory belongs > to the object issuing the Py_buffer struct, and that object needs to > deal with it when the buffer is released. Defining a larger memory chunk > with the Py_buffer as the first item and the shape and stride info > tacked onto the end and returning that from PyObject_GetBuffer() means > that the shape/stride info will be released automatically when the view > is released via PyBuffer_Release(). Ok, so another question: given that this will change the Py_buffer layout a bit, can it go into 3.0.1 and 2.6.2? From solipsis at pitrou.net Mon Dec 8 22:14:46 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Dec 2008 21:14:46 +0000 (UTC) Subject: [Python-Dev] Python-3.0, unicode, and os.environ References: <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493D85A0.6060601@egenix.com> Message-ID: Adam Olsen gmail.com> writes: > > Except they're clearly NOT part of the unicode spec. This is always the same discussion going in circles. I know they're not part of the unicode spec, but practicality beats purity and if the said error handler comes with an appropriate warning in the official doc, then why not? In any case, +1 to Marc-Andr?'s proposal. From rhamph at gmail.com Mon Dec 8 22:32:00 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 8 Dec 2008 14:32:00 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493D8B34.1070506@egenix.com> References: <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493D85A0.6060601@egenix.com> <493D8B34.1070506@egenix.com> Message-ID: On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg wrote: > On 2008-12-08 21:45, Antoine Pitrou wrote: >> M.-A. Lemburg egenix.com> writes: >>> Such application specific error handlers could then also apply >>> whatever fancy round-trip safe encoding of non-decodable bytes >>> to Unicode escapes, private code points, etc. as seen fit by the >>> application. >> >> I'd argue that such fancy round-trip safe error handler should be provided by >> Python. It's not reasonable to expect application coders to come up with their >> own codec variation based on subtle details of the unicode spec. > > Fair enough. We could add some e.g. > > * a round-trip safe escape error handler that uses a Unicode private > code point area which we officially reserve for the Python > interpreter This would of course alter the behaviour of those private code points, preventing them from round-tripping properly. I don't think round-tripping can be done from an error handler. You need a full codec to do it. A simple option is 8859-1. Or, ya know, bytes. This has long since gotten repetitive.. > * a human readable escape error handler that encodes the problem > bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1 > encoded directory name instead of failing Similar to '?'.encode('ascii', 'backslashreplace')? I'm +1 on making that work. > * a warning error handler that replaces the problem cases with > a question mark and issues a warning through the warning > framework I dub thee errors='warnreplace'. -- Adam Olsen, aka Rhamphoryncus From a.badger at gmail.com Mon Dec 8 22:36:30 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Mon, 08 Dec 2008 13:36:30 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493C0FE1.30506@gmail.com> <493CF2F5.9000904@gmail.com> Message-ID: <493D935E.9030800@gmail.com> Guido van Rossum wrote: > On Mon, Dec 8, 2008 at 12:07 PM, wrote: >> On Mon, 8 Dec 2008 at 11:25, Guido van Rossum wrote: >> But I'm happy with just issuing a warning by default. That would mean >> it doesn't fail silently, but neither does it crash. Seems like the >> best compromise with the broken nature of the real world IT >> environment. > > OK, I can live with that too. > Same here. This lets the application specify globally what should happen (exception, warning, ignore via the warnings filters) and should give enough context that it doesn't become a mysterious error in the program. The per method addition of an errors argument so that this isoverridable locally as well as globally is also a nice touch but can be done separately from this step. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From victor.stinner at haypocalc.com Mon Dec 8 22:39:18 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 8 Dec 2008 22:39:18 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493D85A0.6060601@egenix.com> References: <493D85A0.6060601@egenix.com> Message-ID: <200812082239.18802.victor.stinner@haypocalc.com> > ('strict', 'ignore', 'replace', 'xmlcharrefreplace') replace (or xmlcharrefreplace) is just useless because you will not be unable to open or rename the file... You just know that there is a strange file in the directory. From ncoghlan at gmail.com Mon Dec 8 22:42:37 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 07:42:37 +1000 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> Message-ID: <493D94CD.5040209@gmail.com> Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> Actually, I think your suggested scheme for the one-dimensional case >> shows the way forward: ownership of the shape and strides memory belongs >> to the object issuing the Py_buffer struct, and that object needs to >> deal with it when the buffer is released. Defining a larger memory chunk >> with the Py_buffer as the first item and the shape and stride info >> tacked onto the end and returning that from PyObject_GetBuffer() means >> that the shape/stride info will be released automatically when the view >> is released via PyBuffer_Release(). > > Ok, so another question: given that this will change the Py_buffer layout a bit, > can it go into 3.0.1 and 2.6.2? No, you misunderstand what I meant. Py_buffer doesn't need to be changed at all. The *issuing type* would define a new structure with the additional fields, such as: struct _my_Py_buffer { Py_buffer view; SHAPE_TYPE shape; STRIDES_TYPE strides; } Internally, the object would use these instead of vanilla Py_buffer objects, and set the shape and strides pointers inside the view field to refer to the shape and strides fields. Clients wouldn't need to know or care that the shape and stride information had been tacked on to the end of the Py_buffer struct. When the buffer was released via PyBuffer_Release, the object would throw away the whole _my_Py_buffer structure (since the pointers are the same). Alexander's suggestion of going and looking at what the numpy folks have done in this area is probably a good idea too. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From mal at egenix.com Mon Dec 8 22:44:30 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 08 Dec 2008 22:44:30 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493D85A0.6060601@egenix.com> <493D8B34.1070506@egenix.com> Message-ID: <493D953E.10107@egenix.com> On 2008-12-08 22:32, Adam Olsen wrote: > On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg wrote: >> On 2008-12-08 21:45, Antoine Pitrou wrote: >>> M.-A. Lemburg egenix.com> writes: >>>> Such application specific error handlers could then also apply >>>> whatever fancy round-trip safe encoding of non-decodable bytes >>>> to Unicode escapes, private code points, etc. as seen fit by the >>>> application. >>> I'd argue that such fancy round-trip safe error handler should be provided by >>> Python. It's not reasonable to expect application coders to come up with their >>> own codec variation based on subtle details of the unicode spec. >> Fair enough. We could add some e.g. >> >> * a round-trip safe escape error handler that uses a Unicode private >> code point area which we officially reserve for the Python >> interpreter > > This would of course alter the behaviour of those private code points, > preventing them from round-tripping properly. > > I don't think round-tripping can be done from an error handler. You > need a full codec to do it. A simple option is 8859-1. Or, ya know, > bytes. This has long since gotten repetitive.. The error handler would just map the problem bytes to the private area. The application would then have to decide what to do with them, ie. the error handler only provides one half of the round- tripping. And that's on purpose: I don't believe we can come up with some magic solution for the encodings problem. This is essentially something that applications will have to solve on a case-by-case basis. >> * a human readable escape error handler that encodes the problem >> bytes to say hex escapes, e.g. gives Andr\xe9 for a Latin-1 >> encoded directory name instead of failing > > Similar to '?'.encode('ascii', 'backslashreplace')? I'm +1 on making that work. Yes. >> * a warning error handler that replaces the problem cases with >> a question mark and issues a warning through the warning >> framework > > I dub thee errors='warnreplace'. Yep, something along those lines. Perhaps there are more and better alternatives. These suggestions are just to show how the idea could be put to some real-life use. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 08 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rhamph at gmail.com Mon Dec 8 22:47:01 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 8 Dec 2008 14:47:01 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <493CF2F5.9000904@gmail.com> Message-ID: On Mon, Dec 8, 2008 at 1:12 PM, Guido van Rossum wrote: > On Mon, Dec 8, 2008 at 12:07 PM, wrote: >> But I'm happy with just issuing a warning by default. That would mean >> it doesn't fail silently, but neither does it crash. Seems like the >> best compromise with the broken nature of the real world IT >> environment. > > OK, I can live with that too. +1 -- Adam Olsen, aka Rhamphoryncus From mal at egenix.com Mon Dec 8 22:47:21 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 08 Dec 2008 22:47:21 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812082239.18802.victor.stinner@haypocalc.com> References: <493D85A0.6060601@egenix.com> <200812082239.18802.victor.stinner@haypocalc.com> Message-ID: <493D95E9.4000104@egenix.com> On 2008-12-08 22:39, Victor Stinner wrote: >> ('strict', 'ignore', 'replace', 'xmlcharrefreplace') > > replace (or xmlcharrefreplace) is just useless because you will not be unable > to open or rename the file... You just know that there is a strange file in > the directory. Right, but that's already a lot better than not knowing of the file's existence at all :-) Note that the above are standard error handlers for Unicode conversions. The rest of the email you cut away has more useful error handlers for the purpose in question. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 08 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dato at net.com.org.es Mon Dec 8 22:45:29 2008 From: dato at net.com.org.es (Adeodato =?utf-8?B?U2ltw7M=?=) Date: Mon, 8 Dec 2008 22:45:29 +0100 Subject: [Python-Dev] [PATCH] Make 2to3 --write preserve file mode (eg. execution bit) In-Reply-To: <5c6f2a5d0812081256l7926602cra099ae25e80a11a9@mail.gmail.com> References: <20081208185157.GA19135@chistera.yi.org> <5c6f2a5d0812081256l7926602cra099ae25e80a11a9@mail.gmail.com> Message-ID: <20081208214529.GA23974@chistera.yi.org> * Mark Dickinson [Mon, 08 Dec 2008 20:56:25 +0000]: > On Mon, Dec 8, 2008 at 6:51 PM, Adeodato Sim? wrote: > > The attached patch is a possible way to fix this issue. It'd be great if > > somebody could apply it, or write a more appropriate fix. > Please could you submit your patch to the bug tracker, at > http://bugs.python.org > That way it's less likely to get lost. :) Ok, submitted as #4602. Thanks, -- Adeodato Sim? dato at net.com.org.es Debian Developer adeodato at debian.org As scarce as truth is, the supply has always been in excess of the demand. -- Josh Billings From guido at python.org Mon Dec 8 22:54:41 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Dec 2008 13:54:41 -0800 Subject: [Python-Dev] "as" keyword woes In-Reply-To: <200812072206.21908.paul@boddie.org.uk> References: <200812072206.21908.paul@boddie.org.uk> Message-ID: On Sun, Dec 7, 2008 at 1:06 PM, Paul Boddie wrote: > On Sat Dec 6 21:29:09 CET 2008, Guido van Rossum wrote: >> >> On Sat, Dec 6, 2008 at 11:38 AM, Warren DeLano >> wrote: >> > As someone somewhat knowledgable of how parsers work, I do not >> > understand why a method/attribute name "object_name.as(...)" must >> > necessarily conflict with a standalone keyword " as ". It seems to me >> > that it should be possible to unambiguously separate the two without >> > ambiguity or undue complication of the parser. >> >> That's possible with sufficiently powerful parser technology, but >> that's not how the Python parser (and most parsers, in my experience) >> treat reserved words. Reserved words are reserved in all contexts, >> regardless of whether ambiguity could arise. > > Just a quick aside from someone who merely lurks on this list: in SQL, it's > quite possible to use keywords in a fashion similar to that desired by the > inquirer, and it's actually possible to double-quote keywords and use them as > names for things. I'm not advocating more complicated parsing technology for > any Python implementation, but I think it's pertinent to point out that the > technology isn't particularly obscure. >From my experience with SQL, it's nearly as bad as Python in that every single one of the 200+ reserved words in a typical implementation cannot be used as a name in any context without using double quotes. While the double-quote escape is handy (especially given there are so many obscure reserved words) this is not exactly what the OP wanted -- they would have to say x."as"('float'), except using some other notation instead of double quotes. Having to escape it completely kills the OP's claim that 'as' is "simplest and most elegant". -- --Guido van Rossum (home page: http://www.python.org/~guido/) From gruszczy at gmail.com Mon Dec 8 22:55:21 2008 From: gruszczy at gmail.com (=?UTF-8?Q?Filip_Gruszczy=C5=84ski?=) Date: Mon, 8 Dec 2008 22:55:21 +0100 Subject: [Python-Dev] Self in method body Message-ID: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com> There is a large discussion on python-list about Guido's article about new self syntax, therefore I would like to use that to raise similar question: self in the body. Some time ago I was coding in Magik language (http://en.wikipedia.org/wiki/Magik_(programming_language), which is dynamically typed and similar to Smalltalk and actually to Python too - although the syntax is far less appalling. As you can see in the examples, defining methods is very similar to what Guido proposed in his blog, though you don't provide the name of the argument, but the name of the class. Then you just precede attributes with a '.', which is 4 letters less than self. And, well, this rocks ;-) It is really not a problem to type 4 letters (well, six with a coma and a space) in the signature, but it takes a lot of time to type all those selfs inside the function's body. So I was thinking, if this issue could be raised too, when new self syntax is proposed. Simple example looks like this: class bar: def bar.foo(): .x = 5 This could really save a lot of code, while attributes are still easily distinguishable. -- Filip Gruszczy?ski From guido at python.org Mon Dec 8 23:07:49 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Dec 2008 14:07:49 -0800 Subject: [Python-Dev] Nonlocal shortcut In-Reply-To: References: Message-ID: On Sun, Dec 7, 2008 at 2:45 PM, Amaury Forgeot d'Arc wrote: > Hello, > > Fabio Zadrozny wrote: >> Hi, >> >> I'm currently implementing a parser to handle Python 3.0, and one of >> the points I found conflicting with the grammar specification is the >> PEP 3104. >> >> It says that a shortcut would be added to Python 3.0 so that "nonlocal >> x = 0" can be written. However, the latest grammar specification >> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) >> doesn't seem to take that into account... So, can someone enlighten me >> on what should be the correct treatment for that on a grammar that >> wants to support Python 3.0? > > An issue was already filed about this: > http://bugs.python.org/issue4199 > It should be ready for inclusion in 3.0.1. No it should not. It should be put in 3.1. I strongly object against the addition of features of *any* kind to 3.0.1, no matter whether they were promised or announced in a PEP or in the docs or on the 8 o'clock news. This would make 3.0.0 forever a "loser" release. (I find the removal of 'cmp' hard to swallow too, but in a sense the addition of features is worse, as it makes downgrading a risk. Upgrades, no matter how minimal, always represent risks -- however downgrading shouldn't represent risks, unless you happen to depend on a bugfix that wasn't present in the downgrade -- but we're not talking about a bugfix here no matter how you bend the English language.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From paul at boddie.org.uk Mon Dec 8 23:18:55 2008 From: paul at boddie.org.uk (Paul Boddie) Date: Mon, 8 Dec 2008 23:18:55 +0100 Subject: [Python-Dev] "as" keyword woes In-Reply-To: References: <200812072206.21908.paul@boddie.org.uk> Message-ID: <200812082318.55524.paul@boddie.org.uk> On Monday 08 December 2008 22:54:41 Guido van Rossum wrote: > > From my experience with SQL, it's nearly as bad as Python in that > every single one of the 200+ reserved words in a typical > implementation cannot be used as a name in any context without using > double quotes. SQL is a big language; I won't disagree with that! That said, you don't always have to quote names like "end" as I mention below. > While the double-quote escape is handy (especially > given there are so many obscure reserved words) this is not exactly > what the OP wanted -- they would have to say x."as"('float'), except > using some other notation instead of double quotes. Having to escape > it completely kills the OP's claim that 'as' is "simplest and most > elegant". You can do what the OP wants, at least in PostgreSQL, which is fairly conformant. As I wrote on comp.lang.python... create table "create" ( "select" varchar ); select "select" from "create"; select "create".select from "create"; (This from a PostgreSQL 8.2 session.) I don't know whether SQL 1992 actually allows dropping the double-quotes for column names, but this is the kind of thing he has in mind. Paul From rhamph at gmail.com Mon Dec 8 23:25:03 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 8 Dec 2008 15:25:03 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493D953E.10107@egenix.com> References: <493D85A0.6060601@egenix.com> <493D8B34.1070506@egenix.com> <493D953E.10107@egenix.com> Message-ID: On Mon, Dec 8, 2008 at 2:44 PM, M.-A. Lemburg wrote: > On 2008-12-08 22:32, Adam Olsen wrote: >> On Mon, Dec 8, 2008 at 2:01 PM, M.-A. Lemburg wrote: >>> On 2008-12-08 21:45, Antoine Pitrou wrote: >>>> M.-A. Lemburg egenix.com> writes: >>>>> Such application specific error handlers could then also apply >>>>> whatever fancy round-trip safe encoding of non-decodable bytes >>>>> to Unicode escapes, private code points, etc. as seen fit by the >>>>> application. >>>> I'd argue that such fancy round-trip safe error handler should be provided by >>>> Python. It's not reasonable to expect application coders to come up with their >>>> own codec variation based on subtle details of the unicode spec. >>> Fair enough. We could add some e.g. >>> >>> * a round-trip safe escape error handler that uses a Unicode private >>> code point area which we officially reserve for the Python >>> interpreter >> >> This would of course alter the behaviour of those private code points, >> preventing them from round-tripping properly. >> >> I don't think round-tripping can be done from an error handler. You >> need a full codec to do it. A simple option is 8859-1. Or, ya know, >> bytes. This has long since gotten repetitive.. > > The error handler would just map the problem bytes to the private > area. The application would then have to decide what to do with > them, ie. the error handler only provides one half of the round- > tripping. By that point it's already too late. You've already conflated garbage PUA with legitimate PUA. To make it work you need to treat those legitimate PUA scalars as errors too, transforming them. A common example is how escaping replaces a single '\' with '\\'. Hrm. nul-escaping should work. Obviously it can't be used outside the filesystem though, as they may introduce a legitimate nul. -- Adam Olsen, aka Rhamphoryncus From ironfroggy at gmail.com Mon Dec 8 23:52:12 2008 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 8 Dec 2008 17:52:12 -0500 Subject: [Python-Dev] Nonlocal shortcut In-Reply-To: References: Message-ID: <76fd5acf0812081452h26ee56aaxbe736013ca6458b2@mail.gmail.com> Did the original PEP discussion cover debates about the shortcut working for all assignment operators (like += and x[i] =) and the difference between it being one-shot (doesnt affect x for the rest of the function) or simply the unrolling into nonlocal x; x= y as it is? On Mon, Dec 8, 2008 at 5:07 PM, Guido van Rossum wrote: > On Sun, Dec 7, 2008 at 2:45 PM, Amaury Forgeot d'Arc wrote: >> Hello, >> >> Fabio Zadrozny wrote: >>> Hi, >>> >>> I'm currently implementing a parser to handle Python 3.0, and one of >>> the points I found conflicting with the grammar specification is the >>> PEP 3104. >>> >>> It says that a shortcut would be added to Python 3.0 so that "nonlocal >>> x = 0" can be written. However, the latest grammar specification >>> (http://docs.python.org/dev/3.0/reference/grammar.html?highlight=full%20grammar) >>> doesn't seem to take that into account... So, can someone enlighten me >>> on what should be the correct treatment for that on a grammar that >>> wants to support Python 3.0? >> >> An issue was already filed about this: >> http://bugs.python.org/issue4199 >> It should be ready for inclusion in 3.0.1. > > No it should not. It should be put in 3.1. > > I strongly object against the addition of features of *any* kind to > 3.0.1, no matter whether they were promised or announced in a PEP or > in the docs or on the 8 o'clock news. This would make 3.0.0 forever a > "loser" release. > > (I find the removal of 'cmp' hard to swallow too, but in a sense the > addition of features is worse, as it makes downgrading a risk. > Upgrades, no matter how minimal, always represent risks -- however > downgrading shouldn't represent risks, unless you happen to depend on > a bugfix that wasn't present in the downgrade -- but we're not talking > about a bugfix here no matter how you bend the English language.) > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/ironfroggy%40gmail.com > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From steve at pearwood.info Mon Dec 8 23:52:20 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Dec 2008 09:52:20 +1100 Subject: [Python-Dev] Self in method body In-Reply-To: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com> References: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com> Message-ID: <200812090952.21310.steve@pearwood.info> On Tue, 9 Dec 2008 08:55:21 am Filip Gruszczy?ski wrote: > There is a large discussion on python-list about Guido's article > about new self syntax, therefore I would like to use that to raise > similar question: self in the body. Some time ago I was coding in > Magik language > (http://en.wikipedia.org/wiki/Magik_(programming_language), which is > dynamically typed and similar to Smalltalk and actually to Python too > - although the syntax is far less appalling. As you can see in the > examples, defining methods is very similar to what Guido proposed in > his blog, though you don't provide the name of the argument, but the > name of the class. Then you just precede attributes with a '.', which > is 4 letters less than self. And, well, this rocks ;-) > > It is really not a problem to type 4 letters (well, six with a coma > and a space) in the signature, but it takes a lot of time to type all > those selfs inside the function's body. For some definition of "a lot". I've just grabbed a random, heavily OO module from my own code library. It has 60 instances of "self", or 240 characters, out of 18,839 characters in total (including newlines). Removing self will decrease the number of my keystrokes and the amount of pure typing time (excluding thinking time, debugging time) by about 1.2%. I don't call that "a lot" -- it's actually quite small. And it becomes vanishingly trivial when you factor in that most of the time spent programming is not typing but thinking, testing, debugging, etc. Doing the same calculation for BaseHTTPServer.py and SimpleHTTPServer.py in the standard library, I get 1.9% and 2.0% respectively. > This could really save a lot of code, while attributes are still > easily distinguishable. I don't think so. -- Steven From gruszczy at gmail.com Tue Dec 9 00:18:49 2008 From: gruszczy at gmail.com (=?UTF-8?Q?Filip_Gruszczy=C5=84ski?=) Date: Tue, 9 Dec 2008 00:18:49 +0100 Subject: [Python-Dev] Self in method body In-Reply-To: <200812090952.21310.steve@pearwood.info> References: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com> <200812090952.21310.steve@pearwood.info> Message-ID: <1be78d220812081518s7e064430v414da0a73399ac5f@mail.gmail.com> > I've just grabbed a random, heavily OO module from my own code library. > It has 60 instances of "self", or 240 characters, out of 18,839 > characters in total (including newlines). Removing self will decrease > the number of my keystrokes and the amount of pure typing time > (excluding thinking time, debugging time) by about 1.2%. I don't call > that "a lot" -- it's actually quite small. And it becomes vanishingly > trivial when you factor in that most of the time spent programming is > not typing but thinking, testing, debugging, etc. Well, maybe I don't program in Python the "right way" ;-), because it's a bit more in my code. I repeated this test, and for a random module holding some GUI stuff (built using PyQt) and it's more than 5% (213 selfs out of 16204 characters). With a small app for creating dungeon tiles for role playing games I astonishingly got same very similar value (484 * 4 / 35000) ;-) Maybe it's a feature of programming with a lot of gui stuff, which I do. But 1 of the 20 chars used for a self is quite a lot for me. -- Filip Gruszczy?ski From solipsis at pitrou.net Tue Dec 9 00:25:20 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Dec 2008 23:25:20 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > No, you misunderstand what I meant. Py_buffer doesn't need to be changed > at all. The *issuing type* would define a new structure with the > additional fields, such as: With to the current buffer API, this is not possible. It's the caller who allocates the Py_buffer struct (usually on the stack), not the callee. Therefore the callee (e.g. the getbufferproc of the issuing type) cannot choose to allocate a different structure. (of course complex schemes can be devised where the callee maintains its own separate storage for shape and strides, but I don't think we want to go there) > Alexander's suggestion of going and looking at what the numpy folks have > done in this area is probably a good idea too. Well, I'm open to others doing this, but I won't do it myself. My interest is in fixing the most glaring bugs of the buffer API and memoryview object. The numpy folks are welcome to voice their opinions and give advice on python-dev. Regards Antoine. From tjreedy at udel.edu Tue Dec 9 00:58:09 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 08 Dec 2008 18:58:09 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493D85A0.6060601@egenix.com> References: <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493D85A0.6060601@egenix.com> Message-ID: M.-A. Lemburg wrote: >> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy wrote: >>> try: >>> files = os.listdir(somedir, errors = strict) >>> except OSError as e: >>> log() >>> files = os.listdir(somedir) > If that error parameter is the same as in unicode(value, errors), > then this would be a useful feature: Except that unicode becomes str in 3.0, that is exactly my intention. > People could then choose among the already existing error handlers > ('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register > their own ones via the codecs module. These could be passed through from listdir or getenv to str. [Side questions: 1. 'xmlcharrefreplace' is not in the 3.0 LibRef doc or doc string. Should it be or is 'xmlcharrefreplace' an addition for a later version. 2. A garbage value for errors (such as 'blah') is silently ignored (so I cannot test the above). Intended or a bug?] Someone else proposed a new option 'warn', which Guido has accepted to be the default instead of the current 'ignore'. It could not be passed through (unless str were changed or something registered). I believe the implementation of that would be to call str with 'strict' but catch errors and warn instead. Whether there should be 1 warning for each problematic bytes encountered or 1 for each listdir (or whatever) call, possibly with the number of problems, I leave to others to decide. > Such application specific error handlers could then also apply > whatever fancy round-trip safe encoding of non-decodable bytes > to Unicode escapes, private code points, etc. as seen fit by the > application. > > Perhaps we should also add an ''encoding'' parameter that can be > set on a per directory basis (if necessary) and defaults to the > global file system encoding. That could also be passed through, but I will lets others make the argument for it. > > If an application hits directory that is known to cause problems, > it could then chose to receive the file names in a different, > more suitable encoding. This allows implementing fallback > mechanisms with a list of common encodings for a locale. Terry Jan Reedy From tjreedy at udel.edu Tue Dec 9 01:18:26 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 08 Dec 2008 19:18:26 -0500 Subject: [Python-Dev] Self in method body In-Reply-To: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com> References: <1be78d220812081355p3fdc7652q32988b730c78087f@mail.gmail.com> Message-ID: Filip Gruszczy?ski wrote: > There is a large discussion on python-list about Guido's article about That discussion should stay there. > new self syntax, therefore I would like to use that to raise similar > question: self in the body. That has also be heavily discussed, many times, there and here. > ... Then you just precede attributes with a '.', Guido has specifically rejected that, more than once, I believe. > which is 4 letters less than self. As has been said *many* times in previous discussions, you can use 1 letter intead of 4 if you really wish, if saving keystrokes is your highest priority. But please don't rehash these discussions, at least not here. Terry Jan Reedy From amk at amk.ca Tue Dec 9 03:53:17 2008 From: amk at amk.ca (A.M. Kuchling) Date: Mon, 8 Dec 2008 21:53:17 -0500 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> <4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com> Message-ID: <20081209025317.GA1080@amk.local> On Sat, Dec 06, 2008 at 02:42:38PM -0800, Brett Cannon wrote: > No, I am saying I had told AMK I was interested in championing the > session. He chose you, and that's that. One less thing for me to worry > about. =) Brett, I actually think you'd be a good champion for the 11AM transition-planning session. As a reminder, the topics came up with were: Transition plan for rest of 2.x series; goals for 2.7/3.1. - New features & future plans? - Is 2.7 last of the 2.x releases? - Unicode issues - Stdlib plans? (Possibly this is too much material for one session, and something will have to be pruned.) --amk From alexander.belopolsky at gmail.com Tue Dec 9 04:01:18 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 8 Dec 2008 22:01:18 -0500 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: On Mon, Dec 8, 2008 at 6:25 PM, Antoine Pitrou wrote: .. >> Alexander's suggestion of going and looking at what the numpy folks have >> done in this area is probably a good idea too. > > Well, I'm open to others doing this, but I won't do it myself. My interest is in > fixing the most glaring bugs of the buffer API and memoryview object. The numpy > folks are welcome to voice their opinions and give advice on python-dev. > I did not follow numpy development for the last year or more, so I won't qualify as "the numpy folks," but my understanding is that numpy does exactly what Nick recommended: the viewed object owns shape and strides just as it owns the data. The viewing object increases the reference count of the viewed object and thus assures that data, shape and strides don't go away prematurely. I am copying Travis, the author of the PEP 3118, hoping that he would step in on behalf of "the numpy folks." From brett at python.org Tue Dec 9 04:31:56 2008 From: brett at python.org (Brett Cannon) Date: Mon, 8 Dec 2008 19:31:56 -0800 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: <20081209025317.GA1080@amk.local> References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> <4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com> <20081209025317.GA1080@amk.local> Message-ID: On Mon, Dec 8, 2008 at 18:53, A.M. Kuchling wrote: > On Sat, Dec 06, 2008 at 02:42:38PM -0800, Brett Cannon wrote: >> No, I am saying I had told AMK I was interested in championing the >> session. He chose you, and that's that. One less thing for me to worry >> about. =) > > Brett, I actually think you'd be a good champion for the 11AM > transition-planning session. OK, so I guess I do have one more thing to worry about. =) I'd be happy to do that session. > As a reminder, the topics came up with > were: > > Transition plan for rest of 2.x series; goals for 2.7/3.1. > - New features & future plans? > - Is 2.7 last of the 2.x releases? > - Unicode issues > - Stdlib plans? Probably the last two will be wishy-washy in terms of whether they will be reached. -Brett From greg.ewing at canterbury.ac.nz Tue Dec 9 06:46:14 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Dec 2008 18:46:14 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: <493E0626.3090301@canterbury.ac.nz> Antoine Pitrou wrote: > (of course complex schemes can be devised where the callee maintains its own > separate storage for shape and strides, but I don't think we want to go there) But that's exactly where you're supposed to be going. If the object providing the buffer has variable-sized shape and strides arrays, it has to manage the memory for them somehow. -- Greg From v+python at g.nevcal.com Tue Dec 9 07:20:15 2008 From: v+python at g.nevcal.com (Glenn Linderman) Date: Mon, 08 Dec 2008 22:20:15 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> Message-ID: <493E0E1F.4090009@g.nevcal.com> On approximately 12/8/2008 9:30 AM, came the following characters from the keyboard of rdmurray at bitdance.com: > If warnings were emitted, then files would not be silently ignored, > yet the program could still be used. Yep, this is sounding useful. > PS: I'd like to see a similar warning issued when an access attempt > is made through os.environ to a variable that cannot be decoded. And argv ? Seems like the warning technique could be useful for _any_ interface that has been traditionally bytes, because that's the kind of characters that were, but now should move to (Unicode) characters. The warnings could be the same, or very similar. The question is if one global control should handle all types of bytes problems, or if there should be individual controls for each bytes problem, or both. I tend to believe in both; the paranoid can set exactly the ones they've coded for, the aggressive can set the global one. In this manner, new cases can be added to the global settings over time, if more are discovered -- it should be documented to handle future similar issues in a similar manner. -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking From ajm at flonidan.dk Tue Dec 9 09:41:09 2008 From: ajm at flonidan.dk (Anders J. Munch) Date: Tue, 9 Dec 2008 09:41:09 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: Message-ID: <9B1795C95533CA46A83BA1EAD4B010300320B5@flonidanmail.flonidan.net> On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy wrote: >>> try: >>> files = os.listdir(somedir, errors = strict) >>> except OSError as e: >>> log() >>> files = os.listdir(somedir) Instead of a codecs error handler name, how about a callback for converting bytes to str? os.listdir(somedir, decoder=bytes.decode) os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding, errors='xmlcharrefreplace')) os.listdir(somedir, decoder=repr) ISTM that would be simpler and more flexible than going over the codecs registry. One caveat though is that there's no obvious way of telling listdir to skip a name. But if the default behaviour for decoder=None is to skip with a warning, then the need to explicitly ask for files to be skipped would be small. Terry's example would then be: >>> try: >>> files = os.listdir(somedir, decoder=bytes.decode) >>> except UnicodeDecodeError as e: >>> log() >>> files = os.listdir(somedir) - Anders From ncoghlan at gmail.com Tue Dec 9 10:01:17 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 19:01:17 +1000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493E0E1F.4090009@g.nevcal.com> References: <4939CBDB.30305@gmail.com> <20081206143454.GA15293@phd.pp.ru> <20081206185319.12555.178873533.divmod.xquotient.1547@weber.divmod.com> <493B680C.6010605@gmail.com> <20081207070548.12555.1602587595.divmod.xquotient.1747@weber.divmod.com> <493C0FE1.30506@gmail.com> <493E0E1F.4090009@g.nevcal.com> Message-ID: <493E33DD.5010604@gmail.com> Glenn Linderman wrote: > On approximately 12/8/2008 9:30 AM, came the following characters from > the keyboard of rdmurray at bitdance.com: >> PS: I'd like to see a similar warning issued when an access attempt >> is made through os.environ to a variable that cannot be decoded. > > > And argv ? Seems like the warning technique could be useful for _any_ > interface that has been traditionally bytes, because that's the kind of > characters that were, but now should move to (Unicode) characters. > > The warnings could be the same, or very similar. > > The question is if one global control should handle all types of bytes > problems, or if there should be individual controls for each bytes > problem, or both. I tend to believe in both; the paranoid can set > exactly the ones they've coded for, the aggressive can set the global > one. In this manner, new cases can be added to the global settings over > time, if more are discovered -- it should be documented to handle future > similar issues in a similar manner. The warnings system provides that level of granularity for 'free' (so long as we set the stack level appropriately in the C-API warnings call). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Tue Dec 9 10:07:53 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 19:07:53 +1000 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: <493E3569.6010408@gmail.com> Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> No, you misunderstand what I meant. Py_buffer doesn't need to be changed >> at all. The *issuing type* would define a new structure with the >> additional fields, such as: > > With to the current buffer API, this is not possible. It's the caller who > allocates the Py_buffer struct (usually on the stack), not the callee. Therefore > the callee (e.g. the getbufferproc of the issuing type) cannot choose to > allocate a different structure. > > (of course complex schemes can be devised where the callee maintains its own > separate storage for shape and strides, but I don't think we want to go there) In that case, as Greg noted, this is exactly what the callee should be doing. Maintaining a PyDict instance to map from view pointers to shapes and strides info doesn't strike me as a "complex scheme" though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From mal at egenix.com Tue Dec 9 10:22:59 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 09 Dec 2008 10:22:59 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <9B1795C95533CA46A83BA1EAD4B010300320B5@flonidanmail.flonidan.net> References: <9B1795C95533CA46A83BA1EAD4B010300320B5@flonidanmail.flonidan.net> Message-ID: <493E38F3.7020002@egenix.com> On 2008-12-09 09:41, Anders J. Munch wrote: > On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy wrote: >>>> try: >>>> files = os.listdir(somedir, errors = strict) >>>> except OSError as e: >>>> log() >>>> files = os.listdir(somedir) > > Instead of a codecs error handler name, how about a callback for > converting bytes to str? > > os.listdir(somedir, decoder=bytes.decode) > os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding, errors='xmlcharrefreplace')) > os.listdir(somedir, decoder=repr) > > ISTM that would be simpler and more flexible than going over the > codecs registry. One caveat though is that there's no obvious way of > telling listdir to skip a name. But if the default behaviour for > decoder=None is to skip with a warning, then the need to explicitly > ask for files to be skipped would be small. > > Terry's example would then be: > >>>> try: >>>> files = os.listdir(somedir, decoder=bytes.decode) >>>> except UnicodeDecodeError as e: >>>> log() >>>> files = os.listdir(somedir) Well, this is not too far away from just putting the whole decoding logic into the application directly: files = [filename.decode(filesystemencoding, errors='warnreplace') for filename in os.listdir(dir)] (or os.listdirb() if that's where the discussion is heading) ... and that also tells us something about this discussion: we're trying to come up with some magic to work around writing two lines of Python code. I'd just have all the os APIs return bytes and leave whatever conversion to Unicode might be necessary to a higher level API. Think of it: You really only need the Unicode values if you ever want to output those values in text form somewhere. In those cases, it's usually a human reading a log file or screen output. Most other cases, just care about getting some form of file identifier in order to open the file and don't really care about the encoding of the file name at all. It's probably better to have a two helper functions in the os module that take care of the conversion on demand rather than trying to force this conversion even in cases where the application never really needs to write the filename somewhere, e.g. os.decodefilename() and os.encodefilename(). These should then provide some reasonable default logic, e.g. use a 'warnreplace' error handler. Applications are then free to use these converters or implement their own. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 09 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From nd at perlig.de Tue Dec 9 10:42:32 2008 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Tue, 9 Dec 2008 10:42:32 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493E38F3.7020002@egenix.com> References: <9B1795C95533CA46A83BA1EAD4B010300320B5@flonidanmail.flonidan.net> <493E38F3.7020002@egenix.com> Message-ID: <200812091042.32911.nd@perlig.de> * M.-A. Lemburg wrote: > On 2008-12-09 09:41, Anders J. Munch wrote: > > On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy wrote: > >>>> try: > >>>> files = os.listdir(somedir, errors = strict) > >>>> except OSError as e: > >>>> log() > >>>> files = os.listdir(somedir) > > > > Instead of a codecs error handler name, how about a callback for > > converting bytes to str? > > > > os.listdir(somedir, decoder=bytes.decode) > > os.listdir(somedir, decoder=lambda b: b.decode(preferredencoding, > > errors='xmlcharrefreplace')) os.listdir(somedir, decoder=repr) > > > > ISTM that would be simpler and more flexible than going over the > > codecs registry. One caveat though is that there's no obvious way of > > telling listdir to skip a name. But if the default behaviour for > > decoder=None is to skip with a warning, then the need to explicitly > > ask for files to be skipped would be small. > > > > Terry's example would then be: > >>>> try: > >>>> files = os.listdir(somedir, decoder=bytes.decode) > >>>> except UnicodeDecodeError as e: > >>>> log() > >>>> files = os.listdir(somedir) > > Well, this is not too far away from just putting the whole decoding > logic into the application directly: > > files = [filename.decode(filesystemencoding, errors='warnreplace') > for filename in os.listdir(dir)] > > (or os.listdirb() if that's where the discussion is heading) > > ... and that also tells us something about this discussion: we're > trying to come up with some magic to work around writing two > lines of Python code. > > I'd just have all the os APIs return bytes and leave whatever > conversion to Unicode might be necessary to a higher level API. [...] What I'm saying ;-) +1. nd From ajm at flonidan.dk Tue Dec 9 12:04:48 2008 From: ajm at flonidan.dk (Anders J. Munch) Date: Tue, 9 Dec 2008 12:04:48 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <493E38F3.7020002@egenix.com> Message-ID: <9B1795C95533CA46A83BA1EAD4B010300320B6@flonidanmail.flonidan.net> M.-A. Lemburg wrote: > > Well, this is not too far away from just putting the whole decoding > logic into the application directly: > > files = [filename.decode(filesystemencoding, errors='warnreplace') > for filename in os.listdir(dir)] > > (or os.listdirb() if that's where the discussion is heading) I see what you mean, and yes, I think os.listdirb will do just as well. There is no need for any extra parameters to os.listdir. The typical application will just obliviously use os.listdir(dir) and get the default elide-and-warn behaviour for un-decodable names. That rare special application that needs more control can use os.listdirb and handle decoding itself. Using a global registry of error handlers would just get in the way of an application that needs more control. - Anders From solipsis at pitrou.net Tue Dec 9 12:21:43 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Dec 2008 11:21:43 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: Alexander Belopolsky gmail.com> writes: > > I did not follow numpy development for the last year or more, so I > won't qualify as "the numpy folks," but my understanding is that numpy > does exactly what Nick recommended: the viewed object owns shape and > strides just as it owns the data. The viewing object increases the > reference count of the viewed object and thus assures that data, shape > and strides don't go away prematurely. That doesn't work if e.g. you take a slice of a memoryview object, since the shape changes in the process. See http://bugs.python.org/issue4580 From ncoghlan at gmail.com Tue Dec 9 13:33:53 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 22:33:53 +1000 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: <493E65B1.5020004@gmail.com> Antoine Pitrou wrote: > Alexander Belopolsky gmail.com> writes: >> I did not follow numpy development for the last year or more, so I >> won't qualify as "the numpy folks," but my understanding is that numpy >> does exactly what Nick recommended: the viewed object owns shape and >> strides just as it owns the data. The viewing object increases the >> reference count of the viewed object and thus assures that data, shape >> and strides don't go away prematurely. > > That doesn't work if e.g. you take a slice of a memoryview object, since the > shape changes in the process. > See http://bugs.python.org/issue4580 I have zero problem whatsoever if slice assignment TO a memoryview object is permitted only if the shape stays the same (i.e. I think that issue should be closed as "not a bug"). The buffer protocol permits you to edit the DATA held by another object. It doesn't let you edit the *structure* of that object (which is what would be implied by changing the shape of the object). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Tue Dec 9 14:37:11 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 09 Dec 2008 23:37:11 +1000 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: <493E7487.3050300@gmail.com> Antoine Pitrou wrote: > Alexander Belopolsky gmail.com> writes: >> I did not follow numpy development for the last year or more, so I >> won't qualify as "the numpy folks," but my understanding is that numpy >> does exactly what Nick recommended: the viewed object owns shape and >> strides just as it owns the data. The viewing object increases the >> reference count of the viewed object and thus assures that data, shape >> and strides don't go away prematurely. > > That doesn't work if e.g. you take a slice of a memoryview object, since the > shape changes in the process. > See http://bugs.python.org/issue4580 Note that the PEP is unambiguous as to who owns the pointers in the view object: "The exporter is responsible for making sure that any memory pointed to by buf, format, shape, strides, and suboffsets is valid until releasebuffer is called. If the exporter wants to be able to change an object's shape, strides, and/or suboffsets before releasebuffer is called then it should allocate those arrays when getbuffer is called (pointing to them in the buffer-info structure provided) and free them when releasebuffer is called." The problem with memoryview appears to be related to the way it calculates its own length (since that is the check that is failing when the view blows up): >>> a = array('i', range(10)) >>> m = memoryview(a) >>> len(m) # This is the length in bytes, which is WRONG! 40 >>> m2 = memoryview(a)[2:8] >>> len(m2) # This is correct 6 >>> a2 = array('i', range(6)) >>> m[:] = a # But this works >>> m2[:] = a2 # and this does not Traceback (most recent call last): File "", line 1, in ValueError: cannot modify size of memoryview object >>> len(memoryview(a2)) # Ah, 24 != 6 is our problem! 24 Looks to me like there are a couple of bugs here: The first is that memoryview is treating the len field in the Py_buffer struct as the number of objects in the view in a few places instead of as the total number of bytes being exposed (it is actually the latter, as defined in PEP 3118). The second is that the getbuf implementation in array.array is broken. It is ONLY OK for shape to be null when ndim=0 (i.e. a scalar value). An array is NOT a scalar value, so the array objects should be setting the shape pointer to point to an single item array (where shape[0] is the length of the array). memoryview can then be fixed to use shape[0] instead of len to get the number of objects in the view. memoryview also currently gets the shape wrong on slices: >>> m.shape (10,) >>> m2.shape (10,) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Tue Dec 9 15:27:56 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 09 Dec 2008 15:27:56 +0100 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493E65B1.5020004@gmail.com> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> Message-ID: <1228832876.18857.11.camel@localhost> Le mardi 09 d?cembre 2008 ? 22:33 +1000, Nick Coghlan a ?crit : > I have zero problem whatsoever if slice assignment TO a memoryview > object is permitted only if the shape stays the same (i.e. I think that > issue should be closed as "not a bug"). I'm not even talking about slice /assignment/ here, just read-only slicing. Slicing a memoryview must produce another memoryview with a different shape but with the same underlying object. That's why I have to modify the shape field /after/ the new Py_buffer is initialized. > The buffer protocol permits you to edit the DATA held by another > object. It doesn't let you edit the *structure* of that object Perhaps, but it's necessary for slicing. > The first is that memoryview is treating the len field in the > Py_buffer struct as the number of objects in the view in a few places > instead of as the total number of bytes being exposed (it is actually > the latter, as defined in PEP 3118). I don't understand the difference between "the number of objects in the view" and "the total number of bytes being exposed". For me it should be the same and the "buf" and "len" fields in the Py_buffer should be usable by any other C function, otherwise they are useless. > memoryview also currently gets the shape wrong on slices: I know, that's what I'm trying to fix... From solipsis at pitrou.net Tue Dec 9 15:56:42 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Dec 2008 14:56:42 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> Message-ID: Antoine Pitrou pitrou.net> writes: > > > The first is that memoryview is treating the len field in the > > Py_buffer struct as the number of objects in the view in a few places > > instead of as the total number of bytes being exposed (it is actually > > the latter, as defined in PEP 3118). > > I don't understand the difference between "the number of objects in the > view" and "the total number of bytes being exposed". For me it should be > the same and the "buf" and "len" fields in the Py_buffer should be > usable by any other C function, otherwise they are useless. Sorry, I had misread your message. Yes, indeed "len" should the number of bytes, not the number of objects. This is also solved as part of the patch I proposed in the aforementioned bug entry. Regards Antoine. From rdmurray at bitdance.com Tue Dec 9 17:38:18 2008 From: rdmurray at bitdance.com (rdmurray at bitdance.com) Date: Tue, 9 Dec 2008 11:38:18 -0500 (EST) Subject: [Python-Dev] RELEASED Python 3.0 final In-Reply-To: <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com> References: <79990c6b0812041452x1fabd55alb5e76ba34c071f2d@mail.gmail.com> <20081205023514.GA1723@amk.local> <20081205035942.12555.426869079.divmod.xquotient.963@weber.divmod.com> <20081205072705.12555.1807176316.divmod.xquotient.1322@weber.divmod.com> <20081206052844.12555.1264888995.divmod.xquotient.1454@weber.divmod.com> <20081206201915.12555.340762929.divmod.xquotient.1697@weber.divmod.com> Message-ID: On Sat, 6 Dec 2008 at 20:19, glyph at divmod.com wrote: > On 05:54 pm, guido at python.org wrote: >> On Fri, Dec 5, 2008 at 9:28 PM, wrote: >> Whenever someone asks me which version to use, I alwasys respond with >> a question -- what do you want to use it for? > > In the longer term, I think that you should look at this as a symptom of a > problem. If you learn Java, you learn the most recent version. If you need > your software to work with an older version, you just pass a special option Sometimes this even works. But it isn't always easy to get it right, and if you are mixing libraries....well, in my real-world experience we wound up upgrading the VM. > to the compiler. If you want your *old* software to work with a *new* > version, it basically just does (at least, 99% of the time). If you specify the source option correctly. It seems to me that 3to2 and 2to3 are the python equivalent to the javac 'target' and 'source' options. Like Guido said, the python community just doesn't have the resources to make them perfect :(. Based on a quick google, the Java community appears to be grappling with these same issues: http://blog.adjective.org/post/2008/02/21/Java-Backwards-Compatability the poster seems intent on maintaining more backward compatibility than we have with python2/3, until you remember that java uses a compile-and-distribute-binaries paradigm and python does not. Once you realize that, the differences in backward compatibility don't seem so large...at least to me. --RDM From foom at fuhm.net Tue Dec 9 18:01:10 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 9 Dec 2008 12:01:10 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <9B1795C95533CA46A83BA1EAD4B010300320B6@flonidanmail.flonidan.net> References: <9B1795C95533CA46A83BA1EAD4B010300320B6@flonidanmail.flonidan.net> Message-ID: On Dec 9, 2008, at 6:04 AM, Anders J. Munch wrote: > The typical application will just obliviously use os.listdir(dir) > and get the default elide-and-warn behaviour for un-decodable names. > That rare special application I guess this is a new definition of rare special application: "an application which deals with user-specified files". This is the problem I see in having two parallel APIs: people keep saying "most applications can just go ahead and use the [broken] unicode string API". If there was a unicode API and a bytes API, but everyone was clear that "always use the bytes API" is the right thing to do, that'd be okay... But, since even python-dev members are saying that only a rare special app needs to care about working with users' existing files, I'm rather worried this API design will cause most programs written in python to be broken. Which seems a shame. > that needs more control can use os.listdirb and handle decoding > itself. James From steve at holdenweb.com Tue Dec 9 18:15:53 2008 From: steve at holdenweb.com (Steve Holden) Date: Tue, 09 Dec 2008 12:15:53 -0500 Subject: [Python-Dev] Floating-point implementations Message-ID: Is anyone aware of any implementations that use other than 64-bit floating-point? I'd be particularly interested in any that use greater precision than the usual 56-bit mantissa. Do modern 64-bit systems implement anything wider than the normal double? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From dickinsm at gmail.com Tue Dec 9 18:24:44 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 9 Dec 2008 17:24:44 +0000 Subject: [Python-Dev] Floating-point implementations In-Reply-To: References: Message-ID: <5c6f2a5d0812090924x68297db3qfb0f95eb64a28b4c@mail.gmail.com> On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden wrote: > Is anyone aware of any implementations that use other than 64-bit > floating-point? I'd be particularly interested in any that use greater > precision than the usual 56-bit mantissa. Do modern 64-bit systems > implement anything wider than the normal double? I don't know of any. There are certainly places in the codebase that assume 56 bits are enough. (I seem to recall it's something like 56 bits for IBM, 53 bits for IEEE 754, 48 for Cray, and 52 or 56 for VAX.) Many systems have a "long double" type, which usually seems to be either 80-bit (with a 64-bit mantissa) or 128-bit. The latter is sometimes implemented as a pair of doubles, effectively giving a 106-bit mantissa, and sometimes as an IEEE extended precision type; I don't know how many bits the mantissa would have in that case, but surely not more than 117. I asked a related question a while ago: http://mail.python.org/pipermail/python-dev/2008-February/076680.html Mark From dickinsm at gmail.com Tue Dec 9 18:33:14 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 9 Dec 2008 17:33:14 +0000 Subject: [Python-Dev] Floating-point implementations In-Reply-To: References: Message-ID: <5c6f2a5d0812090933r6d679e3dk70de5dd129fd86d2@mail.gmail.com> On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden wrote: > precision than the usual 56-bit mantissa. Do modern 64-bit systems > implement anything wider than the normal double? I may have misinterpreted your question. Are you asking simply about what the hardware provides, or about what the C compiler and library support? Or something else entirely? It looks like IEEE-conforming 128-bit floats would have a 113-bit mantissa (including the implicit leading '1' bit). Mark From steve at holdenweb.com Tue Dec 9 18:43:28 2008 From: steve at holdenweb.com (Steve Holden) Date: Tue, 09 Dec 2008 12:43:28 -0500 Subject: [Python-Dev] Floating-point implementations In-Reply-To: <5c6f2a5d0812090933r6d679e3dk70de5dd129fd86d2@mail.gmail.com> References: <5c6f2a5d0812090933r6d679e3dk70de5dd129fd86d2@mail.gmail.com> Message-ID: <493EAE40.5060909@holdenweb.com> Mark Dickinson wrote: > On Tue, Dec 9, 2008 at 5:15 PM, Steve Holden wrote: >> precision than the usual 56-bit mantissa. Do modern 64-bit systems >> implement anything wider than the normal double? > > I may have misinterpreted your question. Are you asking simply > about what the hardware provides, or about what the C compiler > and library support? Or something else entirely? > > It looks like IEEE-conforming 128-bit floats would have a 113-bit > mantissa (including the implicit leading '1' bit). > I was actually asking about Python implementations, and read your original answer as meaning "no, there aren't any". I had assumed, correctly or otherwise, that the C library would have to offer well-integrated support to enable its use in Python. In fact I had assumed it would need to be pretty much a drop-in repleacement, but it sounds as though there are some hard-coded assumptions about float size that would not allow that. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From eckhardt at satorlaser.com Tue Dec 9 19:31:29 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Tue, 9 Dec 2008 19:31:29 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: Message-ID: <200812091931.29905.eckhardt@satorlaser.com> On Monday 08 December 2008, Adam Olsen wrote: > At this point someone suggests we have a type that can store an > arbitrary mix of unicode and bytes, so the undecodable portions stay > in their original form. :P Well, not an arbitrary mix, but a type that just stores whatever comes from the system without further specifying it as either bytes or Unicode: * If you want a string for displaying it, you first have to extract a string from that thing and there you optionally specify the encoding and error behaviour. * If you want to append a string to it, it is automatically encoded in the default encoding, which obviously can fail. * Similarly, e.g. globbing is done on the underlying representation's level, so "*.py" will first have to be converted according to the default encoding. * If you just print it, you will get something that you can make out the decodable parts from, but it will probably be like "{Unicode:u'abcde'}" or "{bytes:b'ab\xf0\x0fcd'}". * If you don't want to display it, but just want to pass it to the system, just use it as is. Yes, this puts an inconvenience on application programmers that up to now always assumed that they received a list of strings from os.readdir(), but that's the way with false assumptions. In any case, they will be aware (from reading the docs) of what the problem is and why there is no way to return a text. Further, they will get tools to convert these paths or environment vars to texts, so it will be simply replacing "os.readdir()" with "map(to_unicode,os.readdir())". Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From lists at larsko.org Tue Dec 9 20:26:51 2008 From: lists at larsko.org (Lars Kotthoff) Date: Tue, 9 Dec 2008 19:26:51 +0000 Subject: [Python-Dev] Forking and pipes Message-ID: <20081209192651.7dfbcf7b@ronin.larsko.net> Dear list, I recently noticed a python program which uses forks and pipes for communication between the processes not behaving as expected. The minimal example program: -------------------------------------------------------------------------------- #!/usr/bin/python import os, sys r, w = os.pipe() write = os.fdopen(w, 'w') print >> write, "foo" pid = os.fork() if pid: os.waitpid(pid, 0) else: sys.exit(0) write.close() read = os.fdopen(r) print read.read() read.close() -------------------------------------------------------------------------------- This prints out "foo" twice although it's only written once to the pipe. It seems that python doesn't flush file descriptors before copying them to the child process, thus resulting in the duplicate message. The equivalent C program behaves as expected, -------------------------------------------------------------------------------- #include #include #include int main(void) { int fds[2]; pid_t pid; char* buf = (char*) calloc(4, sizeof(char)); pipe(fds); write(fds[1], "foo", 3); pid = fork(); if(pid) { waitpid(pid, NULL, 0); } else { return EXIT_SUCCESS; } close(fds[1]); read(fds[0], buf, 3); printf("%s\n", buf); close(fds[0]); free(buf); return EXIT_SUCCESS; } -------------------------------------------------------------------------------- Is this behaviour intentional? I've tested both python and C on Linux, OpenBSD and Solaris (python versions 2.5.2 and 2.3.3), the behaviour was the same everywhere. Thanks, Lars From rhamph at gmail.com Tue Dec 9 20:22:35 2008 From: rhamph at gmail.com (Adam Olsen) Date: Tue, 9 Dec 2008 12:22:35 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812091931.29905.eckhardt@satorlaser.com> References: <200812091931.29905.eckhardt@satorlaser.com> Message-ID: On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt wrote: > On Monday 08 December 2008, Adam Olsen wrote: >> At this point someone suggests we have a type that can store an >> arbitrary mix of unicode and bytes, so the undecodable portions stay >> in their original form. :P > > Well, not an arbitrary mix, but a type that just stores whatever comes from > the system without further specifying it as either bytes or Unicode: > > * If you want a string for displaying it, you first have to extract a string > from that thing and there you optionally specify the encoding and error > behaviour. > * If you want to append a string to it, it is automatically encoded in the > default encoding, which obviously can fail. So the 2.x str, but with a more interesting default encoding than ASCII. It'll work fine on the developer's system, but one day a user will present it with strange input, and boom. You have to be pessimistic here. The default operations should either always work or never work. Using unicode internally and skipping garbage input means the operations always work. Using a bytes API means mixing with unicode never works, unless the programmer explicitly converts, in which case the onus is on them to use proper error handling. The only thing separating this from a bikeshed discussion is that a bikeshed has many equally good solutions, while we have no good solutions. Instead we're trying to find the least-bad one. The unicode/bytes separation is pretty close to that. Adding a warning gets even closer. Adding magic makes it worse. -- Adam Olsen, aka Rhamphoryncus From foom at fuhm.net Tue Dec 9 20:40:11 2008 From: foom at fuhm.net (James Y Knight) Date: Tue, 9 Dec 2008 14:40:11 -0500 Subject: [Python-Dev] Forking and pipes In-Reply-To: <20081209192651.7dfbcf7b@ronin.larsko.net> References: <20081209192651.7dfbcf7b@ronin.larsko.net> Message-ID: <3E4D576A-5E49-4FE0-9AF2-34FFFC3B1594@fuhm.net> On Dec 9, 2008, at 2:26 PM, Lars Kotthoff wrote: > Dear list, > > I recently noticed a python program which uses forks and pipes for > communication between the processes not behaving as expected. The > minimal > example program: > > [snip] > This prints out "foo" twice although it's only written once to the > pipe. It > seems that python doesn't flush file descriptors before copying them > to the > child process, thus resulting in the duplicate message. The > equivalent C > program behaves as expected, > > [snip] > > Is this behaviour intentional? I've tested both python and C on > Linux, OpenBSD > and Solaris (python versions 2.5.2 and 2.3.3), the behaviour was the > same > everywhere. Yes, it's intentional. And, no, your programs aren't equivalent. Rewrite your C program to use fdopen, and fread/fwrite. *Then* it will be equivalent and have the same behavior as the python program. Alternatively, you can change your python program to use os.read/ os.write instead of fdopen and fileobject.read/fileobject.write, if you want your python program to work like the C program. James From shigin at rambler-co.ru Tue Dec 9 20:35:16 2008 From: shigin at rambler-co.ru (Alexander Shigin) Date: Tue, 09 Dec 2008 22:35:16 +0300 Subject: [Python-Dev] Forking and pipes In-Reply-To: <20081209192651.7dfbcf7b@ronin.larsko.net> References: <20081209192651.7dfbcf7b@ronin.larsko.net> Message-ID: <1228851316.24594.5.camel@jenner> ? ???, 09/12/2008 ? 19:26 +0000, Lars Kotthoff ?????: > Dear list, > > I recently noticed a python program which uses forks and pipes for > communication between the processes not behaving as expected. The minimal > example program: If you write ==== r, w = os.pipe() os.write(w, 'foo') pid = os.fork() ==== You'll get the same result as C program. Or if you use fdopen in C program you'll get the same result as Python. The problem with the example is libc buffering. If you say write.flush(), buffer won't be shared with child process and you'll see only one 'foo'. From a.badger at gmail.com Tue Dec 9 21:25:01 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Tue, 09 Dec 2008 12:25:01 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <9B1795C95533CA46A83BA1EAD4B010300320B6@flonidanmail.flonidan.net> Message-ID: <493ED41D.3020900@gmail.com> James Y Knight wrote: > On Dec 9, 2008, at 6:04 AM, Anders J. Munch wrote: >> The typical application will just obliviously use os.listdir(dir) and >> get the default elide-and-warn behaviour for un-decodable names. That >> rare special application > > I guess this is a new definition of rare special application: "an > application which deals with user-specified files". > > This is the problem I see in having two parallel APIs: people keep > saying "most applications can just go ahead and use the [broken] unicode > string API". If there was a unicode API and a bytes API, but everyone > was clear that "always use the bytes API" is the right thing to do, > that'd be okay... But, since even python-dev members are saying that > only a rare special app needs to care about working with users' existing > files, I'm rather worried this API design will cause most programs > written in python to be broken. Which seems a shame. > I agree with you which was part of why I raised this subject but I also think that using the warnings module to issue a warning and ignore the entire problematic entry is a reasonable compromise. Hopefully it will become obvious to people that it's a python3 wart at some point in the future and we'll re-examine the default. But until then, having a printed warning that individual apps can turn into an exception seems like it is less broken than the other alternatives the "rare special application" people can live with :-) -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From ncoghlan at gmail.com Tue Dec 9 22:22:47 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Dec 2008 07:22:47 +1000 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <1228832876.18857.11.camel@localhost> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> Message-ID: <493EE1A7.6050405@gmail.com> Antoine Pitrou wrote: > Le mardi 09 d?cembre 2008 ? 22:33 +1000, Nick Coghlan a ?crit : >> memoryview also currently gets the shape wrong on slices: > > I know, that's what I'm trying to fix... Yes, I was slightly misled by your use of slice assignment to demonstrate the problem. It also turns out that while assignment to memoryviews has issues, and so does slicing, there is a fundamental problem with the length calculation when a memoryview is first created which is further confusing matters. For the slicing problem in particular, memoryview is currently trying to get away with only one Py_buffer object when it needs TWO. The first Py_buffer object needs to describe the view the memoryview has of the target object (i.e. it describes the entire data area of the target). The shape/strides/etc pointers in that struct are owned by the target object. The existing self->view tends to fill this role fairly well. The *second* (currently nonexistent) Py_buffer object needs to describe the memory layout that the memoryview exposes to the rest of the world. The pointers in *this* struct will be owned by the memoryview object and accurately reflect any changes in shape due to slicing operations. Currently, memoryview is trying to make the first Py_buffer also fill the role of the second one, and that obviously isn't going to work for subviews. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From greg.ewing at canterbury.ac.nz Tue Dec 9 23:31:48 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Dec 2008 11:31:48 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493E3569.6010408@gmail.com> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E3569.6010408@gmail.com> Message-ID: <493EF1D4.5090803@canterbury.ac.nz> Nick Coghlan wrote: > Maintaining a PyDict instance to map from view pointers to shapes > and strides info doesn't strike me as a "complex scheme" though. I don't see why a given buffer provider should ever need more than one set of shape/strides arrays at a time. It can allocate them on creation, reallocate them as needed if the shape of its internal data changes, and deallocate them when it goes away. If you are creating view objects that present slices or some other alternative perspective, then the view object itself is a buffer provider and should maintain shape/stride arrays for its particular view of the underlying object. -- Greg From greg.ewing at canterbury.ac.nz Tue Dec 9 23:45:03 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Dec 2008 11:45:03 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: <493EF4EF.6080600@canterbury.ac.nz> Antoine Pitrou wrote: > That doesn't work if e.g. you take a slice of a memoryview object, since the > shape changes in the process. > See http://bugs.python.org/issue4580 I haven't looked in detail at how memoryview is currently implemented, but it seems to me that the way it should work is that whenever you access a slice, it obtains a fresh Py_Buffer from the underlying object, and does the right thing based on the shape/strides from that together with the slice ranges. The only time it should need to allocate its own shape/strides is if you request a Py_Buffer from the memoryview itself, at which time it should obtain a Py_Buffer from the underlying object, update its own shape/strides and pass them to the caller. The underlying Py_Buffer lock should be held until the caller releases the memoryview's Py_Buffer, ensuring that its shape/strides remains valid for as long as they're needed. -- Greg From greg.ewing at canterbury.ac.nz Tue Dec 9 23:54:08 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Dec 2008 11:54:08 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493E7487.3050300@gmail.com> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E7487.3050300@gmail.com> Message-ID: <493EF710.3060509@canterbury.ac.nz> Nick Coghlan wrote: > [from the PEP] "If the exporter wants to be able to change an > object's shape, strides, and/or suboffsets before releasebuffer is > called then it should allocate those arrays when getbuffer is called > (pointing to them in the buffer-info structure provided) and free them > when releasebuffer is called." Even allowing this seems rather dubious to me. I suppose there's no serious danger as long as the block of memory ultimately holding the data doesn't move or change size, but changing the shape could confuse a buffer user that's iterating over the data. -- Greg From solipsis at pitrou.net Wed Dec 10 00:15:58 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Dec 2008 23:15:58 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > For the slicing problem in particular, memoryview is currently trying to > get away with only one Py_buffer object when it needs TWO. Why should it need two? Why couldn't the embedded Py_buffer fullfill all the needs of the memoryview object? If the memoryview can't be a relatively thin object-oriented wrapper around a Py_buffer, then this all screams failure to me. ---- In all honesty, I admit I am annoyed by all the problems with the buffer API / memoryview object, many of which are caused by its utterly bizarre design (and the fact that the design team went missing in action after imposing such a bizarre and complex design on us), and I'm reluctant to add yet another level of byzantine complexity in order to solve those problems. It explains I may sound a bit angry at times :-) If we really need to change things a lot to make them work, we should re-work the buffer API from the ground up, make the Py_buffer struct a true PyObject (that is, a true variable-length object so as to solve the shape and strides allocation issue) and merge it with the current memoryview implementation. It would make things both more simpler and more flexible. But of course it would destroy C-level compatibility with 2.6 / 3.0. Regards Antoine. From greg.ewing at canterbury.ac.nz Wed Dec 10 00:55:43 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Dec 2008 12:55:43 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> Message-ID: <493F057F.4070806@canterbury.ac.nz> Antoine Pitrou wrote: > Why should it need two? Why couldn't the embedded Py_buffer fullfill all the > needs of the memoryview object? Two things here: 1) The memoryview should *not* be holding onto a Py_buffer in between calls to its getitem and setitem methods. It should request one from the underlying object when needed and release it again as soon as possible. 2) The "second" Py_buffer referred to above only needs to be materialized when someone makes a GetBuffer request on the memoryview itself. It's not needed for Python getitem and setitem calls. (The implementation might choose to implement these by creating a temporary Py_buffer, but again, it would only last as long as the call.) > If the memoryview can't be a relatively thin > object-oriented wrapper around a Py_buffer, then this all screams failure to me. It shouldn't be a wrapper around a Py_buffer, it should be a wrapper around the buffer *interface* of the underlying object. > In all honesty, I admit I am annoyed by all the problems with the buffer API / > memoryview object, many of which are caused by its utterly bizarre design It sounds to me like whoever wrote the memoryview implementation didn't understand how the buffer interface is meant to be used. That doesn't mean there's anything wrong with the buffer interface. I have some doubts myself about whether it needs to be as complicated as it is, but I think the basic idea is sound: that Py_buffer objects are ephemeral, to be obtained when needed and not kept for any longer than necessary. -- Greg From greg.ewing at canterbury.ac.nz Wed Dec 10 01:00:40 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Dec 2008 13:00:40 +1300 Subject: [Python-Dev] Forking and pipes In-Reply-To: <20081209192651.7dfbcf7b@ronin.larsko.net> References: <20081209192651.7dfbcf7b@ronin.larsko.net> Message-ID: <493F06A8.9010100@canterbury.ac.nz> Lars Kotthoff wrote: > This prints out "foo" twice although it's only written once to the pipe. It > seems that python doesn't flush file descriptors before copying them to the > child process, thus resulting in the duplicate message. The equivalent C > program behaves as expected, Your Python and C programs are not equivalent -- the C one is writing directly to the file descriptor, whereas the Python one is effectively using a buffered stdio stream. The unflushed stdio buffer is getting copied by the fork, hence the duplicate output. Solution: either (a) flush the Python file object before forking or (b) use os.write() directly on the fd to avoid the buffering. -- Greg From solipsis at pitrou.net Wed Dec 10 01:21:54 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 10 Dec 2008 00:21:54 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493F057F.4070806@canterbury.ac.nz> Message-ID: Greg Ewing canterbury.ac.nz> writes: > > 1) The memoryview should *not* be holding onto a Py_buffer > in between calls to its getitem and setitem methods. It > should request one from the underlying object when needed > and release it again as soon as possible. If the memoryview wasn't holding onto a Py_buffer, one couldn't rely on its length or anything else because the underlying object could be mutated at any moment (even by another thread). It would make memoryview objects basically unusable for anything except bytes objects (which are immutable). > 2) The "second" Py_buffer referred to above only needs to > be materialized when someone makes a GetBuffer request on > the memoryview itself. It's already what is being done, but that's got nothing to do with the problem at hand. We are talking about slicing the memoryview, not taking a (non-sliced) buffer of it. > It's not needed for Python getitem > and setitem calls. What is needed for Python getitem and setitem calls is proper shape information in the embedded Py_buffer struct, otherwise memoryview slices are buggy. In the case of a memoryview slice, the proper shape information can only be computed *after* the Py_buffer is obtained. > It sounds to me like whoever wrote the memoryview implementation > didn't understand how the buffer interface is meant to be used. Perhaps, perhaps not, but without any concrete suggestion we won't go anywhere. As I said, I don't think it would be foolish to revamp the current spec and/or implementation /if we have a precise plan of how to do better/. The /if/ part is important :-) Regards Antoine. From martin at v.loewis.de Wed Dec 10 07:31:29 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 10 Dec 2008 07:31:29 +0100 Subject: [Python-Dev] Floating-point implementations In-Reply-To: References: Message-ID: <493F6241.3050500@v.loewis.de> > Is anyone aware of any implementations that use other than 64-bit > floating-point? As I understand you are asking about Python implementations: sure, the gmpy package supports arbitrary-precision floating point. > I'd be particularly interested in any that use greater > precision than the usual 56-bit mantissa. Nit-pickingly: it's usual that the mantissa is 53-bit. > Do modern 64-bit systems implement anything wider > than the normal double? As Mark said: sure. x86 systems have supported 80-bit "extended" precision for ages. Some architectures have architecture support for 128-bit floats (e.g. Itanium, SPARC v9); it's not clear to me whether they actually implement the long double operations in hardware, or whether they trap and get software-emulated. Regards, Martin From greg.ewing at canterbury.ac.nz Wed Dec 10 11:21:40 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Dec 2008 23:21:40 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493F057F.4070806@canterbury.ac.nz> Message-ID: <493F9834.8030100@canterbury.ac.nz> Antoine Pitrou wrote: > If the memoryview wasn't holding onto a Py_buffer, one couldn't rely on its > length or anything else because the underlying object could be mutated at any > moment Hmm, it seems there are two different approaches that could be taken here to the design of a memoryview object. You seem to be thinking of an "eager" approach where the memoryview keeps the underlying object's memory locked for as long as it exists, thus preventing it from being resized. Whereas I've been thinking of it as being "lazy", in the sense that the memoryview simply remembers the slice parameters it was given, and waits until you access it before making any GetBuffer calls. The lazy version would have the characteristic that creating a slice could succeed even though accessing it later fails due to a range error. I'm not sure that's necessarily a fatally bad thing. I'm also not sure that the eager version would be totally immune to such things. The PEP seems to permit the shape to change while the buffer is locked as long as the overall size and location of the memory doesn't change, so a subsequent access to a formerly-valid slice could still fail. In any case, I think it should be possible to implement either version without the memoryview having to own more than one Py_buffer and one set of shape/strides at a time. Slicing the memoryview creates another memoryview with its own Py_buffer and shape/strides. -- Greg From eckhardt at satorlaser.com Wed Dec 10 11:39:37 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Wed, 10 Dec 2008 11:39:37 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812091931.29905.eckhardt@satorlaser.com> Message-ID: <200812101139.37301.eckhardt@satorlaser.com> On Tuesday 09 December 2008, Adam Olsen wrote: > On Tue, Dec 9, 2008 at 11:31 AM, Ulrich Eckhardt > > wrote: > > On Monday 08 December 2008, Adam Olsen wrote: > >> At this point someone suggests we have a type that can store an > >> arbitrary mix of unicode and bytes, so the undecodable portions stay > >> in their original form. :P > > > > Well, not an arbitrary mix, but a type that just stores whatever comes > > from the system without further specifying it as either bytes or Unicode: > > > > * If you want a string for displaying it, you first have to extract a > > string from that thing and there you optionally specify the encoding and > > error behaviour. > > * If you want to append a string to it, it is automatically encoded in > > the default encoding, which obviously can fail. > > So the 2.x str, but with a more interesting default encoding than > ASCII. It'll work fine on the developer's system, but one day a user > will present it with strange input, and boom. If the system's representation of filenames can not represent a Unicode codepoint that the user entered, trying to open such a file must fail. If it can be represented, for convenience I would allow an implicit conversion. for i in readdir(): copy( i, i+".backup") ... > You have to be pessimistic here. The default operations should either > always work or never work. Using unicode internally and skipping > garbage input means the operations always work. Using a bytes API > means mixing with unicode never works, unless the programmer > explicitly converts, in which case the onus is on them to use proper > error handling. So, if I understand you correctly, you would prefer an explicit conversion to the system's representation: for i in readdir(): copy( i, i+path(".backup")) ... > The only thing separating this from a bikeshed discussion is that a > bikeshed has many equally good solutions, while we have no good > solutions. Instead we're trying to find the least-bad one. The > unicode/bytes separation is pretty close to that. Adding a warning > gets even closer. Adding magic makes it worse. Well, I see two cases: 1. Converting from an uncertain representation to a known one. 2. Converting from a known representation to a known one. The uncertain one is the one used by the filesystem or environment. The known representations are the expected(!) encoding for filesystem and environment and the internal text in Unicode. For case 1, I would require an explicit conversion to make the programmer really aware of the fact that it can fail. For the second case, I would allow an implicit conversion even though it can fail. Anyhow, that is a matter of taste, and I can actually live with your point of view. However, one question still remains: What about the approach in general, i.e. that these texts with an uncertain representation are handled as a separate type? I find this much more appealing that duplicating APIs like readdir() using either overloading on the arguments or a separate readdirb(). Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From dickinsm at gmail.com Wed Dec 10 11:42:10 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Wed, 10 Dec 2008 10:42:10 +0000 Subject: [Python-Dev] Floating-point implementations In-Reply-To: <5c6f2a5d0812090924x68297db3qfb0f95eb64a28b4c@mail.gmail.com> References: <5c6f2a5d0812090924x68297db3qfb0f95eb64a28b4c@mail.gmail.com> Message-ID: <5c6f2a5d0812100242m11042672q17d1a52027c54f68@mail.gmail.com> On Tue, Dec 9, 2008 at 5:24 PM, Mark Dickinson wrote: > I don't know of any. There are certainly places in the codebase that > assume 56 bits are enough. (I seem to recall it's something like > 56 bits for IBM, 53 bits for IEEE 754, 48 for Cray, and 52 or 56 for VAX.) Quick correction, after actually bothering to look things up rather than relying on my poor memory: VAX doubles have either *53* (not 52) or 56 bit mantissas. More precisely, the VAX G_floating format has a 53-bit mantissa (52 bits stored directly, one implicit 'hidden' bit), while the (now rare) D_floating format has a 56-bit mantissa (again, including the implicit 'hidden' bit). Mark From regebro at gmail.com Wed Dec 10 11:55:42 2008 From: regebro at gmail.com (Lennart Regebro) Date: Wed, 10 Dec 2008 11:55:42 +0100 Subject: [Python-Dev] datetime.date.today() raises "AttributeError: time" In-Reply-To: References: <7afdee2f0811160500g44421c26o64765d2acf91a712@mail.gmail.com> <7afdee2f0811160555y3cb71afp460e267c29a96827@mail.gmail.com> Message-ID: <319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com> A funny thing just happened to me. I tried out causing this error, just to see how the error message was somehow different, by creating a time.py in /tmp, and running python from there. Then I removed the time.py, and went on working. Two days later, my usage of zc.buildout are broken with a "module time has no attribute time". Huh? Turns out, I created an empty time.py in /tmp, just to see the error message. By buildout will when creating eggs from checked out modules, copy them to a directory under /tmp, and evidently run python from /tmp to create the eggs. So that process finds the time.pyc, created from the empty time.py, which I hadn't deleted, and breaks! Heh. That was funny. Moral of the story: Don't create python modules with names that clash with build in modules in /tmp, even for testing. Or at least, of you do, remember to remove the pyc. :-P Or, reboot your Linux every night. Or well. I guess this could have been avoided in many ways. ;-) On Sun, Nov 16, 2008 at 17:43, Guilherme Polo wrote: > On Sun, Nov 16, 2008 at 11:55 AM, Tal Einat wrote: >> Steve Holden wrote: >>> Tal Einat wrote: >>>> It this desired behavior? >>>> >>>> At the very least the exception should be more detailed, perhaps to >>>> the point of suggesting the probable cause of the error (i.e. >>>> overriding the time module). >>>> >>> How is this different from any other case where you import a module with >>> a standard library name conflict, thereby confusing modules loaded later >>> standard library. Should we do the same for any error induced in such a way? >> >> The difference is that here the exception is generated directly in the >> C code so you don't get an intelligible traceback. The C code for >> datetime imports the time module via the Python C API. >> >> In other words, here a function from a module in the stdlib, datetime, >> barfs unexpectedly because I happen to have a file name time.py >> hanging around in some directory. There is no traceback and no >> intelligible exception message, just "AttributeError: time". I had to >> dig through datetime's C code to figure out which module was being >> imported via the Python C API, which turned out to be time. > > Just like Steve told you, this isn't different from other cases. But, > at least you get a message a bit more verbose in most cases, like: > > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'module' object has no attribute 'time' > > Then I went to look why this wasn't happening with datetime too, and I > found out that PyObject_CallMethod in abstract.c re sets the exception > message that would have been set by PyObject_GetAttr by now. Maybe > someone can tell me why it is doing that, for now a patch is attached > here (I didn't resist to not remove two trailing whitespaces). > >> >> This is rare enough that I've never had something like this happen to >> me in seven years of heavy Python programming. >> >> - Tal >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/ggpolo%40gmail.com >> > > > > -- > -- Guilherme H. Polo Goncalves > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/regebro%40gmail.com > > -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From victor.stinner at haypocalc.com Wed Dec 10 12:06:49 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 10 Dec 2008 12:06:49 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE Message-ID: <200812101206.49316.victor.stinner@haypocalc.com> Hi, I published a new version of my fault handler: it installs an handler for signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and continue the execution of your Python program. Example: try: call_evil_code() except MemoryError: print "A segfault? Haha, I don't care!" print "continue the execution" (yes, it's possible to continue the execution after a segmentation fault!) Handled errors: - Segmentation fault: * invalid memory read * invalid memory write * stack overflow (stack pointer outside the stack memory) - SIGFPE * division by zero * floating point error? Such errors may occurs from external libraries (written in C)... or Python builtin libraries (eg. imageop). The handler is now only used in Py_EvalFrameEx(), but it could be used anywhere. The patch uses sigsetjmp() in Py_EvalFrameEx() to set a "check point", and siglongjmp() in the signal handler to go back to the check point. It also uses a separated stack for the signal handler, because on stack overflow you can not use the stack (ex: unable to call any function!). With MAXDEPTH=100, the memory footprint is ~20 KB. If you call Py_EvalFrameEx() more than MAXDEPTH times, the handler will go back to the frame #MAXDEPTH on error (you loose the last entries in the Python traceback). sigsetjmp()/siglongjmp() should be available on many OS. I just know that it works perfectly on Linux. sigaltstack() is needed to recover after a stack overflow, but other errors can be catched without it. I didn't run any benchmark yet, but it would be interresting ;-) Changing MAXDEPTH constant may changes the speed with many recursive calls (eg. MAXDEPTH=1 only set a check for the first call to Py_EvalFrameEx()). I would appreciate a review, especially for the patch in Python/ceval.c. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From ncoghlan at gmail.com Wed Dec 10 12:49:47 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Dec 2008 21:49:47 +1000 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> Message-ID: <493FACDB.1030607@gmail.com> Antoine Pitrou wrote: > In all honesty, I admit I am annoyed by all the problems with the buffer API / > memoryview object, many of which are caused by its utterly bizarre design (and > the fact that the design team went missing in action after imposing such a > bizarre and complex design on us), and I'm reluctant to add yet another level of > byzantine complexity in order to solve those problems. It explains I may sound a > bit angry at times :-) > > If we really need to change things a lot to make them work, we should re-work > the buffer API from the ground up, make the Py_buffer struct a true PyObject > (that is, a true variable-length object so as to solve the shape and strides > allocation issue) and merge it with the current memoryview implementation. It > would make things both more simpler and more flexible. I don't see anything wrong with the PEP 3118 protocol. It does exactly what it is designed to do: allow the number crunching crowd to share large datasets between different libraries without copying things around in memory. Yes, the protocol is complicated, but that is because it is trying to handle a complicated problem. The memoryview implementation on the other hand is pretty broken. I do have a theory on how it ended up in such an unusable state, but I'm not particularly inclined to share it - this kind of thing can happen sometimes, and the important question now is how we fix it. As I see it, memoryview is actually trying to do two things, but the design for supporting the second of them doesn't appear to have been adequately thought through in the current implementation. The first use of a memoryview object is merely to allow access to the Py_buffer of a data store. This is pretty simple, and aside from currently getting len() wrong when itemsize > 1, memoryview isn't terrible at it. If we left memoryview at that it *would* just be a simple wrapper around a Py_buffer struct, and it's implementation wouldn't be difficult at all. Where it gets a bit more complicated is if we want to support slices (rather than just indexing) on memoryview objects. When you do that, the memoryview is no longer a simple wrapper around the Py_buffer of the underlying data store, because it isn't exposing the whole data store any more - it is only exposing part of it. Requesting access to only part of a data buffer is NOT part of the PEP 3118 API, and it doesn't need to be: it can be part of a separate object that adapts from the underlying data store to the desired subview. The object that is meant to be performing at least simple 1-dimensional cases of that adaptation is memoryview (or more to the point, memoryview slices), but it currently *sucks* at this because it relies too heavily on the info in the Py_buffer that it got from the underlying object. That Py_buffer describes the *whole* data store, but a memoryview slice may only be exposing part of it - so while the info in the Py_buffer is accurate for the underlying object, it is *not* accurate for the memoryview itself. Fixing that for the 1 dimensional case shouldn't actually be all that difficult - the memoryview just needs to maintain its own shape[0] entry that reflects the number of items in the view rather than the number in the underlying object. The multi-dimensional cases get pretty tricky though, since they will almost always end up dealing with non-contiguous data. The PEP 3118 protocol is up to handling the task, but the implementation of the index mapping to handle these multi-dimensional cases is highly non-trivial, and probably best left to third party libraries like numpy. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Wed Dec 10 12:54:01 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Dec 2008 21:54:01 +1000 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493F9834.8030100@canterbury.ac.nz> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493F057F.4070806@canterbury.ac.nz> <493F9834.8030100@canterbury.ac.nz> Message-ID: <493FADD9.4010109@gmail.com> Greg Ewing wrote: > In any case, I think it should be possible to implement > either version without the memoryview having to own > more than one Py_buffer and one set of shape/strides > at a time. Slicing the memoryview creates another > memoryview with its own Py_buffer and shape/strides. The important point is that the shape information in the Py_buffer filled in by the underlying object is the shape of *that* object. Except in the trivial case where the memoryview is exposing the entire underlying data buffer, the shape information in the Py_buffer has nothing to do with the shape of the memoryview object itself. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Wed Dec 10 13:58:20 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 10 Dec 2008 12:58:20 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493FACDB.1030607@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > I don't see anything wrong with the PEP 3118 protocol. Apart from the fact that: - it uses something (Py_buffer) which is not a PyObject and has totally different allocation/lifetime semantics (which makes it non-trivial to adapt to for anyone used to the rest of the C API) - it has unsolved issues like allocation of the underlying shape and strides members - it doesn't specify how to obtain e.g. a sub-buffer, or even duplicate an existing one (which seem to be rather fundamental actions to me) ... I agree there's nothing wrong with it! > That Py_buffer describes the *whole* data store, but a memoryview slice > may only be exposing part of it - so while the info in the Py_buffer is > accurate for the underlying object, it is *not* accurate for the > memoryview itself. And the problem here is that Py_buffer is/was (*) not flexible enough to allow easy modification in order to take a sub-buffer without some annoying problems. (*) my patch solves the one-dimensional case. People interested in the multi-dimensional case will have to do their homework themselves! Regards Antoine. From fwierzbicki at gmail.com Wed Dec 10 15:18:39 2008 From: fwierzbicki at gmail.com (Frank Wierzbicki) Date: Wed, 10 Dec 2008 09:18:39 -0500 Subject: [Python-Dev] Holding a Python Language Summit at PyCon In-Reply-To: References: <20081203153128.GA6161@amk-desktop.matrixgroup.net> <4dab5f760812041205i6ef37f8djf418c2e4d1f0e1a1@mail.gmail.com> <4dab5f760812041702o72107c57h1a6ce72a4eafe671@mail.gmail.com> <20081209025317.GA1080@amk.local> Message-ID: <4dab5f760812100618x6dbca5e5o80895aa5c4aa73a5@mail.gmail.com> On Mon, Dec 8, 2008 at 10:31 PM, Brett Cannon wrote: > On Mon, Dec 8, 2008 at 18:53, A.M. Kuchling wrote: >> On Sat, Dec 06, 2008 at 02:42:38PM -0800, Brett Cannon wrote: >>> No, I am saying I had told AMK I was interested in championing the >>> session. He chose you, and that's that. One less thing for me to worry >>> about. =) >> >> Brett, I actually think you'd be a good champion for the 11AM >> transition-planning session. > > OK, so I guess I do have one more thing to worry about. =) I'd be > happy to do that session. Sounds good, and I'm still happy to do the other session even with all of the heckling :) -Frank From lie.1296 at gmail.com Tue Dec 9 22:48:38 2008 From: lie.1296 at gmail.com (Lie Ryan) Date: Tue, 9 Dec 2008 21:48:38 +0000 (UTC) Subject: [Python-Dev] Floating-point implementations References: Message-ID: On Tue, 09 Dec 2008 12:15:53 -0500, Steve Holden wrote: > Is anyone aware of any implementations that use other than 64-bit > floating-point? I'd be particularly interested in any that use greater > precision than the usual 56-bit mantissa. Do modern 64-bit systems > implement anything wider than the normal double? > > regards > Steve Why don't we create a DecimalFloat datatype which is a variable-width floating point number. Decimal is variable precision fixed-point number, while the plain ol' float would be system dependent floating point. From oliphant.travis at ieee.org Wed Dec 10 16:44:06 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 10 Dec 2008 09:44:06 -0600 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: Message-ID: Antoine Pitrou wrote: > Hello, > > The Py_buffer struct has two pointers named `shape` and `strides`. Each points > to an array of Py_ssize_t values whose length is equal to the number of > dimensions of the buffer object. Unfortunately, the buffer protocol spec doesn't > explain how allocation of these arrays should be handled. > I'm coming in late to this discussion, so I apologize for being out of order. But, as Nick later clarifies, the PEP *does* specify how allocation of these arrays is handled. Specifically, it is the responsibility of the exporter to do it and keep them correct as long as the buffer is shared. I have not been able to keep up with the python-dev mailing lists since I have been working full time outside of academia. I apologize for the difficulty this may have caused. But, I have been available via email and am happy to respond to specific questions regarding the buffer protocol and its implementation. I will make some time during December to help clean up confusing issues. There are still pieces to implement as well (the enhancements to the struct module, for example), but I will not have time for this in the next 6 months because I would like to spend any time I can find on porting NumPy to use the new buffer protocol as part of getting NumPy ready for 3.0. -Travis From steve at holdenweb.com Wed Dec 10 16:46:55 2008 From: steve at holdenweb.com (Steve Holden) Date: Wed, 10 Dec 2008 10:46:55 -0500 Subject: [Python-Dev] Floating-point implementations In-Reply-To: References: Message-ID: Lie Ryan wrote: > On Tue, 09 Dec 2008 12:15:53 -0500, Steve Holden wrote: > >> Is anyone aware of any implementations that use other than 64-bit >> floating-point? I'd be particularly interested in any that use greater >> precision than the usual 56-bit mantissa. Do modern 64-bit systems >> implement anything wider than the normal double? >> >> regards >> Steve > > Why don't we create a DecimalFloat datatype which is a variable-width > floating point number. Decimal is variable precision fixed-point number, > while the plain ol' float would be system dependent floating point. > Because it's a large amount of work? For a limited return ... the implementation is bound to be hugely slow compared with hardware floating point, and as Martin already pointed out gmpy provides higher-precision arithmetic where required, and the Decimal module provides arbitrary-range fixed-point arithmetic. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From oliphant.travis at ieee.org Wed Dec 10 16:49:01 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 10 Dec 2008 09:49:01 -0600 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: Alexander Belopolsky wrote: > On Mon, Dec 8, 2008 at 6:25 PM, Antoine Pitrou wrote: > .. >>> Alexander's suggestion of going and looking at what the numpy folks have >>> done in this area is probably a good idea too. >> Well, I'm open to others doing this, but I won't do it myself. My interest is in >> fixing the most glaring bugs of the buffer API and memoryview object. The numpy >> folks are welcome to voice their opinions and give advice on python-dev. >> > > I did not follow numpy development for the last year or more, so I > won't qualify as "the numpy folks," but my understanding is that numpy > does exactly what Nick recommended: the viewed object owns shape and > strides just as it owns the data. The viewing object increases the > reference count of the viewed object and thus assures that data, shape > and strides don't go away prematurely. > > I am copying Travis, the author of the PEP 3118, hoping that he would > step in on behalf of "the numpy folks." I appreciate the copy, as I mentioned I have not had time to follow python-dev in detail this year, but I'm glad to help maintain the buffer protocol and share any information I can. I think Nick understands the situation: the exporter is responsible for allocating and freeing shape, strides, and suboffsets memory (as well as formats, and buf memory). How it does this is not specified and open for interpretation by the objects. In the standard library there is nothing that needs anything complicated and I'm comfortable with what I wrote previously to support the objects in the standard library. There is a length bug in the memoryview implementation, but that is a separate issue and being handled. NumPy will have to handle sharing shape and strides information and will serve as a reference implementation when that support is added. -Travis From dickinsm at gmail.com Wed Dec 10 16:51:29 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Wed, 10 Dec 2008 15:51:29 +0000 Subject: [Python-Dev] Floating-point implementations In-Reply-To: References: Message-ID: <5c6f2a5d0812100751w47c7eeefqdb33968d067e384e@mail.gmail.com> On Tue, Dec 9, 2008 at 9:48 PM, Lie Ryan wrote: > Why don't we create a DecimalFloat datatype which is a variable-width > floating point number. Decimal is variable precision fixed-point number, > while the plain ol' float would be system dependent floating point. Decimal is *already* floating-point. Its handling of exponents and significant zeros mean that it can do a pretty good job of imitating fixed-point as well, but it's still at root a floating-point type. Mark From oliphant.travis at ieee.org Wed Dec 10 16:54:01 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 10 Dec 2008 09:54:01 -0600 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> Message-ID: Antoine Pitrou wrote: > Alexander Belopolsky gmail.com> writes: >> I did not follow numpy development for the last year or more, so I >> won't qualify as "the numpy folks," but my understanding is that numpy >> does exactly what Nick recommended: the viewed object owns shape and >> strides just as it owns the data. The viewing object increases the >> reference count of the viewed object and thus assures that data, shape >> and strides don't go away prematurely. > > That doesn't work if e.g. you take a slice of a memoryview object, since the > shape changes in the process. > See http://bugs.python.org/issue4580 > I think there was some confusion about how to support slicing with memory view objects. I remember thinking about it but not getting to the code to write it. The memory object is both an exporter and consumer of the buffer protocol. It can have it's own semantics about storing shape and strides information separate from the buffer protocol. The memory view object needs some way to translate the information it gets from the underlying object to the consumer of the information. My thinking is that the memory view object itself will allocate shape and strides information as it needs it. -Travis From oliphant.travis at ieee.org Wed Dec 10 17:12:10 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 10 Dec 2008 10:12:10 -0600 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> Message-ID: Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> For the slicing problem in particular, memoryview is currently trying to >> get away with only one Py_buffer object when it needs TWO. > > Why should it need two? Why couldn't the embedded Py_buffer fullfill all the > needs of the memoryview object? If the memoryview can't be a relatively thin > object-oriented wrapper around a Py_buffer, then this all screams failure to me. > The advice to look at NumPy is good because memoryview is modeled after NumPy -- and never completed. When a slice view is made, a new memoryview object is created with a Py_buffer structure that needs to allocate it's own shape and strides (or something that will allow correct shape and strides to be reported to any consumer). In this way, there are two Py_buffer structures. I do not remember implementing slicing for memoryview objects and it looks like the problem is there. > ---- > > In all honesty, I admit I am annoyed by all the problems with the buffer API / > memoryview object, many of which are caused by its utterly bizarre design (and > the fact that the design team went missing in action after imposing such a > bizarre and complex design on us), and I'm reluctant to add yet another level of > byzantine complexity in order to solve those problems. It explains I may sound a > bit angry at times :-) I understand your frustration, but I've been here (just not able to follow python-dev), and I've tried to respond to issues that came to my attention. I did not have time to complete the memoryview implementation, but that does not meen the buffer API is "bizarre". Yes, the cobbled together memoryview object itself may be "bizarre", but that is sometimes the reality of volunteer work. Just ignore the memoryview object if it does not meet your needs. Please let me know what other problems exist. > > If we really need to change things a lot to make them work, we should re-work > the buffer API from the ground up, make the Py_buffer struct a true PyObject > (that is, a true variable-length object so as to solve the shape and strides > allocation issue) and merge it with the current memoryview implementation. It > would make things both more simpler and more flexible. > The only place there is a shape/strides allocation issue is with the memoryview object itself. There is not an issue as far as I can see with the buffer protocol itself. I'm glad you are trying to help clean up the memoryview implementation. I welcome the eyes and the keystrokes. Are you familiar at all with NumPy? That may help you understand what you currently consider to be "utterly bizarre" Best regards, -Travis From oliphant.travis at ieee.org Wed Dec 10 17:30:09 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 10 Dec 2008 10:30:09 -0600 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493F057F.4070806@canterbury.ac.nz> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493F057F.4070806@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Antoine Pitrou wrote: > >> Why should it need two? Why couldn't the embedded Py_buffer fullfill >> all the >> needs of the memoryview object? > > Two things here: > > 1) The memoryview should *not* be holding onto a Py_buffer > in between calls to its getitem and setitem methods. It > should request one from the underlying object when needed > and release it again as soon as possible. > This is actually a different design than the PEP calls for. From the PEP: This is functionally similar to the current buffer object except a reference to base is kept and the memory view is not re-grabbed. Thus, this memory view object holds on to the memory of base until it is deleted. I'm open to this changing, but it is the current PEP. > 2) The "second" Py_buffer referred to above only needs to > be materialized when someone makes a GetBuffer request on > the memoryview itself. It's not needed for Python getitem > and setitem calls. (The implementation might choose to > implement these by creating a temporary Py_buffer, but > again, it would only last as long as the call.) The memoryview object will need to store some information for re-calculating strides, shape, and sub-offsets for consumers. > >> If the memoryview can't be a relatively thin >> object-oriented wrapper around a Py_buffer, then this all screams >> failure to me. > > It shouldn't be a wrapper around a Py_buffer, it should be a > wrapper around the buffer *interface* of the underlying object. > This is a different object than what was proposed, but I'm not opposed to it. > It sounds to me like whoever wrote the memoryview implementation > didn't understand how the buffer interface is meant to be used. > That doesn't mean there's anything wrong with the buffer interface. > > I have some doubts myself about whether it needs to be as > complicated as it is, but I think the basic idea is sound: > that Py_buffer objects are ephemeral, to be obtained when > needed and not kept for any longer than necessary. > I'm all for simplifying as much as possible. There are some things I understand very well (like how strides and shape information can be shared with views), but others that I'm trying to understand better (like whether holding on to a view or re-grabbing the view is better). I think I'm leaning toward the re-grabbing concept. I'm all for improving the memoryview object, but let's not confuse that effort with the buffer API implementation. I do not think we need to worry about changes to the memoryview object, because I doubt anything outside of the standard library is using it yet. -Travis From oliphant.travis at ieee.org Wed Dec 10 17:34:09 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 10 Dec 2008 10:34:09 -0600 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493FACDB.1030607@gmail.com> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493FACDB.1030607@gmail.com> Message-ID: Nick Coghlan wrote: > Antoine Pitrou wrote: >> In all honesty, I admit I am annoyed by all the problems with the buffer API / >> memoryview object, many of which are caused by its utterly bizarre design (and >> the fact that the design team went missing in action after imposing such a >> bizarre and complex design on us), and I'm reluctant to add yet another level of >> byzantine complexity in order to solve those problems. It explains I may sound a >> bit angry at times :-) >> >> If we really need to change things a lot to make them work, we should re-work >> the buffer API from the ground up, make the Py_buffer struct a true PyObject >> (that is, a true variable-length object so as to solve the shape and strides >> allocation issue) and merge it with the current memoryview implementation. It >> would make things both more simpler and more flexible. > > I don't see anything wrong with the PEP 3118 protocol. It does exactly > what it is designed to do: allow the number crunching crowd to share > large datasets between different libraries without copying things around > in memory. Yes, the protocol is complicated, but that is because it is > trying to handle a complicated problem. > > The memoryview implementation on the other hand is pretty broken. I do > have a theory on how it ended up in such an unusable state, but I'm not > particularly inclined to share it - this kind of thing can happen > sometimes, and the important question now is how we fix it. > Thank you Nick. This is a correct assessment of the situation. I'd like to help improve memoryview as I can. It does need thought about what you want memoryview to be. I wanted memoryview to be able to be sliced and diced (much like NumPy arrays). But, I only was able to get around to implementing the (simple view of Py_buffer struct). -Travis From oliphant.travis at ieee.org Wed Dec 10 17:37:22 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 10 Dec 2008 10:37:22 -0600 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493FACDB.1030607@gmail.com> Message-ID: Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> I don't see anything wrong with the PEP 3118 protocol. > > Apart from the fact that: > - it uses something (Py_buffer) which is not a PyObject and has totally > different allocation/lifetime semantics (which makes it non-trivial to adapt to > for anyone used to the rest of the C API) * this is a non-issue. The Py_buffer struct is just a place-holder for a bunch of variables. It could be a Python-object but that was seen as unnecessary. > - it has unsolved issues like allocation of the underlying shape and strides > members * this is false. It does specify how this is handled. > - it doesn't specify how to obtain e.g. a sub-buffer, or even duplicate an > existing one (which seem to be rather fundamental actions to me) * this is not part of the PEP. Whether it's a deficiency or not is open to interpretation. > > ... I agree there's nothing wrong with it! I'm glad you agree. > >> That Py_buffer describes the *whole* data store, but a memoryview slice >> may only be exposing part of it - so while the info in the Py_buffer is >> accurate for the underlying object, it is *not* accurate for the >> memoryview itself. > > And the problem here is that Py_buffer is/was (*) not flexible enough to allow > easy modification in order to take a sub-buffer without some annoying problems. > You are confusing the intent of the memoryview with the Py_buffer struct. -Travis From oliphant.travis at ieee.org Wed Dec 10 17:39:45 2008 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed, 10 Dec 2008 10:39:45 -0600 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493EF1D4.5090803@canterbury.ac.nz> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E3569.6010408@gmail.com> <493EF1D4.5090803@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Nick Coghlan wrote: >> Maintaining a PyDict instance to map from view pointers to shapes >> and strides info doesn't strike me as a "complex scheme" though. > > I don't see why a given buffer provider should ever need > more than one set of shape/strides arrays at a time. It > can allocate them on creation, reallocate them as needed > if the shape of its internal data changes, and deallocate > them when it goes away. > I agree. NumPy has a single shape/strides array. The intent was to share this through the buffer interface. > If you are creating view objects that present slices or > some other alternative perspective, then the view object > itself is a buffer provider and should maintain shape/stride > arrays for its particular view of the underlying object. Yes, that is correct. -Travis From foom at fuhm.net Wed Dec 10 18:49:50 2008 From: foom at fuhm.net (James Y Knight) Date: Wed, 10 Dec 2008 12:49:50 -0500 Subject: [Python-Dev] datetime.date.today() raises "AttributeError: time" In-Reply-To: <319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com> References: <7afdee2f0811160500g44421c26o64765d2acf91a712@mail.gmail.com> <7afdee2f0811160555y3cb71afp460e267c29a96827@mail.gmail.com> <319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com> Message-ID: <6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net> On Dec 10, 2008, at 5:55 AM, Lennart Regebro wrote: > Turns out, I created an empty time.py in /tmp, just to see the error > message. By buildout will when creating eggs from checked out modules, > copy them to a directory under /tmp, and evidently run python from > /tmp to create the eggs. So that process finds the time.pyc, created > from the empty time.py, which I hadn't deleted, and breaks! Sounds like a security hole in zc.buildout. Imagine someone *else* made a time.py in /tmp... James From regebro at gmail.com Wed Dec 10 19:05:47 2008 From: regebro at gmail.com (Lennart Regebro) Date: Wed, 10 Dec 2008 19:05:47 +0100 Subject: [Python-Dev] datetime.date.today() raises "AttributeError: time" In-Reply-To: <6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net> References: <7afdee2f0811160500g44421c26o64765d2acf91a712@mail.gmail.com> <7afdee2f0811160555y3cb71afp460e267c29a96827@mail.gmail.com> <319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com> <6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net> Message-ID: <319e029f0812101005l898243ta05152f09fce92fc@mail.gmail.com> On Wed, Dec 10, 2008 at 18:49, James Y Knight wrote: > > On Dec 10, 2008, at 5:55 AM, Lennart Regebro wrote: > >> Turns out, I created an empty time.py in /tmp, just to see the error >> message. By buildout will when creating eggs from checked out modules, >> copy them to a directory under /tmp, and evidently run python from >> /tmp to create the eggs. So that process finds the time.pyc, created >> from the empty time.py, which I hadn't deleted, and breaks! > > Sounds like a security hole in zc.buildout. Imagine someone *else* made a > time.py in /tmp... Yup. Adam Olsen also reminded me of this, and I have filed a bug report. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From schmir at gmail.com Wed Dec 10 19:19:19 2008 From: schmir at gmail.com (Ralf Schmitt) Date: Wed, 10 Dec 2008 19:19:19 +0100 Subject: [Python-Dev] datetime.date.today() raises "AttributeError: time" In-Reply-To: <6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net> References: <7afdee2f0811160500g44421c26o64765d2acf91a712@mail.gmail.com> <7afdee2f0811160555y3cb71afp460e267c29a96827@mail.gmail.com> <319e029f0812100255r54e019e4x8c3e74c7ba96ae4c@mail.gmail.com> <6589C688-ED96-4980-AFF7-671F3A9268F3@fuhm.net> Message-ID: <932f8baf0812101019p6c08798du34c3038d1e4cd83f@mail.gmail.com> On Wed, Dec 10, 2008 at 6:49 PM, James Y Knight wrote: > > On Dec 10, 2008, at 5:55 AM, Lennart Regebro wrote: > >> Turns out, I created an empty time.py in /tmp, just to see the error >> message. By buildout will when creating eggs from checked out modules, >> copy them to a directory under /tmp, and evidently run python from >> /tmp to create the eggs. So that process finds the time.pyc, created >> from the empty time.py, which I hadn't deleted, and breaks! > > Sounds like a security hole in zc.buildout. Imagine someone *else* made a > time.py in /tmp... > the current working directory is also added to sys.path if PYTHONPATH contains an empty element. might be the case here... - Ralf From rhamph at gmail.com Wed Dec 10 19:14:30 2008 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 10 Dec 2008 11:14:30 -0700 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812101206.49316.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> Message-ID: On Wed, Dec 10, 2008 at 4:06 AM, Victor Stinner wrote: > Hi, > > I published a new version of my fault handler: it installs an handler for > signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and > continue the execution of your Python program. Example: This will of course leave the program in an undefined state. It is very likely to crash again, emit garbage, hang, or otherwise be useless. sigsetjmp() is only safe for code explicitly designed for it. That will never be the case for CPython, let alone all the arbitrary libraries that may be used with it. -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Wed Dec 10 19:31:45 2008 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 10 Dec 2008 11:31:45 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812101139.37301.eckhardt@satorlaser.com> References: <200812091931.29905.eckhardt@satorlaser.com> <200812101139.37301.eckhardt@satorlaser.com> Message-ID: On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt wrote: > On Tuesday 09 December 2008, Adam Olsen wrote: >> The only thing separating this from a bikeshed discussion is that a >> bikeshed has many equally good solutions, while we have no good >> solutions. Instead we're trying to find the least-bad one. The >> unicode/bytes separation is pretty close to that. Adding a warning >> gets even closer. Adding magic makes it worse. > > Well, I see two cases: > 1. Converting from an uncertain representation to a known one. > 2. Converting from a known representation to a known one. Not quite: 1. Using a garbage file name locally (within a single process, not talking to any libs) 2. Using a unicode filename everywhere (libs, saved to config files, displayed to the user, etc.) Note that if you have a GUI doing the former, all you technically need is a placeholder like "". You might try to extract some ASCII out of it, but that's just a minor bonus. On linux the bytes/unicode separation is perfect for this. You decide which approach you're using and use it consistently. If you mess up (mixing bytes and unicode) you'll consistently get an error. We currently don't follow this model on windows, so a garbage file name gets passed around as if it was unicode, but fails when passed to a lib, saved to a config file, is displayed to a user, etc. (Depending on the API, as many won't validate either.) -- Adam Olsen, aka Rhamphoryncus From victor.stinner at haypocalc.com Wed Dec 10 19:37:16 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 10 Dec 2008 19:37:16 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> Message-ID: <200812101937.16467.victor.stinner@haypocalc.com> Oh, I forgot the issue URL: http://bugs.python.org/issue3999 I also attached an example of catching segfaults. > > I published a new version of my fault handler: it installs an handler for > > signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and > > continue the execution of your Python program. Example: > > This will of course leave the program in an undefined state. It is > very likely to crash again, emit garbage, hang, or otherwise be > useless. Recover after a segfault is dangerous, but my first goal was to get the Python backtrace instead just one line: "Segmentation fault". It helps a lot for debug! I didn't try on real world application, but with a small script the program continues its execution without any problem. But yes, there is a big risk of: - leak memory - deadlock - context problem, eg. for the GIL, I call PyGILState_Ensure() - etc. I choosed the exceptions MemoryError and ArithmeticError, but we could use specific exceptions based on BaseException instead of Exception to avoid catching them with "except Exception: ...". -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From musiccomposition at gmail.com Wed Dec 10 19:42:56 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Wed, 10 Dec 2008 12:42:56 -0600 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812101937.16467.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <200812101937.16467.victor.stinner@haypocalc.com> Message-ID: <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> On Wed, Dec 10, 2008 at 12:37 PM, Victor Stinner wrote: > Oh, I forgot the issue URL: > http://bugs.python.org/issue3999 > > I also attached an example of catching segfaults. > >> > I published a new version of my fault handler: it installs an handler for >> > signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and >> > continue the execution of your Python program. Example: >> >> This will of course leave the program in an undefined state. It is >> very likely to crash again, emit garbage, hang, or otherwise be >> useless. > > Recover after a segfault is dangerous, but my first goal was to get the Python > backtrace instead just one line: "Segmentation fault". It helps a lot for > debug! Exactly! That's why it doesn't belong in the Python core. We can't guarantee anything about its affects or encourage it. > > I didn't try on real world application, but with a small script the program > continues its execution without any problem. But as you say, it would be used on real world programs! -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From rhamph at gmail.com Wed Dec 10 19:59:09 2008 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 10 Dec 2008 11:59:09 -0700 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812101937.16467.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <200812101937.16467.victor.stinner@haypocalc.com> Message-ID: On Wed, Dec 10, 2008 at 11:37 AM, Victor Stinner wrote: > Oh, I forgot the issue URL: > http://bugs.python.org/issue3999 > > I also attached an example of catching segfaults. > >> > I published a new version of my fault handler: it installs an handler for >> > signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and >> > continue the execution of your Python program. Example: >> >> This will of course leave the program in an undefined state. It is >> very likely to crash again, emit garbage, hang, or otherwise be >> useless. > > Recover after a segfault is dangerous, but my first goal was to get the Python > backtrace instead just one line: "Segmentation fault". It helps a lot for > debug! It's possible to print the Python stack purely from C, without invoking any Python code. Even better, you could print the C stack while you're at it! Doing that in a signal handler, and then killing the process, could be seriously considered. Take a look at http://www.linuxjournal.com/article/6391 . You'll probably need #ifdef's to only use it on certain supported platforms, and probably disable it by default anyway (configure option? Not sure). Still, it'd be useful to have it there. -- Adam Olsen, aka Rhamphoryncus From tjreedy at udel.edu Wed Dec 10 20:04:00 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 10 Dec 2008 14:04:00 -0500 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <200812101937.16467.victor.stinner@haypocalc.com> <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> Message-ID: Benjamin Peterson wrote: > On Wed, Dec 10, 2008 at 12:37 PM, Victor Stinner >>> This will of course leave the program in an undefined state. It is >>> very likely to crash again, emit garbage, hang, or otherwise be >>> useless. >> Recover after a segfault is dangerous, but my first goal was to get the Python >> backtrace instead just one line: "Segmentation fault". It helps a lot for >> debug! > > Exactly! That's why it doesn't belong in the Python core. We can't > guarantee anything about its affects or encourage it. Would it be safe to catch SIGSEGV, output a trace, and then exit? IE, make the 'first goal' the only goal? From bjourne at gmail.com Wed Dec 10 20:22:13 2008 From: bjourne at gmail.com (=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=) Date: Wed, 10 Dec 2008 20:22:13 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> Message-ID: <740c3aec0812101122g75812be8l2877c6d5b5ee896d@mail.gmail.com> One thing i think it would be useful for in the real world is for unittesting extension modules. You cant profitably write unit tests for segfaults because that breaks the test harness. In situations like those, recovering would be likely (caveat emptor of course). 2008/12/10, Adam Olsen : > On Wed, Dec 10, 2008 at 4:06 AM, Victor Stinner > wrote: >> Hi, >> >> I published a new version of my fault handler: it installs an handler for >> signals SIGFPE and SIGSEGV. Using it, it's possible to catch them and >> continue the execution of your Python program. Example: > > This will of course leave the program in an undefined state. It is > very likely to crash again, emit garbage, hang, or otherwise be > useless. > > sigsetjmp() is only safe for code explicitly designed for it. That > will never be the case for CPython, let alone all the arbitrary > libraries that may be used with it. > > > -- > Adam Olsen, aka Rhamphoryncus > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/bjourne%40gmail.com > -- mvh Bj?rn From rhamph at gmail.com Wed Dec 10 21:05:17 2008 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 10 Dec 2008 13:05:17 -0700 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <740c3aec0812101122g75812be8l2877c6d5b5ee896d@mail.gmail.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <740c3aec0812101122g75812be8l2877c6d5b5ee896d@mail.gmail.com> Message-ID: On Wed, Dec 10, 2008 at 12:22 PM, BJ?rn Lindqvist wrote: > One thing i think it would be useful for in the real world is for > unittesting extension modules. You cant profitably write unit tests > for segfaults because that breaks the test harness. In situations like > those, recovering would be likely (caveat emptor of course). The only safe option there is a subprocess. -- Adam Olsen, aka Rhamphoryncus From mal at egenix.com Wed Dec 10 22:09:00 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 10 Dec 2008 22:09:00 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <740c3aec0812101122g75812be8l2877c6d5b5ee896d@mail.gmail.com> Message-ID: <49402FEC.7070303@egenix.com> On 2008-12-10 21:05, Adam Olsen wrote: > On Wed, Dec 10, 2008 at 12:22 PM, BJ?rn Lindqvist wrote: >> One thing i think it would be useful for in the real world is for >> unittesting extension modules. You cant profitably write unit tests >> for segfaults because that breaks the test harness. In situations like >> those, recovering would be likely (caveat emptor of course). > > The only safe option there is a subprocess. True, but that still makes it a little difficult to report the errors found in the module. mxTools has an optional safecall() function that allows calling functions which potentially segfault and still returns control back to the calling application: http://www.egenix.com/products/python/mxBase/mxTools/ It's not (yet) documented, but fairly straight forward to use once you've enabled it in egenix_mx_base.py: result = mx.Tools.safecall(callable, args, kws) Using such a function is handy in situations where you have a multi-process application setup that sometimes needs to call out to external libraries of varying quality - a situation that's not uncommon in real-life situations. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 10 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From martin at v.loewis.de Thu Dec 11 00:12:43 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 11 Dec 2008 00:12:43 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812101206.49316.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> Message-ID: <49404CEB.8040900@v.loewis.de> > I would appreciate a review, especially for the patch in Python/ceval.c. In this specific case, it is not clear for what objective you want such review. For inclusion into Python? Several people already said (essentially) that: -1. I don't think such code should be added to the Python core, no matter how smart or correct it is. Regards, Martin From greg.ewing at canterbury.ac.nz Thu Dec 11 00:56:04 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Dec 2008 12:56:04 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> Message-ID: <49405714.6030108@canterbury.ac.nz> Travis Oliphant wrote: > When a slice view is made, a new memoryview object is created with a > Py_buffer structure that needs to allocate it's own shape and strides > (or something that will allow correct shape and strides to be reported > to any consumer). In this way, there are two Py_buffer structures. To be precise, the important thing is for the memoryview to allocate its own shape and strides. It's not strictly necessary to keep them internally in a Py_buffer struct, although that may be a convenient way to do it. -- Greg From jyasskin at gmail.com Thu Dec 11 01:12:02 2008 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Wed, 10 Dec 2008 16:12:02 -0800 Subject: [Python-Dev] Merging flow In-Reply-To: References: Message-ID: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> Was there ever a conclusion to this? I need to merge the patches associated with issue 4597 from trunk to all the maintenance branches, and I'd like to avoid messing anyone up if possible. If I don't hear back, I'll plan to svnmerge directly from trunk to each of the branches, and then block my merge to py3k from being merged again to release30-maint. Thanks, Jeffrey On Thu, Dec 4, 2008 at 7:12 AM, Christian Heimes wrote: > Several people have asked about the patch and merge flow. Now that Python > 3.0 is out it's a bit more complicated. > > Flow diagram > ------------ > > trunk ---> release26-maint > \-> py3k ---> release30-maint > > > Patches for all versions of Python should land in the trunk. They are then > merged into release26-maint and py3k branches. Changes for Python 3.0 are > merged via the py3k branch. > > Christian From alexander.belopolsky at gmail.com Thu Dec 11 01:21:06 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 10 Dec 2008 19:21:06 -0500 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <49404CEB.8040900@v.loewis.de> References: <200812101206.49316.victor.stinner@haypocalc.com> <49404CEB.8040900@v.loewis.de> Message-ID: On Wed, Dec 10, 2008 at 6:12 PM, "Martin v. L?wis" wrote: >> I would appreciate a review, especially for the patch in Python/ceval.c. > > In this specific case, it is not clear for what objective you want such > review. For inclusion into Python? > Even if it does not result in an inclusion into Python, I personally would be quite interested in following this thread if discussion of Victor's patch continues. It may quite possibly yield some improvements to python development tools (core and libraries' development). Graceful handling of hard errors is an unsolved problem in Python and it has become more important since ctypes made it to the standard library and therefore it has become possible to easily trigger a hard error from pure python code. > Several people already said (essentially) that: -1. I don't think such > code should be added to the Python core, no matter how smart or correct > it is. > Looking up the thread, I don't see anyone taking such an extreme position: never recover from SEGV even if it can be done 100% correctly. The sentiment that I see and the one that I share is that it is extremely difficult (and maybe impossible) to do correctly. However, if someone comes up with a smart solution, I would be very much interested to see it. While by the time you get a SIGSEGV, you process is likely to be beyond recovery, I don't think the same applies to SIGFPE. It may also be possible to get rid of the arbitrary recursion limit on Linux (I've heard this problem is solved on Windows) by being smart about handling SIGSEGV. Finally, providing some diagnostic before exiting on hard errors is not without precedent: I believe R has such a feature. It may be worthwhile to compare Victor's approach to what is done in R. It may, however, be better to move further discussion to the tracker (I understand that the patch is at ). From greg.ewing at canterbury.ac.nz Thu Dec 11 01:21:48 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Dec 2008 13:21:48 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: <493FACDB.1030607@gmail.com> References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493FACDB.1030607@gmail.com> Message-ID: <49405D1C.60207@canterbury.ac.nz> Nick Coghlan wrote: > The multi-dimensional cases get pretty tricky though, since they will > almost always end up dealing with non-contiguous data. The PEP 3118 > protocol is up to handling the task, but the implementation of the index > mapping to handle these multi-dimensional cases is highly non-trivial, > and probably best left to third party libraries like numpy. I'm wondering whether there should be some kind of utility function provided with the buffer API for doing this. It would take the shape/strides info from a Py_buffer together with a set of slicing parameters, and create you another set of shape/strides info describing the slice. It seems sensible to put the effort into doing this correctly once, rather than leave everyone implementing a memoryview-like object to come up with their own half-working and/or broken implementation. -- Greg From greg.ewing at canterbury.ac.nz Thu Dec 11 01:21:56 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Dec 2008 13:21:56 +1300 Subject: [Python-Dev] Allocation of shape and strides fields in Py_buffer In-Reply-To: References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493FACDB.1030607@gmail.com> Message-ID: <49405D24.3010607@canterbury.ac.nz> Antoine Pitrou wrote: > - it uses something (Py_buffer) which is not a PyObject and has totally > different allocation/lifetime semantics This was a deliberate decision -- in fact I argued for it myself. The buffer interface is meant to be a minimal-overhead way for C code to get at the underlying data. Requiring allocation of a PyObject would be too expensive. The way to think about the Py_buffer struct is not as an object in its own right, but just a place to put some output parameters from the GetBuffer call. The lifetime of the information pointed to by the Py_buffer is the same as the lifetime of the underlying object, and that object is responsible for managing it. > - it doesn't specify how to obtain e.g. a sub-buffer, or even duplicate an > existing one (which seem to be rather fundamental actions to me) I don't think they're as fundamental as all that. But some utilities for doing things like this could be useful, as I mentioned in another post. -- Greg From solipsis at pitrou.net Thu Dec 11 01:35:05 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 11 Dec 2008 00:35:05 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Allocation_of_shape_and_strides_fields_in_?= =?utf-8?q?Py=5Fbuffer?= References: <493D87BD.90106@gmail.com> <493D94CD.5040209@gmail.com> <493E65B1.5020004@gmail.com> <1228832876.18857.11.camel@localhost> <493EE1A7.6050405@gmail.com> <493FACDB.1030607@gmail.com> <49405D24.3010607@canterbury.ac.nz> Message-ID: Greg Ewing canterbury.ac.nz> writes: > > This was a deliberate decision -- in fact I argued for it myself. > The buffer interface is meant to be a minimal-overhead way for > C code to get at the underlying data. Requiring allocation of > a PyObject would be too expensive. Tuples are used everywhere throughout the interpreter and yet they are proper PyObjects. Even simple integers are often wrapped into PyLong objects (see the getitem/setitem protocol in Py3k). I doubt Py_buffers are more critical for performance than tuples and integers are. From martin at v.loewis.de Thu Dec 11 01:56:56 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 11 Dec 2008 01:56:56 +0100 Subject: [Python-Dev] Merging flow In-Reply-To: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> References: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> Message-ID: <49406558.7020005@v.loewis.de> Jeffrey Yasskin wrote: > Was there ever a conclusion to this? I need to merge the patches > associated with issue 4597 from trunk to all the maintenance branches, > and I'd like to avoid messing anyone up if possible. If I don't hear > back, I'll plan to svnmerge directly from trunk to each of the > branches, and then block my merge to py3k from being merged again to > release30-maint. No - you should merge from the py3k branch to the release30-maint branch. Regards, Martin From rhamph at gmail.com Thu Dec 11 02:01:45 2008 From: rhamph at gmail.com (Adam Olsen) Date: Wed, 10 Dec 2008 18:01:45 -0700 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <49404CEB.8040900@v.loewis.de> Message-ID: On Wed, Dec 10, 2008 at 5:21 PM, Alexander Belopolsky wrote: > On Wed, Dec 10, 2008 at 6:12 PM, "Martin v. L?wis" wrote: >> Several people already said (essentially) that: -1. I don't think such >> code should be added to the Python core, no matter how smart or correct >> it is. >> > > Looking up the thread, I don't see anyone taking such an extreme > position: never recover from SEGV even if it can be done 100% > correctly. The sentiment that I see and the one that I share is that > it is extremely difficult (and maybe impossible) to do correctly. > However, if someone comes up with a smart solution, I would be very > much interested to see it. It is impossible to do in general, and I am -1 on any misguided attempts to do so. > While by the time you get a SIGSEGV, you process is likely to be > beyond recovery, I don't think the same applies to SIGFPE. No, it's as much about the context as it is the error. We could write our own floating point code that can recover from SIGFPE (which isn't portable, but still mostly doable), but enabling it for arbitrary third-party libraries is completely unsafe. Printing a stack trace and then aborting would be possible and useful though. > It may > also be possible to get rid of the arbitrary recursion limit on Linux > (I've heard this problem is solved on Windows) by being smart about > handling SIGSEGV. If we could calculate how much stack is left we'd have a much more robust way of doing recursion limits. I suppose this could be done by reading a byte from each page with a temporary SIGSEGV handler installed, but I'm not convinced you can't ask the platform directly somehow. I'd also be considered about thread-safety. > Finally, providing some diagnostic before exiting on hard errors is > not without precedent: I believe R has such a feature. It may be > worthwhile to compare Victor's approach to what is done in R. > > It may, however, be better to move further discussion to the tracker > (I understand that the patch is at > ). -- Adam Olsen, aka Rhamphoryncus From alexander.belopolsky at gmail.com Thu Dec 11 02:22:23 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 10 Dec 2008 20:22:23 -0500 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <49404CEB.8040900@v.loewis.de> Message-ID: On Wed, Dec 10, 2008 at 8:01 PM, Adam Olsen wrote: .. > It is impossible to do in general, and I am -1 on any misguided > attempts to do so. > I agree, recovering from segfaults caused by buggy third party C modules is a losing proposition, but for a limited number of conditions that can be triggered from python code running on a non-buggy interpreter (hopefully ctypes included, but that would be hard), converting signals into exceptions may be possible. .. > Printing a stack trace and then aborting would be possible and useful though. > Even a simple dialog: Python have encountered a segfault, would you like to dump core? y/n in the interactive session will be quite useful. From hodgestar+pythondev at gmail.com Thu Dec 11 08:21:32 2008 From: hodgestar+pythondev at gmail.com (Simon Cross) Date: Thu, 11 Dec 2008 09:21:32 +0200 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812101937.16467.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <200812101937.16467.victor.stinner@haypocalc.com> Message-ID: On Wed, Dec 10, 2008 at 8:37 PM, Victor Stinner wrote: > Recover after a segfault is dangerous, but my first goal was to get the Python > backtrace instead just one line: "Segmentation fault". It helps a lot for > debug! This would be extremely useful. I've had PyGTK segfault on me a number of times in an app I'm writing and I keep meaning to try get to the bottom of the issue but it happens infrequently and somehow I never get around to it. Some indictation of what Python was executing when the segfault occurred would help narrow now the possibilities rapidly. Schiavo Simon From eckhardt at satorlaser.com Thu Dec 11 10:19:16 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Thu, 11 Dec 2008 10:19:16 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812101139.37301.eckhardt@satorlaser.com> Message-ID: <200812111019.16950.eckhardt@satorlaser.com> On Wednesday 10 December 2008, Adam Olsen wrote: > On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt > > wrote: > > On Tuesday 09 December 2008, Adam Olsen wrote: > >> The only thing separating this from a bikeshed discussion is that a > >> bikeshed has many equally good solutions, while we have no good > >> solutions. Instead we're trying to find the least-bad one. The > >> unicode/bytes separation is pretty close to that. Adding a warning > >> gets even closer. Adding magic makes it worse. > > > > Well, I see two cases: > > 1. Converting from an uncertain representation to a known one. > > 2. Converting from a known representation to a known one. > > Not quite: > 1. Using a garbage file name locally (within a single process, not > talking to any libs) > 2. Using a unicode filename everywhere (libs, saved to config files, > displayed to the user, etc.) I think there is some misunderstanding. I was referring to conversions and whether it is good to perform them implicitly. For that, I saw the above two cases. > On linux the bytes/unicode separation is perfect for this. You decide > which approach you're using and use it consistently. If you mess up > (mixing bytes and unicode) you'll consistently get an error. > > We currently don't follow this model on windows, so a garbage file > name gets passed around as if it was unicode, but fails when passed to > a lib, saved to a config file, is displayed to a user, etc. I'm not sure I agree with this. Facts I know are: 1. On POSIX systems, there is no reliable encoding for filenames while the system APIs use char/byte strings. 2. On MS Windows, the encoding for filenames is Unicode/UTF-16. Returning Unicode strings from readdir() is wrong because it can't handle the case 1 above. Returning byte strings is wrong because it can't handle case 2 above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, worst case, to the locale-dependent MBCS. Returning something different depending on the system us also broken because that would make Python code that uses this function and assumes a certain type unportable. Note that this doesn't get much better if you provide a separate readdirb() API or one that simply returns a byte string or Unicode string depending on its argument. It just shifts the brokenness from readdir() to the code that uses it, unless this code makes a distinction between the target systems. Since way too many programmers are not aware of the problem, they will not handle these systems differently, so code will become non-portable. What I'd just like some feedback on is the approach to return a distinct type (neither a byte string nor a Unicode string) from readdir(). In order to use this, a programmer will have to convert it explicitly, otherwise e.g. printing it will just produce . This will immediately bump each programmer with their heads on the issue of unknown encodings and they will have to make the application-specific choice whether an approximation of the filename, an exception or ignoring the file is the right choice. Also, it presents the options for doing this conversion in a single class, which I personally find much better than providing overloads for hundreds of functions. Sorry for ranting, but I'm a bit confused and desperate, because either I'm unable to explain what I mean or I'm really not understanding something that everybody else here seems to agree upon. I just know that using a distinct path type has helped me in C++ in the past, and I don't see why it shouldn't in Python. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From victor.stinner at haypocalc.com Thu Dec 11 10:34:24 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 11 Dec 2008 10:34:24 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> Message-ID: <200812111034.24319.victor.stinner@haypocalc.com> Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez ?crit?: > >> Recover after a segfault is dangerous, but my first goal was to get the > >> Python backtrace instead just one line: "Segmentation fault". It helps a > >> lot for debug! > > > > Exactly! That's why it doesn't belong in the Python core. We can't > > guarantee anything about its affects or encourage it. > > Would it be safe to catch SIGSEGV, output a trace, and then exit? > IE, make the 'first goal' the only goal? Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to display the trace? It would be nice to -at least- use the Python stderr (which is written in pure Python for Python3). It would be better if the user can setup a callback, like sys.excepthook. But if -as many people wrote- Python is totally broken after a segfault, it is maybe not a good idea :-) I guess that sigsetjmp() and siglongjmp() hack can be avoided in Py_EvalFrameEx(), so ceval.c could be unchanged. New pseudocode: set checkpoint if error: get the backtrace display the backtrace fast exit (eg. don't call atexit, don't free memory, ...) else: normal execution -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From fijall at gmail.com Thu Dec 11 11:10:14 2008 From: fijall at gmail.com (Maciej Fijalkowski) Date: Thu, 11 Dec 2008 11:10:14 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <49404CEB.8040900@v.loewis.de> Message-ID: <693bc9ab0812110210i5174ce77u4309e5841b897a1a@mail.gmail.com> > > If we could calculate how much stack is left we'd have a much more > robust way of doing recursion limits. I suppose this could be done by > reading a byte from each page with a temporary SIGSEGV handler > installed, but I'm not convinced you can't ask the platform directly > somehow. I'd also be considered about thread-safety. > It's something as hard as taking address of local variable at the beginning of the program and at any arbitrary point. Of course 'how much is left' means additional arithmetics. Cheers, fijal From steve at holdenweb.com Thu Dec 11 13:13:49 2008 From: steve at holdenweb.com (Steve Holden) Date: Thu, 11 Dec 2008 07:13:49 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812111019.16950.eckhardt@satorlaser.com> References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> Message-ID: <494103FD.5000101@holdenweb.com> Ulrich Eckhardt wrote: > On Wednesday 10 December 2008, Adam Olsen wrote: >> On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt >> >> wrote: >>> On Tuesday 09 December 2008, Adam Olsen wrote: >>>> The only thing separating this from a bikeshed discussion is that a >>>> bikeshed has many equally good solutions, while we have no good >>>> solutions. Instead we're trying to find the least-bad one. The >>>> unicode/bytes separation is pretty close to that. Adding a warning >>>> gets even closer. Adding magic makes it worse. >>> Well, I see two cases: >>> 1. Converting from an uncertain representation to a known one. >>> 2. Converting from a known representation to a known one. >> Not quite: >> 1. Using a garbage file name locally (within a single process, not >> talking to any libs) >> 2. Using a unicode filename everywhere (libs, saved to config files, >> displayed to the user, etc.) > > I think there is some misunderstanding. I was referring to conversions and > whether it is good to perform them implicitly. For that, I saw the above two > cases. > >> On linux the bytes/unicode separation is perfect for this. You decide >> which approach you're using and use it consistently. If you mess up >> (mixing bytes and unicode) you'll consistently get an error. >> >> We currently don't follow this model on windows, so a garbage file >> name gets passed around as if it was unicode, but fails when passed to >> a lib, saved to a config file, is displayed to a user, etc. > > I'm not sure I agree with this. Facts I know are: > 1. On POSIX systems, there is no reliable encoding for filenames while the > system APIs use char/byte strings. > 2. On MS Windows, the encoding for filenames is Unicode/UTF-16. > > Returning Unicode strings from readdir() is wrong because it can't handle the > case 1 above. Returning byte strings is wrong because it can't handle case 2 > above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, > worst case, to the locale-dependent MBCS. Returning something different > depending on the system us also broken because that would make Python code > that uses this function and assumes a certain type unportable. > > Note that this doesn't get much better if you provide a separate readdirb() > API or one that simply returns a byte string or Unicode string depending on > its argument. It just shifts the brokenness from readdir() to the code that > uses it, unless this code makes a distinction between the target systems. > Since way too many programmers are not aware of the problem, they will not > handle these systems differently, so code will become non-portable. > > What I'd just like some feedback on is the approach to return a distinct type > (neither a byte string nor a Unicode string) from readdir(). In order to use > this, a programmer will have to convert it explicitly, otherwise e.g. > printing it will just produce . This will > immediately bump each programmer with their heads on the issue of unknown > encodings and they will have to make the application-specific choice whether > an approximation of the filename, an exception or ignoring the file is the > right choice. Also, it presents the options for doing this conversion in a > single class, which I personally find much better than providing overloads > for hundreds of functions. > > > Sorry for ranting, but I'm a bit confused and desperate, because either I'm > unable to explain what I mean or I'm really not understanding something that > everybody else here seems to agree upon. I just know that using a distinct > path type has helped me in C++ in the past, and I don't see why it shouldn't > in Python. > Seems to me this just threatens to add to the confusion. If you know what your filesystem produces, you can take the appropriate action to convert it into a type that makes sense to the user. If you don't, then at least if you have the string in its bytes form you can re-present it to the filesystem to manipulate the file. What are we supposed to do with the "special type"? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From steve at holdenweb.com Thu Dec 11 13:13:49 2008 From: steve at holdenweb.com (Steve Holden) Date: Thu, 11 Dec 2008 07:13:49 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812111019.16950.eckhardt@satorlaser.com> References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> Message-ID: <494103FD.5000101@holdenweb.com> Ulrich Eckhardt wrote: > On Wednesday 10 December 2008, Adam Olsen wrote: >> On Wed, Dec 10, 2008 at 3:39 AM, Ulrich Eckhardt >> >> wrote: >>> On Tuesday 09 December 2008, Adam Olsen wrote: >>>> The only thing separating this from a bikeshed discussion is that a >>>> bikeshed has many equally good solutions, while we have no good >>>> solutions. Instead we're trying to find the least-bad one. The >>>> unicode/bytes separation is pretty close to that. Adding a warning >>>> gets even closer. Adding magic makes it worse. >>> Well, I see two cases: >>> 1. Converting from an uncertain representation to a known one. >>> 2. Converting from a known representation to a known one. >> Not quite: >> 1. Using a garbage file name locally (within a single process, not >> talking to any libs) >> 2. Using a unicode filename everywhere (libs, saved to config files, >> displayed to the user, etc.) > > I think there is some misunderstanding. I was referring to conversions and > whether it is good to perform them implicitly. For that, I saw the above two > cases. > >> On linux the bytes/unicode separation is perfect for this. You decide >> which approach you're using and use it consistently. If you mess up >> (mixing bytes and unicode) you'll consistently get an error. >> >> We currently don't follow this model on windows, so a garbage file >> name gets passed around as if it was unicode, but fails when passed to >> a lib, saved to a config file, is displayed to a user, etc. > > I'm not sure I agree with this. Facts I know are: > 1. On POSIX systems, there is no reliable encoding for filenames while the > system APIs use char/byte strings. > 2. On MS Windows, the encoding for filenames is Unicode/UTF-16. > > Returning Unicode strings from readdir() is wrong because it can't handle the > case 1 above. Returning byte strings is wrong because it can't handle case 2 > above because it gives you useless roundtrips from UTF-16 to either UTF-8 or, > worst case, to the locale-dependent MBCS. Returning something different > depending on the system us also broken because that would make Python code > that uses this function and assumes a certain type unportable. > > Note that this doesn't get much better if you provide a separate readdirb() > API or one that simply returns a byte string or Unicode string depending on > its argument. It just shifts the brokenness from readdir() to the code that > uses it, unless this code makes a distinction between the target systems. > Since way too many programmers are not aware of the problem, they will not > handle these systems differently, so code will become non-portable. > > What I'd just like some feedback on is the approach to return a distinct type > (neither a byte string nor a Unicode string) from readdir(). In order to use > this, a programmer will have to convert it explicitly, otherwise e.g. > printing it will just produce . This will > immediately bump each programmer with their heads on the issue of unknown > encodings and they will have to make the application-specific choice whether > an approximation of the filename, an exception or ignoring the file is the > right choice. Also, it presents the options for doing this conversion in a > single class, which I personally find much better than providing overloads > for hundreds of functions. > > > Sorry for ranting, but I'm a bit confused and desperate, because either I'm > unable to explain what I mean or I'm really not understanding something that > everybody else here seems to agree upon. I just know that using a distinct > path type has helped me in C++ in the past, and I don't see why it shouldn't > in Python. > Seems to me this just threatens to add to the confusion. If you know what your filesystem produces, you can take the appropriate action to convert it into a type that makes sense to the user. If you don't, then at least if you have the string in its bytes form you can re-present it to the filesystem to manipulate the file. What are we supposed to do with the "special type"? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From ncoghlan at gmail.com Thu Dec 11 13:18:31 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Dec 2008 22:18:31 +1000 Subject: [Python-Dev] Merging flow In-Reply-To: <49406558.7020005@v.loewis.de> References: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> <49406558.7020005@v.loewis.de> Message-ID: <49410517.1030601@gmail.com> Martin v. L?wis wrote: > Jeffrey Yasskin wrote: >> Was there ever a conclusion to this? I need to merge the patches >> associated with issue 4597 from trunk to all the maintenance branches, >> and I'd like to avoid messing anyone up if possible. If I don't hear >> back, I'll plan to svnmerge directly from trunk to each of the >> branches, and then block my merge to py3k from being merged again to >> release30-maint. > > No - you should merge from the py3k branch to the release30-maint branch. I believe that's difficult when you previously merged from the trunk to the py3k branch - the merged change to the svnmerge related properties on the root directory gets in the way when svnmerge attempts to update them on the maintenance branch. That's what started this thread, and so far nobody has come up with a workaround. It seems to me that svnmerge.py should just be able to do a svn revert on the affected properties in the maintenance branch before it attempts to modify them, but my svn-fu isn't strong enough for me to say that for sure. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From skip at pobox.com Thu Dec 11 13:57:03 2008 From: skip at pobox.com (skip at pobox.com) Date: Thu, 11 Dec 2008 06:57:03 -0600 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <200812101937.16467.victor.stinner@haypocalc.com> Message-ID: <18753.3615.21624.999357@montanaro-dyndns-org.local> Simon> Some indictation of what Python was executing when the segfault Simon> occurred would help narrow now the possibilities rapidly. The Python distribution comes with a Misc/gdbinit file (you can grab it from the Subversion source tree via the web as well) that defines a pystack command. It will work with core files as well as running processes and should give you a very good idea where your Python code was executing when the segfault occurred. -- Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/ From solipsis at pitrou.net Thu Dec 11 14:10:48 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 11 Dec 2008 13:10:48 +0000 (UTC) Subject: [Python-Dev] Trap SIGSEGV and SIGFPE References: <200812101206.49316.victor.stinner@haypocalc.com> <200812101937.16467.victor.stinner@haypocalc.com> <18753.3615.21624.999357@montanaro-dyndns-org.local> Message-ID: pobox.com> writes: > > The Python distribution comes with a Misc/gdbinit file (you can grab it from > the Subversion source tree via the web as well) that defines a pystack > command. It will work with core files as well as running processes and > should give you a very good idea where your Python code was executing when > the segfault occurred. Still, it would be much better if the stack trace could be printed by Python itself rather than having to resort to gdb wizardry. Especially if the problem is reported by one of your non-developer users. From skip at pobox.com Thu Dec 11 14:28:01 2008 From: skip at pobox.com (skip at pobox.com) Date: Thu, 11 Dec 2008 07:28:01 -0600 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <200812101937.16467.victor.stinner@haypocalc.com> <18753.3615.21624.999357@montanaro-dyndns-org.local> Message-ID: <18753.5473.853788.617528@montanaro-dyndns-org.local> Antoine> Still, it would be much better if the stack trace could be Antoine> printed by Python itself rather than having to resort to gdb Antoine> wizardry. Especially if the problem is reported by one of your Antoine> non-developer users. I understand. The guy has a problem today for which there is a solution that I posted. If he's "been meaning to look into the problem" and he's posting to python-dev I presume he knows at least a little about running gdb if he's operating in a Unix environment. These two gdb commands source .gdbinit pystack shouldn't be too much of a barrier. Skip From solipsis at pitrou.net Thu Dec 11 14:37:39 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 11 Dec 2008 13:37:39 +0000 (UTC) Subject: [Python-Dev] Trap SIGSEGV and SIGFPE References: <200812101206.49316.victor.stinner@haypocalc.com> <200812101937.16467.victor.stinner@haypocalc.com> <18753.3615.21624.999357@montanaro-dyndns-org.local> <18753.5473.853788.617528@montanaro-dyndns-org.local> Message-ID: pobox.com> writes: > > I understand. The guy has a problem today for which there is a solution > that I posted. If he's "been meaning to look into the problem" and he's > posting to python-dev I presume he knows at least a little about running gdb > if he's operating in a Unix environment. These two gdb commands > > source .gdbinit > pystack > > shouldn't be too much of a barrier. Well, but sometimes you don't have a core file (because you didn't run ulimit before launching Python and the crash wasn't expected; if the crash is very erratic, by the time you've fixed the system limits, you don't manage to reproduce it anymore, or it takes hours because it's at the end of a very long workload). Sometimes you don't have the gdbinit file around (for example, Mandriva doesn't ship it with any Python-related package). Sometimes you are under Windows. etc. :-) From eckhardt at satorlaser.com Thu Dec 11 14:41:46 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Thu, 11 Dec 2008 14:41:46 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <494103FD.5000101@holdenweb.com> References: <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> Message-ID: <200812111441.46739.eckhardt@satorlaser.com> On Thursday 11 December 2008, Steve Holden wrote: > Ulrich Eckhardt wrote: > > What I'd just like some feedback on is the approach to return a distinct > > type (neither a byte string nor a Unicode string) from readdir(). In > > order to use this, a programmer will have to convert it explicitly, > > otherwise e.g. printing it will just produce . > > This will immediately bump each programmer with their heads on the issue > > of unknown encodings and they will have to make the application-specific > > choice whether an approximation of the filename, an exception or ignoring > > the file is the right choice. Also, it presents the options for doing > > this conversion in a single class, which I personally find much better > > than providing overloads for hundreds of functions. [...] > > Seems to me this just threatens to add to the confusion. > > If you know what your filesystem produces, you can take the appropriate > action to convert it into a type that makes sense to the user. If you > don't, then at least if you have the string in its bytes form you can ^^^^^^^^^^^^^^^^^^^ There are operating systems that don't use bytes to represent a file path, namely all the MS Windows variants. Even worse, when you use a byte string there, it typically means that you want to use the obsolete encoding that is based on codepages. Why can we not preserve the representation of a path as it is? Why do we _have_ to convert it to anything at all, without even knowing if this conversion is needed? I just want to do something to a file's content, why does its path have to be converted to something and then be converted back in order for the system to digest it? > re-present it to the filesystem to manipulate the file. What are we > supposed to do with the "special type"? You receive from readdir() and pass it to stat(), simple as that. No conversions from the native representation needed. If you need a textual representation, then you have to convert it and you have to do so explicitly according to whatever logic your application requires. If readdir() returned Unicode text, people would start taking that for granted. If it returned bytes, just the same. Returning a completely unrelated type will give them enough hint that for this thing they have to rethink their assumptions. This runs along the lines of "In the face of ambiguity, refuse the temptation to guess.", as it makes guessing rather impossible. I just don't see a case where using a separate path class would break things. Further, the special handling that is required would be made even clearer by using such a class. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From krstic at solarsail.hcs.harvard.edu Thu Dec 11 14:44:57 2008 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=) Date: Thu, 11 Dec 2008 14:44:57 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <49404CEB.8040900@v.loewis.de> References: <200812101206.49316.victor.stinner@haypocalc.com> <49404CEB.8040900@v.loewis.de> Message-ID: Hi Martin, On Dec 11, 2008, at 12:12 AM, Martin v. L?wis wrote: > Several people already said (essentially) that: -1. I don't think such > code should be added to the Python core, no matter how smart or > correct > it is. does your -1 apply only to attempts to resume execution after SIGSEGV, or also to the idea of dumping the stack and immediately exiting? The former strikes me as crazy talk, while the latter is genuinely useful. Cheers, -- Ivan Krsti? | http://radian.org From victor.stinner at haypocalc.com Thu Dec 11 15:19:07 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Thu, 11 Dec 2008 15:19:07 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <18753.3615.21624.999357@montanaro-dyndns-org.local> References: <200812101206.49316.victor.stinner@haypocalc.com> <18753.3615.21624.999357@montanaro-dyndns-org.local> Message-ID: <200812111519.07899.victor.stinner@haypocalc.com> Le Thursday 11 December 2008 13:57:03 skip at pobox.com, vous avez ?crit?: > Simon> Some indictation of what Python was executing when the segfault > Simon> occurred would help narrow now the possibilities rapidly. > > The Python distribution comes with a Misc/gdbinit file Hum, do you really run *all* programs in gdb? Most of the time, you don't expect a crash (because you trust your softwares). You will have to try to reproduce the crash, but sometimes it's very hard (eg. Heisenbugs!). My new proposition is to display the backtrace instead of just the message "segmentation fault". It's not a problem if displaying the backtrace produces new fault because it's already better than just the message "segmentation fault". Even with my SIGSEVG handler, you can still use gdb because gdb catchs the signal before the program. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From ijmorlan at uwaterloo.ca Thu Dec 11 14:58:51 2008 From: ijmorlan at uwaterloo.ca (Isaac Morland) Date: Thu, 11 Dec 2008 08:58:51 -0500 (EST) Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812111441.46739.eckhardt@satorlaser.com> References: <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <200812111441.46739.eckhardt@satorlaser.com> Message-ID: On Thu, 11 Dec 2008, Ulrich Eckhardt wrote: > On Thursday 11 December 2008, Steve Holden wrote: >> Ulrich Eckhardt wrote: >> Seems to me this just threatens to add to the confusion. >> >> If you know what your filesystem produces, you can take the appropriate >> action to convert it into a type that makes sense to the user. If you >> don't, then at least if you have the string in its bytes form you can > ^^^^^^^^^^^^^^^^^^^ > > There are operating systems that don't use bytes to represent a file path, > namely all the MS Windows variants. Even worse, when you use a byte string > there, it typically means that you want to use the obsolete encoding that is > based on codepages. > > Why can we not preserve the representation of a path as it is? Why do we > _have_ to convert it to anything at all, without even knowing if this > conversion is needed? I just want to do something to a file's content, why > does its path have to be converted to something and then be converted back in > order for the system to digest it? > >> re-present it to the filesystem to manipulate the file. What are we >> supposed to do with the "special type"? > > You receive from readdir() and pass it to stat(), simple as that. No > conversions from the native representation needed. If you need a textual > representation, then you have to convert it and you have to do so explicitly > according to whatever logic your application requires. Not only would this address the issue with the local filesystem, it would also provide a principled way to deal with remote filesystems. For example, an FTP interface library for Python could use this type to returns paths of the sort actually supported by the raw FTP protocol. Thinking of "the" filesystem is actually a misconception - always referring to "a" filesystem opens up all sorts of possibilities. There is a lot of coding to do to allow this, but allowing programs to work with paths and files in the local filesystem, remote filesystems, and filesystems constructed from others (e.g., by expanding symlinks, changing the root similar to chroot, or encoding/unencoding pathnames) would open up lots of possibilities, including better test environments. This is an interesting case of separating byte strings from character strings. As long as the two are conflated, everything appears simple. But when they are separated, not only are there two types where before there was only one, it turns out that which type is correct in some circumstances depends on the platform. Also, many objects which are byte strings at the protocol level are usually or always meant to be character strings of some sort, but how to translate them simply cannot be nailed down once and for all. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist From skip at pobox.com Thu Dec 11 15:27:27 2008 From: skip at pobox.com (skip at pobox.com) Date: Thu, 11 Dec 2008 08:27:27 -0600 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812111519.07899.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <18753.3615.21624.999357@montanaro-dyndns-org.local> <200812111519.07899.victor.stinner@haypocalc.com> Message-ID: <18753.9039.975554.300631@montanaro-dyndns-org.local> >> The Python distribution comes with a Misc/gdbinit file Victor> Hum, do you really run *all* programs in gdb? Most of the time, Victor> you don't expect a crash (because you trust your softwares). You Victor> will have to try to reproduce the crash, but sometimes it's very Victor> hard (eg. Heisenbugs!). Please folks! Get real. I was trying to help out a guy who responded to this thread saying that he gets intermittent segfaults in his PyGTK programs. I don't presume that he runs his app in gdb. If he has a core file this will work. I apologize profusely for any implication that a set of gdb commands is in any way superior to your patch. OTOH, it works today if you have a core file and are running Python at least as far back as 2.4. It doesn't require any changes to the interpreter. I use it frequently at work (a couple times a month anyway). We get notifications of all core files dropped each day. I make at least a cursory check of all core files dumped by Python. For that I use the pystack command defined in Misc/gdbinit. Victor> My new proposition is to display the backtrace instead of just Victor> the message "segmentation fault". It's not a problem if Victor> displaying the backtrace produces new fault because it's already Victor> better than just the message "segmentation fault". Even with my Victor> SIGSEVG handler, you can still use gdb because gdb catchs the Victor> signal before the program. Again, I meant no disrespect to your proposal. I was *simply trying to help the guy out*. Skip From jyasskin at gmail.com Thu Dec 11 17:08:53 2008 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 11 Dec 2008 08:08:53 -0800 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812111034.24319.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> <200812111034.24319.victor.stinner@haypocalc.com> Message-ID: <5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com> On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner wrote: > But if -as many people wrote- > Python is totally broken after a segfault, it is maybe not a good idea :-) While it's true that after a segfault or unexpected longjmp, there are no guarantees whatsoever about the state of the python program, the program will often just happen to work, and there are at least some programs I've worked on that would rather take the risk in order to try to shut down gracefully. For example, an interactive app may want to give the user a chance to save her (not necessarily corrupted) work into a new file rather than unconditionally losing it. Or a webserver might want to catch the segfault, finish replying to the other requests that were in progress at the time, maybe reply to the request that caused the segfault, and then restart. Yes there's a possibility that the events around the segfault exposed some secret internal data (and they may do so even without segfaulting), but when the alternative is not replying to the users at all, this may be a risk the app wants to take. It would be nice for Python to at least expose the option so that developers (who are consenting adults, remember) can make their own decisions. It should _not_ be on by default, but something like sys.dangerous_turn_C_crashes_into_exceptions() would be useful. Jeffrey From jyasskin at gmail.com Thu Dec 11 17:17:49 2008 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 11 Dec 2008 08:17:49 -0800 Subject: [Python-Dev] Merging flow In-Reply-To: <49410517.1030601@gmail.com> References: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> <49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com> Message-ID: <5d44f72f0812110817s74df22afk476c664acd5c8a6d@mail.gmail.com> On Thu, Dec 11, 2008 at 4:18 AM, Nick Coghlan wrote: > Martin v. L?wis wrote: >> Jeffrey Yasskin wrote: >>> Was there ever a conclusion to this? I need to merge the patches >>> associated with issue 4597 from trunk to all the maintenance branches, >>> and I'd like to avoid messing anyone up if possible. If I don't hear >>> back, I'll plan to svnmerge directly from trunk to each of the >>> branches, and then block my merge to py3k from being merged again to >>> release30-maint. >> >> No - you should merge from the py3k branch to the release30-maint branch. > > I believe that's difficult when you previously merged from the trunk to > the py3k branch - the merged change to the svnmerge related properties > on the root directory gets in the way when svnmerge attempts to update > them on the maintenance branch. > > That's what started this thread, and so far nobody has come up with a > workaround. It seems to me that svnmerge.py should just be able to do a > svn revert on the affected properties in the maintenance branch before > it attempts to modify them, but my svn-fu isn't strong enough for me to > say that for sure. Yeah, that's why I asked. I tried what Martin suggested with r67698 by just saying I'd resolved the conflict, which added the single revision I was merging from to the svnmerge-integrated property. It didn't add the two original revisions. I don't know enough about how svnmerge works to know if that's the right outcome or who it's going to cause trouble for. Jeffrey From foom at fuhm.net Thu Dec 11 17:40:28 2008 From: foom at fuhm.net (James Y Knight) Date: Thu, 11 Dec 2008 11:40:28 -0500 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> <200812111034.24319.victor.stinner@haypocalc.com> <5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com> Message-ID: On Dec 11, 2008, at 11:08 AM, Jeffrey Yasskin wrote: > On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner > wrote: >> But if -as many people wrote- >> Python is totally broken after a segfault, it is maybe not a good >> idea :-) > > While it's true that after a segfault or unexpected longjmp, there are > no guarantees whatsoever about the state of the python program, the > program will often just happen to work, and there are at least some > programs I've worked on that would rather take the risk in order to > try to shut down gracefully. I ran an interactive game for years (written in C, mind you, not python), where the SIGSEGV handler simply recursively reinvoked the main loop, after disabling the command that caused a SEGV if it had caused a SEGV twice already. It almost always worked and continued running without issue. YMMV, of course. :) James From musiccomposition at gmail.com Thu Dec 11 17:38:36 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Thu, 11 Dec 2008 10:38:36 -0600 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> <200812111034.24319.victor.stinner@haypocalc.com> <5d44f72f0812110808l37d20644r3c1560eff5f927f5@mail.gmail.com> Message-ID: <1afaf6160812110838j2064385byca69bd01f7d9d06@mail.gmail.com> On Thu, Dec 11, 2008 at 10:08 AM, Jeffrey Yasskin wrote: > On Thu, Dec 11, 2008 at 1:34 AM, Victor Stinner > wrote: >> But if -as many people wrote- >> Python is totally broken after a segfault, it is maybe not a good idea :-) > > While it's true that after a segfault or unexpected longjmp, there are > no guarantees whatsoever about the state of the python program, the > program will often just happen to work, and there are at least some > programs I've worked on that would rather take the risk in order to > try to shut down gracefully. For example, an interactive app may want > to give the user a chance to save her (not necessarily corrupted) work > into a new file rather than unconditionally losing it. Or a webserver > might want to catch the segfault, finish replying to the other > requests that were in progress at the time, maybe reply to the request > that caused the segfault, and then restart. Yes there's a possibility > that the events around the segfault exposed some secret internal data > (and they may do so even without segfaulting), but when the > alternative is not replying to the users at all, this may be a risk > the app wants to take. It would be nice for Python to at least expose > the option so that developers (who are consenting adults, remember) > can make their own decisions. It should _not_ be on by default, but > something like sys.dangerous_turn_C_crashes_into_exceptions() would be > useful. Trying to recover (or save work etc.) is incredibility unpredictable, though. It could very well end up making the situation worse! I'm -1 on putting this in the core. -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From steve at holdenweb.com Thu Dec 11 18:46:57 2008 From: steve at holdenweb.com (Steve Holden) Date: Thu, 11 Dec 2008 12:46:57 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812111441.46739.eckhardt@satorlaser.com> References: <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <200812111441.46739.eckhardt@satorlaser.com> Message-ID: Ulrich Eckhardt wrote: > On Thursday 11 December 2008, Steve Holden wrote: >> Ulrich Eckhardt wrote: >>> What I'd just like some feedback on is the approach to return a distinct >>> type (neither a byte string nor a Unicode string) from readdir(). In >>> order to use this, a programmer will have to convert it explicitly, >>> otherwise e.g. printing it will just produce . >>> This will immediately bump each programmer with their heads on the issue >>> of unknown encodings and they will have to make the application-specific >>> choice whether an approximation of the filename, an exception or ignoring >>> the file is the right choice. Also, it presents the options for doing >>> this conversion in a single class, which I personally find much better >>> than providing overloads for hundreds of functions. > [...] >> Seems to me this just threatens to add to the confusion. >> >> If you know what your filesystem produces, you can take the appropriate >> action to convert it into a type that makes sense to the user. If you >> don't, then at least if you have the string in its bytes form you can > ^^^^^^^^^^^^^^^^^^^ > > There are operating systems that don't use bytes to represent a file path, > namely all the MS Windows variants. Even worse, when you use a byte string > there, it typically means that you want to use the obsolete encoding that is > based on codepages. > > Why can we not preserve the representation of a path as it is? Why do we > _have_ to convert it to anything at all, without even knowing if this > conversion is needed? I just want to do something to a file's content, why > does its path have to be converted to something and then be converted back in > order for the system to digest it? > You don't: that was my point. You only need to perform any kind of conversion when the filename has to be presented to something other than the file system. >> re-present it to the filesystem to manipulate the file. What are we >> supposed to do with the "special type"? > > You receive from readdir() and pass it to stat(), simple as that. No > conversions from the native representation needed. If you need a textual > representation, then you have to convert it and you have to do so explicitly > according to whatever logic your application requires. > Exactly. > If readdir() returned Unicode text, people would start taking that for > granted. If it returned bytes, just the same. Returning a completely > unrelated type will give them enough hint that for this thing they have to > rethink their assumptions. This runs along the lines of "In the face of > ambiguity, refuse the temptation to guess.", as it makes guessing rather > impossible. > So you are suggesting this "special object" be used only to represent files to users? Now I understand. > I just don't see a case where using a separate path class would break things. > Further, the special handling that is required would be made even clearer by > using such a class. > But it does have to be implemented ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From rhamph at gmail.com Thu Dec 11 19:04:20 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 11 Dec 2008 11:04:20 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812111441.46739.eckhardt@satorlaser.com> References: <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <200812111441.46739.eckhardt@satorlaser.com> Message-ID: On Thu, Dec 11, 2008 at 6:41 AM, Ulrich Eckhardt wrote: > On Thursday 11 December 2008, Steve Holden wrote: >> re-present it to the filesystem to manipulate the file. What are we >> supposed to do with the "special type"? > > You receive from readdir() and pass it to stat(), simple as that. No > conversions from the native representation needed. If you need a textual > representation, then you have to convert it and you have to do so explicitly > according to whatever logic your application requires. The simplest solution there is to have windows bytes APIs that return raw UTF-16 bytes (note that windows does NOT guaranteed to be valid unicode, despite being much more likely than on linux). The only real issue I see is that UTF-16 isn't an ASCII superset, so it won't print nicely. In other words, bytes can be your special type. -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Thu Dec 11 19:15:22 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 11 Dec 2008 11:15:22 -0700 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812111034.24319.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> <200812111034.24319.victor.stinner@haypocalc.com> Message-ID: On Thu, Dec 11, 2008 at 2:34 AM, Victor Stinner wrote: > Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez ?crit : >> >> Recover after a segfault is dangerous, but my first goal was to get the >> >> Python backtrace instead just one line: "Segmentation fault". It helps a >> >> lot for debug! >> > >> > Exactly! That's why it doesn't belong in the Python core. We can't >> > guarantee anything about its affects or encourage it. >> >> Would it be safe to catch SIGSEGV, output a trace, and then exit? >> IE, make the 'first goal' the only goal? > > Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to > display the trace? It would be nice to -at least- use the Python stderr > (which is written in pure Python for Python3). It would be better if the user > can setup a callback, like sys.excepthook. But if -as many people wrote- > Python is totally broken after a segfault, it is maybe not a good idea :-) You have to use the low-level stderr, nothing that invokes Python. I'd hate to get a second segfault while printing the first. Just think about how indirect refcounting bugs tend to be. Another example is messing up GIL handling. There's heaps of things for which we'd want good stack traces, which can't be done from Python. -- Adam Olsen, aka Rhamphoryncus From daniel at stutzbachenterprises.com Thu Dec 11 20:39:30 2008 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Thu, 11 Dec 2008 13:39:30 -0600 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> <200812111034.24319.victor.stinner@haypocalc.com> Message-ID: On Thu, Dec 11, 2008 at 12:15 PM, Adam Olsen wrote: > You have to use the low-level stderr, nothing that invokes Python. > I'd hate to get a second segfault while printing the first. > > Just think about how indirect refcounting bugs tend to be. Another > example is messing up GIL handling. There's heaps of things for which > we'd want good stack traces, which can't be done from Python. > +1 on functionality to print a stack trace on a fault -1 on translating the fault into an exception I suggest exposing some functions to control the functionality. Here are some things the user may wish to control: 1. Disable/enable the functionality altogether 2. Set the file descriptor that the stack trace should be written to 3. Set a file name that should be created and written to instead 4. Specify whether a core dump should be generated 5. Specify a program to run after the stack trace has been printed #3 combined with #5 would be very useful for automated bug reporting. For what it's worth, the functionality could be implemented under Windows using Structured Exception Handling. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Thu Dec 11 21:00:43 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 11 Dec 2008 21:00:43 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <1afaf6160812101042u45320fb2yaae484ffdb6a16a2@mail.gmail.com> <200812111034.24319.victor.stinner@haypocalc.com> Message-ID: <4941716B.6030401@egenix.com> On 2008-12-11 19:15, Adam Olsen wrote: > On Thu, Dec 11, 2008 at 2:34 AM, Victor Stinner > wrote: >> Le Wednesday 10 December 2008 20:04:00 Terry Reedy, vous avez ?crit : >>>>> Recover after a segfault is dangerous, but my first goal was to get the >>>>> Python backtrace instead just one line: "Segmentation fault". It helps a >>>>> lot for debug! >>>> Exactly! That's why it doesn't belong in the Python core. We can't >>>> guarantee anything about its affects or encourage it. >>> Would it be safe to catch SIGSEGV, output a trace, and then exit? >>> IE, make the 'first goal' the only goal? >> Oh yeah, good idea :-) Does it mean that Python interpreter can't be used to >> display the trace? It would be nice to -at least- use the Python stderr >> (which is written in pure Python for Python3). It would be better if the user >> can setup a callback, like sys.excepthook. But if -as many people wrote- >> Python is totally broken after a segfault, it is maybe not a good idea :-) > > You have to use the low-level stderr, nothing that invokes Python. > I'd hate to get a second segfault while printing the first. > > Just think about how indirect refcounting bugs tend to be. Another > example is messing up GIL handling. There's heaps of things for which > we'd want good stack traces, which can't be done from Python. Experience with mx.Tools.safecall() shows that there's a lot you can still do after a segfault in some library, including print the traceback in Python, so things are not as bad. However, I'd disable such functionality in Python per default, if it should ever get introduced. This has got to stay an expert option, unless we want to risk messing up user systems completely, e.g. by having some logging manager unintentionally overwrite important files on the disk. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 11 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From martin at v.loewis.de Thu Dec 11 21:03:03 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 11 Dec 2008 21:03:03 +0100 Subject: [Python-Dev] Merging flow In-Reply-To: <49410517.1030601@gmail.com> References: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> <49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com> Message-ID: <494171F7.7050208@v.loewis.de> > I believe that's difficult when you previously merged from the trunk to > the py3k branch - the merged change to the svnmerge related properties > on the root directory gets in the way when svnmerge attempts to update > them on the maintenance branch. > > That's what started this thread, and so far nobody has come up with a > workaround. The work-around is fairly straight-forward: - inspect the conflict file (I forgot its name - something like dir-props), and verify that the only conflict is in the missing merge info from trunk to py3k - svn resolved . > It seems to me that svnmerge.py should just be able to do a > svn revert on the affected properties in the maintenance branch before > it attempts to modify them, but my svn-fu isn't strong enough for me to > say that for sure. See above. svnmerge overwrites the property after it has conflicted, so the only additional action to take is to declare that a resolution. Regards, Martin From martin at v.loewis.de Thu Dec 11 21:05:39 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 11 Dec 2008 21:05:39 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: References: <200812101206.49316.victor.stinner@haypocalc.com> <49404CEB.8040900@v.loewis.de> Message-ID: <49417293.50506@v.loewis.de> > On Dec 11, 2008, at 12:12 AM, Martin v. L?wis wrote: >> Several people already said (essentially) that: -1. I don't think such >> code should be added to the Python core, no matter how smart or correct >> it is. > > > does your -1 apply only to attempts to resume execution after SIGSEGV, > or also to the idea of dumping the stack and immediately exiting? The > former strikes me as crazy talk, while the latter is genuinely useful. Only to the former. If it is actually possible to print a stack trace, that could be useful indeed. I'm then skeptical that this is possible in the general case (i.e. displaying the full C stack), but displaying (parts of) the Python stack might be possible. I think it should still proceed to dump core, so that you can then inspect the core with a proper debugger. Regards, Martin From martin at v.loewis.de Thu Dec 11 21:10:07 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Thu, 11 Dec 2008 21:10:07 +0100 Subject: [Python-Dev] Merging flow In-Reply-To: <5d44f72f0812110817s74df22afk476c664acd5c8a6d@mail.gmail.com> References: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> <49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com> <5d44f72f0812110817s74df22afk476c664acd5c8a6d@mail.gmail.com> Message-ID: <4941739F.6020701@v.loewis.de> > Yeah, that's why I asked. I tried what Martin suggested with r67698 by > just saying I'd resolved the conflict, which added the single revision > I was merging from to the svnmerge-integrated property. It didn't add > the two original revisions. Can you elaborate? What are the "two original revisions" it didn't add? If you are referring to the trunk revisions - that's fine. As far as svnmerge is concerned, we merge revisions from the 3k branch to the 3.0 maintenance branch. The original revisions don't exist on the 3k branch (they have an empty changeset), so it's not a problem that they didn't get recorded as merged. Regards, Martin From jyasskin at gmail.com Thu Dec 11 21:33:09 2008 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 11 Dec 2008 12:33:09 -0800 Subject: [Python-Dev] Merging flow In-Reply-To: <4941739F.6020701@v.loewis.de> References: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> <49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com> <5d44f72f0812110817s74df22afk476c664acd5c8a6d@mail.gmail.com> <4941739F.6020701@v.loewis.de> Message-ID: <5d44f72f0812111233p2cae1249n31e8ddab857c1e03@mail.gmail.com> On Thu, Dec 11, 2008 at 12:10 PM, "Martin v. L?wis" wrote: >> Yeah, that's why I asked. I tried what Martin suggested with r67698 by >> just saying I'd resolved the conflict, which added the single revision >> I was merging from to the svnmerge-integrated property. It didn't add >> the two original revisions. > > Can you elaborate? What are the "two original revisions" it didn't add? > > If you are referring to the trunk revisions - that's fine. As far > as svnmerge is concerned, we merge revisions from the 3k branch > to the 3.0 maintenance branch. The original revisions don't exist > on the 3k branch (they have an empty changeset), so it's not a > problem that they didn't get recorded as merged. Yes, I was referring to the trunk revisions. Sounds like this (marking the conflicting property as resolved without changing it) is the way to go then. Thanks! From ncoghlan at gmail.com Thu Dec 11 21:39:29 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Dec 2008 06:39:29 +1000 Subject: [Python-Dev] Merging flow In-Reply-To: <494171F7.7050208@v.loewis.de> References: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> <49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com> <494171F7.7050208@v.loewis.de> Message-ID: <49417A81.8050505@gmail.com> Martin v. L?wis wrote: >> I believe that's difficult when you previously merged from the trunk to >> the py3k branch - the merged change to the svnmerge related properties >> on the root directory gets in the way when svnmerge attempts to update >> them on the maintenance branch. >> >> That's what started this thread, and so far nobody has come up with a >> workaround. > > The work-around is fairly straight-forward: > > - inspect the conflict file (I forgot its name - something like > dir-props), and verify that the only conflict is in the missing > merge info from trunk to py3k > - svn resolved . Ah, that's the missing piece of info - thanks :) This should probably go in the dev FAQ somewhere though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From eric at trueblade.com Thu Dec 11 21:45:09 2008 From: eric at trueblade.com (Eric Smith) Date: Thu, 11 Dec 2008 15:45:09 -0500 Subject: [Python-Dev] Merging flow In-Reply-To: <49417A81.8050505@gmail.com> References: <5d44f72f0812101612x1054c89dxc90d0346b7df76a@mail.gmail.com> <49406558.7020005@v.loewis.de> <49410517.1030601@gmail.com> <494171F7.7050208@v.loewis.de> <49417A81.8050505@gmail.com> Message-ID: <49417BD5.10109@trueblade.com> Nick Coghlan wrote: > Martin v. L?wis wrote: >>> I believe that's difficult when you previously merged from the trunk to >>> the py3k branch - the merged change to the svnmerge related properties >>> on the root directory gets in the way when svnmerge attempts to update >>> them on the maintenance branch. >>> >>> That's what started this thread, and so far nobody has come up with a >>> workaround. >> The work-around is fairly straight-forward: >> >> - inspect the conflict file (I forgot its name - something like >> dir-props), and verify that the only conflict is in the missing >> merge info from trunk to py3k >> - svn resolved . > > Ah, that's the missing piece of info - thanks :) > > This should probably go in the dev FAQ somewhere though. Indeed! Preferably with an example, if someone who understands it has the time. I have some changes I've been hold off of checking in until I see how someone else handles this. Eric. From martin at v.loewis.de Thu Dec 11 22:02:16 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Thu, 11 Dec 2008 22:02:16 +0100 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <200812111519.07899.victor.stinner@haypocalc.com> References: <200812101206.49316.victor.stinner@haypocalc.com> <18753.3615.21624.999357@montanaro-dyndns-org.local> <200812111519.07899.victor.stinner@haypocalc.com> Message-ID: <49417FD8.7050307@v.loewis.de> >> The Python distribution comes with a Misc/gdbinit file > > Hum, do you really run *all* programs in gdb? Most of the time, you don't > expect a crash (because you trust your softwares). You will have to try to > reproduce the crash, but sometimes it's very hard (eg. Heisenbugs!). You don't have to run the program in gdb. You can also use the core dump that the operating system will generate, and study the crash after it happened. Regards, Martin From stephen at xemacs.org Fri Dec 12 02:55:52 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Dec 2008 10:55:52 +0900 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <494103FD.5000101@holdenweb.com> References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> Message-ID: <871vwe9mxj.fsf@xemacs.org> Steve Holden writes: > Ulrich Eckhardt writes: > > What I'd just like some feedback on is the approach to return a > > distinct type (neither a byte string nor a Unicode string) from > > readdir(). This is presumably unacceptable on the grounds that it will break existing code that does something more or less useful more or less some of the time. > If you know what your filesystem produces, you can take the appropriate > action to convert it into a type that makes sense to the user. Unfortunately, even programmers experienced in I18N like Martin, and those with intuition-that-has-the-force-of-law like Guido, express deliberate disbelief on this point. They say that filesystem names and environment variable values are text, which is true from the semantic viewpoint but can't be fully supported by any implementation. The implementation issue is why you want bytes, but I don't think it is going to overcome the tide of (semantically-oriented) pragmatism. > If you don't, then at least if you have the string in its bytes > form you can re-present it to the filesystem to manipulate the > file. What are we supposed to do with the "special type"? Trivially convert it back to bytes and re-present it to the filesystem, of course. I gather that the BFDL's line on this thread of discussion is that forcing programmers to think about encodings every time they call out to the OS is unacceptable when most programs will work acceptably almost all of the time with a rather naive approach. This means that almost all Python programs will be technically broken for the forseeable future, sorry, Ulrich. And for the same pragmatic reasons, these functions are going to return strings (ie, Unicode), not bytes, I expect. Sorry, Steve. What needs to be determined here is the best way to provide reliability to those who will go to the effort of asking for it if it's available. I don't think "just return bytes" fits the bill for the reason above. What I would like to see is a type that is derived from string (so if you present it to an API expecting string, it is silently treated as string), but from which the original bytes can always be extracted on request. If the original bytes cannot be sensibly decoded to a string, then the string field in the object would either contain something that should normally cause an error in a string API, or some made-up string (presumably it would attempt to be a more or less faithful representation of the bytes) at the caller's option. Probably they'd also contain some metadata useful in guessing encodings (the read time locale in particular). These objects probably shouldn't support string-like operations in a general way (ie, maintaining both the string representation and the bytes "correctly"). Rather, using "proper" string operations on them would use the string content and produce strings. People who really want to handle mixed-encoding pathnames and the like would have to keep collections of these objects and handle them in an ad-hoc way. Unfortunate implementing this is way beyond my skills and time availability. From sturla at molden.no Fri Dec 12 02:13:13 2008 From: sturla at molden.no (Sturla Molden) Date: Fri, 12 Dec 2008 02:13:13 +0100 (CET) Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? Message-ID: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> Last month there was a discussion on Python-Dev regarding removal of reference counting to remove the GIL. I hope you forgive me for continuing the debate. I think reference counting is a good feature. It prevents huge piles of garbage from building up. It makes the interpreter run more smoothly. It is not just important for games and multimedia applications, but also servers under high load. Python does not pause to look for garbage like Java or .NET. It only pauses to look for dead reference cycles. This can be safely turned off temporarily; it can be turned off completely if you do not create reference cycles. With Java and .NET, no garbage is ever reclaimed except by the intermittent garbage collection. Python always reclaims an object when the reference count drops to zero ? whether the GC is enabled or not. This makes Python programs well-behaved. For this reason, I think removing reference counting is a genuinely bad idea. Even if the GIL is evil, this remedy is even worse. I am not a Python core developer; I am a research scientist who use Python because Matlab is (or used to be) a bad programming language, albeit a good computing environment. As most people who have worked with scientific computing know, there are better paradigms for concurrency than threads. In particular, there are message-passing systems like MPI and Erlang, and there are autovectorizing compilers for OpenMP and Fortran 90/95. There are special LAPACK, BLAS and FFT libraries for parallel computer architectures. There are fork-join systems like cilk and java.util.concurrent. Threads seem to be used only because mediocre programmers don't know what else to use. I genuinely think the use of threads should be discouraged. It leads to code that are full of bugs and difficult to maintain - race conditions, deadlocks, and livelocks are common pitfalls. Very few developers are capable of implementing efficient load-balancing by hand. Multi-threaded programs tend to scale badly because they are badly written. If the GIL discourages the abuse of threads, it serves a purpose albeit being evil like the Linux kernel's BKL. Python could be better off doing what tcl does. Allow each process to embed multiple interpreters; run each interpreter in its own thread. Implement a fast message-passing system between the interpreters (e.g. copy-on-write by making communicated objects immutable), and Python would be closer to Erlang than Java. I thus think the main offender is the thread and threading modules - not the GIL. Without thread support in the interpreter, there would be no threads. Without threads, there would be no need for a GIL. Both sources of evil can be removed by just removing thread support from the Python interpreter. In addition, it would make Python faster at executing linear code. Just copy the concurrency model of Erlang instead of Java and get rid of those nasty threads. In the meanwhile, I'll continue to experiment with multiprocessing. Removing reference counting to encourage the use of threads is like shooting ourselves in the leg twice. That?s my two cents on this issue. There is another issue to note as well: If you can endure a 200x loss of efficacy by using Python instead of Fortran, scalability on dual or quad-core processors may not be that important. Just move the bottlenecks out of Python and you are much better off. Regards, Sturla Molden From rhamph at gmail.com Fri Dec 12 06:22:37 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 11 Dec 2008 22:22:37 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <871vwe9mxj.fsf@xemacs.org> References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> Message-ID: On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull wrote: > Unfortunately, even programmers experienced in I18N like Martin, and > those with intuition-that-has-the-force-of-law like Guido, > express deliberate disbelief on this point. They say that filesystem > names and environment variable values are text, which is true from the > semantic viewpoint but can't be fully supported by any implementation. With all the focus on backup tools and file managers I think we've lost perspective. They're an important use case, but hardly the dominant one. Please, as a user, if your app is creating new files, do NOT use bytes! You have no excuse for creating garbage, and garbage doesn't help the user any. Getting the encoding right, use the unicode APIs, and don't pass the buck on to everything else. The fact that the unicode is easier is a bonus for doing the right thing. -- Adam Olsen, aka Rhamphoryncus From a.badger at gmail.com Fri Dec 12 06:41:57 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 11 Dec 2008 21:41:57 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> Message-ID: <4941F9A5.5040704@gmail.com> Adam Olsen wrote: > On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull wrote: >> Unfortunately, even programmers experienced in I18N like Martin, and >> those with intuition-that-has-the-force-of-law like Guido, >> express deliberate disbelief on this point. They say that filesystem >> names and environment variable values are text, which is true from the >> semantic viewpoint but can't be fully supported by any implementation. > > With all the focus on backup tools and file managers I think we've > lost perspective. They're an important use case, but hardly the > dominant one. > > Please, as a user, if your app is creating new files, do NOT use > bytes! You have no excuse for creating garbage, and garbage doesn't > help the user any. Getting the encoding right, use the unicode APIs, > and don't pass the buck on to everything else. > Uhmmm.... That's good advice but doesn't solve any problems :-(. No matter what I create, the filenames will be bytes when the next person reads them in. If my locale is shift-js and the person I'm sharing the file with uses utf-8 things won't work. Even if my locale is utf-8 (since I come from a European nation) and their locale is utf-16 (because they're from an Asian nation) the Unicode API won't work. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From rhamph at gmail.com Fri Dec 12 07:19:27 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 11 Dec 2008 23:19:27 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4941F9A5.5040704@gmail.com> References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: On Thu, Dec 11, 2008 at 10:41 PM, Toshio Kuratomi wrote: > Adam Olsen wrote: >> On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull wrote: >>> Unfortunately, even programmers experienced in I18N like Martin, and >>> those with intuition-that-has-the-force-of-law like Guido, >>> express deliberate disbelief on this point. They say that filesystem >>> names and environment variable values are text, which is true from the >>> semantic viewpoint but can't be fully supported by any implementation. >> >> With all the focus on backup tools and file managers I think we've >> lost perspective. They're an important use case, but hardly the >> dominant one. >> >> Please, as a user, if your app is creating new files, do NOT use >> bytes! You have no excuse for creating garbage, and garbage doesn't >> help the user any. Getting the encoding right, use the unicode APIs, >> and don't pass the buck on to everything else. >> > Uhmmm.... That's good advice but doesn't solve any problems :-(. No > matter what I create, the filenames will be bytes when the next person > reads them in. If my locale is shift-js and the person I'm sharing the > file with uses utf-8 things won't work. Even if my locale is utf-8 > (since I come from a European nation) and their locale is utf-16 > (because they're from an Asian nation) the Unicode API won't work. So you'll open up the dir and find this collection: ??????.txt ????????.png ???????.html ????????.html ???.png ??????.txt ??????.txt ??????.txt A half-broken setup is still a broken setup. Eventually you have to tell people to stop screwing around and pick one encoding. I doubt that UTF-16 is used very much (other than on windows). I haven't found any statistics on what distros use, but did find this one of the web itself: http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html I can't wait for next year's statistics. -- Adam Olsen, aka Rhamphoryncus From curt at hagenlocher.org Fri Dec 12 07:25:08 2008 From: curt at hagenlocher.org (Curt Hagenlocher) Date: Thu, 11 Dec 2008 22:25:08 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen wrote: > > I doubt that UTF-16 is used very much (other than on windows). > There's this other obscure platform called "Java"... ;) -- Curt Hagenlocher curt at hagenlocher.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhamph at gmail.com Fri Dec 12 07:25:31 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 11 Dec 2008 23:25:31 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> Message-ID: On Thu, Dec 11, 2008 at 10:22 PM, Adam Olsen wrote: > On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull wrote: >> Unfortunately, even programmers experienced in I18N like Martin, and >> those with intuition-that-has-the-force-of-law like Guido, >> express deliberate disbelief on this point. They say that filesystem >> names and environment variable values are text, which is true from the >> semantic viewpoint but can't be fully supported by any implementation. > > With all the focus on backup tools and file managers I think we've > lost perspective. They're an important use case, but hardly the > dominant one. > > Please, as a user, if your app is creating new files, do NOT use > bytes! You have no excuse for creating garbage, and garbage doesn't > help the user any. Getting the encoding right, use the unicode APIs, > and don't pass the buck on to everything else. > > The fact that the unicode is easier is a bonus for doing the right thing. As a data point, firefox (when pointed at my home dir) DOES skip over garbage files. -- Adam Olsen, aka Rhamphoryncus From rhamph at gmail.com Fri Dec 12 07:26:46 2008 From: rhamph at gmail.com (Adam Olsen) Date: Thu, 11 Dec 2008 23:26:46 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: On Thu, Dec 11, 2008 at 11:25 PM, Curt Hagenlocher wrote: > On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen wrote: >> >> I doubt that UTF-16 is used very much (other than on windows). > > There's this other obscure platform called "Java"... ;) Sorry, I should have said "for interchange". :) (CPython doesn't use UTF-8 internally either. It uses UTF-16 or UTF-32.) -- Adam Olsen, aka Rhamphoryncus From a.badger at gmail.com Fri Dec 12 08:16:38 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 11 Dec 2008 23:16:38 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: <49420FD6.1040901@gmail.com> Adam Olsen wrote: > A half-broken setup is still a broken setup. Eventually you have to > tell people to stop screwing around and pick one encoding. > But it's not a broken setup. It's the way the world is because people share things with each other. > I doubt that UTF-16 is used very much (other than on windows). I > haven't found any statistics on what distros use, but did find this > one of the web itself: > http://googleblog.blogspot.com/2008/05/moving-to-unicode-51.html > UTF-16 is popular in Asian locales for the same reason that shift-js and big-5 are hanging in there. utf-8 takes many more bytes to encode Asian Unicode characters than utf-16. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From a.badger at gmail.com Fri Dec 12 08:33:28 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Thu, 11 Dec 2008 23:33:28 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> Message-ID: <494213C8.7040809@gmail.com> Adam Olsen wrote: > As a data point, firefox (when pointed at my home dir) DOES skip over > garbage files. > > That's not true. However, it looks like Firefox is actually broken. Take a look at this screenshot: firefox.png That shows a directory with a folder that's not decodable in my utf-8 locale. What's interesting to note is that I actually have two nondecodable folders there but only one of them showed up. So firefox is inconsistent with its treatment, rendering some non-decodable files and ignoring others. Also interesting, if you point your browser at: http://toshio.fedorapeople.org/u/ You should see two other test files. They're both (one-half)(enyei).html but one's encoded in utf-8 and the other in latin-1. Firefox has some bugs in it related to this. For instance, if you mouseover the two links you'll see that firefox displays the same symbolic names for each of the files (even though they're in two different encodings). Sometimes firefox is able to load both files and sometimes it only loads one of them. Firefox seems to be translating the characters from ASCII percent encoding of bytes into their unicode symbols and back to utf-8 in some circumstances related to whether it has the pages in its cache or not. In this case, it should be leaving things as percent encoded bytes as it's the only way that apache is going to know what to retrieve. -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From rhamph at gmail.com Fri Dec 12 09:00:26 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 12 Dec 2008 01:00:26 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <494213C8.7040809@gmail.com> References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <494213C8.7040809@gmail.com> Message-ID: On Fri, Dec 12, 2008 at 12:33 AM, Toshio Kuratomi wrote: > Adam Olsen wrote: >> As a data point, firefox (when pointed at my home dir) DOES skip over >> garbage files. >> >> > That's not true. However, it looks like Firefox is actually broken. > Take a look at this screenshot: > firefox.png > > That shows a directory with a folder that's not decodable in my utf-8 > locale. What's interesting to note is that I actually have two > nondecodable folders there but only one of them showed up. So firefox > is inconsistent with its treatment, rendering some non-decodable files > and ignoring others. > > Also interesting, if you point your browser at: > http://toshio.fedorapeople.org/u/ > > You should see two other test files. They're both > (one-half)(enyei).html but one's encoded in utf-8 and the other in > latin-1. Firefox has some bugs in it related to this. For instance, if > you mouseover the two links you'll see that firefox displays the same > symbolic names for each of the files (even though they're in two > different encodings). Sometimes firefox is able to load both files and > sometimes it only loads one of them. Firefox seems to be translating > the characters from ASCII percent encoding of bytes into their unicode > symbols and back to utf-8 in some circumstances related to whether it > has the pages in its cache or not. In this case, it should be leaving > things as percent encoded bytes as it's the only way that apache is > going to know what to retrieve. UTF-8 in percent encodings is becoming a defacto standard. Otherwise the browser has to display the percent escapes in the address bar, rather than the intended text. IOW, inconsistent behaviour is a bug, but translating into UTF-8 is not. ;) -- Adam Olsen, aka Rhamphoryncus From eckhardt at satorlaser.com Fri Dec 12 09:19:05 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Fri, 12 Dec 2008 09:19:05 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812111441.46739.eckhardt@satorlaser.com> Message-ID: <200812120919.05389.eckhardt@satorlaser.com> On Thursday 11 December 2008, Steve Holden wrote: > Ulrich Eckhardt wrote: > > If readdir() returned Unicode text, people would start taking that for > > granted. If it returned bytes, just the same. Returning a completely > > unrelated type will give them enough hint that for this thing they have > > to rethink their assumptions. This runs along the lines of "In the face > > of ambiguity, refuse the temptation to guess.", as it makes guessing > > rather impossible. > > So you are suggesting this "special object" be used only to represent > files to users? Now I understand. Not only files, the same problem crops up when handling sys.argv and os.environ. > > I just don't see a case where using a separate path class would break > > things. Further, the special handling that is required would be made even > > clearer by using such a class. > > But it does have to be implemented ... Well, it isn't really terribly difficult to do so, after all its just a container for either a byte string or Unicode string plus some helper code to convert it to/from Unicode. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From stefan_ml at behnel.de Fri Dec 12 09:35:25 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 12 Dec 2008 09:35:25 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> Message-ID: Hi, replying to the topic only: because many C libraries support threading and Python extension modules can integrate them in a way that allows concurrency in a safe way (although 'safe' is definitely something that is paid for in developer days). Stefan From eckhardt at satorlaser.com Fri Dec 12 09:31:16 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Fri, 12 Dec 2008 09:31:16 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812111441.46739.eckhardt@satorlaser.com> Message-ID: <200812120931.16231.eckhardt@satorlaser.com> On Thursday 11 December 2008, Adam Olsen wrote: > The simplest solution there is to have windows bytes APIs that return > raw UTF-16 bytes (note that windows does NOT guaranteed to be valid > unicode, despite being much more likely than on linux). Actually, I'm not aware of this case. I only know that the OS refuses to mount media it can't decode, but that is on the OS-level. Can you give me a hint? > The only real issue I see is that UTF-16 isn't an ASCII superset, so it > won't print nicely. True, but I personally couldn't care less. Actually, I would even prefer if printing a byte string always produced \x escaped byte values, that way it would at least be consistent. > In other words, bytes can be your special type. That would actually be a lot of work to do, but I do agree that it would be a way. The problem though is that I have seen quite a few places in Python where such a byte string is passed as 'char*' and treated with the assumption that strlen() would yield a meaningful value there, so this calls at least for a distinct 'Py_Byte' type. Also, this still doesn't even remotely handle the problem that you do have two valid encodings on win32, even though the MBCS one could be called deprecated. People will try to interface to other libraries that use win32 CHAR strings and that will be much harder or even impossible. Further, and that is IMHO the worst part of it, things will fail too silently and programmers aren't encouraged to write portable code, but maybe I'm just too pessimistic. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From stephen at xemacs.org Fri Dec 12 09:57:20 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Dec 2008 17:57:20 +0900 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4941F9A5.5040704@gmail.com> References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: <87oczh93f3.fsf@xemacs.org> Toshio Kuratomi writes: > Adam Olsen wrote: > > On Thu, Dec 11, 2008 at 6:55 PM, Stephen J. Turnbull wrote: > >> Unfortunately, even programmers experienced in I18N like Martin, and > >> those with intuition-that-has-the-force-of-law like Guido, > >> express deliberate disbelief on this point. They say that filesystem > >> names and environment variable values are text, which is true from the > >> semantic viewpoint but can't be fully supported by any implementation. > > > > With all the focus on backup tools and file managers I think we've > > lost perspective. They're an important use case, but hardly the > > dominant one. True. > > Please, as a user, if your app is creating new files, do NOT use > > bytes! You have no excuse for creating garbage, and garbage doesn't > > help the user any. Getting the encoding right, use the unicode APIs, > > and don't pass the buck on to everything else. > > > Uhmmm.... That's good advice but doesn't solve any problems :-(. Exactly. Furthermore, the problems *already exist*. My current locale is UTF-8 and all files dated since about 2002 have UTF-8 names, *except* in my MIME-bodies garbage can, where only recently have I got around to coercing my MUA to doing the right thing. And of course there are still legacy files names in EUC-JP, which I suppose I could search for but since I only access a directory containing one once in a pale blue moon, I'm not gonna bother. It's just not reasonable to expect users or even sysadminns to go around cleaning up legacy data. From nd at perlig.de Fri Dec 12 10:11:09 2008 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Fri, 12 Dec 2008 10:11:09 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <494213C8.7040809@gmail.com> Message-ID: <200812121011.09427.nd@perlig.de> * Adam Olsen wrote: > UTF-8 in percent encodings is becoming a defacto standard. Otherwise > the browser has to display the percent escapes in the address bar, > rather than the intended text. Duh! The address bar should contain the URL, which *is* the intended text. The escapes are there for a reason. If I pass some octets using percent escapes via the query string or request body, it's not text, not even intended. It's still a collection of octets. Translating them back (and forth when I press enter in the address bar) is a pretty ambigious operation and therefore pretty wrong. The defacto standard does not exist. There's a real one instead: RFC 2396. nd From rhamph at gmail.com Fri Dec 12 10:12:26 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 12 Dec 2008 02:12:26 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812120931.16231.eckhardt@satorlaser.com> References: <200812111441.46739.eckhardt@satorlaser.com> <200812120931.16231.eckhardt@satorlaser.com> Message-ID: On Fri, Dec 12, 2008 at 1:31 AM, Ulrich Eckhardt wrote: > On Thursday 11 December 2008, Adam Olsen wrote: >> The simplest solution there is to have windows bytes APIs that return >> raw UTF-16 bytes (note that windows does NOT guaranteed to be valid >> unicode, despite being much more likely than on linux). > > Actually, I'm not aware of this case. I only know that the OS refuses to mount > media it can't decode, but that is on the OS-level. Can you give me a hint? Only pages like this, which indicate the underlying API is an array of WCHAR: http://blogs.msdn.com/michkap/archive/2005/05/11/416552.aspx >> The only real issue I see is that UTF-16 isn't an ASCII superset, so it >> won't print nicely. > > True, but I personally couldn't care less. Actually, I would even prefer if > printing a byte string always produced \x escaped byte values, that way it > would at least be consistent. > >> In other words, bytes can be your special type. > > That would actually be a lot of work to do, but I do agree that it would be a > way. > > The problem though is that I have seen quite a few places in Python where such > a byte string is passed as 'char*' and treated with the assumption that > strlen() would yield a meaningful value there, so this calls at least for a > distinct 'Py_Byte' type. Also, this still doesn't even remotely handle the > problem that you do have two valid encodings on win32, even though the MBCS > one could be called deprecated. People will try to interface to other > libraries that use win32 CHAR strings and that will be much harder or even > impossible. Further, and that is IMHO the worst part of it, things will fail > too silently and programmers aren't encouraged to write portable code, but > maybe I'm just too pessimistic. char * is just fine. You need only pass a length along with it. All internal APIs *must* already do this, as they support nul bytes. Also note that the underlying POSIX APIs prohibit nul bytes in filenames, so it's irrelevant for them. If your concern is that people will use MBCS byte strings (produced how?) in a WCHAR API.. I agree it would be confusing, but not nearly enough to warrant a special type (which would probably get passed a MBCS byte string anyway.) Although I haven't found an official claim that MBCS is deprecated, I see no reason why it wouldn't be effectively obsoleted by the UTF-16 APIs. (Certain outdated APIs may be the exception.) We could have a way to convert (locale-dependent codec?), but that's as much as we should care. -- Adam Olsen, aka Rhamphoryncus From eckhardt at satorlaser.com Fri Dec 12 10:10:13 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Fri, 12 Dec 2008 10:10:13 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <871vwe9mxj.fsf@xemacs.org> References: <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> Message-ID: <200812121010.13157.eckhardt@satorlaser.com> On Friday 12 December 2008, Stephen J. Turnbull wrote: > I gather that the BFDL's line on this thread of discussion is that > forcing programmers to think about encodings every time they call out > to the OS is unacceptable Exactly that is not necessary. for n in os.readdir('.'): f = open(n) if grep('foo', f): print('found "foo"!') Now, if you actually wanted to output the filename, you could never do so reliably anyway, because even though it is supposed to be text, the encoding isn't known. So, an archiving program will probably do something like this: try: for n in os.readdir(): b = n.encode('UTF-8') f = open(n) archive.write_file_header(b) archive.write_file(f) catch ... print "oops, couldn't decode file '%s'" % n.unicode(error='replace') If you're writing a filemanager, you would store the path alongside an approximated Unicode representation. > when most programs will work acceptably > almost all of the time with a rather naive approach. This means that > almost all Python programs will be technically broken for the > forseeable future, sorry, Ulrich. Actually, they are already broken, only that few people notice it. :| > And for the same pragmatic reasons, these functions are going to > return strings (ie, Unicode), not bytes, I expect. Sorry, Steve. > > What needs to be determined here is the best way to provide > reliability to those who will go to the effort of asking for it if > it's available. I don't think "just return bytes" fits the bill for > the reason above. > > What I would like to see is a type that is derived from string (so if > you present it to an API expecting string, it is silently treated as > string), but from which the original bytes can always be extracted on > request. I like that idea, this type would behave pretty much like the env_string I proposed. The main difference is that it does several implicit conversions where I personally would rather see explicit conversions. Other than that, I'm all for it. > If the original bytes cannot be sensibly decoded to a > string, then the string field in the object would either contain > something that should normally cause an error in a string API, or some > made-up string (presumably it would attempt to be a more or less > faithful representation of the bytes) at the caller's option. > Probably they'd also contain some metadata useful in guessing > encodings (the read time locale in particular). Well, I wouldn't provide an approximation. Considering the archiving software above, you would end up with a file name "" in an archive. For that kind of software, it would be fatal. But, and that is much more important than my preference, at least your approach would allow writing reliable software that properly handles such environment strings. Further, and that is where it differs from just returning bytes, it even makes it easy by the using a distinct type. Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From rhamph at gmail.com Fri Dec 12 10:19:14 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 12 Dec 2008 02:19:14 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812121011.09427.nd@perlig.de> References: <494213C8.7040809@gmail.com> <200812121011.09427.nd@perlig.de> Message-ID: On Fri, Dec 12, 2008 at 2:11 AM, Andr? Malo wrote: > * Adam Olsen wrote: > >> UTF-8 in percent encodings is becoming a defacto standard. Otherwise >> the browser has to display the percent escapes in the address bar, >> rather than the intended text. > > Duh! The address bar should contain the URL, which *is* the intended text. > The escapes are there for a reason. If I pass some octets using percent > escapes via the query string or request body, it's not text, not even > intended. It's still a collection of octets. Translating them back (and > forth when I press enter in the address bar) is a pretty ambigious > operation and therefore pretty wrong. > > The defacto standard does not exist. There's a real one instead: RFC 2396. All the heaps of people using non-english wikipedia sites might disagree with you. There's only, what, a few *million* pages that would be affected? It'd be very interesting if someone at Google could provide some statistics on URL encodings. -- Adam Olsen, aka Rhamphoryncus From p.f.moore at gmail.com Fri Dec 12 11:03:14 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 12 Dec 2008 10:03:14 +0000 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> Message-ID: <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> 2008/12/12 Sturla Molden : > Last month there was a discussion on Python-Dev regarding removal of > reference counting to remove the GIL. I hope you forgive me for continuing > the debate. [...] > Python could be better off doing what tcl does. Allow each process to > embed multiple interpreters; run each interpreter in its own thread. > Implement a fast message-passing system between the interpreters (e.g. > copy-on-write by making communicated objects immutable), and Python would > be closer to Erlang than Java. Too much to comment individually here, but I'd agree that message-passing approaches are a better model in general. Some specific points: 1. The Queue module gives the bones of a message-passing model, building something based on that is possible now (and may already exist). You have to do isolation by convention rather than having it enforced by the system, but that's OK for coding. (It doesn't help the "remove the GIL" debate, though). 2. I'd like to see isolation based on multiple interpreters, but the problem lies with extensions (and at a lower level with the Python C API) which wasn't designed with isolation in mind. Changing that may be nice, but it's probably too late (or if not, it's likely to be a lot of work to do it in a compatible manner). 3. Exposing multiple interpreters at the Python level would let most of this be done outside the core. But it may result in pure Python code being able to crash the application if not done carefully. And of course, the overriding points: - This needs to be done in a backward compatible manner (Python 3.0 is out now!) - A working patch is hugely more likely to make progress, as all the evidence shows that the current core developers don't find this issue important enough to spend their limited coding time on. Paul. From ncoghlan at gmail.com Fri Dec 12 11:07:54 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Dec 2008 20:07:54 +1000 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> Message-ID: <494237FA.7090500@gmail.com> Sturla Molden wrote: > Last month there was a discussion on Python-Dev regarding removal of > reference counting to remove the GIL. I hope you forgive me for continuing > the debate. Anything to do with removing the GIL/threads/whatever other core language feature someone doesn't like really belongs on c.l.p. or python-ideas rather than here. Ideas should be at least remotely feasible before they're brought to python-dev. That said, I'll bite anyway... Treating threads as communicating sequential processes (via the Queue module) actually makes them pretty easy to use correctly. They are then extraordinarily handy for performing multiple non-GIL bound tasks (such as IO operations or number crunching using an extension module like numpy) in parallel. For GIL bound tasks, switching from the threading module to the multiprocessing module now allows the activity to scale to multiple CPUs. Removing thread support merely because concurrent programming is hard (no matter how you do it) would be... odd (to say the least). Changing the underlying concurrency mechanism from threads to subinterpreters to processes to whole computers doesn't make understanding and coping with the concepts involved in concurrency any easier (and in fact will often make them harder to handle by increasing the communications latency). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Fri Dec 12 11:09:26 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Dec 2008 20:09:26 +1000 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> Message-ID: <49423856.30705@gmail.com> Paul Moore wrote: > 2. I'd like to see isolation based on multiple interpreters, but the > problem lies with extensions (and at a lower level with the Python C > API) which wasn't designed with isolation in mind. Changing that may > be nice, but it's probably too late (or if not, it's likely to be a > lot of work to do it in a compatible manner). Actually, I believe 3.0 already took a big step towards allowing this by changing the way modules are initialised. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From regebro at gmail.com Fri Dec 12 11:52:46 2008 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 12 Dec 2008 11:52:46 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> Message-ID: <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> On Fri, Dec 12, 2008 at 02:13, Sturla Molden wrote: > I genuinely think the use of threads should be discouraged. It leads to > code that are full of bugs and difficult to maintain - race conditions, > deadlocks, and livelocks are common pitfalls. The use of threads for load balancing should be discouraged, yes. That is not what they are designed for. Threads are designed to allow blocking processes to go on in the background without blocking the main process. This, they are very useful for. Removing thread support would therefore be a very big mistake. It's needed, it has it's uses, just not the one *you* want. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From sturla at molden.no Fri Dec 12 12:23:34 2008 From: sturla at molden.no (Sturla Molden) Date: Fri, 12 Dec 2008 12:23:34 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> Message-ID: <494249B6.6040206@molden.no> On 12/12/2008 11:52 AM, Lennart Regebro wrote: > The use of threads for load balancing should be discouraged, yes. That > is not what they are designed for. Threads are designed to allow > blocking processes to go on in the background without blocking the > main process. It seems that most programmers with Java or Windows experience don't understand this; hence the ever lasting GIL debate. With multiple interpreters - one interpreter per thread - this could still be accomplished. Let one interpreter block while another continues to work. Then the result of the blocking operation is messaged back. Multi-threaded C libraries could be used the in same way. But there would be no need for a GIL, because each interpreter would be a single-threaded compartment. .NET have something similar in what is called 'appdomains'. I am not suggesting removal of threads but rather the Java threading model. I just think it is a mistake to let multiple OS threads touch the same interpreter. Sturla Molden From vext01 at gmail.com Fri Dec 12 13:29:11 2008 From: vext01 at gmail.com (Edd Barrett) Date: Fri, 12 Dec 2008 12:29:11 +0000 Subject: [Python-Dev] Build failure on OpenBSD 4.4-current Message-ID: Hi, I just had to move the "extern lstat..." outside the "ifndef HAVE_LSTAT" to get python 2.6.1 to build on OpenBSD 4.4-current/i386. I'm not suggesting this is correct, but it fixes the build for my platform at least. --- Modules/posixmodule.c.orig Fri Dec 12 11:08:54 2008 +++ Modules/posixmodule.c Fri Dec 12 11:54:16 2008 @@ -208,10 +208,11 @@ #ifdef HAVE_SYMLINK extern int symlink(const char *, const char *); #endif /* HAVE_SYMLINK */ +#endif /* !HAVE_UNISTD_H */ + #ifdef HAVE_LSTAT extern int lstat(const char *, struct stat *); #endif /* HAVE_LSTAT */ -#endif /* !HAVE_UNISTD_H */ #endif /* !_MSC_VER */ Im using gcc-4.2 Thanks -- Best Regards Edd http://students.dec.bournemouth.ac.uk/ebarrett From solipsis at pitrou.net Fri Dec 12 14:06:35 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Dec 2008 13:06:35 +0000 (UTC) Subject: [Python-Dev] Python-3.0, unicode, and os.environ References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: Curt Hagenlocher hagenlocher.org> writes: > > > On Thu, Dec 11, 2008 at 10:19 PM, Adam Olsen gmail.com> wrote: > > > I doubt that UTF-16 is used very much (other than on windows). > > There's this other obscure platform called "Java"... ;) Does it have a filesystem? From solipsis at pitrou.net Fri Dec 12 14:17:36 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Dec 2008 13:17:36 +0000 (UTC) Subject: [Python-Dev] Build failure on OpenBSD 4.4-current References: Message-ID: Hello, Edd Barrett gmail.com> writes: > > I just had to move the "extern lstat..." outside the "ifndef > HAVE_LSTAT" to get python 2.6.1 to build on OpenBSD 4.4-current/i386. Could you please open an issue in http://bugs.python.org ? That way the problem is less likely to be overlooked. By the way, there are other bug entries regarding OpenBSD, at least one of them has a patch waiting for review: http://bugs.python.org/issue3920 Regards Antoine. From vext01 at gmail.com Fri Dec 12 15:12:38 2008 From: vext01 at gmail.com (Edd Barrett) Date: Fri, 12 Dec 2008 14:12:38 +0000 Subject: [Python-Dev] Build failure on OpenBSD 4.4-current In-Reply-To: References: Message-ID: Hi, On Fri, Dec 12, 2008 at 1:17 PM, Antoine Pitrou wrote: > Could you please open an issue in http://bugs.python.org ? That way the problem > is less likely to be overlooked. http://bugs.python.org/issue4639 Thanks -- Best Regards Edd http://students.dec.bournemouth.ac.uk/ebarrett From curt at hagenlocher.org Fri Dec 12 15:14:53 2008 From: curt at hagenlocher.org (Curt Hagenlocher) Date: Fri, 12 Dec 2008 06:14:53 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: On Fri, Dec 12, 2008 at 5:06 AM, Antoine Pitrou wrote: > > Curt Hagenlocher hagenlocher.org> writes: > > > There's this other obscure platform called "Java"... ;) > > Does it have a filesystem? No, but it also has to interact with filesystems of possibly invalid or indeterminate encodings. What does java.io do? -- Curt Hagenlocher curt at hagenlocher.org From solipsis at pitrou.net Fri Dec 12 15:19:30 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Dec 2008 14:19:30 +0000 (UTC) Subject: [Python-Dev] Python-3.0, unicode, and os.environ References: <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: Curt Hagenlocher hagenlocher.org> writes: > > No, but it also has to interact with filesystems of possibly invalid > or indeterminate encodings. What does java.io do? My point was that Python doesn't have to interact with the Java IO libraries, while it has to interact with the Unix and Windows IO APIs. From curt at hagenlocher.org Fri Dec 12 15:23:16 2008 From: curt at hagenlocher.org (Curt Hagenlocher) Date: Fri, 12 Dec 2008 06:23:16 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou wrote: > Curt Hagenlocher hagenlocher.org> writes: >> >> No, but it also has to interact with filesystems of possibly invalid >> or indeterminate encodings. What does java.io do? > > My point was that Python doesn't have to interact with the Java IO libraries, > while it has to interact with the Unix and Windows IO APIs. Of course. But the Java IO libraries have to interact with the Unix and Windows IO APIs as well. It might be interesting to know how they handle similar situations. -- Curt Hagenlocher curt at hagenlocher.org From lists at cheimes.de Fri Dec 12 15:50:13 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 12 Dec 2008 15:50:13 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <49423856.30705@gmail.com> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> Message-ID: Nick Coghlan schrieb: > Actually, I believe 3.0 already took a big step towards allowing this by > changing the way modules are initialised. You are believing correctly. Martin has designed and implemented a nicely working API to store extension module data per interpreter state. For now interpreter states are used for sub interpreters only. http://www.python.org/dev/peps/pep-3121/ Christian From scott+python-dev at scottdial.com Fri Dec 12 16:21:35 2008 From: scott+python-dev at scottdial.com (Scott Dial) Date: Fri, 12 Dec 2008 10:21:35 -0500 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: <4942817F.20202@scottdial.com> Curt Hagenlocher wrote: > On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou wrote: >> Curt Hagenlocher hagenlocher.org> writes: >>> No, but it also has to interact with filesystems of possibly invalid >>> or indeterminate encodings. What does java.io do? >> My point was that Python doesn't have to interact with the Java IO libraries, >> while it has to interact with the Unix and Windows IO APIs. > > Of course. But the Java IO libraries have to interact with the Unix > and Windows IO APIs as well. It might be interesting to know how they > handle similar situations. See the following email for a summary of existing practice (as of 2004): http://www.mail-archive.com/unicode at unicode.org/msg27352.html -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From regebro at gmail.com Fri Dec 12 17:39:33 2008 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 12 Dec 2008 17:39:33 +0100 Subject: [Python-Dev] 2to3 question about fix_imports. Message-ID: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> The fix_imports fix seems to fix only the first import per line that you have. So if you do for example import urllib2, cStringIO it will not fix cStringIO. Is this a bug or a feature? :-) If it's a feature it should warn at least, right? -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From victor.stinner at haypocalc.com Fri Dec 12 17:54:49 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Fri, 12 Dec 2008 17:54:49 +0100 Subject: [Python-Dev] 2to3 question about fix_imports. In-Reply-To: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> Message-ID: <200812121754.50123.victor.stinner@haypocalc.com> Le Friday 12 December 2008 17:39:33 Lennart Regebro, vous avez ?crit?: > The fix_imports fix seems to fix only the first import per line that you > have. So if you do for example > import urllib2, cStringIO > it will not fix cStringIO. > > Is this a bug or a feature? :-) I prefer to see that as a bug and so replace cStringIO by StringIO. So can you open an issue? -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From a.badger at gmail.com Fri Dec 12 17:56:19 2008 From: a.badger at gmail.com (Toshio Kuratomi) Date: Fri, 12 Dec 2008 08:56:19 -0800 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812101139.37301.eckhardt@satorlaser.com> <200812111019.16950.eckhardt@satorlaser.com> <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <494213C8.7040809@gmail.com> Message-ID: <494297B3.3000204@gmail.com> Adam Olsen wrote: > UTF-8 in percent encodings is becoming a defacto standard. Otherwise > the browser has to display the percent escapes in the address bar, > rather than the intended text. > > IOW, inconsistent behaviour is a bug, but translating into UTF-8 is not. ;) > > I think we should let this tangent drop because it's about bugs in firefox bug, not in python :-) -Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: OpenPGP digital signature URL: From theller at ctypes.org Fri Dec 12 18:32:07 2008 From: theller at ctypes.org (Thomas Heller) Date: Fri, 12 Dec 2008 18:32:07 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> Message-ID: Christian Heimes schrieb: > Nick Coghlan schrieb: >> Actually, I believe 3.0 already took a big step towards allowing this by >> changing the way modules are initialised. > > You are believing correctly. Martin has designed and implemented a > nicely working API to store extension module data per interpreter state. > For now interpreter states are used for sub interpreters only. > > http://www.python.org/dev/peps/pep-3121/ But the extension modules still have to changed to use this mechanism, right? -- Thanks, Thomas From regebro at gmail.com Fri Dec 12 19:10:14 2008 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 12 Dec 2008 19:10:14 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <494249B6.6040206@molden.no> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> <494249B6.6040206@molden.no> Message-ID: <319e029f0812121010p8dd97b9t8ccde78c037a42c2@mail.gmail.com> On Fri, Dec 12, 2008 at 12:23, Sturla Molden wrote: > It seems that most programmers with Java or Windows experience don't > understand this; hence the ever lasting GIL debate. Yes. Maybe writing this with big letters in the thread module docs would help? > I am not suggesting removal of threads but rather the Java threading model. > I just think it is a mistake to let multiple OS threads touch the same > interpreter. Does Python have a java threading model? I don't know java well enough to know what that is. :) -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From regebro at gmail.com Fri Dec 12 19:21:31 2008 From: regebro at gmail.com (Lennart Regebro) Date: Fri, 12 Dec 2008 19:21:31 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <4942817F.20202@scottdial.com> References: <4941F9A5.5040704@gmail.com> <4942817F.20202@scottdial.com> Message-ID: <319e029f0812121021v9214e89n506da07d347839a0@mail.gmail.com> On Fri, Dec 12, 2008 at 16:21, Scott Dial wrote: > See the following email for a summary of existing practice (as of 2004): > > http://www.mail-archive.com/unicode at unicode.org/msg27352.html Interesting. Quite a lot of them do just drop the undecodable filenames. The Java solution with replacing it seems to be a better idea at first glance, but what if you then end up with two filenames that are the same? Possibly replacing with the character is a good idea to notify that the file is there, but fail then fail to open it. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From status at bugs.python.org Fri Dec 12 18:06:44 2008 From: status at bugs.python.org (Python tracker) Date: Fri, 12 Dec 2008 18:06:44 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20081212170644.9951B785BC@psf.upfronthosting.co.za> ACTIVITY SUMMARY (12/05/08 - 12/12/08) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2261 open (+58) / 14206 closed (+37) / 16467 total (+95) Open issues with patches: 763 Average duration of open issues: 699 days. Median duration of open issues: 2499 days. Open Issues Breakdown open 2242 (+57) pending 19 ( +1) Issues Created Or Reopened (97) _______________________________ Remove mimetools usage from the stdlib 12/06/08 http://bugs.python.org/issue2848 reopened brett.cannon patch improve linecache: reuse tokenize.detect_encoding() and io.open( 12/12/08 http://bugs.python.org/issue4016 reopened benjamin.peterson patch Deprecated python 2.x syntax in "HOWTO Use Python in the web" 12/05/08 CLOSED http://bugs.python.org/issue4550 created jcsalterego patch The python 2.6.1 source distribution is missing Doc/tools/sphinx 12/05/08 CLOSED http://bugs.python.org/issue4551 created andreask Doc/tools/sphinxext not included in the 2.6.1 tarball 12/05/08 CLOSED http://bugs.python.org/issue4552 created doko Results from os.path.islink and os.stat S_ISLNK do not match 12/05/08 CLOSED http://bugs.python.org/issue4553 created npatters Missing make altframeworkinstall for Mac OS X 12/06/08 http://bugs.python.org/issue4554 created christian.heimes Smelly exports 12/06/08 http://bugs.python.org/issue4555 created christian.heimes cmp() function erroneously noted as gone in "What's New" 12/06/08 CLOSED http://bugs.python.org/issue4556 created mwatkins array('c') in python 3.0 produces error, doc says it is ok 12/06/08 CLOSED http://bugs.python.org/issue4557 created lopgok with_stdc89 12/06/08 http://bugs.python.org/issue4558 created christian.heimes patch, patch Whats new recommendation error 12/06/08 CLOSED http://bugs.python.org/issue4559 created lregebro "Flouted", not "flaunted" 12/06/08 CLOSED http://bugs.python.org/issue4560 created jdf Optimize new io library 12/06/08 http://bugs.python.org/issue4561 created christian.heimes patch zip() documentation was not updated 12/06/08 CLOSED http://bugs.python.org/issue4562 created mchouza Wrong formatting of contributor list in About page 12/06/08 http://bugs.python.org/issue4563 created salty-horse bytearray.fromhex doesn't respect bytearray subclass 12/06/08 CLOSED http://bugs.python.org/issue4564 created pitrou io write() performance very slow 12/06/08 http://bugs.python.org/issue4565 created ialbert 2.6.1 breaks many applications that embed Python on Windows 12/06/08 http://bugs.python.org/issue4566 created craigh Registry key not set if unattended installation used 12/06/08 http://bugs.python.org/issue4567 created stuaxo Improved optparse "varargs" callback example 12/06/08 http://bugs.python.org/issue4568 created gregg.lind patch Segfault when mutating a memoryview to an array.array 12/07/08 CLOSED http://bugs.python.org/issue4569 created pitrou Bad example in set tutorial 12/07/08 CLOSED http://bugs.python.org/issue4570 created jmarter write to stdout in binary mode - is it possible? 12/07/08 CLOSED http://bugs.python.org/issue4571 created lopgok add SEEK_* values to io and/or io.IOBase 12/07/08 http://bugs.python.org/issue4572 created gumpy zsh-style subpattern matching for fnmatch/glob 12/07/08 http://bugs.python.org/issue4573 created erickt patch reading UTF16-encoded text file crashes if \r on 64-char boundar 12/07/08 http://bugs.python.org/issue4574 created sjmachin patch Py_IS_INFINITY defect causes test_cmath failure on x86 12/07/08 http://bugs.python.org/issue4575 created marketdickinson patch "Defining new types" little outdated 12/07/08 CLOSED http://bugs.python.org/issue4576 created exe distutils: -3 warnings (apply) 12/07/08 http://bugs.python.org/issue4577 created srittau patch compiler: -3 warnings 12/07/08 http://bugs.python.org/issue4578 created srittau patch .read() and .readline() differ in failing 12/07/08 http://bugs.python.org/issue4579 created eggy patch, needs review slicing of memoryviews when itemsize != 1 is wrong 12/07/08 http://bugs.python.org/issue4580 created pitrou patch, needs review failed to import module from lib-dynload 12/07/08 CLOSED http://bugs.python.org/issue4581 created legerf type of __builtins__ changes if in main module or not 12/07/08 CLOSED http://bugs.python.org/issue4582 created nnorwitz segfault when mutating memoryview to array.array when array is r 12/07/08 http://bugs.python.org/issue4583 created gumpy doctest fails to display bytes type 12/07/08 CLOSED http://bugs.python.org/issue4584 created msyang Build failure on OS X 10.5.5: make: *** [sharedmods] Error 1 12/07/08 CLOSED http://bugs.python.org/issue4585 created marketdickinson "Extending Embedded Python" documention uses removed Py_InitModu 12/07/08 CLOSED http://bugs.python.org/issue4586 created blakemadden Need to rework the dbm lib/include selection process 12/08/08 http://bugs.python.org/issue4587 created skip.montanaro patch, needs review Need a way to make my own bytes 12/08/08 CLOSED http://bugs.python.org/issue4588 created lopgok 'with' loses ->bool exceptions 12/08/08 CLOSED http://bugs.python.org/issue4589 created jyasskin patch 2to3 strips trailing L for long iterals in two fixers 12/08/08 CLOSED http://bugs.python.org/issue4590 created aronacher patch, needs review 32-bits unsigned user/group identifier 12/08/08 http://bugs.python.org/issue4591 created sjoerd patch, needs review Embedding example does not add created module 12/08/08 CLOSED http://bugs.python.org/issue4592 created blakemadden patch, needs review Documentation for multiprocessing - Pool.apply() 12/08/08 http://bugs.python.org/issue4593 created beazley easy Can't compile with -O3, on ARM, with gcc 3.4.4 12/08/08 http://bugs.python.org/issue4594 created metageek new types example is out of date 12/08/08 http://bugs.python.org/issue4595 created blakemadden 2to3 does not fail as early as possible. 12/08/08 http://bugs.python.org/issue4596 created LambertDW EvalFrameEx fails to set 'why' for some exceptions 12/10/08 CLOSED http://bugs.python.org/issue4597 reopened amaury.forgeotdarc patch IDLE not opening 12/08/08 CLOSED http://bugs.python.org/issue4598 created ec2929 Strings undisplayable with repr 12/08/08 CLOSED http://bugs.python.org/issue4599 created mfoord __class__ assignment: new-style? heap? == confusing 12/08/08 http://bugs.python.org/issue4600 created tjreedy directory permission error with make install in 3.0 12/08/08 http://bugs.python.org/issue4601 created legerf patch 2to3 drops executable bit with --write 12/08/08 CLOSED http://bugs.python.org/issue4602 created dato patch 3.0 document tab interpretation change 12/08/08 http://bugs.python.org/issue4603 created tjreedy close() seems to have limited effect 12/09/08 http://bugs.python.org/issue4604 created skip.montanaro patch 3.0 documentation mentions using maketrans from within the strin 12/09/08 http://bugs.python.org/issue4605 created suicideducky Passing 'None' if argtype is set to POINTER(...) doesn't always 12/09/08 http://bugs.python.org/issue4606 created robertluce patch uuid behavior with multiple threads 12/09/08 http://bugs.python.org/issue4607 created mortenab urllib.request.urlopen does not return an iterable object 12/09/08 http://bugs.python.org/issue4608 created jwilk Allow use of > 256 FD's on solaris in 32 bit mode 12/09/08 http://bugs.python.org/issue4609 created pajs at fodder.org.uk Unicode case mappings are incorrect 12/09/08 http://bugs.python.org/issue4610 created alexs Small error in "Extending Python with C or C++" 12/09/08 http://bugs.python.org/issue4611 created jakamkon PyModule_Create() doesn't add/import module 12/09/08 CLOSED http://bugs.python.org/issue4612 created blakemadden Can't figure out where SyntaxError: can not delete variable 'x' 12/09/08 http://bugs.python.org/issue4613 created marduk patch Document PyModule_Create() 12/09/08 http://bugs.python.org/issue4614 created brett.cannon needs review de-duping function in itertools 12/10/08 http://bugs.python.org/issue4615 created thomaspinckney3 tarfile does not set the creation date and time of the extracted 12/10/08 CLOSED http://bugs.python.org/issue4616 created throbi SyntaxError when free variable name is also an exception target 12/10/08 http://bugs.python.org/issue4617 created amaury.forgeotdarc patch, needs review print_function and unicode_literals don't work together 12/10/08 http://bugs.python.org/issue4618 created exarkun Invalid Behaviour When a Default Argument is a Mutable Object 12/10/08 CLOSED http://bugs.python.org/issue4619 created rhr Memory leak with datetime used with time.strptime 12/10/08 CLOSED http://bugs.python.org/issue4620 created sebegue zipfile returns string but expects binary 12/10/08 http://bugs.python.org/issue4621 created francescor SequenceMatcher bug with long sequences 12/10/08 http://bugs.python.org/issue4622 created eliben IDLE shutdown if I run an edited file contains chinese 12/11/08 http://bugs.python.org/issue4623 created bianpeng Can not import readline on python3.0 (ubuntu 8.04) 12/11/08 CLOSED http://bugs.python.org/issue4624 created xxiao IDLE won't open anymore, .idlerc unaccessible 12/11/08 http://bugs.python.org/issue4625 created skcheng compile() doesn't ignore the source encoding when a string is pa 12/11/08 http://bugs.python.org/issue4626 created brett.cannon Add Mac OS X Disk Images to Python.org homepage 12/11/08 http://bugs.python.org/issue4627 created carlj No universal newline support for compile() when using bytes 12/11/08 http://bugs.python.org/issue4628 created brett.cannon getopt should not accept no_argument that ends with '=' 12/11/08 http://bugs.python.org/issue4629 created wangchun patch IDLE no longer respects .Xdefaults insertOffTime 12/11/08 http://bugs.python.org/issue4630 created mark urlopen returns extra, spurious bytes 12/11/08 http://bugs.python.org/issue4631 created dato Wrong fix for range(42)[::-1] 12/11/08 CLOSED http://bugs.python.org/issue4632 created theller file.tell() gives wrong result 12/11/08 CLOSED http://bugs.python.org/issue4633 created yavuz164 2to3 should fix "import HTMLParser" 12/11/08 CLOSED http://bugs.python.org/issue4634 created mastrodomenico no reference for optparse methods 12/11/08 http://bugs.python.org/issue4635 created techtonik bdist_wininst installer with install script raises exception 12/11/08 http://bugs.python.org/issue4636 created theller Binary floating point and decimal floating point arithmetic 12/11/08 CLOSED http://bugs.python.org/issue4637 created Retro 1 is 1 is allways true while 1.0 is 1.0 may sometimes be true 12/12/08 CLOSED http://bugs.python.org/issue4638 created nassrat Build failure on OpenBSD 4.4-current regarding lstat() 12/12/08 http://bugs.python.org/issue4639 created vext01 optparse - dosn't distinguish between '--option' and '-option' 12/12/08 http://bugs.python.org/issue4640 created kszawala optparse - dosn't distinguish between '--option' and '-option' 12/12/08 http://bugs.python.org/issue4641 created kszawala optparse - dosn't distinguish between '--option' and '-option' 12/12/08 http://bugs.python.org/issue4642 created kszawala cgitb.html fails if getattr call raises exception 12/12/08 http://bugs.python.org/issue4643 created amc1 Minor documentation fault in 2to3 script 12/12/08 http://bugs.python.org/issue4644 created amc1 Issues Now Closed (74) ______________________ gdbm/ndbm 1.8.1+ needs libgdbm_compat.so 449 days http://bugs.python.org/issue1167 ocean-city patch Victor Stinner's GMP patch for longs 328 days http://bugs.python.org/issue1814 marketdickinson patch Python fails silently on bad locale 291 days http://bugs.python.org/issue2173 marketdickinson patch Full precision summation 214 days http://bugs.python.org/issue2819 marketdickinson patch Incorrect rounding in floating-point operations with gcc/x87 198 days http://bugs.python.org/issue2937 marketdickinson math test fails on Solaris 10 173 days http://bugs.python.org/issue3167 marketdickinson patch Multiprocessing Array and sharedctypes.Array error in docs/imple 165 days http://bugs.python.org/issue3206 amaury.forgeotdarc patch BufferedWriter not thread-safe 129 days http://bugs.python.org/issue3476 wplappert patch expm1 missing 123 days http://bugs.python.org/issue3501 marketdickinson test_math: math.log(-ninf) fails to raise exception on OpenBSD 108 days http://bugs.python.org/issue3682 marketdickinson math.log(x, 10) gives different result than math.log10(x) 98 days http://bugs.python.org/issue3724 marketdickinson patch _lsprof: clear() should call flush_unmatched() 79 days http://bugs.python.org/issue3952 haypo patch tokenize.detect_encoding(): raise SyntaxError on codecs.lookup() 70 days http://bugs.python.org/issue4021 benjamin.peterson patch, patch, needs review Decimal.max(NaN, x) gives incorrect results when x is finite and 63 days http://bugs.python.org/issue4084 marketdickinson patch ihooks incompatible with absolute_import feature 35 days http://bugs.python.org/issue4244 georg.brandl Update pydoc URLs 36 days http://bugs.python.org/issue4259 loewis smtplib.py initialisation defect 28 days http://bugs.python.org/issue4302 ocean-city patch state_reset not called on 'state' before sre_search invoked 16 days http://bugs.python.org/issue4416 amaury.forgeotdarc String allocations waste 3 bytes of memory on average. 9 days http://bugs.python.org/issue4445 marketdickinson patch __import__ documentation obsolete 9 days http://bugs.python.org/issue4457 georg.brandl patch parameters of PyLong_FromString() are not checked for NULL 6 days http://bugs.python.org/issue4461 marketdickinson patch Windows installer crash 4 days http://bugs.python.org/issue4481 tjreedy Error to build _dbm module during make 6 days http://bugs.python.org/issue4483 skip.montanaro patch, easy Python Documentation not Newb Friendly 4 days http://bugs.python.org/issue4488 georg.brandl Compiler warnings in longobject.c 3 days http://bugs.python.org/issue4497 marketdickinson patch 3.0 test failure on Mac OS X 10.5.5 2 days http://bugs.python.org/issue4507 marketdickinson Decorators should have an index entry 2 days http://bugs.python.org/issue4511 georg.brandl problem with str.join - should work with list input, error says 1 days http://bugs.python.org/issue4534 lopgok webbrowser.UnixBrowser should use builtins.open 1 days http://bugs.python.org/issue4537 amaury.forgeotdarc A defect in -bool exceptions 3 days http://bugs.python.org/issue4589 amaury.forgeotdarc patch 2to3 strips trailing L for long iterals in two fixers 1 days http://bugs.python.org/issue4590 aronacher patch, needs review Embedding example does not add created module 1 days http://bugs.python.org/issue4592 georg.brandl patch, needs review EvalFrameEx fails to set 'why' for some exceptions 0 days http://bugs.python.org/issue4597 jyasskin patch IDLE not opening 2 days http://bugs.python.org/issue4598 loewis Strings undisplayable with repr 0 days http://bugs.python.org/issue4599 loewis 2to3 drops executable bit with --write 3 days http://bugs.python.org/issue4602 benjamin.peterson patch PyModule_Create() doesn't add/import module 0 days http://bugs.python.org/issue4612 amaury.forgeotdarc tarfile does not set the creation date and time of the extracted 2 days http://bugs.python.org/issue4616 lars.gustaebel Invalid Behaviour When a Default Argument is a Mutable Object 0 days http://bugs.python.org/issue4619 loewis Memory leak with datetime used with time.strptime 1 days http://bugs.python.org/issue4620 skip.montanaro Can not import readline on python3.0 (ubuntu 8.04) 0 days http://bugs.python.org/issue4624 benjamin.peterson Wrong fix for range(42)[::-1] 0 days http://bugs.python.org/issue4632 benjamin.peterson file.tell() gives wrong result 0 days http://bugs.python.org/issue4633 QuantumTim 2to3 should fix "import HTMLParser" 0 days http://bugs.python.org/issue4634 benjamin.peterson Binary floating point and decimal floating point arithmetic 0 days http://bugs.python.org/issue4637 gvanrossum 1 is 1 is allways true while 1.0 is 1.0 may sometimes be true 0 days http://bugs.python.org/issue4638 tim_one Proto 2 pickle vs dict subclass 1873 days http://bugs.python.org/issue826897 benjamin.peterson Python interpreter stalled on _PyPclose.WaitForSingleObject 1708 days http://bugs.python.org/issue928332 amaury.forgeotdarc Fix for #777597 - socketmodule.c connection handling incorec 1647 days http://bugs.python.org/issue965036 amaury.forgeotdarc patch distutils' dry-run wants to create some real build dirs 1545 days http://bugs.python.org/issue1030250 amaury.forgeotdarc patch correct/clarify documentation for super 1363 days http://bugs.python.org/issue1163367 rhettinger sys.settrace cause curried parms to show up as attributes 796 days http://bugs.python.org/issue1569356 loewis thread + import => crashes? 292 days http://bugs.python.org/issue1720705 forest Top Issues Most Discussed (10) ______________________________ 42 Get rid of more refercenes to __cmp__ 57 days open http://bugs.python.org/issue1717 23 slicing of memoryviews when itemsize != 1 is wrong 5 days open http://bugs.python.org/issue4580 17 Make conversions from long to float correctly rounded. 174 days open http://bugs.python.org/issue3166 13 Optimize new io library 6 days open http://bugs.python.org/issue4561 12 bugs in array.array with exports (buffer protocol) 9 days open http://bugs.python.org/issue4509 11 Whats new recommendation error 1 days closed http://bugs.python.org/issue4559 11 Error to build _dbm module during make 6 days closed http://bugs.python.org/issue4483 10 with_stdc89 7 days open http://bugs.python.org/issue4558 9 tarfile does not set the creation date and time of the extracte 2 days closed http://bugs.python.org/issue4616 9 'with' loses ->bool exceptions 3 days closed http://bugs.python.org/issue4589 From glyph at divmod.com Fri Dec 12 21:54:07 2008 From: glyph at divmod.com (glyph at divmod.com) Date: Fri, 12 Dec 2008 20:54:07 -0000 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <494103FD.5000101@holdenweb.com> <871vwe9mxj.fsf@xemacs.org> <4941F9A5.5040704@gmail.com> Message-ID: <20081212205407.12555.311547571.divmod.xquotient.2122@weber.divmod.com> On 02:23 pm, curt at hagenlocher.org wrote: >On Fri, Dec 12, 2008 at 6:19 AM, Antoine Pitrou >wrote: >>Curt Hagenlocher hagenlocher.org> writes: >>> >>>No, but it also has to interact with filesystems of possibly invalid >>>or indeterminate encodings. What does java.io do? >> >>My point was that Python doesn't have to interact with the Java IO >>libraries, >>while it has to interact with the Unix and Windows IO APIs. > >Of course. But the Java IO libraries have to interact with the Unix >and Windows IO APIs as well. It might be interesting to know how they >handle similar situations. Apparently Java has the facilities to do the right thing, but actually it's just broken. My locale says UTF-8. However, if I create a non-decodable file with Python (2), there are three ways I can tell Java to open it: I can ask for it with a string (that won't work, because no valid UTF-8 string maps to an undecodable string, pretty much by definition). I can list the directory that it's in (presuming that *that's* a directory) and get a java.io.File, which could be retaining all the interesting information, or I can use a URI, which is a string that resolves to octets before it resolves to characters again. However, it looks like Java screws up in every case. Here's a transcript from the ever-helpful jython: glyph at nhuvasarim:~/tmp$ python Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52) [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>>file("\xff\xff", "wb").write("lolz\n") glyph at nhuvasarim:~/tmp$ jython Jython 2.2.1 on java1.6.0_07 Type "copyright", "credits" or "license" for more information. >>>from java.io import File >>>fileList = File(".").listFiles() >>>fileList array(java.io.File,[./ >>>fileList[0].__class__ >>>from java.io import FileReader >>>FileReader(fileList[0]) Traceback (innermost last): File "", line 1, in ? at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:106) at java.io.FileReader.(FileReader.java:55) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) java.io.FileNotFoundException: java.io.FileNotFoundException: ./?FD?FD (No such file or directory) >>>from java.net import URI >>>u = URI("file:///home/glyph/tmp/%ff%ff") >>>FileReader(File(u)) Traceback (innermost last): File "", line 1, in ? at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:106) at java.io.FileReader.(FileReader.java:55) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) java.io.FileNotFoundException: java.io.FileNotFoundException: /home/glyph/tmp/?FD?FD (No such file or directory) >>> From ncoghlan at gmail.com Fri Dec 12 22:34:01 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 13 Dec 2008 07:34:01 +1000 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> Message-ID: <4942D8C9.5080203@gmail.com> Thomas Heller wrote: > Christian Heimes schrieb: >> Nick Coghlan schrieb: >>> Actually, I believe 3.0 already took a big step towards allowing this by >>> changing the way modules are initialised. >> You are believing correctly. Martin has designed and implemented a >> nicely working API to store extension module data per interpreter state. >> For now interpreter states are used for sub interpreters only. >> >> http://www.python.org/dev/peps/pep-3121/ > > But the extension modules still have to changed to use this mechanism, right? Yep, but at least it's *possible* now. With 2.x, it isn't possible for an extension module to support subinterpreters properly, even if they want to. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From nd at perlig.de Sat Dec 13 05:47:47 2008 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Sat, 13 Dec 2008 05:47:47 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812121011.09427.nd@perlig.de> Message-ID: <200812130547.47351@news.perlig.de> * Adam Olsen wrote: > On Fri, Dec 12, 2008 at 2:11 AM, Andr? Malo wrote: > > * Adam Olsen wrote: > >> UTF-8 in percent encodings is becoming a defacto standard. Otherwise > >> the browser has to display the percent escapes in the address bar, > >> rather than the intended text. > > > > Duh! The address bar should contain the URL, which *is* the intended > > text. The escapes are there for a reason. If I pass some octets using > > percent escapes via the query string or request body, it's not text, > > not even intended. It's still a collection of octets. Translating them > > back (and forth when I press enter in the address bar) is a pretty > > ambigious operation and therefore pretty wrong. > > > > The defacto standard does not exist. There's a real one instead: RFC > > 2396. > > All the heaps of people using non-english wikipedia sites might > disagree with you. There's only, what, a few *million* pages that > would be affected? I'm not sure what you're trying to pull here. Is that supposed to be an argument? There's no page affected at all. It's a browser UI issue, not a page issue. And even if it were interesting at all, how the URL escapes are displayed in the address bar, those millions of people would favourite KOI8-R or Big 5 over UTF-8 if you would ask them. Which leads to the exact point: The browser cannot know, nor should it even. It's opaque. The only entity which needs to understand the encoding of URL percent escapes in query or request body is the *server* selecting the resource. But I'm sure I'm not telling you any news here. nd -- "Das Verhalten von Gates hatte mir bewiesen, dass ich auf ihn und seine beiden Gef?hrten nicht zu z?hlen brauchte" -- Karl May, "Winnetou III" Im Westen was neues: From rhamph at gmail.com Sat Dec 13 07:12:47 2008 From: rhamph at gmail.com (Adam Olsen) Date: Fri, 12 Dec 2008 23:12:47 -0700 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <200812130547.47351@news.perlig.de> References: <200812121011.09427.nd@perlig.de> <200812130547.47351@news.perlig.de> Message-ID: On Fri, Dec 12, 2008 at 9:47 PM, Andr? Malo wrote: > * Adam Olsen wrote: >> On Fri, Dec 12, 2008 at 2:11 AM, Andr? Malo wrote: >> > * Adam Olsen wrote: >> >> UTF-8 in percent encodings is becoming a defacto standard. Otherwise >> >> the browser has to display the percent escapes in the address bar, >> >> rather than the intended text. >> > >> > Duh! The address bar should contain the URL, which *is* the intended >> > text. The escapes are there for a reason. If I pass some octets using >> > percent escapes via the query string or request body, it's not text, >> > not even intended. It's still a collection of octets. Translating them >> > back (and forth when I press enter in the address bar) is a pretty >> > ambigious operation and therefore pretty wrong. >> > >> > The defacto standard does not exist. There's a real one instead: RFC >> > 2396. >> >> All the heaps of people using non-english wikipedia sites might >> disagree with you. There's only, what, a few *million* pages that >> would be affected? > > I'm not sure what you're trying to pull here. Is that supposed to be an > argument? There's no page affected at all. It's a browser UI issue, not a > page issue. > > And even if it were interesting at all, how the URL escapes are displayed in > the address bar, those millions of people would favourite KOI8-R or Big 5 > over UTF-8 if you would ask them. > > Which leads to the exact point: The browser cannot know, nor should it even. > It's opaque. The only entity which needs to understand the encoding of URL > percent escapes in query or request body is the *server* selecting the > resource. > > But I'm sure I'm not telling you any news here. You're arguing that text should be an opaque entity.. We've wasted enough of everybody's time on this already, I'm not going to continue on this thread. Send me a private email if you think it's really important. -- Adam Olsen, aka Rhamphoryncus From nd at perlig.de Sat Dec 13 07:34:06 2008 From: nd at perlig.de (=?iso-8859-1?q?Andr=E9_Malo?=) Date: Sat, 13 Dec 2008 07:34:06 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812130547.47351@news.perlig.de> Message-ID: <200812130734.06774@news.perlig.de> * Adam Olsen wrote: > On Fri, Dec 12, 2008 at 9:47 PM, Andr? Malo wrote: > > * Adam Olsen wrote: > >> On Fri, Dec 12, 2008 at 2:11 AM, Andr? Malo wrote: > >> > * Adam Olsen wrote: > >> >> UTF-8 in percent encodings is becoming a defacto standard. > >> >> Otherwise the browser has to display the percent escapes in the > >> >> address bar, rather than the intended text. > >> > > >> > Duh! The address bar should contain the URL, which *is* the intended > >> > text. The escapes are there for a reason. If I pass some octets > >> > using percent escapes via the query string or request body, it's not > >> > text, not even intended. It's still a collection of octets. > >> > Translating them back (and forth when I press enter in the address > >> > bar) is a pretty ambigious operation and therefore pretty wrong. > >> > > >> > The defacto standard does not exist. There's a real one instead: RFC > >> > 2396. > >> > >> All the heaps of people using non-english wikipedia sites might > >> disagree with you. There's only, what, a few *million* pages that > >> would be affected? > > > > I'm not sure what you're trying to pull here. Is that supposed to be an > > argument? There's no page affected at all. It's a browser UI issue, not > > a page issue. > > > > And even if it were interesting at all, how the URL escapes are > > displayed in the address bar, those millions of people would favourite > > KOI8-R or Big 5 over UTF-8 if you would ask them. > > > > Which leads to the exact point: The browser cannot know, nor should it > > even. It's opaque. The only entity which needs to understand the > > encoding of URL percent escapes in query or request body is the > > *server* selecting the resource. > > > > But I'm sure I'm not telling you any news here. > > You're arguing that text should be an opaque entity.. No, actually I'm not. I'm arguing that escapes are opaque. > We've wasted enough of everybody's time on this already, I'm not going > to continue on this thread. Agreed. nd -- Da f?llt mir ein, wieso gibt es eigentlich in Unicode kein "i" mit einem Herzchen als T?pfelchen? Das w?r sooo s??ss! -- Bj?rn H?hrmann in darw From lie.1296 at gmail.com Sat Dec 13 08:57:28 2008 From: lie.1296 at gmail.com (Lie Ryan) Date: Sat, 13 Dec 2008 07:57:28 +0000 (UTC) Subject: [Python-Dev] Psyco for -OO or -O Message-ID: I'm sure probably most of you knows about psyco[1], the optimizer. Python has an -O and -OO flag that is intended to be optimization flag, but we know that currently it doesn't do much. Why not add psyco as standard library and let -O or -OO invoke psyco? [1] http://psyco.sourceforge.net/index.html From fuzzyman at voidspace.org.uk Sat Dec 13 14:28:37 2008 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 13 Dec 2008 13:28:37 +0000 Subject: [Python-Dev] Psyco for -OO or -O In-Reply-To: References: Message-ID: <4943B885.1070605@voidspace.org.uk> Lie Ryan wrote: > I'm sure probably most of you knows about psyco[1], the optimizer. Python > has an -O and -OO flag that is intended to be optimization flag, but we > know that currently it doesn't do much. Why not add psyco as standard > library and let -O or -OO invoke psyco? > This really belongs on Python-ideas and not Python-dev. The main reason why not is that someone(s) from the Python core team would then need to 'own' maintaining Psyco (which is x86 only as well). Psyco is so hard to maintain that even the original author wants to drop it. :-) Michael Foord > [1] http://psyco.sourceforge.net/index.html > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk > -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From fuzzyman at voidspace.org.uk Sat Dec 13 14:32:36 2008 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 13 Dec 2008 13:32:36 +0000 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> Message-ID: <4943B974.6020407@voidspace.org.uk> Lennart Regebro wrote: > On Fri, Dec 12, 2008 at 02:13, Sturla Molden wrote: > >> I genuinely think the use of threads should be discouraged. It leads to >> code that are full of bugs and difficult to maintain - race conditions, >> deadlocks, and livelocks are common pitfalls. >> > > The use of threads for load balancing should be discouraged, yes. That > is not what they are designed for. Threads are designed to allow > blocking processes to go on in the background without blocking the > main process. This, they are very useful for. Removing thread support > would therefore be a very big mistake. It's needed, it has it's uses, > just not the one *you* want. > > That's an interesting assertion about what threads were designed for. Do you have anything to back it up? Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From lie.1296 at gmail.com Sat Dec 13 16:19:47 2008 From: lie.1296 at gmail.com (Lie Ryan) Date: Sat, 13 Dec 2008 15:19:47 +0000 (UTC) Subject: [Python-Dev] Psyco for -OO or -O References: <4943B885.1070605@voidspace.org.uk> Message-ID: On Sat, 13 Dec 2008 13:28:37 +0000, Michael Foord wrote: > Lie Ryan wrote: >> I'm sure probably most of you knows about psyco[1], the optimizer. >> Python has an -O and -OO flag that is intended to be optimization flag, >> but we know that currently it doesn't do much. Why not add psyco as >> standard library and let -O or -OO invoke psyco? >> >> > This really belongs on Python-ideas and not Python-dev. Ah yes, sorry about that, I'm new here. This will be my last post about this here... From guido at python.org Sat Dec 13 17:14:57 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Dec 2008 08:14:57 -0800 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <4943B974.6020407@voidspace.org.uk> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> <4943B974.6020407@voidspace.org.uk> Message-ID: Yes, this is what threads were designed for. As an abstraction to have multiple "threads of control" on a *single* processor (in a single process). The whole multi-core business came decades later. (Classic multi-processors have something called threads too, but they, too, came later than the original single-core-single-CPU thread concept, and often threads on those systems have properties that don't match how threads work on modern multi-core CPUs.) On Sat, Dec 13, 2008 at 5:32 AM, Michael Foord wrote: > Lennart Regebro wrote: >> >> On Fri, Dec 12, 2008 at 02:13, Sturla Molden wrote: >> >>> >>> I genuinely think the use of threads should be discouraged. It leads to >>> code that are full of bugs and difficult to maintain - race conditions, >>> deadlocks, and livelocks are common pitfalls. >>> >> >> The use of threads for load balancing should be discouraged, yes. That >> is not what they are designed for. Threads are designed to allow >> blocking processes to go on in the background without blocking the >> main process. This, they are very useful for. Removing thread support >> would therefore be a very big mistake. It's needed, it has it's uses, >> just not the one *you* want. >> >> > > That's an interesting assertion about what threads were designed for. Do you > have anything to back it up? > > Michael > > -- > http://www.ironpythoninaction.com/ > http://www.voidspace.org.uk/blog > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steve at holdenweb.com Sat Dec 13 17:57:44 2008 From: steve at holdenweb.com (Steve Holden) Date: Sat, 13 Dec 2008 11:57:44 -0500 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> <4943B974.6020407@voidspace.org.uk> Message-ID: If I remember correctly (when threading was invented in the mid-1980s) threads were originally described as "lightweight processes". The perceived advantage at the time was the ability to have multiple threads of control with shared memory: this was much faster than the available inter-process communication mechanisms. On a single-processor computer synchronization was much less of a problem. regards Steve Guido van Rossum wrote: > Yes, this is what threads were designed for. As an abstraction to have > multiple "threads of control" on a *single* processor (in a single > process). The whole multi-core business came decades later. (Classic > multi-processors have something called threads too, but they, too, > came later than the original single-core-single-CPU thread concept, > and often threads on those systems have properties that don't match > how threads work on modern multi-core CPUs.) > > On Sat, Dec 13, 2008 at 5:32 AM, Michael Foord > wrote: >> Lennart Regebro wrote: >>> On Fri, Dec 12, 2008 at 02:13, Sturla Molden wrote: >>> >>>> I genuinely think the use of threads should be discouraged. It leads to >>>> code that are full of bugs and difficult to maintain - race conditions, >>>> deadlocks, and livelocks are common pitfalls. >>>> >>> The use of threads for load balancing should be discouraged, yes. That >>> is not what they are designed for. Threads are designed to allow >>> blocking processes to go on in the background without blocking the >>> main process. This, they are very useful for. Removing thread support >>> would therefore be a very big mistake. It's needed, it has it's uses, >>> just not the one *you* want. >>> >>> >> That's an interesting assertion about what threads were designed for. Do you >> have anything to back it up? >> -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From wilk at flibuste.net Sat Dec 13 16:31:00 2008 From: wilk at flibuste.net (William Dode) Date: Sat, 13 Dec 2008 15:31:00 +0000 (UTC) Subject: [Python-Dev] Psyco for -OO or -O References: <4943B885.1070605@voidspace.org.uk> Message-ID: On 13-12-2008, Michael Foord wrote: > Lie Ryan wrote: >> I'm sure probably most of you knows about psyco[1], the optimizer. Python >> has an -O and -OO flag that is intended to be optimization flag, but we >> know that currently it doesn't do much. Why not add psyco as standard >> library and let -O or -OO invoke psyco? >> > > This really belongs on Python-ideas and not Python-dev. > > The main reason why not is that someone(s) from the Python core team > would then need to 'own' maintaining Psyco (which is x86 only as well). > Psyco is so hard to maintain that even the original author wants to drop > it. :-) It could be the killer feature wich will push python3 adoption ;-) Bloggers like so much benchings ! Sorry... -- William Dod? - http://flibuste.net Informaticien Ind?pendant From roy.lowrance at gmail.com Sat Dec 13 18:08:59 2008 From: roy.lowrance at gmail.com (Roy Lowrance) Date: Sat, 13 Dec 2008 12:08:59 -0500 Subject: [Python-Dev] beginning developer: fastest way to learn how Python 3.0 works Message-ID: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com> I'd like to learn how Python 3.0 works. I've downloaded the svn. I am wondering what the best way to learn is: - Just jump in? - Or perhaps learn A before B? - Or maybe there is a tutorial for those new to the internals? What's the best way to learn how Python 3.0 works? Roy From lists at cheimes.de Sat Dec 13 18:13:55 2008 From: lists at cheimes.de (Christian Heimes) Date: Sat, 13 Dec 2008 18:13:55 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> <4943B974.6020407@voidspace.org.uk> Message-ID: Steve Holden schrieb: > If I remember correctly (when threading was invented in the mid-1980s) > threads were originally described as "lightweight processes". The > perceived advantage at the time was the ability to have multiple threads > of control with shared memory: this was much faster than the available > inter-process communication mechanisms. On a single-processor computer > synchronization was much less of a problem. Initially one of Java's main target platforms were set-top boxes. Back in the 90ties set-top boxes had limited hardware and dumb processors. Most of the boxes had no MMU and so didn't support multiple processes. Threads were the easiest way to have some kind of concurrency. Back in those days threads were the only solution for concurrency but today - about 15 years later with powerful processors even in cheap mobile phones - people are still indoctrinated with the same philosophy ... Christian From guido at python.org Sat Dec 13 18:48:03 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Dec 2008 09:48:03 -0800 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> <4943B974.6020407@voidspace.org.uk> Message-ID: On Sat, Dec 13, 2008 at 9:13 AM, Christian Heimes wrote: > Steve Holden schrieb: >> If I remember correctly (when threading was invented in the mid-1980s) >> threads were originally described as "lightweight processes". The >> perceived advantage at the time was the ability to have multiple threads >> of control with shared memory: this was much faster than the available >> inter-process communication mechanisms. On a single-processor computer >> synchronization was much less of a problem. > > Initially one of Java's main target platforms were set-top boxes. Back > in the 90ties set-top boxes had limited hardware and dumb processors. > Most of the boxes had no MMU and so didn't support multiple processes. > Threads were the easiest way to have some kind of concurrency. Just let's not rewrite history and believe Java invented threads. They were around well before that. > Back in those days threads were the only solution for concurrency but > today - about 15 years later with powerful processors even in cheap > mobile phones - people are still indoctrinated with the same philosophy ... It's not so much indoctrination. Threads are a useful tool. The problem is that some people perceive threads as the *only* tool. There's a whole spectrum of tools, from event handling to multiple processes, and they don't all solve the same problem. (I guess it doesn't help that the word process is given new meanings by some languages.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) From steve at pearwood.info Fri Dec 12 13:01:29 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Dec 2008 23:01:29 +1100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: <494213C8.7040809@gmail.com> References: <494213C8.7040809@gmail.com> Message-ID: <200812122301.29729.steve@pearwood.info> On Fri, 12 Dec 2008 06:33:28 pm Toshio Kuratomi wrote: > Also interesting, if you point your browser at: > http://toshio.fedorapeople.org/u/ > > You should see two other test files. They're both > (one-half)(enyei).html but one's encoded in utf-8 and the other in > latin-1. For what it's worth, Konquorer 3.5 displays the two files as (1/2)(n+tilde).html (A+caret)(1/2)(A+tilde)(plusminus).html It doesn't seem to have any trouble opening either of them. -- Steven From aahz at pythoncraft.com Sat Dec 13 19:18:51 2008 From: aahz at pythoncraft.com (Aahz) Date: Sat, 13 Dec 2008 10:18:51 -0800 Subject: [Python-Dev] beginning developer: fastest way to learn how Python 3.0 works In-Reply-To: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com> References: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com> Message-ID: <20081213181851.GA23531@panix.com> On Sat, Dec 13, 2008, Roy Lowrance wrote: > > What's the best way to learn how Python 3.0 works? Post to the correct mailing list. ;-) Use comp.lang.python or python-tutor or python-help python-dev is for people creating new versions of Python -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From roy.lowrance at gmail.com Sat Dec 13 19:30:10 2008 From: roy.lowrance at gmail.com (Roy Lowrance) Date: Sat, 13 Dec 2008 13:30:10 -0500 Subject: [Python-Dev] beginning developer: fastest way to learn how Python 3.0 works In-Reply-To: <20081213181851.GA23531@panix.com> References: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com> <20081213181851.GA23531@panix.com> Message-ID: <162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com> Maybe this is the correct list, as my inquiry is about how to learn how the current implementation works so that I could consider how to implement new features. So, here's a modified question: If you want to learn how python works (not how to program in the python language), what's a productive way to proceed? Roy On Sat, Dec 13, 2008 at 1:18 PM, Aahz wrote: > On Sat, Dec 13, 2008, Roy Lowrance wrote: >> >> What's the best way to learn how Python 3.0 works? > > Post to the correct mailing list. ;-) > > Use comp.lang.python or python-tutor or python-help > > python-dev is for people creating new versions of Python > -- > Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ > > "It is easier to optimize correct code than to correct optimized code." > --Bill Harlan > -- Roy Lowrance home: 212 674 9777 mobile: 347 255 2544 From tjreedy at udel.edu Sat Dec 13 21:13:55 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 13 Dec 2008 15:13:55 -0500 Subject: [Python-Dev] beginning developer: fastest way to learn how Python 3.0 works In-Reply-To: <162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com> References: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com> <20081213181851.GA23531@panix.com> <162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com> Message-ID: Roy Lowrance wrote: > Maybe this is the correct list, as my inquiry is about how to learn > how the current implementation works so that I could consider how to > implement new features. > > So, here's a modified question: If you want to learn how python works > (not how to program in the python language), what's a productive way > to proceed? There are developer pages on the site, a wiki page on the ceval loop, the extending and embedding manual, and the code itself. From solipsis at pitrou.net Sat Dec 13 22:22:16 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Dec 2008 21:22:16 +0000 (UTC) Subject: [Python-Dev] Reindenting the C code base? Message-ID: Hello, I remember there were some talks of reindenting the C code base (from tabs to 4-space indents) after py3k is released, but I can't find the discussion thread again. Was a decision ever taken about it? Regards Antoine. From guido at python.org Sat Dec 13 22:26:50 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Dec 2008 13:26:50 -0800 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: On Sat, Dec 13, 2008 at 1:22 PM, Antoine Pitrou wrote: > I remember there were some talks of reindenting the C code base (from tabs to > 4-space indents) after py3k is released, but I can't find the discussion thread > again. Was a decision ever taken about it? I think we should not do this. We should use 4 space indents for new files, but existing files should not be reindented. If you reindent, much of the history of the file is essentially lost -- "svn blame" will blame whoever reindented the code, and it's a pain to go back. There's also the issue of merging between the 2.x and 3.x branches, which we still do. As far as a decision, I think the de facto decision is to keep the status quo, and I'm all for sticking with that. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From solipsis at pitrou.net Sat Dec 13 23:11:47 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Dec 2008 22:11:47 +0000 (UTC) Subject: [Python-Dev] Reindenting the C code base? References: Message-ID: Guido van Rossum python.org> writes: > > I think we should not do this. We should use 4 space indents for new > files, but existing files should not be reindented. Well, right now many files are indented with a mix of spaces and tabs, depending on who did the edit and how their editor was configured at the time. Perhaps a graceful policy would be to mandate that all new edits be made with spaces without touching other functions in the file. Then hopefully the code base would gradually converge to a tabless scheme. Regards Antoine. From martin at v.loewis.de Sat Dec 13 23:28:32 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 13 Dec 2008 23:28:32 +0100 Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1 Message-ID: <49443710.3060102@v.loewis.de> On behalf of the Python development team and the Python community, I'm happy to announce the release candidates of Python 2.4.6 and 2.5.3. 2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases will only include security fixes. According to the release notes, over 100 bugs and patches have been addressed since Python 2.5.1, many of them improving the stability of the interpreter, and improving its portability. 2.4.6 includes only a small number of security fixes. Python 2.6 is the latest version of Python, we're making this release for people who are still running Python 2.4. See the release notes at the website (also available as Misc/NEWS in the source distribution) for details of bugs fixed; most of them prevent interpreter crashes (and now cause proper Python exceptions in cases where the interpreter may have crashed before). Assuming no major problems crop up, a final release of Python 2.4.6 and 2.5.3 will follow in about a week's time. For more information on Python 2.4.6 and 2.5.3, including download links for various platforms, release notes, and known issues, please see: http://www.python.org/2.4.6 http://www.python.org/2.5.3 Highlights of the previous major Python releases are available from the Python 2.5 page, at http://www.python.org/2.4/highlights.html http://www.python.org/2.5/highlights.html Enjoy this release, Martin Martin v. Loewis martin at v.loewis.de Python Release Manager (on behalf of the entire python-dev team) From mlobol at gmail.com Sat Dec 13 23:35:00 2008 From: mlobol at gmail.com (Miguel Lobo) Date: Sat, 13 Dec 2008 22:35:00 +0000 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: <10b800400812131435l6f42da16mc9d2c5e69eddd959@mail.gmail.com> > I think we should not do this. We should use 4 space indents for new > files, but existing files should not be reindented. If you reindent, > much of the history of the file is essentially lost -- "svn blame" > will blame whoever reindented the code, and it's a pain to go back. I believe "svn blame -x -w" ignores whitespace changes. -- Miguel Check out Gleam, an LGPL sound synthesizer library, at http://gleamsynth.sf.net From lists at cheimes.de Sat Dec 13 23:39:36 2008 From: lists at cheimes.de (Christian Heimes) Date: Sat, 13 Dec 2008 23:39:36 +0100 Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1 In-Reply-To: <49443710.3060102@v.loewis.de> References: <49443710.3060102@v.loewis.de> Message-ID: <494439A8.2030208@cheimes.de> Martin v. L?wis schrieb: > 2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases > will only include security fixes. According to the release notes, over > 100 bugs and patches have been addressed since Python 2.5.1, many of ^^^^ Do you really mean 2.5.1? Christian From martin at v.loewis.de Sat Dec 13 23:47:27 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 13 Dec 2008 23:47:27 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> <4943B974.6020407@voidspace.org.uk> Message-ID: <49443B7F.8020602@v.loewis.de> > If I remember correctly (when threading was invented in the mid-1980s) > threads were originally described as "lightweight processes". According to http://www.serpentine.com/blog/threads-faq/the-history-of-threads/ that's when threads where *reinvented*. They were originally invented in 1965, on Multics (1970) they were used to perform compilation in the background. When Unix came along, it *added* address space separation, introducing what is now known as processes. > The > perceived advantage at the time was the ability to have multiple threads > of control with shared memory: this was much faster than the available > inter-process communication mechanisms. On a single-processor computer > synchronization was much less of a problem. Historically, it was vice versa. First there were threads/processes/tasks with shared variables, semaphores, etc, and later address space separation was added. Regards, Martin From martin at v.loewis.de Sat Dec 13 23:51:25 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 13 Dec 2008 23:51:25 +0100 Subject: [Python-Dev] beginning developer: fastest way to learn how Python 3.0 works In-Reply-To: <162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com> References: <162be4f00812130908x297c1b99k5c6e605f78835e25@mail.gmail.com> <20081213181851.GA23531@panix.com> <162be4f00812131030w123c5e37tb716e6a5d283a4d7@mail.gmail.com> Message-ID: <49443C6D.8040005@v.loewis.de> > Maybe this is the correct list, as my inquiry is about how to learn > how the current implementation works so that I could consider how to > implement new features. > > So, here's a modified question: If you want to learn how python works > (not how to program in the python language), what's a productive way > to proceed? Well, the question is what you want to learn it *for*. If you want to learn in order to contribute, I suggest you pick an old bug on the bug tracker and try to solve it. If you have a specific new feature in mind that you want to implement, I again suggest that you just start implementing it. If you don't know how, then you should ask on python-list how certain things are done that you might need for the feature, or you even explain to python-list readers what the feature is that you want to implement, and how people would go about implementing it. Regards, Martin From martin at v.loewis.de Sat Dec 13 23:55:38 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 13 Dec 2008 23:55:38 +0100 Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1 In-Reply-To: <494439A8.2030208@cheimes.de> References: <49443710.3060102@v.loewis.de> <494439A8.2030208@cheimes.de> Message-ID: <49443D6A.9020308@v.loewis.de> Christian Heimes wrote: > Martin v. L?wis schrieb: >> 2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases >> will only include security fixes. According to the release notes, over >> 100 bugs and patches have been addressed since Python 2.5.1, many of > ^^^^ > > Do you really mean 2.5.1? Oops, no - although the statement is technically correct; since 2.5.2, only 80 bugs have been added :-) Thanks for pointing that out. Martin From skip at pobox.com Sun Dec 14 04:04:09 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 13 Dec 2008 21:04:09 -0600 Subject: [Python-Dev] Problem with svn on community buildbot Message-ID: <18756.30633.439039.977094@montanaro-dyndns-org.local> I have a community buildbot: http://www.python.org/dev/buildbot/community/all/g5%20OSX%202.5/builds/14/step-svn/0 which is failing the svn checkout of the 2.5 branch: svn: PROPFIND request failed on '/projects/python/branches/release25-maint' svn: PROPFIND of '/projects/python/branches/release25-maint': Could not resolve hostname `svn.python.org': Temporary failure in name resolution (http://svn.python.org) The svn command is: /opt/local/bin/svn checkout --revision 67742 --non-interactive http://svn.python.org/projects/python/branches/release25-maint build Any idea what the problem might be? Thanks, Skip From martin at v.loewis.de Sun Dec 14 05:40:19 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Dec 2008 05:40:19 +0100 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: <18756.30633.439039.977094@montanaro-dyndns-org.local> References: <18756.30633.439039.977094@montanaro-dyndns-org.local> Message-ID: <49448E33.9080506@v.loewis.de> > svn: PROPFIND of '/projects/python/branches/release25-maint': Could not resolve hostname `svn.python.org': Temporary failure in name resolution (http://svn.python.org) > > Any idea what the problem might be? Well - can you resolve `svn.python.org' on that machine (e.g. when using ping(1))? Regards, Martin From skip at pobox.com Sun Dec 14 14:58:25 2008 From: skip at pobox.com (skip at pobox.com) Date: Sun, 14 Dec 2008 07:58:25 -0600 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: <49448E33.9080506@v.loewis.de> References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> Message-ID: <18757.4353.603639.60602@montanaro-dyndns-org.local> Martin> Well - can you resolve `svn.python.org' on that machine Martin> (e.g. when using ping(1))? Yup: $ host svn.python.org svn.python.org has address 82.94.164.164 svn.python.org has IPv6 address 2001:888:2000:d::a4 $ ping svn.python.org PING svn.python.org (82.94.164.164): 56 data bytes 64 bytes from 82.94.164.164: icmp_seq=0 ttl=50 time=134.041 ms 64 bytes from 82.94.164.164: icmp_seq=1 ttl=50 time=135.441 ms 64 bytes from 82.94.164.164: icmp_seq=2 ttl=50 time=135.352 ms ^C --- svn.python.org ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max/stddev = 134.041/134.945/135.441/0.640 ms $ telnet svn.python.org 80 Trying 82.94.164.164... Connected to svn.python.org. Escape character is '^]'. ^] telnet> quit Connection closed. Skip From alexander.belopolsky at gmail.com Sun Dec 14 17:07:30 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 14 Dec 2008 11:07:30 -0500 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: <18757.4353.603639.60602@montanaro-dyndns-org.local> References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> <18757.4353.603639.60602@montanaro-dyndns-org.local> Message-ID: I don't know is this is related, but from my end, access to svn.python.org has been extremely slow recently: $ time curl -o /dev/null http://svn.python.org % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 353 100 353 0 0 4 0 0:01:28 0:01:15 0:00:13 0 real 1m15.045s user 0m0.004s sys 0m0.004s I've seen similar slowdowns accessing bugs.python.org, but not now. It looks like it has something to do with IPv6: $ host svn.python.org svn.python.org has address 82.94.164.164 svn.python.org has IPv6 address 2001:888:2000:d::a4 $ time curl -v -o /dev/null http://svn.python.org * About to connect() to svn.python.org port 80 (#0) * Trying 2001:888:2000:d::a4... Operation timed out * Trying 82.94.164.164... connected ... No slowdown when IPv6 lookup is disabled with -4 option to curl: $ time curl -4 -o /dev/null http://svn.python.org % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 353 100 353 0 0 774 0 --:--:-- --:--:-- --:--:-- 0 real 0m0.463s user 0m0.004s sys 0m0.004s On Sun, Dec 14, 2008 at 8:58 AM, wrote: > > Martin> Well - can you resolve `svn.python.org' on that machine > Martin> (e.g. when using ping(1))? > > Yup: > > $ host svn.python.org > svn.python.org has address 82.94.164.164 > svn.python.org has IPv6 address 2001:888:2000:d::a4 > $ ping svn.python.org > PING svn.python.org (82.94.164.164): 56 data bytes > 64 bytes from 82.94.164.164: icmp_seq=0 ttl=50 time=134.041 ms > 64 bytes from 82.94.164.164: icmp_seq=1 ttl=50 time=135.441 ms > 64 bytes from 82.94.164.164: icmp_seq=2 ttl=50 time=135.352 ms > ^C > --- svn.python.org ping statistics --- > 3 packets transmitted, 3 packets received, 0% packet loss > round-trip min/avg/max/stddev = 134.041/134.945/135.441/0.640 ms > $ telnet svn.python.org 80 > Trying 82.94.164.164... > Connected to svn.python.org. > Escape character is '^]'. > ^] > telnet> quit > Connection closed. > > Skip > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com > From guido at python.org Sun Dec 14 17:26:06 2008 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Dec 2008 08:26:06 -0800 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou wrote: > Guido van Rossum python.org> writes: >> >> I think we should not do this. We should use 4 space indents for new >> files, but existing files should not be reindented. > > Well, right now many files are indented with a mix of spaces and tabs, depending > on who did the edit and how their editor was configured at the time. That's a shame. We used to have more rigorous standards than allowing that. > Perhaps a graceful policy would be to mandate that all new edits be made with > spaces without touching other functions in the file. Then hopefully the code > base would gradually converge to a tabless scheme. I don't think so. I find local consistency more important than global consistency. A file can become really hard to read when different indentation schemes are used in random parts of the code. If you have a problem configuring your editor, just say so and someone will explain how to do it. --Guido van Rossum (home page: http://www.python.org/~guido/) From skip at pobox.com Sun Dec 14 17:30:02 2008 From: skip at pobox.com (skip at pobox.com) Date: Sun, 14 Dec 2008 10:30:02 -0600 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> <18757.4353.603639.60602@montanaro-dyndns-org.local> Message-ID: <18757.13450.115714.797824@montanaro-dyndns-org.local> Alexander> It looks like it has something to do with IPv6: Alexander> $ host svn.python.org svn.python.org has address Alexander> 82.94.164.164 svn.python.org has IPv6 address Alexander> 2001:888:2000:d::a4 ... Alexander> No slowdown when IPv6 lookup is disabled with -4 option to Alexander> curl: ... But I have no problem on my laptop which is sitting right next to the G5 which is having problems. Both show an IPv6 address for svn.python.org. Skip From jyasskin at gmail.com Sun Dec 14 18:43:28 2008 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sun, 14 Dec 2008 09:43:28 -0800 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum wrote: > On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou wrote: >> Guido van Rossum python.org> writes: >>> >>> I think we should not do this. We should use 4 space indents for new >>> files, but existing files should not be reindented. >> >> Well, right now many files are indented with a mix of spaces and tabs, depending >> on who did the edit and how their editor was configured at the time. > > That's a shame. We used to have more rigorous standards than allowing that. > >> Perhaps a graceful policy would be to mandate that all new edits be made with >> spaces without touching other functions in the file. Then hopefully the code >> base would gradually converge to a tabless scheme. > > I don't think so. I find local consistency more important than global > consistency. A file can become really hard to read when different > indentation schemes are used in random parts of the code. > > If you have a problem configuring your editor, just say so and someone > will explain how to do it. I've never figured out how to configure emacs to deduce whether the current file uses spaces or tabs and has a 4 or 8 space indent. I always try to get it right anyway, but it'd be a lot more convenient if my editor did it for me. If there are such instructions, perhaps they should be added to PEPs 7 and 8? Thanks, Jeffrey From solipsis at pitrou.net Sun Dec 14 18:49:39 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Dec 2008 17:49:39 +0000 (UTC) Subject: [Python-Dev] Reindenting the C code base? References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: Jeffrey Yasskin gmail.com> writes: > > I've never figured out how to configure emacs to deduce whether the > current file uses spaces or tabs and has a 4 or 8 space indent. Same question for Kate! Although I guess that if emacs isn't able to do it, Kate won't do it either... (Kate allows configuring on a directory basis, on a file extension basis, but not on a filename basis) Regards Antoine. From alexandre at peadrop.com Sun Dec 14 18:54:15 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 14 Dec 2008 12:54:15 -0500 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: On Sat, Dec 13, 2008 at 5:11 PM, Antoine Pitrou wrote: > Guido van Rossum python.org> writes: >> >> I think we should not do this. We should use 4 space indents for new >> files, but existing files should not be reindented. > > Well, right now many files are indented with a mix of spaces and tabs, depending > on who did the edit and how their editor was configured at the time. > Personally, I think the indentation of, at least, Objects/unicodeobject.c should be fixed. This file has become so mixed-up with tab and space indents that I have no-idea what to use when I edit it. Just to give an idea how messy it is, they are 5214 lines indented with tabs and 4272 indented with spaces (out the 9733 of the file). -- Alexandre From alexandre at peadrop.com Sun Dec 14 18:57:14 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 14 Dec 2008 12:57:14 -0500 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: On Sun, Dec 14, 2008 at 12:43 PM, Jeffrey Yasskin wrote: > I've never figured out how to configure emacs to deduce whether the > current file uses spaces or tabs and has a 4 or 8 space indent. I > always try to get it right anyway, but it'd be a lot more convenient > if my editor did it for me. If there are such instructions, perhaps > they should be added to PEPs 7 and 8? > I know python-mode is able to detect indent configuration of python code automatically, but I don't know if c-mode is able to. Personally, From alexandre at peadrop.com Sun Dec 14 19:03:40 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 14 Dec 2008 13:03:40 -0500 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: On Sun, Dec 14, 2008 at 12:57 PM, Alexandre Vassalotti wrote: > On Sun, Dec 14, 2008 at 12:43 PM, Jeffrey Yasskin wrote: >> I've never figured out how to configure emacs to deduce whether the >> current file uses spaces or tabs and has a 4 or 8 space indent. I >> always try to get it right anyway, but it'd be a lot more convenient >> if my editor did it for me. If there are such instructions, perhaps >> they should be added to PEPs 7 and 8? >> > > I know python-mode is able to detect indent configuration of python > code automatically, but I don't know if c-mode is able to. Personally, > [sorry, in gmail made it send my unfinished email] Personally, I use auto-mode-alist to make Emacs choose the indent configuration to use automatically. Here's how it looks like for me: (defmacro def-styled-c-mode (name style &rest body) "Define styled C modes." `(defun ,name () (interactive) (c-mode) (c-set-style ,style) , at body)) (def-styled-c-mode python-c-mode "python" (setq indent-tabs-mode t tab-width 8 c-basic-offset 8)) (def-styled-c-mode py3k-c-mode "python" (setq indent-tabs-mode nil tab-width 4 c-basic-offset 4)) (setq auto-mode-alist (append '(("/python.org/python/.*\\.[ch]\\'" . python-c-mode) ("/python.org/.*/.*\\.[ch]\\'" . py3k-c-mode)) auto-mode-alist)) From alexandre at peadrop.com Sun Dec 14 19:19:17 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 14 Dec 2008 13:19:17 -0500 Subject: [Python-Dev] 2to3 question about fix_imports. In-Reply-To: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> Message-ID: On Fri, Dec 12, 2008 at 11:39 AM, Lennart Regebro wrote: > The fix_imports fix seems to fix only the first import per line that you have. > So if you do for example > import urllib2, cStringIO > it will not fix cStringIO. > > Is this a bug or a feature? :-) If it's a feature it should warn at > least, right? > Which revision of python are you using? I tried the test-case you gave and 2to3 translated it perfectly. -- Alexandre alex at helios:~$ cat test.py import urllib2, cStringIO s = cStringIO.StringIO(urllib2.randombytes(100)) alex at helios:~$ 2to3 test.py RefactoringTool: Skipping implicit fixer: buffer RefactoringTool: Skipping implicit fixer: idioms RefactoringTool: Skipping implicit fixer: set_literal RefactoringTool: Skipping implicit fixer: ws_comma --- test.py (original) +++ test.py (refactored) @@ -1,3 +1,3 @@ -import urllib2, cStringIO +import urllib.request, urllib.error, io -s = cStringIO.StringIO(urllib2.randombytes(100)) +s = io.StringIO(urllib2.randombytes(100)) RefactoringTool: Files that need to be modified: RefactoringTool: test.py From regebro at gmail.com Sun Dec 14 19:34:35 2008 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 14 Dec 2008 19:34:35 +0100 Subject: [Python-Dev] 2to3 question about fix_imports. In-Reply-To: References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> Message-ID: <319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com> On Sun, Dec 14, 2008 at 19:19, Alexandre Vassalotti wrote: > Which revision of python are you using? I tried the test-case you gave > and 2to3 translated it perfectly. 3.0, I haven't tried with trunk yet, and possibly it's a more complicated usecase. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From alexandre at peadrop.com Sun Dec 14 19:49:06 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sun, 14 Dec 2008 13:49:06 -0500 Subject: [Python-Dev] 2to3 question about fix_imports. In-Reply-To: <319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com> References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> <319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com> Message-ID: On Sun, Dec 14, 2008 at 1:34 PM, Lennart Regebro wrote: > On Sun, Dec 14, 2008 at 19:19, Alexandre Vassalotti > wrote: >> Which revision of python are you using? I tried the test-case you gave >> and 2to3 translated it perfectly. > > 3.0, I haven't tried with trunk yet, and possibly it's a more > complicated usecase. Strange, fix_imports in Python 3.0 (final) looks fine. If you can come up with a reproducible example, please open a bug on bugs.python.org and set me as the assignee (my user id is alexandre.vassalotti). Thanks, -- Alexandre From regebro at gmail.com Sun Dec 14 20:02:01 2008 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 14 Dec 2008 20:02:01 +0100 Subject: [Python-Dev] 2to3 question about fix_imports. In-Reply-To: References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> <319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com> Message-ID: <319e029f0812141102y2818dca0v22e759a3cc73a3c7@mail.gmail.com> On Sun, Dec 14, 2008 at 19:49, Alexandre Vassalotti wrote: >> 3.0, I haven't tried with trunk yet, and possibly it's a more >> complicated usecase. > > Strange, fix_imports in Python 3.0 (final) looks fine. If you can come > up with a reproducible example, please open a bug on bugs.python.org > and set me as the assignee (my user id is alexandre.vassalotti). Actually, it wasn't more complex, but it was completely different. It doesn't have anything with the amount of statements, but it's specifically if you have urlparse in the imports that breaks it. I'll open a bug report. -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From regebro at gmail.com Sun Dec 14 20:08:09 2008 From: regebro at gmail.com (Lennart Regebro) Date: Sun, 14 Dec 2008 20:08:09 +0100 Subject: [Python-Dev] 2to3 question about fix_imports. In-Reply-To: <319e029f0812141102y2818dca0v22e759a3cc73a3c7@mail.gmail.com> References: <319e029f0812120839o4f79b25aq8fd3e53719eb127a@mail.gmail.com> <319e029f0812141034g6d523922x1cf3b01b50c8f@mail.gmail.com> <319e029f0812141102y2818dca0v22e759a3cc73a3c7@mail.gmail.com> Message-ID: <319e029f0812141108j291e3fb1n70512fb7c20b0947@mail.gmail.com> On Sun, Dec 14, 2008 at 20:02, Lennart Regebro wrote: > On Sun, Dec 14, 2008 at 19:49, Alexandre Vassalotti > wrote: >>> 3.0, I haven't tried with trunk yet, and possibly it's a more >>> complicated usecase. >> >> Strange, fix_imports in Python 3.0 (final) looks fine. If you can come >> up with a reproducible example, please open a bug on bugs.python.org >> and set me as the assignee (my user id is alexandre.vassalotti). > > Actually, it wasn't more complex, but it was completely different. It > doesn't have anything with the amount of statements, but it's > specifically if you have urlparse in the imports that breaks it. I'll > open a bug report. I couldn't assign it to you, so here goes: http://bugs.python.org/issue4664 -- Lennart Regebro: Zope and Plone consulting. http://www.colliberty.com/ +33 661 58 14 64 From martin at v.loewis.de Sun Dec 14 20:55:40 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Dec 2008 20:55:40 +0100 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> <18757.4353.603639.60602@montanaro-dyndns-org.local> Message-ID: <494564BC.4020000@v.loewis.de> > I don't know is this is related It shouldn't. AFAIK, buildbot makes its internet connections through twisted, and twisted doesn't use IPv6. Also, the diagnostics (cannot resolve name) doesn't match connectivity problems. > $ time curl -v -o /dev/null http://svn.python.org > * About to connect() to svn.python.org port 80 (#0) > * Trying 2001:888:2000:d::a4... Operation timed out Hmm. Can you debug this further? Do you have IPv6 connectivity at all? Do you have a global v6 address? What happens if you do a v6 traceroute? Regards, Martin From martin at v.loewis.de Sun Dec 14 21:42:10 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Dec 2008 21:42:10 +0100 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: <49456FA2.70900@v.loewis.de> > I've never figured out how to configure emacs to deduce whether the > current file uses spaces or tabs and has a 4 or 8 space indent. If it is now official policy that different files use different styles, then I think it would be helpful to put Emacs variables at the end of each file. See the end of Objects/unicodeobject.c for an example. I'm not aware of a builtin function that adjusts c-mode automatically; I could fine a package that does some basic guessing, though: http://members.iinet.net.au/~bethandmark/elisp/mst-guess-indentation.el http://www.emacswiki.org/cgi-bin/emacs/guess-offset.el I've tried the second one briefly. It guesses c-basic-offset fairly well, but doesn't attempt to guess indent-tabs. This one does; I haven't tried it yet: https://savannah.nongnu.org/projects/dtrt-indent/ http://git.savannah.gnu.org/gitweb/?p=dtrt-indent.git;a=blob_plain;f=dtrt-indent.el;hb=HEAD Regards, Martin From martin at v.loewis.de Sun Dec 14 21:43:47 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Dec 2008 21:43:47 +0100 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: <49457003.5060104@v.loewis.de> > Personally, I think the indentation of, at least, > Objects/unicodeobject.c should be fixed. This file has become so > mixed-up with tab and space indents that I have no-idea what to use > when I edit it. Just to give an idea how messy it is, they are 5214 > lines indented with tabs and 4272 indented with spaces (out the 9733 > of the file). As an Emacs variables block is present in the file, I would consider this normative, and declare that the official indenting is 4 spaces for the file, no tabs. Regards, Martin From dickinsm at gmail.com Sun Dec 14 21:49:31 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 14 Dec 2008 20:49:31 +0000 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? Message-ID: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com> Hi all, I'm having some trouble making some bits of the Python core code available to extension modules. Specifically, I'm trying to add a function 'Py_force_to_memory' to Python/pymath.c and then use it (via a macro) from Modules/cmathmodule.c. But importing of the cmath module fails with a 'Symbol not found' error. The function is declared with a 'PyAPI_FUNC' in Python/pymath.h. Here's the relevant portion of the make output: *** WARNING: renaming "cmath" since importing it failed: dlopen(build/lib.macosx-10.3-i386-2.7/cmath.so, 2): Symbol not found: _Py_force_to_memory Referenced from: /Users/dickinsm/python_source/branches/trunk/build/lib.macosx-10.3-i386-2.7/cmath.so Expected in: dynamic lookup This is a non-debug trunk build, on OS X (10.5.5), with all the defaults. I'm using Apple's standard toolchain (gcc 4.0.1, Darwin linker). The patch I'm building with can be seen at: http://bugs.python.org/issue4575 (It's the first of the two patches there, called 'force_to_memory.patch'.) I think I understand the cause of this problem; I just don't know how to fix it. The cause seems to be that none of the symbols in pymath.o is used in the Python executable; they're used only in the extension modules. So while the '_Py_force_to_memory' symbol appears in libpython2.7.a, it doesn't appear in the python.exe executable; hence the above error. If I move the definition of Py_force_to_memory from Python/pymath.c to Objects/floatobject.c then everything works as expected. Questions: (1) Is this an OS X only problem? (2) Is there an easy way to force a particular symbol (or all the symbols from a particular object file) to be exported in the Python executable, so that it's available to a dynamically loaded extension module? I've found the -u option to gcc, but this doesn't seem like a particularly portable solution. Of course, if this problem exists only on OS X, then the solution doesn't need to be portable. Thanks, Mark From martin at v.loewis.de Sun Dec 14 21:53:09 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Dec 2008 21:53:09 +0100 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: <49457235.7060701@v.loewis.de> > Same question for Kate! Although I guess that if emacs isn't able to do it, Kate > won't do it either... > > (Kate allows configuring on a directory basis, on a file extension basis, but > not on a filename basis) I guess it would be possible to write a Kate plugin that does that. Regards, Martin From martin at v.loewis.de Sun Dec 14 22:06:19 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Dec 2008 22:06:19 +0100 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? In-Reply-To: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com> References: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com> Message-ID: <4945754B.3010201@v.loewis.de> > (1) Is this an OS X only problem? Probably not. If nothing of pymath.c is actually needed when linking the python executable, pymath.o will be excluded by the linker. > (2) Is there an easy way to force a particular symbol (or all the > symbols from a particular object file) to be exported in the Python > executable, so that it's available to a dynamically loaded extension > module? That's not the issue. Had pymath.o been linked into python, it's symbols would have been exported (is that proper use of English tenses?) To fix this, I see three solutions 1. Explicitly link the module to extensions which are known to require it, e.g. by explicitly adding it to the sources in setup.py. That might cause duplications, but would IMO be the cleanest solution (python.exe has no business in exporting standard math functions, IMO) 2. Explicitly link pymath.o to python.exe, instead of integrating it into libpythonxy.a. If the symbols need to be exposed through python.exe (for whatever reason), this is the clean way to do it. 3. Implicitly force linkage, by adding a dummy symbol to pymath.o which gets referenced from an object known to be linked into the interpreter. This has the least impact on the build process, but is the most hackish approach (IMO). Regards, Martin From solipsis at pitrou.net Sun Dec 14 22:08:14 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Dec 2008 21:08:14 +0000 (UTC) Subject: [Python-Dev] Reindenting the C code base? References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> <49457235.7060701@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > I guess it would be possible to write a Kate plugin that does that. Or perhaps more simply, Kate allows modelines at the beginning and at the end of source files. I don't know if it's ok to add these to the code base though. From alexander.belopolsky at gmail.com Sun Dec 14 22:52:36 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 14 Dec 2008 16:52:36 -0500 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: <494564BC.4020000@v.loewis.de> References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> <18757.4353.603639.60602@montanaro-dyndns-org.local> <494564BC.4020000@v.loewis.de> Message-ID: Please see below for more svn debugging, but now I also traced down the delays I observe when I go to bugs.python.com to the same issue. The offending download is the style sheet and that explains why curl does not show it when pointed to the main page: $ curl -v -o /dev/null http://python.org/styles/screen-switcher-default.css * About to connect() to python.org port 80 (#0) * Trying 2001:888:2000:d::a2... Operation timed out The offending main page element is: $ curl http://bugs.python.org 2>/dev/null | grep screen-switcher-default On Sun, Dec 14, 2008 at 2:55 PM, "Martin v. L?wis" wrote: .. >> $ time curl -v -o /dev/null http://svn.python.org >> * About to connect() to svn.python.org port 80 (#0) >> * Trying 2001:888:2000:d::a4... Operation timed out > > Hmm. Can you debug this further? > > Do you have IPv6 connectivity at all? I don't think so. > Do you have a global v6 address? No, only private inet6 address: $ ifconfig en0 en0: flags=8863 mtu 1500 inet6 fe80::21f:5bff:fef3:c0a4%en0 prefixlen 64 scopeid 0x4 inet 192.168.1.6 netmask 0xffffff00 broadcast 192.168.1.255 ... > What happens if you do a v6 traceroute? > $ traceroute6 -v svn.python.org traceroute6 to svn.python.org (2001:888:2000:d::a4) from fdbd:a375:403a:51c6:21f:5bff:fef3:c0a4, 30 hops max, 12 byte packets 1 * 24 bytes from fe80::216:cbff:fec1:c94c%en0 to fe80::21f:5bff:fef3:c0a4: icmp type 136 (Neighbor Advertisement) code 0 0000: fe800000 00000000 0216cbff fec1c94c 0010: 00000000 00000000 32 bytes from fe80::216:cbff:fec1:c94c%en0 to fe80::21f:5bff:fef3:c0a4: icmp type 135 (Neighbor Solicitation) code 0 0000: fe800000 00000000 021f5bff fef3c0a4 0010: 01010016 cbc1c94c 00000000 00000000 * * From dickinsm at gmail.com Sun Dec 14 22:57:41 2008 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 14 Dec 2008 21:57:41 +0000 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? In-Reply-To: <4945754B.3010201@v.loewis.de> References: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com> <4945754B.3010201@v.loewis.de> Message-ID: <5c6f2a5d0812141357y14462e28p96569b7d61a0cd92@mail.gmail.com> On Sun, Dec 14, 2008 at 9:06 PM, "Martin v. L?wis" wrote: > That's not the issue. Had pymath.o been linked into python, it's > symbols would have been exported (is that proper use of English > tenses?) Sounds right to me. > > To fix this, I see three solutions > > [...] Thanks for this; this gives me a clearer idea of how things might be solved. > (python.exe has no business in exporting > standard math functions, IMO) It's a little bit messy: some bits of pymath.c (hypot, and possibly copysign) are needed in the core, but only on platforms whose math libraries haven't caught up with C99. The rest is only (possibly) needed in the math and cmath modules. In fact, on OS X none of pymath.c is needed at all, which results in lots of "ranlib: file: libpython2.7.a(pymath.o) has no symbols" in the build output... I'll try to find a non-hackish solution. Mark From alexander.belopolsky at gmail.com Sun Dec 14 23:03:17 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 14 Dec 2008 17:03:17 -0500 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> <18757.4353.603639.60602@montanaro-dyndns-org.local> <494564BC.4020000@v.loewis.de> Message-ID: I've found a work-around in Firefox: go to about:config page an change network.dns.disableIPv6 to true. Does anyone know a similar setting in Safari? On Sun, Dec 14, 2008 at 4:52 PM, Alexander Belopolsky wrote: > Please see below for more svn debugging, but now I also traced down > the delays I observe when I go to bugs.python.com to the same issue. > The offending download is the style sheet and that explains why curl > does not show it when pointed to the main page: > > $ curl -v -o /dev/null http://python.org/styles/screen-switcher-default.css > * About to connect() to python.org port 80 (#0) > * Trying 2001:888:2000:d::a2... Operation timed out > > The offending main page element is: > $ curl http://bugs.python.org 2>/dev/null | grep screen-switcher-default > href="http://python.org/styles/screen-switcher-default.css" > type="text/css" id="screen-switcher-stylesheet" rel="stylesheet" /> > > > On Sun, Dec 14, 2008 at 2:55 PM, "Martin v. L?wis" wrote: > .. >>> $ time curl -v -o /dev/null http://svn.python.org >>> * About to connect() to svn.python.org port 80 (#0) >>> * Trying 2001:888:2000:d::a4... Operation timed out >> >> Hmm. Can you debug this further? >> >> Do you have IPv6 connectivity at all? > I don't think so. > >> Do you have a global v6 address? > No, only private inet6 address: > > $ ifconfig en0 > en0: flags=8863 mtu 1500 > inet6 fe80::21f:5bff:fef3:c0a4%en0 prefixlen 64 scopeid 0x4 > inet 192.168.1.6 netmask 0xffffff00 broadcast 192.168.1.255 > ... > >> What happens if you do a v6 traceroute? >> > $ traceroute6 -v svn.python.org > traceroute6 to svn.python.org (2001:888:2000:d::a4) from > fdbd:a375:403a:51c6:21f:5bff:fef3:c0a4, 30 hops max, 12 byte packets > 1 * > 24 bytes from fe80::216:cbff:fec1:c94c%en0 to > fe80::21f:5bff:fef3:c0a4: icmp type 136 (Neighbor Advertisement) code > 0 > 0000: fe800000 00000000 0216cbff fec1c94c > 0010: 00000000 00000000 > > 32 bytes from fe80::216:cbff:fec1:c94c%en0 to > fe80::21f:5bff:fef3:c0a4: icmp type 135 (Neighbor Solicitation) code 0 > 0000: fe800000 00000000 021f5bff fef3c0a4 > 0010: 01010016 cbc1c94c 00000000 00000000 > * * > From martin at v.loewis.de Sun Dec 14 23:15:41 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 14 Dec 2008 23:15:41 +0100 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? In-Reply-To: <5c6f2a5d0812141357y14462e28p96569b7d61a0cd92@mail.gmail.com> References: <5c6f2a5d0812141249p45fc064bkbfac08a9450cb6bc@mail.gmail.com> <4945754B.3010201@v.loewis.de> <5c6f2a5d0812141357y14462e28p96569b7d61a0cd92@mail.gmail.com> Message-ID: <4945858D.4050309@v.loewis.de> > It's a little bit messy: some bits of pymath.c (hypot, and possibly > copysign) are needed in the core, but only on platforms whose > math libraries haven't caught up with C99. It would be possible to only build the module if it defines any functions; that should be checked in configure. Alternatively, I believe that autoconf offers a mechanism to have fallback functions in files named like the function; autoconf will then build itself a list of all additional source files. Using that would require to split pymath.c into multiple files. Regards, Martin From martin at v.loewis.de Sun Dec 14 23:18:33 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Dec 2008 23:18:33 +0100 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> <18757.4353.603639.60602@montanaro-dyndns-org.local> <494564BC.4020000@v.loewis.de> Message-ID: <49458639.4020507@v.loewis.de> > I've found a work-around in Firefox: go to about:config page an change > network.dns.disableIPv6 to true. I'd advise against using such a work-around. The infrastructure is designed to cope with that case transparently; if it is not transparent, your system must be somehow misconfigured (it could also be the case that applications are buggy - but I don't think this is the case you are facing). The proper solution is to fix your system (although I'm still uncertain what precisely the problem might be). Regards, Martin From alexander.belopolsky at gmail.com Sun Dec 14 23:38:12 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 14 Dec 2008 17:38:12 -0500 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: <49458639.4020507@v.loewis.de> References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> <18757.4353.603639.60602@montanaro-dyndns-org.local> <494564BC.4020000@v.loewis.de> <49458639.4020507@v.loewis.de> Message-ID: On Sun, Dec 14, 2008 at 5:18 PM, "Martin v. L?wis" wrote: >> I've found a work-around in Firefox: go to about:config page an change >> network.dns.disableIPv6 to true. > > I'd advise against using such a work-around. The infrastructure is > designed to cope with that case transparently; if it is not transparent, > your system must be somehow misconfigured ... I've never had similar issues with any site other than those in python.org domain and I had these problems with bug.python.org on several systems in different locations. Another work-around, which happens to work for all browsers and svn is to disable IPv6 in network preferences (my system is Mac OS 10.5.5). since I don't have IPv6 connectivity, I think this is a solution I can live with, but I wonder why is it necessary for python.org to be registered as both an IPv4 and v6 domain? Google does not do that: $ host google.com google.com has address 72.14.205.100 google.com has address 74.125.45.100 google.com has address 209.85.171.100 google.com mail is handled by 10 smtp4.google.com. google.com mail is handled by 10 smtp1.google.com. google.com mail is handled by 10 smtp2.google.com. google.com mail is handled by 10 smtp3.google.com. $ host ipv6.google.com ipv6.google.com is an alias for ipv6.l.google.com. ipv6.l.google.com has IPv6 address 2001:4860:0:2001::68 From martin at v.loewis.de Sun Dec 14 23:57:59 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 14 Dec 2008 23:57:59 +0100 Subject: [Python-Dev] Problem with svn on community buildbot In-Reply-To: References: <18756.30633.439039.977094@montanaro-dyndns-org.local> <49448E33.9080506@v.loewis.de> <18757.4353.603639.60602@montanaro-dyndns-org.local> <494564BC.4020000@v.loewis.de> <49458639.4020507@v.loewis.de> Message-ID: <49458F77.7050709@v.loewis.de> > live with, but I wonder why is it necessary for python.org to be > registered as both an IPv4 and v6 domain? Google does not do that: Google works in changing that: http://www3.ietf.org/proceedings/08jul/slides/plenaryw-4.pdf Other systems have been doing it for many years now: martin at mira:~$ host www.freebsd.org www.freebsd.org has address 69.147.83.33 www.freebsd.org has IPv6 address 2001:4f8:fff6::21 Regards, Martin From alexander.belopolsky at gmail.com Mon Dec 15 04:12:41 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 14 Dec 2008 22:12:41 -0500 Subject: [Python-Dev] sys.stdout.write encoding failure Message-ID: There is currently a unit test in the trunk that fails in verbose mode: $ ./python.exe Lib/test/test_doctest.py -v ... UnicodeEncodeError: 'ascii' codec can't encode characters in position 338-339: ordinal not in range(128) Apparently, the problem is that stdout cannot encode non-ascii characters: >>> sys.stdout.write(u'f\xf6\xf6') Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128) which is strange because >>> sys.stdout.encoding 'UTF-8' and print has no problem with the same string: >>> print u'f\xf6\xf6' f?? Where does 'ascii' codec come from? From jeremy at alum.mit.edu Mon Dec 15 05:06:51 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Sun, 14 Dec 2008 23:06:51 -0500 Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses Message-ID: This bug is pretty serious, because urllib will insert garbage into the application-visible data for a chunked response. It simply ignores the fact that it's reading a chunked response and includes the chunked header data is payload data. The original bug was reported in September, but no one noticed it. It was reported again recently. http://bugs.python.org/issue3761 http://bugs.python.org/issue4631 I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but that's not my call. Jeremy From g.brandl at gmx.net Mon Dec 15 09:20:44 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 15 Dec 2008 09:20:44 +0100 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: Jeffrey Yasskin schrieb: > On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum wrote: >> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou wrote: >>> Guido van Rossum python.org> writes: >>>> >>>> I think we should not do this. We should use 4 space indents for new >>>> files, but existing files should not be reindented. >>> >>> Well, right now many files are indented with a mix of spaces and tabs, depending >>> on who did the edit and how their editor was configured at the time. >> >> That's a shame. We used to have more rigorous standards than allowing that. >> >>> Perhaps a graceful policy would be to mandate that all new edits be made with >>> spaces without touching other functions in the file. Then hopefully the code >>> base would gradually converge to a tabless scheme. >> >> I don't think so. I find local consistency more important than global >> consistency. A file can become really hard to read when different >> indentation schemes are used in random parts of the code. >> >> If you have a problem configuring your editor, just say so and someone >> will explain how to do it. > > I've never figured out how to configure emacs to deduce whether the > current file uses spaces or tabs and has a 4 or 8 space indent. I > always try to get it right anyway, but it'd be a lot more convenient > if my editor did it for me. If there are such instructions, perhaps > they should be added to PEPs 7 and 8? I use this little hack to detect indentation in Python's C files: (defun c-select-style () "Hack: Select the C style to use from buffer indentation." (save-excursion (if (re-search-forward "^\t" 3000 t) (c-set-style "python") (c-set-style "python-new")))) (add-hook 'c-mode-hook 'c-select-style) -- where "python" and "python-new" are two appropriate c-mode styles. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From eckhardt at satorlaser.com Mon Dec 15 09:40:00 2008 From: eckhardt at satorlaser.com (Ulrich Eckhardt) Date: Mon, 15 Dec 2008 09:40:00 +0100 Subject: [Python-Dev] Python-3.0, unicode, and os.environ In-Reply-To: References: <200812120931.16231.eckhardt@satorlaser.com> Message-ID: <200812150940.00352.eckhardt@satorlaser.com> On Friday 12 December 2008, Adam Olsen wrote: > Only pages like this, which indicate the underlying API is an array of > WCHAR: > > http://blogs.msdn.com/michkap/archive/2005/05/11/416552.aspx Hmm, true. So even there, the encoding isn't known... > char * is just fine. You need only pass a length along with it. All > internal APIs *must* already do this, as they support nul bytes. Also > note that the underlying POSIX APIs prohibit nul bytes in filenames, > so it's irrelevant for them. Hmmm, I see things like Py_GetPath() in the 2.7 sourcecode, which returns a plain char*. I really need to check if 3.0 is better. thanks for the info Uli -- Sator Laser GmbH Gesch?ftsf?hrer: Thorsten F?cking, Amtsgericht Hamburg HR B62 932 ************************************************************************************** Visit our website at ************************************************************************************** Diese E-Mail einschlie?lich s?mtlicher Anh?nge ist nur f?r den Adressaten bestimmt und kann vertrauliche Informationen enthalten. Bitte benachrichtigen Sie den Absender umgehend, falls Sie nicht der beabsichtigte Empf?nger sein sollten. Die E-Mail ist in diesem Fall zu l?schen und darf weder gelesen, weitergeleitet, ver?ffentlicht oder anderweitig benutzt werden. E-Mails k?nnen durch Dritte gelesen werden und Viren sowie nichtautorisierte ?nderungen enthalten. Sator Laser GmbH ist f?r diese Folgen nicht verantwortlich. ************************************************************************************** From amauryfa at gmail.com Mon Dec 15 09:47:31 2008 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Mon, 15 Dec 2008 09:47:31 +0100 Subject: [Python-Dev] sys.stdout.write encoding failure In-Reply-To: References: Message-ID: Hi, Alexander Belopolsky wrote: > There is currently a unit test in the trunk that fails in verbose mode: > > $ ./python.exe Lib/test/test_doctest.py -v > ... > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 338-339: ordinal not in range(128) > > Apparently, the problem is that stdout cannot encode non-ascii characters: > >>>> sys.stdout.write(u'f\xf6\xf6') > Traceback (most recent call last): > File "", line 1, in > UnicodeEncodeError: 'ascii' codec can't encode characters in position > 1-2: ordinal not in range(128) > > which is strange because > >>>> sys.stdout.encoding > 'UTF-8' > > and print has no problem with the same string: >>>> print u'f\xf6\xf6' > f?? > > > Where does 'ascii' codec come from? It's sys.getdefaultencoding default value. sys.stdout.write() expects a bytes string. What you see here is the coercion of the unicode to a string. -- Amaury Forgeot d'Arc From mal at egenix.com Mon Dec 15 11:27:29 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 15 Dec 2008 11:27:29 +0100 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: <49457003.5060104@v.loewis.de> References: <49457003.5060104@v.loewis.de> Message-ID: <49463111.6040800@egenix.com> On 2008-12-14 21:43, Martin v. L?wis wrote: >> Personally, I think the indentation of, at least, >> Objects/unicodeobject.c should be fixed. This file has become so >> mixed-up with tab and space indents that I have no-idea what to use >> when I edit it. Just to give an idea how messy it is, they are 5214 >> lines indented with tabs and 4272 indented with spaces (out the 9733 >> of the file). > > As an Emacs variables block is present in the file, I would consider > this normative, and declare that the official indenting is 4 spaces > for the file, no tabs. All the Unicode C code I wrote at the time used 4 space indents. I would welcome this being restored. It got diluted over time. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 15 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From steve at holdenweb.com Mon Dec 15 14:59:21 2008 From: steve at holdenweb.com (Steve Holden) Date: Mon, 15 Dec 2008 08:59:21 -0500 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: <10b800400812131435l6f42da16mc9d2c5e69eddd959@mail.gmail.com> References: <10b800400812131435l6f42da16mc9d2c5e69eddd959@mail.gmail.com> Message-ID: Miguel Lobo wrote: >> I think we should not do this. We should use 4 space indents for new >> files, but existing files should not be reindented. If you reindent, >> much of the history of the file is essentially lost -- "svn blame" >> will blame whoever reindented the code, and it's a pain to go back. > > I believe "svn blame -x -w" ignores whitespace changes. > Sounds like Uncle Timmy's whitespace management needs to become a little more draconian. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From josiah.carlson at gmail.com Mon Dec 15 19:50:42 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Mon, 15 Dec 2008 10:50:42 -0800 Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1 In-Reply-To: <49443D6A.9020308@v.loewis.de> References: <49443710.3060102@v.loewis.de> <494439A8.2030208@cheimes.de> <49443D6A.9020308@v.loewis.de> Message-ID: Would anyone mind terribly if I backported a version of: http://bugs.python.org/issue4501 to 2.4 and 2.5? It fixes some strange duplicate data issues on poll() with packets with a nonstandard flag set. - Josiah On Sat, Dec 13, 2008 at 2:55 PM, "Martin v. L?wis" wrote: > Christian Heimes wrote: >> Martin v. L?wis schrieb: >>> 2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases >>> will only include security fixes. According to the release notes, over >>> 100 bugs and patches have been addressed since Python 2.5.1, many of >> ^^^^ >> >> Do you really mean 2.5.1? > > Oops, no - although the statement is technically correct; since 2.5.2, > only 80 bugs have been added :-) > > Thanks for pointing that out. > > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com > From jeremy at alum.mit.edu Mon Dec 15 20:19:39 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Mon, 15 Dec 2008 14:19:39 -0500 Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses In-Reply-To: References: Message-ID: I have a patch that appears to fix this bug http://bugs.python.org/file12361/urllib-chunked.diff but I'm not sure about its interaction with the io module and RawIOBase. Is there a new IO expert who could take a look at it for me? Jeremy On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton wrote: > This bug is pretty serious, because urllib will insert garbage into > the application-visible data for a chunked response. It simply > ignores the fact that it's reading a chunked response and includes the > chunked header data is payload data. The original bug was reported in > September, but no one noticed it. It was reported again recently. > > http://bugs.python.org/issue3761 > http://bugs.python.org/issue4631 > > I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but > that's not my call. > > Jeremy > From brett at python.org Mon Dec 15 20:21:27 2008 From: brett at python.org (Brett Cannon) Date: Mon, 15 Dec 2008 11:21:27 -0800 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: On Mon, Dec 15, 2008 at 00:20, Georg Brandl wrote: > Jeffrey Yasskin schrieb: >> On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum wrote: >>> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou wrote: >>>> Guido van Rossum python.org> writes: >>>>> >>>>> I think we should not do this. We should use 4 space indents for new >>>>> files, but existing files should not be reindented. >>>> >>>> Well, right now many files are indented with a mix of spaces and tabs, depending >>>> on who did the edit and how their editor was configured at the time. >>> >>> That's a shame. We used to have more rigorous standards than allowing that. >>> >>>> Perhaps a graceful policy would be to mandate that all new edits be made with >>>> spaces without touching other functions in the file. Then hopefully the code >>>> base would gradually converge to a tabless scheme. >>> >>> I don't think so. I find local consistency more important than global >>> consistency. A file can become really hard to read when different >>> indentation schemes are used in random parts of the code. >>> >>> If you have a problem configuring your editor, just say so and someone >>> will explain how to do it. >> >> I've never figured out how to configure emacs to deduce whether the >> current file uses spaces or tabs and has a 4 or 8 space indent. I >> always try to get it right anyway, but it'd be a lot more convenient >> if my editor did it for me. If there are such instructions, perhaps >> they should be added to PEPs 7 and 8? > > I use this little hack to detect indentation in Python's C files: > > (defun c-select-style () > "Hack: Select the C style to use from buffer indentation." > (save-excursion > (if (re-search-forward "^\t" 3000 t) > (c-set-style "python") > (c-set-style "python-new")))) > > (add-hook 'c-mode-hook 'c-select-style) > > -- where "python" and "python-new" are two appropriate c-mode styles. > Anyone have something similar for Vim? -Brett From mike.klaas at gmail.com Mon Dec 15 20:40:47 2008 From: mike.klaas at gmail.com (Mike Klaas) Date: Mon, 15 Dec 2008 11:40:47 -0800 Subject: [Python-Dev] Psyco for -OO or -O In-Reply-To: <4943B885.1070605@voidspace.org.uk> References: <4943B885.1070605@voidspace.org.uk> Message-ID: <8EEA1438-116A-4226-8C01-E32F36445D00@gmail.com> On 13-Dec-08, at 5:28 AM, Michael Foord wrote: > Lie Ryan wrote: >> I'm sure probably most of you knows about psyco[1], the optimizer. >> Python has an -O and -OO flag that is intended to be optimization >> flag, but we know that currently it doesn't do much. Why not add >> psyco as standard library and let -O or -OO invoke psyco? >> > > This really belongs on Python-ideas and not Python-dev. > > The main reason why not is that someone(s) from the Python core team > would then need to 'own' maintaining Psyco (which is x86 only as well Worse, it is 32bit only, which has greatly diminished its usefulness in the last few years. -Mike From guido at python.org Mon Dec 15 21:59:30 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Dec 2008 12:59:30 -0800 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: Aha! A specific file. I'm supportive of fixing that specific file. Now if you can figure out how to do it and still allow merging between 2.6 and 3.0 that would be cool. --Guido van Rossum (home page: http://www.python.org/~guido/) On Sun, Dec 14, 2008 at 9:54 AM, Alexandre Vassalotti wrote: > On Sat, Dec 13, 2008 at 5:11 PM, Antoine Pitrou wrote: >> Guido van Rossum python.org> writes: >>> >>> I think we should not do this. We should use 4 space indents for new >>> files, but existing files should not be reindented. >> >> Well, right now many files are indented with a mix of spaces and tabs, depending >> on who did the edit and how their editor was configured at the time. >> > > Personally, I think the indentation of, at least, > Objects/unicodeobject.c should be fixed. This file has become so > mixed-up with tab and space indents that I have no-idea what to use > when I edit it. Just to give an idea how messy it is, they are 5214 > lines indented with tabs and 4272 indented with spaces (out the 9733 > of the file). From jmurphy41 at mac.com Mon Dec 15 20:59:51 2008 From: jmurphy41 at mac.com (Jim Murphy) Date: Mon, 15 Dec 2008 14:59:51 -0500 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? Message-ID: Martin: You wrote: "That's not the issue. Had pymath.o been linked into python, it's symbols would have been exported (is that proper use of English tenses?)" Yes, it's a proper and idiomatic use of the subjunctive mood, which many native (American) English speakers manage to mangle. I also noticed you wrote the following a few emails later on the python-dev list: "Using that would require to split pymath.c into multiple files." My ear tells me that either "that would require splitting pymath,c ..." or "that would require one to split pymat.c ..." is much more grammatical than "that would require to split ...," but I can't cite a rule. It is frequently acceptable to use either the infinitive or the gerund form of a verb, which would imply that "to split" should be interchangeable with "splitting," but I believe that some verbs have preferences for one form over the other. My ear seems to be thinking of the template "require someone to do something," and rebels at hearing the "to" without a "someone." That's the best excuse for a rule I could come up with. I actually spent a half-hour trying to find rules on the uses in English of infinitives versus gerunds and did not find anything definitive. I realize now, to my disgust, that English usage is very badly afflicted with "special casing." Jim Murphy 326 Sunnyview Lane Ithaca, New York 14850-6258 Tel (home): +1 607-319-4161 -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3837 bytes Desc: not available URL: From martin at v.loewis.de Mon Dec 15 22:08:22 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 15 Dec 2008 22:08:22 +0100 Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1 In-Reply-To: References: <49443710.3060102@v.loewis.de> <494439A8.2030208@cheimes.de> <49443D6A.9020308@v.loewis.de> Message-ID: <4946C746.3020009@v.loewis.de> > Would anyone mind terribly if I backported a version of: > http://bugs.python.org/issue4501 to 2.4 and 2.5? Yes, I would. These branches are frozen right now until the final release is made. Afterwards, only security-critical patches are allowed, which this one is not, AFAICT. > It fixes some strange duplicate data issues on poll() with packets > with a nonstandard flag set. People experiencing this should upgrade to 2.6 (when it is fixed there). Regards, Martin From steve at holdenweb.com Mon Dec 15 22:09:49 2008 From: steve at holdenweb.com (Steve Holden) Date: Mon, 15 Dec 2008 16:09:49 -0500 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? In-Reply-To: References: Message-ID: <4946C79D.5060109@holdenweb.com> Jim Murphy wrote: > Martin: > > You wrote: > > "That's not the issue. Had pymath.o been linked into python, it's > symbols would have been exported (is that proper use of English > tenses?)" > It does, however, make the common mistake of putting an apostrophe in a possessive personal pronoun. > Yes, it's a proper and idiomatic use of the subjunctive mood, which > many native (American) English speakers manage to mangle. > > I also noticed you wrote the following a few emails later on the > python-dev list: > > "Using that would require to split pymath.c into multiple files." > > My ear tells me that either "that would require splitting pymath,c ..." > or "that would require one to split pymat.c ..." is much more grammatical > than "that would require to split ...," but I can't cite a rule. It is > frequently > acceptable to use either the infinitive or the gerund form of a verb, > which > would imply that "to split" should be interchangeable with "splitting," but > I believe that some verbs have preferences for one form over the other. > > My ear seems to be thinking of the template "require someone to do > something," and rebels at hearing the "to" without a "someone." That's > the best excuse for a rule I could come up with. > > I actually spent a half-hour trying to find rules on the uses in English of > infinitives versus gerunds and did not find anything definitive. I realize > now, to my disgust, that English usage is very badly afflicted with > "special casing." > This is only significant because Martin is a perfectionist who wants to write better English. I can't remember a time when his slightly-less-than-perfect command of the language rendered anything he wrote incomprehensible. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From steve at holdenweb.com Mon Dec 15 22:09:49 2008 From: steve at holdenweb.com (Steve Holden) Date: Mon, 15 Dec 2008 16:09:49 -0500 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? In-Reply-To: References: Message-ID: <4946C79D.5060109@holdenweb.com> Jim Murphy wrote: > Martin: > > You wrote: > > "That's not the issue. Had pymath.o been linked into python, it's > symbols would have been exported (is that proper use of English > tenses?)" > It does, however, make the common mistake of putting an apostrophe in a possessive personal pronoun. > Yes, it's a proper and idiomatic use of the subjunctive mood, which > many native (American) English speakers manage to mangle. > > I also noticed you wrote the following a few emails later on the > python-dev list: > > "Using that would require to split pymath.c into multiple files." > > My ear tells me that either "that would require splitting pymath,c ..." > or "that would require one to split pymat.c ..." is much more grammatical > than "that would require to split ...," but I can't cite a rule. It is > frequently > acceptable to use either the infinitive or the gerund form of a verb, > which > would imply that "to split" should be interchangeable with "splitting," but > I believe that some verbs have preferences for one form over the other. > > My ear seems to be thinking of the template "require someone to do > something," and rebels at hearing the "to" without a "someone." That's > the best excuse for a rule I could come up with. > > I actually spent a half-hour trying to find rules on the uses in English of > infinitives versus gerunds and did not find anything definitive. I realize > now, to my disgust, that English usage is very badly afflicted with > "special casing." > This is only significant because Martin is a perfectionist who wants to write better English. I can't remember a time when his slightly-less-than-perfect command of the language rendered anything he wrote incomprehensible. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From victor.stinner at haypocalc.com Mon Dec 15 22:14:16 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Mon, 15 Dec 2008 22:14:16 +0100 Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1 In-Reply-To: References: <49443710.3060102@v.loewis.de> <49443D6A.9020308@v.loewis.de> Message-ID: <200812152214.16260.victor.stinner@haypocalc.com> Le Monday 15 December 2008 19:50:42 Josiah Carlson, vous avez ?crit?: > Would anyone mind terribly if I backported a version of: > http://bugs.python.org/issue4501 to 2.4 and 2.5? First the patch have be reviewed and at least applied to trunk :-) Can you give an short example to describe the bug? Maybe write an unit test? I don't know poll(), so I can't help for this issue. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From ncoghlan at gmail.com Mon Dec 15 22:17:57 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Dec 2008 07:17:57 +1000 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? In-Reply-To: <4946C79D.5060109@holdenweb.com> References: <4946C79D.5060109@holdenweb.com> Message-ID: <4946C985.40701@gmail.com> Steve Holden wrote: > This is only significant because Martin is a perfectionist who wants to > write better English. I can't remember a time when his > slightly-less-than-perfect command of the language rendered anything he > wrote incomprehensible. I'd actually criticise the written communication abilities of many of my native English speaking friends long before I'd criticise the English writing of most of the non-Native English speakers on this list (i.e. most of the writing here is of a higher standard than many native English speakers could manage). This particular thread of discussion does appear to be veering a little off topic though :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From martin at v.loewis.de Mon Dec 15 22:24:18 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 15 Dec 2008 22:24:18 +0100 Subject: [Python-Dev] How to force export of a particular symbol from python.exe? In-Reply-To: <4946C985.40701@gmail.com> References: <4946C79D.5060109@holdenweb.com> <4946C985.40701@gmail.com> Message-ID: <4946CB02.6000206@v.loewis.de> > This particular thread of discussion does appear to be veering a little > off topic though :) And I apologize for starting it :-) Martin From scott+python-dev at scottdial.com Mon Dec 15 22:25:24 2008 From: scott+python-dev at scottdial.com (Scott Dial) Date: Mon, 15 Dec 2008 16:25:24 -0500 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: <4946CB44.9070800@scottdial.com> Guido van Rossum wrote: > Aha! A specific file. I'm supportive of fixing that specific file. Now > if you can figure out how to do it and still allow merging between 2.6 > and 3.0 that would be cool. Like "svn blame", you can use "svn merge -x -w" to avoid merging whitespace changes. However, svnmerge.py does not support any of these command-line flags being passed along to the svn command-line. It should be pretty easy to hack in, if it was desirable. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From martin at v.loewis.de Mon Dec 15 22:28:42 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 15 Dec 2008 22:28:42 +0100 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: <4946CC0A.6050109@v.loewis.de> > Aha! A specific file. I'm supportive of fixing that specific file. Now > if you can figure out how to do it and still allow merging between 2.6 > and 3.0 that would be cool. In the specific case, I think it's best to fix the 2.7 source, and then merge the changes into 3k. The 3.x version is still similar to the 2.x version, except for a number of additions (such as interning). The changes should probably then also merged into the 2.6 and 3.0 branches, to allow easy merging in the future. Backporting to 2.5 will become difficult; it will also become unnecessary. Regards, Martin From alexandre at peadrop.com Mon Dec 15 22:40:47 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 15 Dec 2008 16:40:47 -0500 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: On Mon, Dec 15, 2008 at 3:59 PM, Guido van Rossum wrote: > Aha! A specific file. I'm supportive of fixing that specific file. Now > if you can figure out how to do it and still allow merging between 2.6 > and 3.0 that would be cool. > Here's the simplest solution I thought so far to allow smooth merging subsequently. First, fix the 2.6 version with 4-space indent. Over a third of the file is already using spaces for indentation, so I don't think losing consistency is a big deal. Then, block the trunk commit with svnmerge to prevent it from being merged back to the py3k branch. Finally, fix the 3.0 version. -- Alexandre From josiah.carlson at gmail.com Mon Dec 15 22:58:34 2008 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Mon, 15 Dec 2008 13:58:34 -0800 Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3, release candidate 1 In-Reply-To: <200812152214.16260.victor.stinner@haypocalc.com> References: <49443710.3060102@v.loewis.de> <49443D6A.9020308@v.loewis.de> <200812152214.16260.victor.stinner@haypocalc.com> Message-ID: On Mon, Dec 15, 2008 at 1:14 PM, Victor Stinner wrote: > Le Monday 15 December 2008 19:50:42 Josiah Carlson, vous avez ?crit : >> Would anyone mind terribly if I backported a version of: >> http://bugs.python.org/issue4501 to 2.4 and 2.5? > > First the patch have be reviewed and at least applied to trunk :-) > > Can you give an short example to describe the bug? Maybe write an unit test? > > I don't know poll(), so I can't help for this issue. One of our 3rd party users of asyncore, ftpdlib by Giampaolo Rodola, discovered a duplicate data issue related to data with the urgent data flag attached to TCP packets. I don't know the underlying source of the issue (it smells like a buffer duplication bug, but I can't see that asyncore is doing it), but the patch does fix the issue. But with policies being "only security issues are backported to 2.4 and 2.5", and this is definitely not a security issue, I won't backport it. - Josiah From skip at pobox.com Tue Dec 16 01:44:05 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 15 Dec 2008 18:44:05 -0600 Subject: [Python-Dev] [Python-3000] python-3000 list is closed In-Reply-To: <4946D870.7000308@v.loewis.de> References: <4946D870.7000308@v.loewis.de> Message-ID: <18758.63957.659035.419926@montanaro-dyndns-org.local> Martin> The mailing list python-3000 at python.org is now closed. All Martin> further discussion of Python 3.x takes place on Martin> python-dev at python.org. Maybe set up a simple email alias reflecting python-3000 to python-dev? Skip From barry at python.org Tue Dec 16 02:07:19 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 15 Dec 2008 20:07:19 -0500 Subject: [Python-Dev] [Python-3000] python-3000 list is closed In-Reply-To: <18758.63957.659035.419926@montanaro-dyndns-org.local> References: <4946D870.7000308@v.loewis.de> <18758.63957.659035.419926@montanaro-dyndns-org.local> Message-ID: <22196DA6-7DBD-48C2-B15F-42DCD1C0F88F@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 15, 2008, at 7:44 PM, skip at pobox.com wrote: > > Martin> The mailing list python-3000 at python.org is now closed. All > Martin> further discussion of Python 3.x takes place on > Martin> python-dev at python.org. > > Maybe set up a simple email alias reflecting python-3000 to python- > dev? Or, https://launchpad.net/replybot - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSUb/R3EjvBPtnXfVAQJ4sgP/Wy8ma4nzcYQ5gXVCw2TpODq5l/duzB+I f3ej5tSyvI2wzf+OTQQwth5A0xySB8LoGbSQsYwhvbA+3xXOe1lIYeVYUGru9Y4T xs1axRgydTwxAFgHBdjrY7tLhXH4GOed0xYvbu6b3tRslb+4agmOhluX4WCBRZH+ sgIW0XL7nsI= =Nrdo -----END PGP SIGNATURE----- From brad at python.org Tue Dec 16 02:55:09 2008 From: brad at python.org (Brad Knowles) Date: Mon, 15 Dec 2008 19:55:09 -0600 Subject: [Python-Dev] [Python-3000] python-3000 list is closed In-Reply-To: <22196DA6-7DBD-48C2-B15F-42DCD1C0F88F@python.org> References: <4946D870.7000308@v.loewis.de> <18758.63957.659035.419926@montanaro-dyndns-org.local> <22196DA6-7DBD-48C2-B15F-42DCD1C0F88F@python.org> Message-ID: <49470A7D.3000301@python.org> Barry Warsaw wrote: >> Maybe set up a simple email alias reflecting python-3000 to python-dev? > > > Or, > > https://launchpad.net/replybot If we're going to leave something configured in Mailman, it already has an auto-reply functionality. It would be nearly trivial to set that up. -- Brad Knowles Member of the Python.org Postmaster Team & Co-Moderator of the mailman-users and mailman-developers mailing lists From bharat.satsangi at gmail.com Tue Dec 16 06:02:18 2008 From: bharat.satsangi at gmail.com (bharat satsangi) Date: Tue, 16 Dec 2008 10:32:18 +0530 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: <15c21a910812152102i6e0bf349k6981f4f04ff5808d@mail.gmail.com> please unsubscribe me On Tue, Dec 16, 2008 at 12:51 AM, Brett Cannon wrote: > On Mon, Dec 15, 2008 at 00:20, Georg Brandl wrote: > > Jeffrey Yasskin schrieb: > >> On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum > wrote: > >>> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou > wrote: > >>>> Guido van Rossum python.org> writes: > >>>>> > >>>>> I think we should not do this. We should use 4 space indents for new > >>>>> files, but existing files should not be reindented. > >>>> > >>>> Well, right now many files are indented with a mix of spaces and tabs, > depending > >>>> on who did the edit and how their editor was configured at the time. > >>> > >>> That's a shame. We used to have more rigorous standards than allowing > that. > >>> > >>>> Perhaps a graceful policy would be to mandate that all new edits be > made with > >>>> spaces without touching other functions in the file. Then hopefully > the code > >>>> base would gradually converge to a tabless scheme. > >>> > >>> I don't think so. I find local consistency more important than global > >>> consistency. A file can become really hard to read when different > >>> indentation schemes are used in random parts of the code. > >>> > >>> If you have a problem configuring your editor, just say so and someone > >>> will explain how to do it. > >> > >> I've never figured out how to configure emacs to deduce whether the > >> current file uses spaces or tabs and has a 4 or 8 space indent. I > >> always try to get it right anyway, but it'd be a lot more convenient > >> if my editor did it for me. If there are such instructions, perhaps > >> they should be added to PEPs 7 and 8? > > > > I use this little hack to detect indentation in Python's C files: > > > > (defun c-select-style () > > "Hack: Select the C style to use from buffer indentation." > > (save-excursion > > (if (re-search-forward "^\t" 3000 t) > > (c-set-style "python") > > (c-set-style "python-new")))) > > > > (add-hook 'c-mode-hook 'c-select-style) > > > > -- where "python" and "python-new" are two appropriate c-mode styles. > > > > Anyone have something similar for Vim? > > -Brett > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/bharat.satsangi%40gmail.com > -- Thanks and Regards Bharat +91-9888674137 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Dec 16 08:15:33 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 16 Dec 2008 02:15:33 -0500 Subject: [Python-Dev] [Python-3000] python-3000 list is closed In-Reply-To: <18758.63957.659035.419926@montanaro-dyndns-org.local> References: <4946D870.7000308@v.loewis.de> <18758.63957.659035.419926@montanaro-dyndns-org.local> Message-ID: skip at pobox.com wrote: > Martin> The mailing list python-3000 at python.org is now closed. All > Martin> further discussion of Python 3.x takes place on > Martin> python-dev at python.org. > > Maybe set up a simple email alias reflecting python-3000 to python-dev? It is currently mirrored to news.gmane.org (which is how I have read it and all other Python lists). I presume they will want to keep their archive (or maybe not), but whatever bot is set up might take their bots into consideration, unless there is a way to explicitly close it on their end. (I am ignorant of this sort of stuff, but appreciate the mirror.) tjr From kirklin.mcdonald at gmail.com Tue Dec 16 08:58:09 2008 From: kirklin.mcdonald at gmail.com (Kirk McDonald) Date: Mon, 15 Dec 2008 23:58:09 -0800 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: <5d44f72f0812140943y652c89dej7f09e36fcb3242a6@mail.gmail.com> Message-ID: <25bd58d10812152358n38928e51j89fe19288f5a7cfd@mail.gmail.com> On Mon, Dec 15, 2008 at 11:21 AM, Brett Cannon wrote: > On Mon, Dec 15, 2008 at 00:20, Georg Brandl wrote: > > Jeffrey Yasskin schrieb: > >> On Sun, Dec 14, 2008 at 8:26 AM, Guido van Rossum > wrote: > >>> On Sat, Dec 13, 2008 at 2:11 PM, Antoine Pitrou > wrote: > >>>> Guido van Rossum python.org> writes: > >>>>> > >>>>> I think we should not do this. We should use 4 space indents for new > >>>>> files, but existing files should not be reindented. > >>>> > >>>> Well, right now many files are indented with a mix of spaces and tabs, > depending > >>>> on who did the edit and how their editor was configured at the time. > >>> > >>> That's a shame. We used to have more rigorous standards than allowing > that. > >>> > >>>> Perhaps a graceful policy would be to mandate that all new edits be > made with > >>>> spaces without touching other functions in the file. Then hopefully > the code > >>>> base would gradually converge to a tabless scheme. > >>> > >>> I don't think so. I find local consistency more important than global > >>> consistency. A file can become really hard to read when different > >>> indentation schemes are used in random parts of the code. > >>> > >>> If you have a problem configuring your editor, just say so and someone > >>> will explain how to do it. > >> > >> I've never figured out how to configure emacs to deduce whether the > >> current file uses spaces or tabs and has a 4 or 8 space indent. I > >> always try to get it right anyway, but it'd be a lot more convenient > >> if my editor did it for me. If there are such instructions, perhaps > >> they should be added to PEPs 7 and 8? > > > > I use this little hack to detect indentation in Python's C files: > > > > (defun c-select-style () > > "Hack: Select the C style to use from buffer indentation." > > (save-excursion > > (if (re-search-forward "^\t" 3000 t) > > (c-set-style "python") > > (c-set-style "python-new")))) > > > > (add-hook 'c-mode-hook 'c-select-style) > > > > -- where "python" and "python-new" are two appropriate c-mode styles. > > > > Anyone have something similar for Vim? > > -Brett > Something along the lines of: :fu Select_c_style() : if search('^\t') : set noet " etc. : el : set et " etc. : en :endf :au BufRead *.[ch] call Select_c_style() -Kirk McDonald -------------- next part -------------- An HTML attachment was scrubbed... URL: From syfou at users.sourceforge.net Tue Dec 16 09:44:06 2008 From: syfou at users.sourceforge.net (Sylvain Fourmanoit) Date: Tue, 16 Dec 2008 03:44:06 -0500 (EST) Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: On Sat, 13 Dec 2008, Guido van Rossum wrote: > If you reindent, much of the history of the file is essentially lost -- > "svn blame" will blame whoever reindented the code, and it's a pain to > go back. I am not a subversion specialist, but it appears this part can be handled gracefully by passing -b (ignore space change) to an external diff command svn blame can rely on (svn blame -x -ub ...). At least, it seems to work on my station (GNU Diffutils, Subversion 1.5.1)! -- Sylvain From techtonik at gmail.com Tue Dec 16 10:26:55 2008 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 16 Dec 2008 11:26:55 +0200 Subject: [Python-Dev] Reindenting the C code base? In-Reply-To: References: Message-ID: On Sat, Dec 13, 2008 at 11:26 PM, Guido van Rossum wrote: > > I think we should not do this. We should use 4 space indents for new > files, but existing files should not be reindented. If you reindent, > much of the history of the file is essentially lost -- "svn blame" > will blame whoever reindented the code, and it's a pain to go back. > There's also the issue of merging between the 2.x and 3.x branches, > which we still do. "svnadmin dump" produces pretty munchable text file to pretend that there were no tabs at all. The problem may be to sync working copies with old new repository. http://svnbook.red-bean.com/en/1.5/svn.ref.svnadmin.c.dump.html svn pre-commit hook can be used to avoid any unescaped tabs in future commits. http://svnbook.red-bean.com/en/1.5/svn.ref.reposhooks.pre-commit.html Adding pre-commit hook is better than adding editor-specific comments, because it doesn't require your editor to support the syntax - regardless of editor you will have to convert tabs file to spaces anyway. -- --anatoly t. From krstic at solarsail.hcs.harvard.edu Tue Dec 16 22:37:26 2008 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=) Date: Tue, 16 Dec 2008 16:37:26 -0500 Subject: [Python-Dev] Trap SIGSEGV and SIGFPE In-Reply-To: <49417293.50506@v.loewis.de> References: <200812101206.49316.victor.stinner@haypocalc.com> <49404CEB.8040900@v.loewis.de> <49417293.50506@v.loewis.de> Message-ID: <68D5B02F-A716-4E66-86FF-B50A0FAEFF4E@solarsail.hcs.harvard.edu> On Dec 11, 2008, at 3:05 PM, Martin v. L?wis wrote: > If it is actually possible to print a stack trace, that could be > useful indeed. I'm then skeptical that this is possible in the > general case (i.e. displaying the full C stack), but displaying > (parts of) the Python stack might be possible. I think it should > still proceed to dump core, so that you can then inspect the core > with a proper debugger. +1. Victor, any interest in attempting to retool your patch in this direction? -- Ivan Krsti? | http://radian.org From krstic at solarsail.hcs.harvard.edu Tue Dec 16 22:43:40 2008 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=) Date: Tue, 16 Dec 2008 16:43:40 -0500 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <49443B7F.8020602@v.loewis.de> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <319e029f0812120252n515087acrfab5f8934e7603c4@mail.gmail.com> <4943B974.6020407@voidspace.org.uk> <49443B7F.8020602@v.loewis.de> Message-ID: <8C214F23-C8D3-49F3-BC9B-0D945218EB0E@solarsail.hcs.harvard.edu> On Dec 13, 2008, at 5:47 PM, Martin v. L?wis wrote: > They were originally invented in 1965, on Multics (1970) they were > used to perform compilation in the background. When Unix came along, > it *added* address space separation, introducing what is now known > as processes. Yes, and a lot of the subsequent interest in threads came due to the historically debilitating overhead of fork() on some important Unices, notably Solaris. -- Ivan Krsti? | http://radian.org From solipsis at pitrou.net Tue Dec 16 22:53:23 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 16 Dec 2008 21:53:23 +0000 (UTC) Subject: [Python-Dev] Calling the GC less often when there are lots of long-lived objects Message-ID: Hello, There are recurring complaints about the garbage collector degrading performance when lots of objects are created in a row. In issue #4074, I've proposed a patch which basically implements Martin's suggestion in http://mail.python.org/pipermail/python-dev/2008-June/080579.html to base the decision to do a full collection on the ratio between the number of objects surviving the (n-1) generation collection and the number of long-lived objects. I've also added a condition so that this new behaviour is only triggered when there are more than 10000 long-lived objects -- therefore, cycles will still get collected quickly in lightweight programs. In Gregory's simple test of storing many tuples in a list, the behaviour has indeed changed from exponential to linear. Is anybody opposed to the principle of this proposal? Antoine. From greg.ewing at canterbury.ac.nz Wed Dec 17 01:27:39 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Dec 2008 13:27:39 +1300 Subject: [Python-Dev] Calling the GC less often when there are lots of long-lived objects In-Reply-To: References: Message-ID: <4948477B.20609@canterbury.ac.nz> Antoine Pitrou wrote: > I've proposed a patch > which basically implements Martin's suggestion in > http://mail.python.org/pipermail/python-dev/2008-June/080579.html > > Is anybody opposed to the principle of this proposal? Sounds okay to me. -- Greg From lists at cheimes.de Wed Dec 17 01:51:21 2008 From: lists at cheimes.de (Christian Heimes) Date: Wed, 17 Dec 2008 01:51:21 +0100 Subject: [Python-Dev] Calling the GC less often when there are lots of long-lived objects In-Reply-To: References: Message-ID: <49484D09.4040202@cheimes.de> Antoine Pitrou schrieb: > Is anybody opposed to the principle of this proposal? Is it reasonable to implement multiple policies so the user can switch between them? Or is the new algorithm superior in all cases? From solipsis at pitrou.net Wed Dec 17 02:00:56 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 17 Dec 2008 01:00:56 +0000 (UTC) Subject: [Python-Dev] Calling the GC less often when there are lots of long-lived objects References: <49484D09.4040202@cheimes.de> Message-ID: Christian Heimes cheimes.de> writes: > > Is it reasonable to implement multiple policies so the user can switch > between them? Or is the new algorithm superior in all cases? We could let the user configure the threshold between the old policy and the new policy. Currently it is hard-wired to a value of 10000 (that is, 10000 long-lived objects tracked by the GC). From martin.hellwig at dcuktec.org Tue Dec 16 21:12:51 2008 From: martin.hellwig at dcuktec.org (Martin P. Hellwig) Date: Tue, 16 Dec 2008 20:12:51 +0000 Subject: [Python-Dev] =?windows-1252?q?=5BANN=5D_EuroPython_2009_=96_Call_?= =?windows-1252?q?for_Participation!?= Message-ID: <49480BC3.9030002@dcuktec.org> On behalf of the EuroPython 2009 organisation it is my privilege and honour to announce the 'Call for Participation' for EuroPython 2009! EuroPython is the conference for the communities around Python, including the Django, Zope and Plone communities. This years conference will be held in Birmingham, UK from Monday 29th June to Saturday 4th July 2009. Talk & Themes Do you have something you wish to present at EuroPython? Go to http://www.europython.eu/talks/cfp/ for this years themes and submissions criteria, the deadline is on 5th April 2009. Other Talks, Activities and Events Have you got something which does not fit the above? Visit http://www.europython.eu/talks/ . Help Us Out We could use a hand any contribution is welcome, please take a look at http://www.europython.eu/contact/ . Sponsors An unique opportunity to affiliate with the prestigious EuroPython conference! http://www.europython.eu/sponsors/ Spread the Word Improve our publicity by distributing this announcement in your corner of the community, please coordinate this with the organizers: http://www.europython.eu/contact/ General Information For more information about the conference, please visit http://www.europython.eu/ Looking forward to see you! The EuroPython Team From bioinformed at gmail.com Wed Dec 17 16:37:08 2008 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Wed, 17 Dec 2008 10:37:08 -0500 Subject: [Python-Dev] Calling the GC less often when there are lots of long-lived objects In-Reply-To: References: <49484D09.4040202@cheimes.de> Message-ID: <2e1434c10812170737q603acb03va3d3a46a459546dd@mail.gmail.com> On Tue, Dec 16, 2008 at 8:00 PM, Antoine Pitrou wrote: > Christian Heimes cheimes.de> writes: > > > > Is it reasonable to implement multiple policies so the user can switch > > between them? Or is the new algorithm superior in all cases? > > > I'll test your patch, as I currently have to micro-manage the garbage collector in several of my algorithms or else they degenerate into almost continuous collection. Results in a day or two. ~Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 17 19:05:29 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Dec 2008 10:05:29 -0800 Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses In-Reply-To: References: Message-ID: The inheritance from io.RawIOBase seems fine. --Guido van Rossum (home page: http://www.python.org/~guido/) On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton wrote: > I have a patch that appears to fix this bug > http://bugs.python.org/file12361/urllib-chunked.diff > but I'm not sure about its interaction with the io module and > RawIOBase. Is there a new IO expert who could take a look at it for > me? > > Jeremy > > On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton wrote: >> This bug is pretty serious, because urllib will insert garbage into >> the application-visible data for a chunked response. It simply >> ignores the fact that it's reading a chunked response and includes the >> chunked header data is payload data. The original bug was reported in >> September, but no one noticed it. It was reported again recently. >> >> http://bugs.python.org/issue3761 >> http://bugs.python.org/issue4631 >> >> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but >> that's not my call. >> >> Jeremy >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > From martin at v.loewis.de Wed Dec 17 22:05:50 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 17 Dec 2008 22:05:50 +0100 Subject: [Python-Dev] Please test OSX installer Message-ID: <494969AE.3060805@v.loewis.de> I just created an OSX installer for 2.5.3c1. As it's the first time I do that, I'd appreciate if somebody could test it and report whether it works (as well as the 2.5.2 one did). http://www.python.org/download/releases/2.5.3/ Regards, Martin From solipsis at pitrou.net Wed Dec 17 23:02:10 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 17 Dec 2008 22:02:10 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Calling_the_GC_less_often_when_there_are_l?= =?utf-8?q?ots_of=09long-lived_objects?= References: <49484D09.4040202@cheimes.de> Message-ID: Antoine Pitrou pitrou.net> writes: > > We could let the user configure the threshold between the old policy and the new > policy. Currently it is hard-wired to a value of 10000 (that is, 10000 > long-lived objects tracked by the GC). I've removed the threshold in the latest patches because it didn't make much sense when a few long-lived objects contained a lot of objects not tracked by the GC. Another improvement I've included in the latest patches (but which is orthogonal to the algorithmic change) is that simple tuples and even simple dicts are not tracked by the GC if they don't need to. A few examples (gc.is_tracked() is a new function which returns True if an object is tracked by the GC): >>> import gc >>> gc.is_tracked(()) False >>> gc.is_tracked((1,2)) False >>> gc.is_tracked((1,(2, "a", None))) False >>> gc.is_tracked((1,(2, "a", None, {}))) True >>> d = {} >>> gc.is_tracked(d) False >>> d[1,2] = 3,4 >>> gc.is_tracked(d) False >>> d[5] = None, "a", (1,2,3) >>> gc.is_tracked(d) False >>> d[6] = {} >>> gc.is_tracked(d) True >>> gc.is_tracked(d[6]) False Regards Antoine. From guido at python.org Wed Dec 17 23:03:38 2008 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Dec 2008 14:03:38 -0800 Subject: [Python-Dev] Please test OSX installer In-Reply-To: <494969AE.3060805@v.loewis.de> References: <494969AE.3060805@v.loewis.de> Message-ID: Worked flawlessly both on an x86 MacBook Pro running Leopard (10.5) and a ppc PowerBook G4 running Tiget (10.4). The only issue is that the Python logo makes the text in the sidebar of the installer hard to read. I didn't test the GUI app. Thanks for doing this! --Guido van Rossum (home page: http://www.python.org/~guido/) On Wed, Dec 17, 2008 at 1:05 PM, "Martin v. L?wis" wrote: > I just created an OSX installer for 2.5.3c1. As it's the first time > I do that, I'd appreciate if somebody could test it and report whether > it works (as well as the 2.5.2 one did). > > http://www.python.org/download/releases/2.5.3/ From alexander.belopolsky at gmail.com Wed Dec 17 23:04:12 2008 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 17 Dec 2008 17:04:12 -0500 Subject: [Python-Dev] Please test OSX installer In-Reply-To: <494969AE.3060805@v.loewis.de> References: <494969AE.3060805@v.loewis.de> Message-ID: I've installed it on a MacBook Air running Leopard (10.5.6). Installer ran like a charm, but when I ran the following in IDLE: >>> from test.regrtest import main >>> main() I got a "Problem Report for Python" pop-up. Skip to "///" for "Problem Details". Interestingly, the test completed with the following report: 286 tests OK. 3 tests failed: test_descr test_file test_subprocess ... 3 skips unexpected on darwin: test_ioctl test_bsddb185 test_univnewlines This suggests that the crash was in a subprocess. /// Process: Python [1203] Path: /Applications/MacPython 2.5/IDLE.app/Contents/MacOS/Python Identifier: Python Version: ??? (???) Code Type: X86 (Native) Parent Process: Python [1027] Date/Time: 2008-12-17 16:57:52.804 -0500 OS Version: Mac OS X 10.5.6 (9G55) Report Version: 6 Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Crashed Thread: 0 Thread 0 Crashed: 0 libSystem.B.dylib 0x90c70e42 __kill + 10 1 libSystem.B.dylib 0x90ce323a raise + 26 2 libSystem.B.dylib 0x90cef679 abort + 73 3 org.python.python 0x004bd33f posix_getloadavg + 0 (posixmodule.c:7961) 4 org.python.python 0x0048571e PyEval_EvalFrameEx + 18973 (ceval.c:3596) 5 org.python.python 0x00487731 PyEval_EvalCodeEx + 1819 (ceval.c:2875) 6 org.python.python 0x004878e5 PyEval_EvalCode + 87 (ceval.c:520) 7 org.python.python 0x004ab810 PyRun_StringFlags + 243 (pythonrun.c:1273) 8 org.python.python 0x004ab8d7 PyRun_SimpleStringFlags + 72 (pythonrun.c:900) 9 org.python.python 0x004b84c5 Py_Main + 1296 (main.c:521) 10 Python 0x00001f8e 0x1000 + 3982 11 Python 0x00001eb5 0x1000 + 3765 Thread 0 crashed with X86 Thread State (32-bit): eax: 0x00000000 ebx: 0x90cef639 ecx: 0xbffff1dc edx: 0x90c70e42 edi: 0x008001c0 esi: 0x00000000 ebp: 0xbffff1f8 esp: 0xbffff1dc ss: 0x0000001f efl: 0x00000282 eip: 0x90c70e42 cs: 0x00000007 ds: 0x0000001f es: 0x0000001f fs: 0x00000000 gs: 0x00000037 cr2: 0x0048d191 Binary Images: 0x1000 - 0x1fff +Python ??? (???) /Applications/MacPython 2.5/IDLE.app/Contents/MacOS/Python 0x3f1000 - 0x4e7fe3 +org.python.python 2.5a0 (2.5) /Library/Frameworks/Python.framework/Versions/2.5/Python 0x8fe00000 - 0x8fe2db43 dyld 97.1 (???) <100d362e03410f181a34e04e94189ae5> /usr/lib/dyld 0x90c02000 - 0x90d69ff3 libSystem.B.dylib ??? (???) /usr/lib/libSystem.B.dylib 0x946bd000 - 0x946c1fff libmathCommon.A.dylib ??? (???) /usr/lib/system/libmathCommon.A.dylib 0xffff0000 - 0xffff1780 libSystem.B.dylib ??? (???) /usr/lib/libSystem.B.dylib On Wed, Dec 17, 2008 at 4:05 PM, "Martin v. L?wis" wrote: > I just created an OSX installer for 2.5.3c1. As it's the first time > I do that, I'd appreciate if somebody could test it and report whether > it works (as well as the 2.5.2 one did). > > http://www.python.org/download/releases/2.5.3/ > > Regards, > Martin > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexander.belopolsky%40gmail.com > From greg.ewing at canterbury.ac.nz Wed Dec 17 23:52:36 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Dec 2008 11:52:36 +1300 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <49423856.30705@gmail.com> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> Message-ID: <494982B4.5040602@canterbury.ac.nz> Nick Coghlan wrote: > Actually, I believe 3.0 already took a big step towards allowing this by > changing the way modules are initialised. It's a step, but I wouldn't call it a big one. There are many other problems to be solved before fully independent interpreters are possible. -- Greg From martin at v.loewis.de Wed Dec 17 23:55:54 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Wed, 17 Dec 2008 23:55:54 +0100 Subject: [Python-Dev] Calling the GC less often when there are lots of long-lived objects In-Reply-To: References: <49484D09.4040202@cheimes.de> Message-ID: <4949837A.7080900@v.loewis.de> > I've removed the threshold in the latest patches because it didn't make much > sense when a few long-lived objects contained a lot of objects not tracked by > the GC. > > Another improvement I've included in the latest patches (but which is > orthogonal to the algorithmic change) is that simple tuples and even simple > dicts are not tracked by the GC if they don't need to. A few examples > (gc.is_tracked() is a new function which returns True if an object is tracked > by the GC): As they are orthogonal, I think they should be considered separately, but in particular committed separately. FWIW, I'm in favor of both (but haven't reviewed the non-cyclic tuples one yet). So despite the organizational overhead, I'd appreciate if you could create separate patches, if not separate issues. Regards, Martin From solipsis at pitrou.net Thu Dec 18 00:06:53 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 17 Dec 2008 23:06:53 +0000 (UTC) Subject: [Python-Dev] Calling the GC less often when there are lots of long-lived objects References: <49484D09.4040202@cheimes.de> <4949837A.7080900@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > So despite the organizational overhead, I'd appreciate if you could > create separate patches, if not separate issues. Ok, I'm gonna do that. Regards Antoine. From arnarbi at gmail.com Thu Dec 18 02:33:35 2008 From: arnarbi at gmail.com (Arnar Birgisson) Date: Thu, 18 Dec 2008 01:33:35 +0000 Subject: [Python-Dev] Atomic instructions for reference count increment/decrement Message-ID: <28012bc60812171733h5cd315cjbaf82e28eac202de@mail.gmail.com> Hi all, I'm new here, so bear with me. I tried googling this, but the closest I came up with was a post from 2000. >From the discussion about getting rid of the GIL lately, what I read from it is that reference counting is the main obstacle. My question is, why aren't hardware supported atomic increments and decrements being used for the reference counters? As far as I'm told they are available on most modern platforms (on x86 it is the LOCK instruction prefix) and these incur little overhead. I'd be very happy with pointers to previous discussion on the matter or simple arguments why this would not apply to the Python reference counting mechanism. cheers, Arnar From daniel at stutzbachenterprises.com Thu Dec 18 04:18:26 2008 From: daniel at stutzbachenterprises.com (Daniel Stutzbach) Date: Wed, 17 Dec 2008 21:18:26 -0600 Subject: [Python-Dev] Atomic instructions for reference count increment/decrement In-Reply-To: <28012bc60812171733h5cd315cjbaf82e28eac202de@mail.gmail.com> References: <28012bc60812171733h5cd315cjbaf82e28eac202de@mail.gmail.com> Message-ID: On Wed, Dec 17, 2008 at 7:33 PM, Arnar Birgisson wrote: > >From the discussion about getting rid of the GIL lately, what I read > from it is that reference counting is the main obstacle. My question > is, why aren't hardware supported atomic increments and decrements > being used for the reference counters? As far as I'm told they are > available on most modern platforms (on x86 it is the LOCK instruction > prefix) True. > and these incur little overhead. False, due to the costs of maintaining cache coherency. I'd be very happy with pointers to previous discussion on the matter > or simple arguments why this would not apply to the Python reference > counting mechanism. > Adam Olsen actually tried it. See: http://mail.python.org/pipermail/python-dev/2007-September/074645.html Other message in that thread describe the problem in more detail. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Dec 18 12:47:37 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 18 Dec 2008 11:47:37 +0000 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <494982B4.5040602@canterbury.ac.nz> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz> Message-ID: <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com> 2008/12/17 Greg Ewing : > Nick Coghlan wrote: > >> Actually, I believe 3.0 already took a big step towards allowing this by >> changing the way modules are initialised. > > It's a step, but I wouldn't call it a big one. There are > many other problems to be solved before fully independent > interpreters are possible. Do you know if these remaining problems are listed anywhere? AIUI, certain software (for example mod_python) has been using multiple interpreters for a long while now - admittedly not without issues, but certainly enough to imply that multiple interpreters are at least "possible" - although not perfect. Experience with such software would probably be a great guide to where the issues exist. Maybe a page on the Python Wiki, or a FAQ entry, would be useful here. If only to make things explicit, and clear up some of the FUD around multiple interpreters. Paul. From jeremy at alum.mit.edu Thu Dec 18 14:22:29 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Dec 2008 08:22:29 -0500 Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses In-Reply-To: References: Message-ID: On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum wrote: > The inheritance from io.RawIOBase seems fine. There is a small problem with the interaction between HTTPResponse and RawIOBase, but I think the problem is more on the http side. You may recall that the HTTP code has a habit of closing the connection for you. In a variety of cases, once you've read the last bytes of the response, the HTTPResponse object calls its own close() method. This interacts poorly with RawIOBase, because it raises a ValueError for any operation on a closed io object. This prevents iterators from working correctly. The iterator implementation expects the final call to readline() to return an empty string and converts that to a StopIteration. Instead, it's seeing a ValueError that propagates out. It's always been odd to me that the connection closed itself. It's going to be tricky to fix the current bug (chunked responses) and keep the self-closing behavior, but I worry that change the self-closing behavior too dramatically isn't appropriate for a bug fix. Will look some more at this tomorrow. Jeremy > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton wrote: >> I have a patch that appears to fix this bug >> http://bugs.python.org/file12361/urllib-chunked.diff >> but I'm not sure about its interaction with the io module and >> RawIOBase. Is there a new IO expert who could take a look at it for >> me? >> >> Jeremy >> >> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton wrote: >>> This bug is pretty serious, because urllib will insert garbage into >>> the application-visible data for a chunked response. It simply >>> ignores the fact that it's reading a chunked response and includes the >>> chunked header data is payload data. The original bug was reported in >>> September, but no one noticed it. It was reported again recently. >>> >>> http://bugs.python.org/issue3761 >>> http://bugs.python.org/issue4631 >>> >>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but >>> that's not my call. >>> >>> Jeremy >>> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >> > From guido at python.org Thu Dec 18 18:27:42 2008 From: guido at python.org (Guido van Rossum) Date: Thu, 18 Dec 2008 09:27:42 -0800 Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses In-Reply-To: References: Message-ID: It sounds like the self-closing is an implementation detail, meant to make sure the socket is closed as early as possible (which I suppose is a good thing if there's a server waiting for the final ACK on the other side). Perhaps it should not use close() but something slightly lower level that affects the socket directly? --Guido van Rossum (home page: http://www.python.org/~guido/) On Thu, Dec 18, 2008 at 5:22 AM, Jeremy Hylton wrote: > On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum wrote: >> The inheritance from io.RawIOBase seems fine. > > There is a small problem with the interaction between HTTPResponse and > RawIOBase, but I think the problem is more on the http side. You may > recall that the HTTP code has a habit of closing the connection for > you. In a variety of cases, once you've read the last bytes of the > response, the HTTPResponse object calls its own close() method. This > interacts poorly with RawIOBase, because it raises a ValueError for > any operation on a closed io object. This prevents iterators from > working correctly. The iterator implementation expects the final call > to readline() to return an empty string and converts that to a > StopIteration. Instead, it's seeing a ValueError that propagates out. > > It's always been odd to me that the connection closed itself. It's > going to be tricky to fix the current bug (chunked responses) and keep > the self-closing behavior, but I worry that change the self-closing > behavior too dramatically isn't appropriate for a bug fix. Will look > some more at this tomorrow. > > Jeremy > >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> >> >> >> On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton wrote: >>> I have a patch that appears to fix this bug >>> http://bugs.python.org/file12361/urllib-chunked.diff >>> but I'm not sure about its interaction with the io module and >>> RawIOBase. Is there a new IO expert who could take a look at it for >>> me? >>> >>> Jeremy >>> >>> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton wrote: >>>> This bug is pretty serious, because urllib will insert garbage into >>>> the application-visible data for a chunked response. It simply >>>> ignores the fact that it's reading a chunked response and includes the >>>> chunked header data is payload data. The original bug was reported in >>>> September, but no one noticed it. It was reported again recently. >>>> >>>> http://bugs.python.org/issue3761 >>>> http://bugs.python.org/issue4631 >>>> >>>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but >>>> that's not my call. >>>> >>>> Jeremy >>>> >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> http://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >>> >> > From janssen at parc.com Thu Dec 18 19:12:50 2008 From: janssen at parc.com (Bill Janssen) Date: Thu, 18 Dec 2008 10:12:50 PST Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses In-Reply-To: References: Message-ID: <8926.1229623970@parc.com> Jeremy Hylton wrote: > but I worry that change the self-closing > behavior too dramatically isn't appropriate for a bug fix. Will look > some more at this tomorrow. Reading through the code, it looks like you've already fixed bug 1348. Thanks! Bill From jeremy at alum.mit.edu Thu Dec 18 20:10:28 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Thu, 18 Dec 2008 14:10:28 -0500 Subject: [Python-Dev] Python 3.0 urllib fails with chunked HTTP responses In-Reply-To: References: Message-ID: On Thu, Dec 18, 2008 at 12:27 PM, Guido van Rossum wrote: > It sounds like the self-closing is an implementation detail, meant to > make sure the socket is closed as early as possible (which I suppose > is a good thing if there's a server waiting for the final ACK on the > other side). Perhaps it should not use close() but something slightly > lower level that affects the socket directly? That's what I'm thinking, too. I had 10 minutes last night after the kids went to bed, and my first attempt didn't work :-). Jeremy > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > On Thu, Dec 18, 2008 at 5:22 AM, Jeremy Hylton wrote: >> On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum wrote: >>> The inheritance from io.RawIOBase seems fine. >> >> There is a small problem with the interaction between HTTPResponse and >> RawIOBase, but I think the problem is more on the http side. You may >> recall that the HTTP code has a habit of closing the connection for >> you. In a variety of cases, once you've read the last bytes of the >> response, the HTTPResponse object calls its own close() method. This >> interacts poorly with RawIOBase, because it raises a ValueError for >> any operation on a closed io object. This prevents iterators from >> working correctly. The iterator implementation expects the final call >> to readline() to return an empty string and converts that to a >> StopIteration. Instead, it's seeing a ValueError that propagates out. >> >> It's always been odd to me that the connection closed itself. It's >> going to be tricky to fix the current bug (chunked responses) and keep >> the self-closing behavior, but I worry that change the self-closing >> behavior too dramatically isn't appropriate for a bug fix. Will look >> some more at this tomorrow. >> >> Jeremy >> >>> --Guido van Rossum (home page: http://www.python.org/~guido/) >>> >>> >>> >>> On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton wrote: >>>> I have a patch that appears to fix this bug >>>> http://bugs.python.org/file12361/urllib-chunked.diff >>>> but I'm not sure about its interaction with the io module and >>>> RawIOBase. Is there a new IO expert who could take a look at it for >>>> me? >>>> >>>> Jeremy >>>> >>>> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton wrote: >>>>> This bug is pretty serious, because urllib will insert garbage into >>>>> the application-visible data for a chunked response. It simply >>>>> ignores the fact that it's reading a chunked response and includes the >>>>> chunked header data is payload data. The original bug was reported in >>>>> September, but no one noticed it. It was reported again recently. >>>>> >>>>> http://bugs.python.org/issue3761 >>>>> http://bugs.python.org/issue4631 >>>>> >>>>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but >>>>> that's not my call. >>>>> >>>>> Jeremy >>>>> >>>> _______________________________________________ >>>> Python-Dev mailing list >>>> Python-Dev at python.org >>>> http://mail.python.org/mailman/listinfo/python-dev >>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >>>> >>> >> > From greg.ewing at canterbury.ac.nz Thu Dec 18 23:52:34 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Dec 2008 11:52:34 +1300 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz> <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com> Message-ID: <494AD432.5060406@canterbury.ac.nz> Paul Moore wrote: > Do you know if these remaining problems are listed anywhere? There was a big discussion about this in comp.lang.python not long ago. Basically all the built-in types and constants are shared between interpreters, which means you still need a GIL to stop different interpreters stepping on each other's toes. > AIUI, > certain software (for example mod_python) has been using multiple > interpreters for a long while now Multiple interpeters are possible, they're just not completely independent. Whether this is a problem depends on the reason you want multiple interpreters. In the Apache case, it's probably more about providing virtual Python environments than free-threading between interpreters. -- Greg From ncoghlan at gmail.com Fri Dec 19 00:05:17 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Dec 2008 09:05:17 +1000 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <494AD432.5060406@canterbury.ac.nz> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz> <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com> <494AD432.5060406@canterbury.ac.nz> Message-ID: <494AD72D.8050208@gmail.com> Greg Ewing wrote: > Paul Moore wrote: >> Do you know if these remaining problems are listed anywhere? > > There was a big discussion about this in comp.lang.python > not long ago. Basically all the built-in types and constants > are shared between interpreters, which means you still need > a GIL to stop different interpreters stepping on each other's > toes. That kind of thing is under the core's control though - the 2.x module initialisation problem means that you can't write a multiple interpreter friendly extension module even if you want to. The new per-interpreter state mechanism could also be used internally by the core to duplicate some of that global state for each new interpreter. I see the introduction of the interpreter specific state mechanism as a big step because it provides an underlying mechanism that makes the problem solvable *in principle* through a combination of per-interpreter state and finer grained shared locking, making it just a practical implementation problem to see if that can be done without adversely impacting single interpreter performance. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From p.f.moore at gmail.com Fri Dec 19 00:18:07 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 18 Dec 2008 23:18:07 +0000 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <494AD432.5060406@canterbury.ac.nz> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz> <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com> <494AD432.5060406@canterbury.ac.nz> Message-ID: <79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com> 2008/12/18 Greg Ewing : > Paul Moore wrote: >> >> Do you know if these remaining problems are listed anywhere? > > There was a big discussion about this in comp.lang.python > not long ago. Basically all the built-in types and constants > are shared between interpreters, which means you still need > a GIL to stop different interpreters stepping on each other's > toes. > >> AIUI, >> certain software (for example mod_python) has been using multiple >> interpreters for a long while now > > Multiple interpeters are possible, they're just not completely > independent. Whether this is a problem depends on the reason > you want multiple interpreters. In the Apache case, it's > probably more about providing virtual Python environments than > free-threading between interpreters. OK, but how close is it to providing isolation for threads running under the control of the GIL? I'm thinking of something along the lines of an in-process version of fork(), which spawns a new interpreter and runs the 2 interpreters as threads, still using the GIL to enforce serialisation, but otherwise independent. I believe that Perl uses this model for its "interpreter threads" implementation. Paul. From lists at cheimes.de Fri Dec 19 00:28:13 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 19 Dec 2008 00:28:13 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz> <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com> <494AD432.5060406@canterbury.ac.nz> <79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com> Message-ID: Paul Moore schrieb: > OK, but how close is it to providing isolation for threads running > under the control of the GIL? I'm thinking of something along the > lines of an in-process version of fork(), which spawns a new > interpreter and runs the 2 interpreters as threads, still using the > GIL to enforce serialisation, but otherwise independent. I believe > that Perl uses this model for its "interpreter threads" > implementation. How is your idea different from subinterpreters? Today you can have multiple subinterpreters inside a single process. Each subinterpreter has its own state and can see only its own objects. Christian From martin at v.loewis.de Fri Dec 19 00:55:19 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 19 Dec 2008 00:55:19 +0100 Subject: [Python-Dev] The endless GIL debate: why not remove thread support instead? In-Reply-To: <79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com> References: <0799fefd483ff61e08e7772768ad3194.squirrel@webmail.uio.no> <79990c6b0812120203w7cc841f2x23c497c2856183f1@mail.gmail.com> <49423856.30705@gmail.com> <494982B4.5040602@canterbury.ac.nz> <79990c6b0812180347q7e654866x3c808f66edaf373a@mail.gmail.com> <494AD432.5060406@canterbury.ac.nz> <79990c6b0812181518s7ad1e1adi7e3710eeae28d27e@mail.gmail.com> Message-ID: <494AE2E7.2080401@v.loewis.de> > OK, but how close is it to providing isolation for threads running > under the control of the GIL? They won't be indedepent. If an extension module has a global variable, that will be shared across interpreters. If that variable supports modifiable state, such modifications will "leak" across interpreters. For example, there will be only a single object class. With that in mind, take a look at object.__subclasses__(); it would provide access to all classes, including those in the other interpreters. Likewise, gc.get_objects() will give you the complete list of all objects. So the isolation is not strong enough to run untrusted code isolated from other code. Regards, Martin From kristjan at ccpgames.com Fri Dec 19 11:25:00 2008 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Fri, 19 Dec 2008 10:25:00 +0000 Subject: [Python-Dev] try/except in io.py Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local> Greetings! Yesterday, I committed revision r67843 to py3k. Re-enablign the windows CRT runtime checks showed me that close() was beeing called with an invalid file descriptor. Now, the problem was was in tokenizer.c, but the reason this wasn't caught earlier was, 1) Incorrect error checking for close() in _fileio.c, which I fixed, and 2) Line 384 in io.py, where all exceptions are caught for self.close(). Fixing 1 and patching 2 would bring the problem to light when running the test_imp.py part of the testsuite and, indeed, applying the fix to tokenizer.c would then remove it again. I am a bit worried about 2) thoug. I didn't modify that, but having a catch all clause just to be clean on system exit seems shaky to me. I wonder, is there a way to make such behaviour, if it is indeed necessary, just to be active when exit is in progress? Something like: try: self.close() except: try: if not sys.exiting(): raise except: pass Or better yet, do as we have done often here, just catch the particular problem that occurs during shutdown, most often name error: try: self.close() except (AttributeError, NameError): pass What do you think? -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Fri Dec 19 11:49:00 2008 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Fri, 19 Dec 2008 11:49:00 +0100 Subject: [Python-Dev] try/except in io.py In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local> Message-ID: Hello, Kristj?n Valur J?nsson wrote: > Greetings! > > Yesterday, I committed revision r67843 to py3k. > > Re-enablign the windows CRT runtime checks showed me that close() was beeing > called with an invalid file descriptor. > > Now, the problem was was in tokenizer.c, but the reason this wasn't caught > earlier was, > > 1) Incorrect error checking for close() in _fileio.c, which I fixed, > and > > 2) Line 384 in io.py, where all exceptions are caught for self.close(). > > > > Fixing 1 and patching 2 would bring the problem to light when running the > test_imp.py part of the testsuite and, indeed, applying the fix to > tokenizer.c would then remove it again. > > I am a bit worried about 2) thoug. I didn't modify that, but having a catch > all clause just to be clean on system exit seems shaky to me. I wonder, is > there a way to make such behaviour, if it is indeed necessary, just to be > active when exit is in progress? > > Something like: > > try: > self.close() > except: > try: > if not sys.exiting(): raise > except: > pass > > > Or better yet, do as we have done often here, just catch the particular > problem that occurs during shutdown, most often name error: > > try: > self.close() > except (AttributeError, NameError): > pass I suggest "except Exception": SystemExit and KeyboardInterrupt inherit from BaseException, not from Exceptions And close() is likely to raise IOErrors. -- Amaury Forgeot d'Arc From kristjan at ccpgames.com Fri Dec 19 11:56:46 2008 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Fri, 19 Dec 2008 10:56:46 +0000 Subject: [Python-Dev] try/except in io.py In-Reply-To: References: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local> Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702A70@exchis.ccp.ad.local> > > try: > > self.close() > > except: > > try: > > if not sys.exiting(): raise > > except: > > pass > > > > > > Or better yet, do as we have done often here, just catch the particular > > problem that occurs during shutdown, most often name error: > > > > try: > > self.close() > > except (AttributeError, NameError): > > pass > > From: Amaury Forgeot d'Arc [mailto:amauryfa at gmail.com] > I suggest "except Exception": SystemExit and KeyboardInterrupt inherit > from BaseException, not from Exceptions > And close() is likely to raise IOErrors. Ah, but that is not what the intent is to guard agains, according the comments. During exit, modules have been deleted and all sorts of things have gone away. It is therefore likely that code that executes during exit will encounter NameErrors (when a module is being cleaned up and its globals removed) And AttributeErrors. ImportErrors too, in fact. It would be good to see the actual repro case that caused this to be added in the first place, so that we could selectively catch those errors. Kristj?n From ncoghlan at gmail.com Fri Dec 19 14:50:37 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Dec 2008 23:50:37 +1000 Subject: [Python-Dev] try/except in io.py In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702A70@exchis.ccp.ad.local> References: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F04D1702A70@exchis.ccp.ad.local> Message-ID: <494BA6AD.2090208@gmail.com> Kristj?n Valur J?nsson wrote: > Ah, but that is not what the intent is to guard agains, according the > comments. During exit, modules have been deleted and all sorts of > things have gone away. It is therefore likely that code that executes > during exit will encounter NameErrors (when a module is being cleaned > up and its globals removed) And AttributeErrors. ImportErrors too, in > fact. > > It would be good to see the actual repro case that caused this to be > added in the first place, so that we could selectively catch those > errors. Generally speaking, close() and __delete__() methods that can be invoked during interpreter shutdown should avoid referencing module globals at all. Necessary globals (including members of other modules) should either be cached on the relevant class or captured in a closure. Now, it may be that the relevant close() method in io.py touches too much code for that to be practical, but it certainly isn't the case in general that encountering Name/Attribute/ImportError during shutdown is inevitable. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From dima at hlabs.spb.ru Fri Dec 19 15:20:55 2008 From: dima at hlabs.spb.ru (Dmitry Vasiliev) Date: Fri, 19 Dec 2008 17:20:55 +0300 Subject: [Python-Dev] Py3k: magical dir() Message-ID: <494BADC7.7040404@hlabs.spb.ru> Hello! I think it's a strange behavior: Python 3.1a0 (py3k:67851, Dec 19 2008, 16:50:32) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> hash(range(10)) Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'range' >>> dir(range(10)) ['__class__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] >>> hash(range(10)) -1211318616 >>> hash(range(1000)) -1211318472 -- Dmitry Vasiliev (dima at hlabs.spb.ru) http://hlabs.spb.ru From lists at cheimes.de Fri Dec 19 16:02:24 2008 From: lists at cheimes.de (Christian Heimes) Date: Fri, 19 Dec 2008 16:02:24 +0100 Subject: [Python-Dev] Py3k: magical dir() In-Reply-To: <494BADC7.7040404@hlabs.spb.ru> References: <494BADC7.7040404@hlabs.spb.ru> Message-ID: Dmitry Vasiliev schrieb: > Hello! > > I think it's a strange behavior: > > Python 3.1a0 (py3k:67851, Dec 19 2008, 16:50:32) > [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> hash(range(10)) > Traceback (most recent call last): > File "", line 1, in > TypeError: unhashable type: 'range' >>>> dir(range(10)) > ['__class__', '__delattr__', '__doc__', '__eq__', '__format__', > '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', > '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', > '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', > '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] >>>> hash(range(10)) > -1211318616 >>>> hash(range(1000)) > -1211318472 Yes, it is. I'm able to reproduce the problem. Christian From eric at trueblade.com Fri Dec 19 16:22:06 2008 From: eric at trueblade.com (Eric Smith) Date: Fri, 19 Dec 2008 10:22:06 -0500 Subject: [Python-Dev] Py3k: magical dir() In-Reply-To: References: <494BADC7.7040404@hlabs.spb.ru> Message-ID: <494BBC1E.1090209@trueblade.com> Christian Heimes wrote: > Dmitry Vasiliev schrieb: >> Hello! >> >> I think it's a strange behavior: >> >> Python 3.1a0 (py3k:67851, Dec 19 2008, 16:50:32) >> [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> hash(range(10)) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: unhashable type: 'range' >>>>> dir(range(10)) >> ['__class__', '__delattr__', '__doc__', '__eq__', '__format__', >> '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', >> '__init__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', >> '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', >> '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] >>>>> hash(range(10)) >> -1211318616 >>>>> hash(range(1000)) >> -1211318472 > > Yes, it is. I'm able to reproduce the problem. It's not just dir(). Same behavior with help(): Python 3.1a0 (py3k:67856, Dec 19 2008, 10:18:03) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> hash(range(10)) Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'range' [43173 refs] >>> help(range(10)) [77213 refs] >>> hash(range(10)) 5041912 [77215 refs] >>> From ggpolo at gmail.com Fri Dec 19 16:23:55 2008 From: ggpolo at gmail.com (Guilherme Polo) Date: Fri, 19 Dec 2008 13:23:55 -0200 Subject: [Python-Dev] Py3k: magical dir() In-Reply-To: <494BADC7.7040404@hlabs.spb.ru> References: <494BADC7.7040404@hlabs.spb.ru> Message-ID: On Fri, Dec 19, 2008 at 12:20 PM, Dmitry Vasiliev wrote: > Hello! > > I think it's a strange behavior: > > Python 3.1a0 (py3k:67851, Dec 19 2008, 16:50:32) > [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> hash(range(10)) > Traceback (most recent call last): > File "", line 1, in > TypeError: unhashable type: 'range' >>>> dir(range(10)) > ['__class__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', > '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', > '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', > '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__setattr__', > '__sizeof__', '__str__', '__subclasshook__'] >>>> hash(range(10)) > -1211318616 >>>> hash(range(1000)) > -1211318472 > There are other ways to reproduce it without using dir, like range(10).__class__; hash(range(10)) Is there some reason no set tp_hash for rangeobject to PyObject_HashNotImplemented ? > -- > Dmitry Vasiliev (dima at hlabs.spb.ru) > http://hlabs.spb.ru -- -- Guilherme H. Polo Goncalves From hagenf at CoLi.Uni-SB.DE Fri Dec 19 16:27:48 2008 From: hagenf at CoLi.Uni-SB.DE (=?UTF-8?B?SGFnZW4gRsO8cnN0ZW5hdQ==?=) Date: Fri, 19 Dec 2008 16:27:48 +0100 Subject: [Python-Dev] Py3k: magical dir() In-Reply-To: References: <494BADC7.7040404@hlabs.spb.ru> Message-ID: <494BBD74.7030605@coli.uni-saarland.de> > Is there some reason no set tp_hash for rangeobject to > PyObject_HashNotImplemented ? http://bugs.python.org/issue4701 - Hagen From status at bugs.python.org Fri Dec 19 18:06:43 2008 From: status at bugs.python.org (Python tracker) Date: Fri, 19 Dec 2008 18:06:43 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20081219170643.2994C7857C@psf.upfronthosting.co.za> ACTIVITY SUMMARY (12/12/08 - 12/19/08) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2266 open (+37) / 14258 closed (+20) / 16524 total (+57) Open issues with patches: 762 Average duration of open issues: 704 days. Median duration of open issues: 2530 days. Open Issues Breakdown open 2248 (+37) pending 18 ( +0) Issues Created Or Reopened (58) _______________________________ Doctest module does not work with zipped packages 12/15/08 CLOSED http://bugs.python.org/issue4197 reopened ncoghlan patch configparser DEFAULT 12/12/08 CLOSED http://bugs.python.org/issue4645 created shawn.ashlee distutils chokes on empty options arg in the setup function 12/12/08 http://bugs.python.org/issue4646 created theller patch, patch Builtin parser module fails to parse relative imports 12/12/08 CLOSED http://bugs.python.org/issue4647 created schluehk Fix n//x to n/x in the Docs 12/12/08 CLOSED http://bugs.python.org/issue4648 created Retro Fix a+b to a + b 12/13/08 CLOSED http://bugs.python.org/issue4649 created Retro getopt need re-factor... 12/13/08 http://bugs.python.org/issue4650 created wangchun getopt need re-factor... 12/13/08 CLOSED http://bugs.python.org/issue4651 created wangchun IDLE does not work with Unicode 12/13/08 http://bugs.python.org/issue4652 created zzyzx Patch to fix typos for Py3K 12/13/08 http://bugs.python.org/issue4653 created typo.pl os.path.realpath() get the wrong result 12/13/08 http://bugs.python.org/issue4654 created dirlt during Python installation, setup.py should not use .pydistutils 12/14/08 http://bugs.python.org/issue4655 created jah Python 3 tutorial has old information about dicts 12/14/08 CLOSED http://bugs.python.org/issue4656 created mdcowles Doctest gets line numbers wrongs with <> in name 12/14/08 http://bugs.python.org/issue4657 created ncoghlan missing closing bracket in Functional Programming HOWTO 12/14/08 CLOSED http://bugs.python.org/issue4658 created bgeron compilation warning in Modules/zipimport.c 12/14/08 http://bugs.python.org/issue4659 created pitrou multiprocessing.JoinableQueue task_done() issue 12/14/08 http://bugs.python.org/issue4660 created merrellb email.parser: impossible to read messages encoded in a different 12/14/08 http://bugs.python.org/issue4661 created dato posix module lacks several DeprecationWarning's 12/14/08 http://bugs.python.org/issue4662 created mishok13 patch Increase TextIOWrapper._CHUNK_SIZE 12/14/08 CLOSED http://bugs.python.org/issue4663 created pitrou Regression fix_imports does not refactor multiple imports correc 12/14/08 CLOSED http://bugs.python.org/issue4664 created lregebro Failure to compile trunk on Solaris10/SPARC using C++ compiler 12/14/08 CLOSED http://bugs.python.org/issue4665 created skip.montanaro test_bad_address in test_urllib2_localnet often fails 12/14/08 CLOSED http://bugs.python.org/issue4666 created pitrou Patch with a couple of 2.0isms in tutorial 12/14/08 CLOSED http://bugs.python.org/issue4667 created sgala patch examples in the functional howto are not consistent with 3.X beh 12/14/08 CLOSED http://bugs.python.org/issue4668 created sgala patch bytes,join and bytearray.join not in manual; help for bytes.join 12/15/08 http://bugs.python.org/issue4669 created sjmachin setup.py exception when db_setup_debug = True 12/15/08 http://bugs.python.org/issue4670 created djmdjm pydoc executes the code to be documented 12/15/08 http://bugs.python.org/issue4671 created Jim_C Distutils SWIG support blocks use of SWIG -outdir option 12/15/08 http://bugs.python.org/issue4672 created andybuckley Distutils should provide an uninstall command 12/15/08 http://bugs.python.org/issue4673 created andybuckley test_normalization failures on some buildbot 12/16/08 CLOSED http://bugs.python.org/issue4674 created pitrou urllib's splitpasswd does not accept newline chars in passwords 12/16/08 http://bugs.python.org/issue4675 created mibanescu patch python3 closes + home keys 12/16/08 http://bugs.python.org/issue4676 created Somelauw a list comprehensions tests for pybench 12/16/08 http://bugs.python.org/issue4677 created pitrou patch Unicode: multiple chars for high code points 12/16/08 CLOSED http://bugs.python.org/issue4678 created ede Fork + shelve causes shelve corruption and backtrace 12/16/08 http://bugs.python.org/issue4679 created calmofthestorm deque class should include high-water mark 12/17/08 CLOSED http://bugs.python.org/issue4680 created roysmith mmap offset should be off_t instead of ssize_t, and size calcula 12/17/08 http://bugs.python.org/issue4681 created saa patch 'b' formatter is actually unsigned char 12/17/08 http://bugs.python.org/issue4682 created vt urllib2.HTTPDigestAuthHandler fails on third hostname? 12/17/08 http://bugs.python.org/issue4683 created cmb sys.exit() exits program when non-daemonic threads are still run 12/17/08 http://bugs.python.org/issue4684 created eggy IDLE will not open (2.6.1 on WinXP pro) 12/17/08 http://bugs.python.org/issue4685 created Yo Exceptions in ConfigParser don't set .args 12/17/08 http://bugs.python.org/issue4686 created beazley GC stats not accurate because of debug overhead 12/17/08 http://bugs.python.org/issue4687 created pitrou patch GC optimization: don't track simple tuples and dicts 12/17/08 http://bugs.python.org/issue4688 created pitrou patch Typo in PyObjC URL on "GUI Programming on the Mac" 12/17/08 http://bugs.python.org/issue4689 created mevans asyncore calls handle_write() on closed sockets when use_poll=Tr 12/18/08 http://bugs.python.org/issue4690 created forest IDLE Code Caching Windows 12/18/08 CLOSED http://bugs.python.org/issue4691 created brandon.dixon Framework build fails if OS X on case-sensitive file system 12/18/08 CLOSED http://bugs.python.org/issue4692 created nad patch Idle for Python 3.0 is default even without doing make fullinsta 12/18/08 http://bugs.python.org/issue4693 created orsenthil _call_method() in multiprocessing documentation 12/18/08 CLOSED http://bugs.python.org/issue4694 created beazley Bad AF_PIPE address in multiprocessing documentation 12/18/08 http://bugs.python.org/issue4695 created beazley email module does not fold headers 12/18/08 http://bugs.python.org/issue4696 created bromine patch Clarification needed for subprocess convenience functions in Pyt 12/18/08 http://bugs.python.org/issue4697 created Erik Sternerson Solaris buildbot failure on trunk in test_hostshot 12/18/08 http://bugs.python.org/issue4698 created pitrou Typo in documentation of "signal" 12/19/08 CLOSED http://bugs.python.org/issue4699 created yam850 UnicodeEncodeError in license() 12/19/08 http://bugs.python.org/issue4700 created mnewman range objects becomes hashable after attribute access 12/19/08 http://bugs.python.org/issue4701 created hagen patch Issues Now Closed (54) ______________________ Thread local storage and PyGILState_* mucked up by os.fork() 15 days http://bugs.python.org/issue1683 loewis optimize list comprehensions 297 days http://bugs.python.org/issue2183 pitrou patch, patch gc.DEBUG_STATS reports invalid "elapsed" times 269 days http://bugs.python.org/issue2467 pitrou patch create a numbits() method for int and long types 148 days http://bugs.python.org/issue3439 marketdickinson patch, needs review use string_print() in gdb 116 days http://bugs.python.org/issue3632 amaury.forgeotdarc patch urllib.request and urllib.response cannot handle HTTP1.1 chunked 103 days http://bugs.python.org/issue3761 jhylton 2.6rc1: test_threading hangs on FreeBSD 6.3 i386 90 days http://bugs.python.org/issue3863 loewis patch _hotshot: invalid error control in logreader() 83 days http://bugs.python.org/issue3954 amaury.forgeotdarc patch __main__.__file__ not set correctly when -m switch gets __main__ 67 days http://bugs.python.org/issue4082 ncoghlan textwrap wordsep_re Unicode 53 days http://bugs.python.org/issue4163 pitrou patch Doctest module does not work with zipped packages 0 days http://bugs.python.org/issue4197 ncoghlan patch Pdb cannot access source code in zipped packages. 51 days http://bugs.python.org/issue4201 ncoghlan patch inspect.getsource doesn't work on functions imported from a zipf 47 days http://bugs.python.org/issue4223 ncoghlan cycle created by profile.run 40 days http://bugs.python.org/issue4273 darrenr [2.5 regression] ctypes fails to build on arm-linux-gnu 31 days http://bugs.python.org/issue4303 loewis (Tkinter) Please backport these 26 days http://bugs.python.org/issue4342 loewis A bug in ncurses.h still exists in FreeBSD 4.9 - 4.11 23 days http://bugs.python.org/issue4368 loewis patch Distutils Metadata Documentation Missing "platforms" Keyword 18 days http://bugs.python.org/issue4446 georg.brandl patch CVE-2008-5031 multiple integer overflows 13 days http://bugs.python.org/issue4469 loewis Speed up PyEval_EvalFrameEx when tracing is off. 12 days http://bugs.python.org/issue4477 jyasskin patch logging module __init__ uses has_key 9 days http://bugs.python.org/issue4523 benjamin.peterson patch Registry key not set if unattended installation used 13 days http://bugs.python.org/issue4567 loewis Improved optparse "varargs" callback example 9 days http://bugs.python.org/issue4568 georg.brandl patch reading UTF16-encoded text file crashes if \r on 64-char boundar 7 days http://bugs.python.org/issue4574 pitrou patch compiler: -3 warnings 8 days http://bugs.python.org/issue4578 georg.brandl patch segfault when mutating memoryview to array.array when array is r 11 days http://bugs.python.org/issue4583 pitrou patch, needs review new types example is out of date 7 days http://bugs.python.org/issue4595 georg.brandl 3.0 document tab interpretation change 6 days http://bugs.python.org/issue4603 georg.brandl 3.0 documentation mentions using maketrans from within the strin 4 days http://bugs.python.org/issue4605 benjamin.peterson Small error in "Extending Python with C or C++" 6 days http://bugs.python.org/issue4611 georg.brandl tarfile does not set the creation date and time of the extracted 3 days http://bugs.python.org/issue4616 lars.gustaebel optparse - dosn't distinguish between '--option' and '-option' 0 days http://bugs.python.org/issue4641 marketdickinson optparse - dosn't distinguish between '--option' and '-option' 0 days http://bugs.python.org/issue4642 marketdickinson Minor documentation fault in 2to3 script 0 days http://bugs.python.org/issue4644 benjamin.peterson configparser DEFAULT 2 days http://bugs.python.org/issue4645 loewis Builtin parser module fails to parse relative imports 0 days http://bugs.python.org/issue4647 benjamin.peterson Fix n//x to n/x in the Docs 0 days http://bugs.python.org/issue4648 rhettinger Fix a+b to a + b 1 days http://bugs.python.org/issue4649 gvanrossum getopt need re-factor... 0 days http://bugs.python.org/issue4651 gvanrossum Python 3 tutorial has old information about dicts 0 days http://bugs.python.org/issue4656 benjamin.peterson missing closing bracket in Functional Programming HOWTO 0 days http://bugs.python.org/issue4658 benjamin.peterson Increase TextIOWrapper._CHUNK_SIZE 1 days http://bugs.python.org/issue4663 pitrou Regression fix_imports does not refactor multiple imports correc 0 days http://bugs.python.org/issue4664 benjamin.peterson Failure to compile trunk on Solaris10/SPARC using C++ compiler 1 days http://bugs.python.org/issue4665 loewis test_bad_address in test_urllib2_localnet often fails 1 days http://bugs.python.org/issue4666 pitrou Patch with a couple of 2.0isms in tutorial 0 days http://bugs.python.org/issue4667 georg.brandl patch examples in the functional howto are not consistent with 3.X beh 0 days http://bugs.python.org/issue4668 georg.brandl patch test_normalization failures on some buildbot 0 days http://bugs.python.org/issue4674 pitrou Unicode: multiple chars for high code points 0 days http://bugs.python.org/issue4678 lemburg deque class should include high-water mark 1 days http://bugs.python.org/issue4680 tim_one IDLE Code Caching Windows 0 days http://bugs.python.org/issue4691 amaury.forgeotdarc Framework build fails if OS X on case-sensitive file system 0 days http://bugs.python.org/issue4692 marketdickinson patch _call_method() in multiprocessing documentation 0 days http://bugs.python.org/issue4694 benjamin.peterson Typo in documentation of "signal" 0 days http://bugs.python.org/issue4699 benjamin.peterson Top Issues Most Discussed (10) ______________________________ 49 create a numbits() method for int and long types 148 days closed http://bugs.python.org/issue3439 14 Optimize new io library 13 days open http://bugs.python.org/issue4561 13 GC optimization: don't track simple tuples and dicts 2 days open http://bugs.python.org/issue4688 11 urlopen returns extra, spurious bytes 8 days open http://bugs.python.org/issue4631 10 Py_IS_INFINITY defect causes test_cmath failure on x86 12 days open http://bugs.python.org/issue4575 10 Building a list of tuples has non-linear performance 73 days open http://bugs.python.org/issue4074 10 optimize list comprehensions 297 days closed http://bugs.python.org/issue2183 8 deque class should include high-water mark 1 days closed http://bugs.python.org/issue4680 8 datetime module missing some important methods 656 days open http://bugs.python.org/issue1673409 7 mmap offset should be off_t instead of ssize_t, and size calcul 2 days open http://bugs.python.org/issue4681 From ziade.tarek at gmail.com Fri Dec 19 19:55:28 2008 From: ziade.tarek at gmail.com (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 19 Dec 2008 19:55:28 +0100 Subject: [Python-Dev] Distutils maintenance Message-ID: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com> Hello I would like to request a commit access to work specifically on distutils maintenance. Regards Tarek -- Tarek Ziad? | Association AfPy | www.afpy.org Blog FR | http://programmation-python.org Blog EN | http://tarekziade.wordpress.com/ From musiccomposition at gmail.com Fri Dec 19 19:59:26 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Fri, 19 Dec 2008 12:59:26 -0600 Subject: [Python-Dev] Distutils maintenance In-Reply-To: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com> References: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com> Message-ID: <1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com> On Fri, Dec 19, 2008 at 12:55 PM, Tarek Ziad? wrote: > Hello > > I would like to request a commit access to work specifically on > distutils maintenance. +1 We are currently without an active distutils maintainer, and many stale distutil tickets are in need of attention I'm sure Tarek could provide. Tarek has also been providing many useful patches of his own. -- Cheers, Benjamin From martin at v.loewis.de Fri Dec 19 21:45:17 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 19 Dec 2008 21:45:17 +0100 Subject: [Python-Dev] [ANN] Python 2.4.6 and 2.5.3 (final) Message-ID: <494C07DD.5090702@v.loewis.de> On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.4.6 and 2.5.3 (final). 2.5.3 is the last bug fix release of Python 2.5. Future 2.5.x releases will only include security fixes. According to the release notes, about 80 bugs and patches have been addressed since Python 2.5.2, many of them improving the stability of the interpreter, and improving its portability. Since the release candidate, the only change was an update to the Macintosh packaging procedure. 2.4.6 includes only a small number of security fixes. Python 2.6 is the latest version of Python, we're making this release for people who are still running Python 2.4. See the release notes at the website (also available as Misc/NEWS in the source distribution) for details of bugs fixed; most of them prevent interpreter crashes (and now cause proper Python exceptions in cases where the interpreter may have crashed before). For more information on Python 2.4.6 and 2.5.3, including download links for various platforms, release notes, and known issues, please see: http://www.python.org/2.4.6 http://www.python.org/2.5.3 Highlights of the previous major Python releases are available from the Python 2.5 page, at http://www.python.org/2.4/highlights.html http://www.python.org/2.5/highlights.html Enjoy this release, Martin Martin v. Loewis martin at v.loewis.de Python Release Manager (on behalf of the entire python-dev team) From kristjan at ccpgames.com Fri Dec 19 22:00:29 2008 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Fri, 19 Dec 2008 21:00:29 +0000 Subject: [Python-Dev] try/except in io.py In-Reply-To: <494BA6AD.2090208@gmail.com> References: <930F189C8A437347B80DF2C156F7EC7F04D1702A4C@exchis.ccp.ad.local> <930F189C8A437347B80DF2C156F7EC7F04D1702A70@exchis.ccp.ad.local> <494BA6AD.2090208@gmail.com> Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702BBF@exchis.ccp.ad.local> Ok, in this case I move that we remove this try/except and see where it leads us. If we see problems during teardown, we should deal with them in a more targeted manner. Kristj?n -----Original Message----- From: Nick Coghlan [mailto:ncoghlan at gmail.com] Sent: 19. desember 2008 13:51 To: Kristj?n Valur J?nsson Cc: Amaury Forgeot d'Arc; Python-Dev Subject: Re: [Python-Dev] try/except in io.py Generally speaking, close() and __delete__() methods that can be invoked during interpreter shutdown should avoid referencing module globals at all. Necessary globals (including members of other modules) should either be cached on the relevant class or captured in a closure. Now, it may be that the relevant close() method in io.py touches too much code for that to be practical, but it certainly isn't the case in general that encountering Name/Attribute/ImportError during shutdown is inevitable. From martin at v.loewis.de Fri Dec 19 22:20:22 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Fri, 19 Dec 2008 22:20:22 +0100 Subject: [Python-Dev] Please test OSX installer In-Reply-To: References: <494969AE.3060805@v.loewis.de> Message-ID: <494C1016.2000803@v.loewis.de> > I got a "Problem Report for Python" pop-up. Skip to "///" for > "Problem Details". Interestingly, the test completed with the > following report: Thanks for the report. I have tested that with 2.5.2, which fails in the same way. So this is not a regression, and I have not attempted to fix it. Regards, Martin From fabiofz at gmail.com Fri Dec 19 22:43:01 2008 From: fabiofz at gmail.com (Fabio Zadrozny) Date: Fri, 19 Dec 2008 19:43:01 -0200 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? Message-ID: Hi, I'm currently having problems to get the output of Python 3.0 into the Eclipse console (integrating it into Pydev). The problem appears to be that stdout and stderr are not running unbuffered (even passing -u or trying to set PYTHONUNBUFFERED), and the content only appears to me when a flush() is done or when the process finishes. So, in the search of a solution, I found a suggestion from http://stackoverflow.com/questions/107705/python-output-buffering to use the following construct: sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) But that gives the error below in Python 3.0: sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) File "D:\bin\Python30\lib\os.py", line 659, in fdopen return io.open(fd, *args, **kwargs) File "D:\bin\Python30\lib\io.py", line 243, in open raise ValueError("can't have unbuffered text I/O") ValueError: can't have unbuffered text I/O So, I'd like to know if there's some way I can make it run unbuffered (to get the output contents without having to flush() after each write). Thanks, Fabio From brett at python.org Fri Dec 19 23:03:01 2008 From: brett at python.org (Brett Cannon) Date: Fri, 19 Dec 2008 14:03:01 -0800 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? In-Reply-To: References: Message-ID: On Fri, Dec 19, 2008 at 13:43, Fabio Zadrozny wrote: > Hi, > > I'm currently having problems to get the output of Python 3.0 into the > Eclipse console (integrating it into Pydev). > > The problem appears to be that stdout and stderr are not running > unbuffered (even passing -u or trying to set PYTHONUNBUFFERED), and > the content only appears to me when a flush() is done or when the > process finishes. > > So, in the search of a solution, I found a suggestion from > http://stackoverflow.com/questions/107705/python-output-buffering > > to use the following construct: > > sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) > > But that gives the error below in Python 3.0: > > sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) > File "D:\bin\Python30\lib\os.py", line 659, in fdopen > return io.open(fd, *args, **kwargs) > File "D:\bin\Python30\lib\io.py", line 243, in open > raise ValueError("can't have unbuffered text I/O") > ValueError: can't have unbuffered text I/O > > So, I'd like to know if there's some way I can make it run unbuffered > (to get the output contents without having to flush() after each > write). Notice how the exception specifies test I/O cannot be unbuffered. This restriction does not apply to bytes I/O. Simply open it as 'wb' instead of 'w' and it works. -Brett From ncoghlan at gmail.com Fri Dec 19 23:15:00 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Dec 2008 08:15:00 +1000 Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup? Message-ID: <494C1CE4.5080102@gmail.com> Some strangeness was recently reported for the range() type in Py3k where instances are unhashable until an attribute is retrieved from the range type itself, and then they become hashable. [1] While there is definitely an associated bug in the range implementation (it doesn't block inheritance of the default object.__hash__ implementation), there's also the fact that when the interpreter *starts* the hash implementation hasn't been inherited yet, but it does get inherited later. It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of the builtin types - they're left to have it called implicitly when an operation using them needs tp_dict filled in. Such operations (which includes retrieving an attribute from the type object) will implicitly call PyType_Ready to populate tp_dict, which also has the side effect of inheriting slot implementations from base classes. Is there a specific reason for not fully initialising the builtin types? Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init? Cheers, Nick. [1] http://bugs.python.org/issue4701 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Fri Dec 19 23:18:14 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Dec 2008 08:18:14 +1000 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? In-Reply-To: References: Message-ID: <494C1DA6.1080202@gmail.com> Brett Cannon wrote: > Notice how the exception specifies test I/O cannot be unbuffered. This > restriction does not apply to bytes I/O. Simply open it as 'wb' > instead of 'w' and it works. s/test/text/ :) (For anyone else that is like me and skipped over the exception detail on first reading, thus becoming a little confused...) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From fabiofz at gmail.com Fri Dec 19 23:20:22 2008 From: fabiofz at gmail.com (Fabio Zadrozny) Date: Fri, 19 Dec 2008 20:20:22 -0200 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? In-Reply-To: References: Message-ID: You're right, thanks (guess I'll use that option then). Now, is it a bug that Python 3.0 doesn't run unbuffered when specifying -u or PYTHONUNBUFFERED, or was this support dropped? Thanks, Fabio On Fri, Dec 19, 2008 at 8:03 PM, Brett Cannon wrote: > On Fri, Dec 19, 2008 at 13:43, Fabio Zadrozny wrote: >> Hi, >> >> I'm currently having problems to get the output of Python 3.0 into the >> Eclipse console (integrating it into Pydev). >> >> The problem appears to be that stdout and stderr are not running >> unbuffered (even passing -u or trying to set PYTHONUNBUFFERED), and >> the content only appears to me when a flush() is done or when the >> process finishes. >> >> So, in the search of a solution, I found a suggestion from >> http://stackoverflow.com/questions/107705/python-output-buffering >> >> to use the following construct: >> >> sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) >> >> But that gives the error below in Python 3.0: >> >> sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0) >> File "D:\bin\Python30\lib\os.py", line 659, in fdopen >> return io.open(fd, *args, **kwargs) >> File "D:\bin\Python30\lib\io.py", line 243, in open >> raise ValueError("can't have unbuffered text I/O") >> ValueError: can't have unbuffered text I/O >> >> So, I'd like to know if there's some way I can make it run unbuffered >> (to get the output contents without having to flush() after each >> write). > > Notice how the exception specifies test I/O cannot be unbuffered. This > restriction does not apply to bytes I/O. Simply open it as 'wb' > instead of 'w' and it works. > > -Brett > From barry at python.org Fri Dec 19 23:28:32 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 19 Dec 2008 17:28:32 -0500 Subject: [Python-Dev] Python 3.0.1 Message-ID: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'd like to get Python 3.0.1 out before the end of the year. There are no showstoppers, but I haven't yet looked at the deferred blockers or the buildbots. Do you think we can get 3.0.1 out on December 24th? Or should we wait until after Christmas and get it out, say on the 29th? Do we need an rc? This question goes mostly to Martin and Georg. What would work for you guys? - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSUwgEXEjvBPtnXfVAQIthgP7BDS6xfBHhADKc50ANvZ5aAfWhGSU9GH/ DR+IRduVmvosu9gm92hupCOaLCN4IbtyFx27A8LQuPNVc4BVrhWfDKDSzpxO2MJu xLJntkF2BRWODSbdrLGdZ6H6WDT0ZAhn6ZjlWXwxhGxQ5FwEJb7moMuY7jAIEeor 5n6Ag5zT+e8= =oU/g -----END PGP SIGNATURE----- From bcannon at gmail.com Fri Dec 19 23:33:38 2008 From: bcannon at gmail.com (bcannon at gmail.com) Date: Fri, 19 Dec 2008 22:33:38 +0000 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? Message-ID: <0016e64f68207a52a5045e6de625@google.com> On Dec 19, 2008 2:20pm, Fabio Zadrozny wrote: > You're right, thanks (guess I'll use that option then). > > > > Now, is it a bug that Python 3.0 doesn't run unbuffered when > > specifying -u or PYTHONUNBUFFERED, or was this support dropped? > > Well, ``python -h`` still lists it. That means either the output for -h needs to be fixed or the feature needs to be supported. -Brett -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Dec 19 23:42:49 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Dec 2008 08:42:49 +1000 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> Message-ID: <494C2369.5030901@gmail.com> Barry Warsaw wrote: > I'd like to get Python 3.0.1 out before the end of the year. There are > no showstoppers, but I haven't yet looked at the deferred blockers or > the buildbots. > > Do you think we can get 3.0.1 out on December 24th? Or should we wait > until after Christmas and get it out, say on the 29th? Do we need an rc? There are some memoryview issues [1] I'd like to have fixed for 3.0.1 - the 29th would be a much easier date to hit. A quick review pass through the other 3.0 highs and criticals might also be worthwhile. Cheers, Nick. http://bugs.python.org/issue4580 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From barry at python.org Fri Dec 19 23:46:30 2008 From: barry at python.org (Barry Warsaw) Date: Fri, 19 Dec 2008 17:46:30 -0500 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: <494C2369.5030901@gmail.com> References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> <494C2369.5030901@gmail.com> Message-ID: <16D50043-22B0-4711-BE91-E752953444EA@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 19, 2008, at 5:42 PM, Nick Coghlan wrote: > Barry Warsaw wrote: >> I'd like to get Python 3.0.1 out before the end of the year. There >> are >> no showstoppers, but I haven't yet looked at the deferred blockers or >> the buildbots. >> >> Do you think we can get 3.0.1 out on December 24th? Or should we >> wait >> until after Christmas and get it out, say on the 29th? Do we need >> an rc? > > There are some memoryview issues [1] I'd like to have fixed for > 3.0.1 - > the 29th would be a much easier date to hit. A quick review pass > through > the other 3.0 highs and criticals might also be worthwhile. Thanks. I've bumped that to release blocker for now. If there are any other 'high' bugs that you want considered for 3.0.1, please make the release blockers too, for now. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSUwkRnEjvBPtnXfVAQKQ4QP/eRmWBgyuijbe9vnXkRkTkAmd4qyrAD2s Forp4hKGvoc4A03Q4x2uVweI4oSdFrKIN2NlcM3JVlSrsU07DTElFoCEA/A8DB3N +6Sp9bC98iVqGUmle54rFIm0F/iCoFQ59mp9jNGeiKVwjojUDkbJNXulHuYIb1co RuICfsatRc0= =zjQz -----END PGP SIGNATURE----- From solipsis at pitrou.net Fri Dec 19 23:47:27 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 19 Dec 2008 22:47:27 +0000 (UTC) Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? References: <0016e64f68207a52a5045e6de625@google.com> Message-ID: > Well, ``python -h`` still lists it. Precisely, it says: -u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x see man page for details on internal buffering relating to '-u' Note the "binary". And indeed: ./python -u Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.stdout.buffer.write(b"y") y1 >>> I don't know what it would take to enable unbuffered text IO while keeping the current TextIOWrapper implementation... Regards Antoine. From solipsis at pitrou.net Fri Dec 19 23:59:49 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 19 Dec 2008 22:59:49 +0000 (UTC) Subject: [Python-Dev] Python 3.0.1 References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> <494C2369.5030901@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > There are some memoryview issues [1] I'd like to have fixed for 3.0.1 - > the 29th would be a much easier date to hit. A quick review pass through > the other 3.0 highs and criticals might also be worthwhile. What about #1717 "Get rid of more refercenes to __cmp__"? (although I like the typo a lot) From guido at python.org Sat Dec 20 00:03:10 2008 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Dec 2008 15:03:10 -0800 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? In-Reply-To: References: <0016e64f68207a52a5045e6de625@google.com> Message-ID: Fror truly unbuffered text output you'd have to make changes to the io.TextIOWrapper class to flush after each write() call. That's an API change -- the constructor currently has a line_buffering option but no option for completely unbuffered mode. It would also require some changes to io.open() which currently rejects buffering=0 in text mode. All that suggests that it should wait until 3.1. However it might make sense to at least turn on line buffering when -u or PYTHONUNBUFFERED is given; that doesn't require API changes and so can be considered a bug fix. --Guido van Rossum (home page: http://www.python.org/~guido/) On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou wrote: > >> Well, ``python -h`` still lists it. > > Precisely, it says: > > -u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x > see man page for details on internal buffering relating to '-u' > > Note the "binary". And indeed: > > ./python -u > Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54) > [GCC 4.3.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import sys >>>> sys.stdout.buffer.write(b"y") > y1 >>>> > > I don't know what it would take to enable unbuffered text IO while keeping the > current TextIOWrapper implementation... > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org > From solipsis at pitrou.net Sat Dec 20 00:16:10 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 19 Dec 2008 23:16:10 +0000 (UTC) Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? References: <0016e64f68207a52a5045e6de625@google.com> Message-ID: Antoine Pitrou pitrou.net> writes: > > Note the "binary". And indeed: [...] And I realize I should have thought a bit before giving that "proof". Sorry! From amauryfa at gmail.com Sat Dec 20 00:38:04 2008 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Sat, 20 Dec 2008 00:38:04 +0100 Subject: [Python-Dev] Distutils maintenance In-Reply-To: <1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com> References: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com> <1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com> Message-ID: On Fri, Dec 19, 2008 at 19:59, Benjamin Peterson wrote: > On Fri, Dec 19, 2008 at 12:55 PM, Tarek Ziad? wrote: >> Hello >> >> I would like to request a commit access to work specifically on >> distutils maintenance. > > +1 > > We are currently without an active distutils maintainer, and many > stale distutil tickets are in need of attention I'm sure Tarek could > provide. Tarek has also been providing many useful patches of his own. +1 from me as well. -- Amaury Forgeot d'Arc From tutufan at gmail.com Sat Dec 20 00:29:38 2008 From: tutufan at gmail.com (Mike Coleman) Date: Fri, 19 Dec 2008 17:29:38 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) Message-ID: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> I have a program that creates a huge (45GB) defaultdict. (The keys are short strings, the values are short lists of pairs (string, int).) Nothing but possibly the strings and ints is shared. The program takes around 10 minutes to run, but longer than 20 minutes to exit (I gave up at that point). That is, after executing the final statement (a print), it is apparently spending a huge amount of time cleaning up before exiting. I haven't installed any exit handlers or anything like that, all files are already closed and stdout/stderr flushed, and there's nothing special going on. I have done 'gc.disable()' for performance (which is hideous without it)--I have no reason to think there are any loops. Currently I am working around this by doing an os._exit(), which is immediate, but this seems like a bit of hack. Is this something that needs fixing, or that has already been fixed? Mike From martin at v.loewis.de Sat Dec 20 03:44:22 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 20 Dec 2008 03:44:22 +0100 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> Message-ID: <494C5C06.30109@v.loewis.de> > Do you think we can get 3.0.1 out on December 24th? I won't have physical access to my build machine from December 24th to January 3rd. > Or should we wait > until after Christmas and get it out, say on the 29th? Do we need an rc? If you want to get it quickly, it should happen on December 23rd (my time, meaning that the tag should be created on December 22nd). December 29th might work as well; I'd create the binaries remotely (in this case, the tag would need to be created on December 28th). Overall, I think a week more or less doesn't really matter, and would prefer to see the release created in January. There are 13 release blockers, and I'm skeptical that they can all get resolved within the next few days. Regards, Martin From martin at v.loewis.de Sat Dec 20 04:12:21 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 20 Dec 2008 04:12:21 +0100 Subject: [Python-Dev] 2.6 and 3.0 buildbot slaves Message-ID: <494C6295.9080506@v.loewis.de> I have now set up buildbot slaves for 2.6 and 3.0, and turned off the 2.5 ones. Regards, Martin From ncoghlan at gmail.com Sat Dec 20 08:17:28 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Dec 2008 17:17:28 +1000 Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup? In-Reply-To: <494C1CE4.5080102@gmail.com> References: <494C1CE4.5080102@gmail.com> Message-ID: <494C9C08.5030702@gmail.com> Nick Coghlan wrote: > Is there a specific reason for not fully initialising the builtin types? > Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init? I need to correct this slightly: some builtin types *are* initialised properly by _Py_ReadyTypes. So the question is actually whether or not the missing builtin types should be added to that function. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From kristjan at ccpgames.com Sat Dec 20 11:02:38 2008 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sat, 20 Dec 2008 10:02:38 +0000 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> Can you distill the program into something reproducible? Maybe with something slightly less than 45Gb but still exhibiting some degradation of exit performance? I can try to point our commercial profiling tools at it and see what it is doing. K -----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Mike Coleman Sent: 19. desember 2008 23:30 To: python-dev at python.org Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) I have a program that creates a huge (45GB) defaultdict. (The keys are short strings, the values are short lists of pairs (string, int).) Nothing but possibly the strings and ints is shared. The program takes around 10 minutes to run, but longer than 20 minutes to exit (I gave up at that point). That is, after executing the final statement (a print), it is apparently spending a huge amount of time cleaning up before exiting. I haven't installed any exit handlers or anything like that, all files are already closed and stdout/stderr flushed, and there's nothing special going on. I have done 'gc.disable()' for performance (which is hideous without it)--I have no reason to think there are any loops. Currently I am working around this by doing an os._exit(), which is immediate, but this seems like a bit of hack. Is this something that needs fixing, or that has already been fixed? Mike _______________________________________________ Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/kristjan%40ccpgames.com From steve at pearwood.info Sat Dec 20 11:55:26 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 20 Dec 2008 21:55:26 +1100 Subject: [Python-Dev] =?iso-8859-1?q?extremely_slow_exit_for_program_havin?= =?iso-8859-1?q?g_huge_=2845G=29_dict_=28python_2=2E5=2E2=29?= In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> Message-ID: <200812202155.28024.steve@pearwood.info> On Sat, 20 Dec 2008 09:02:38 pm Kristj?n Valur J?nsson wrote: > Can you distill the program into something reproducible? > Maybe with something slightly less than 45Gb but still exhibiting > some degradation of exit performance? I can try to point our > commercial profiling tools at it and see what it is doing. K In November 2007, a similar problem was reported on the comp.lang.python newsgroup. 370MB was large enough to demonstrate the problem. I don't know if a bug was ever reported. The thread starts here: http://mail.python.org/pipermail/python-list/2007-November/465498.html or if you prefer Google Groups: http://preview.tinyurl.com/97xsso and it describes extremely long times to populate and destroy large dicts even with garbage collection turned off. My summary at the time was: "On systems with multiple CPUs or 64-bit systems, or both, creating and/or deleting a multi-megabyte dictionary in recent versions of Python (2.3, 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minutes, compared to seconds if the system only has a single CPU. Turning garbage collection off doesn't help." I make no guarantee that the above is a correct description of the problem, only that this is what I believed at the time. I'm afraid it is a very long thread, with multiple red herrings, lots of people unable to reproduce the problem, and the usual nonsense that happens on comp.lang.python. I was originally one of the skeptics until I reproduced the original posters problem. I generated a sample file 8 million key/value pairs as a 370MB text file. Reading it into a dict took two and a half minutes on my relatively slow computer. But deleting the dict took more than 30 minutes even with garbage collection switched off. Sample code reproducing the problem on my machine is here: http://mail.python.org/pipermail/python-list/2007-November/465513.html According to this post of mine: http://mail.python.org/pipermail/python-list/2007-November/466209.html deleting 8 million (key, value) pairs stored as a list of tuples was very fast. It was only if they were stored as a dict that deleting it was horribly slow. Please note that other people have tried and failed to replicate the problem. I suspect the fault (if it is one, and not human error) is specific to some combinations of Python version and hardware. Even if this is a Will Not Fix, I'd be curious if anyone else can reproduce the problem. Hope this is helpful, Steven. > -----Original Message----- > From: python-dev-bounces+kristjan=ccpgames.com at python.org > [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On > Behalf Of Mike Coleman Sent: 19. desember 2008 23:30 > To: python-dev at python.org > Subject: [Python-Dev] extremely slow exit for program having huge > (45G) dict (python 2.5.2) > > I have a program that creates a huge (45GB) defaultdict. (The keys > are short strings, the values are short lists of pairs (string, > int).) Nothing but possibly the strings and ints is shared. > > The program takes around 10 minutes to run, but longer than 20 > minutes to exit (I gave up at that point). That is, after executing > the final statement (a print), it is apparently spending a huge > amount of time cleaning up before exiting. I haven't installed any > exit handlers or anything like that, all files are already closed and > stdout/stderr flushed, and there's nothing special going on. I have > done > 'gc.disable()' for performance (which is hideous without it)--I have > no reason to think there are any loops. > > Currently I am working around this by doing an os._exit(), which is > immediate, but this seems like a bit of hack. Is this something that > needs fixing, or that has already been fixed? > > Mike -- Steven D'Aprano From andymac at bullseye.apana.org.au Sat Dec 20 11:08:00 2008 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Sat, 20 Dec 2008 21:08:00 +1100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> Message-ID: <494CC400.7070404@bullseye.andymac.org> Mike Coleman wrote: > I have a program that creates a huge (45GB) defaultdict. (The keys > are short strings, the values are short lists of pairs (string, int).) > Nothing but possibly the strings and ints is shared. > > The program takes around 10 minutes to run, but longer than 20 minutes > to exit (I gave up at that point). That is, after executing the final > statement (a print), it is apparently spending a huge amount of time > cleaning up before exiting. I haven't installed any exit handlers or > anything like that, all files are already closed and stdout/stderr > flushed, and there's nothing special going on. I have done > 'gc.disable()' for performance (which is hideous without it)--I have > no reason to think there are any loops. > > Currently I am working around this by doing an os._exit(), which is > immediate, but this seems like a bit of hack. Is this something that > needs fixing, or that has already been fixed? You don't mention the platform, but... This behaviour was not unknown in the distant past, with much smaller datasets. Most of the problems then related to the platform malloc() doing funny things as stuff was free()ed, like coalescing free space. [I once sat and watched a Python script run in something like 30 seconds and then take nearly 10 minutes to terminate, as you describe (Python 2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of hundred MB of memory - the Solaris 2.5 malloc() had some undesirable properties from Python's point of view] PyMalloc effectively removed this as an issue for most cases and platform malloc()s have also become considerably more sophisticated since then, but I wonder whether the sheer size of your dataset is unmasking related issues. Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus accumulates (2.3 & 2.4 never free()ed arenas). Your platform malloc() might have odd behaviour with 45GB of arenas returned to it piecemeal. This is something that could be checked with a small C program. Calling os._exit() circumvents the free()ing of the arenas. Also consider that, with the exception of small integers (-1..256), no interning of integers is done. If your data contains large quantities of integers with non-unique values (that aren't in the small integer range) you may find it useful to do your own interning. -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac at pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From steve at holdenweb.com Sat Dec 20 14:14:49 2008 From: steve at holdenweb.com (Steve Holden) Date: Sat, 20 Dec 2008 08:14:49 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494CC400.7070404@bullseye.andymac.org> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> Message-ID: Andrew MacIntyre wrote: > Mike Coleman wrote: >> I have a program that creates a huge (45GB) defaultdict. (The keys >> are short strings, the values are short lists of pairs (string, int).) >> Nothing but possibly the strings and ints is shared. >> >> The program takes around 10 minutes to run, but longer than 20 minutes >> to exit (I gave up at that point). That is, after executing the final >> statement (a print), it is apparently spending a huge amount of time >> cleaning up before exiting. I haven't installed any exit handlers or >> anything like that, all files are already closed and stdout/stderr >> flushed, and there's nothing special going on. I have done >> 'gc.disable()' for performance (which is hideous without it)--I have >> no reason to think there are any loops. >> >> Currently I am working around this by doing an os._exit(), which is >> immediate, but this seems like a bit of hack. Is this something that >> needs fixing, or that has already been fixed? > > You don't mention the platform, but... > > This behaviour was not unknown in the distant past, with much smaller > datasets. Most of the problems then related to the platform malloc() > doing funny things as stuff was free()ed, like coalescing free space. > > [I once sat and watched a Python script run in something like 30 seconds > and then take nearly 10 minutes to terminate, as you describe (Python > 2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of > hundred MB of memory - the Solaris 2.5 malloc() had some undesirable > properties from Python's point of view] > > PyMalloc effectively removed this as an issue for most cases and platform > malloc()s have also become considerably more sophisticated since then, > but I wonder whether the sheer size of your dataset is unmasking related > issues. > > Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus > accumulates (2.3 & 2.4 never free()ed arenas). Your platform malloc() > might have odd behaviour with 45GB of arenas returned to it piecemeal. > This is something that could be checked with a small C program. > Calling os._exit() circumvents the free()ing of the arenas. > > Also consider that, with the exception of small integers (-1..256), no > interning of integers is done. If your data contains large quantities > of integers with non-unique values (that aren't in the small integer > range) you may find it useful to do your own interning. > It's a pity a simplistic approach that redefines all space reclamation activities as null functions won't work. I hate to think of all the cycles that are being wasted reclaiming space just because a program has terminated, when in fact an os.exit() call would work just as well from the user's point of view. Unfortunately there are doubtless programs out there that do rely on actions being taken at shutdown. Maybe os.exit() could be more widely advertised, though ... regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From g.brandl at gmx.net Sat Dec 20 14:26:22 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 20 Dec 2008 14:26:22 +0100 Subject: [Python-Dev] Distutils maintenance In-Reply-To: <1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com> References: <94bdd2610812191055yabf58b5sd3563ab1e1f63e42@mail.gmail.com> <1afaf6160812191059p55bda745ta00597b6e043835d@mail.gmail.com> Message-ID: Benjamin Peterson schrieb: > On Fri, Dec 19, 2008 at 12:55 PM, Tarek Ziad? wrote: >> Hello >> >> I would like to request a commit access to work specifically on >> distutils maintenance. > > +1 > > We are currently without an active distutils maintainer, and many > stale distutil tickets are in need of attention I'm sure Tarek could > provide. Tarek has also been providing many useful patches of his own. FWIW, +1. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From g.brandl at gmx.net Sat Dec 20 14:29:15 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 20 Dec 2008 14:29:15 +0100 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> Message-ID: Barry Warsaw schrieb: > I'd like to get Python 3.0.1 out before the end of the year. There > are no showstoppers, but I haven't yet looked at the deferred blockers > or the buildbots. > > Do you think we can get 3.0.1 out on December 24th? Or should we wait > until after Christmas and get it out, say on the 29th? Do we need an > rc? > > This question goes mostly to Martin and Georg. What would work for > you guys? Since the 24th is the most important Christmas day around here, I'll not be available then :) Either 23rd or 29th is fine with me. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out. From skip at pobox.com Sat Dec 20 16:55:32 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 20 Dec 2008 09:55:32 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> Message-ID: <18765.5492.200918.790182@montanaro-dyndns-org.local> Steve> Unfortunately there are doubtless programs out there that do rely Steve> on actions being taken at shutdown. Indeed. I believe any code which calls atexit.register. Steve> Maybe os.exit() could be more widely advertised, though ... That would be os._exit(). Calling it avoids calls to exit functions registered with atexit.register(). I believe it is both safe, and reasonable programming practice for modules to register exit functions. Both the logging and multiprocessing modules call it. It's incumbent on the application programmer to know these details of the modules the app uses (perhaps indirectly) to know whether or not it's safe/wise to call os._exit(). -- Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/ From aahz at pythoncraft.com Sat Dec 20 18:01:55 2008 From: aahz at pythoncraft.com (Aahz) Date: Sat, 20 Dec 2008 09:01:55 -0800 Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup? In-Reply-To: <494C1CE4.5080102@gmail.com> References: <494C1CE4.5080102@gmail.com> Message-ID: <20081220170154.GA28166@panix.com> On Sat, Dec 20, 2008, Nick Coghlan wrote: > > It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of > the builtin types - they're left to have it called implicitly when an > operation using them needs tp_dict filled in. This seems like a release blocker for 3.0.1 to me -- Aahz (aahz at pythoncraft.com) <*> http://www.pythoncraft.com/ "It is easier to optimize correct code than to correct optimized code." --Bill Harlan From tutufan at gmail.com Sat Dec 20 17:57:47 2008 From: tutufan at gmail.com (Mike Coleman) Date: Sat, 20 Dec 2008 10:57:47 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> Message-ID: <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> On Sat, Dec 20, 2008 at 4:02 AM, Kristj?n Valur J?nsson wrote: > Can you distill the program into something reproducible? > Maybe with something slightly less than 45Gb but still exhibiting some degradation of exit performance? > I can try to point our commercial profiling tools at it and see what it is doing. I will try next week to see if I can come up with a smaller, submittable example. Thanks. From tutufan at gmail.com Sat Dec 20 18:09:03 2008 From: tutufan at gmail.com (Mike Coleman) Date: Sat, 20 Dec 2008 11:09:03 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494CC400.7070404@bullseye.andymac.org> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> Message-ID: <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> Andrew, this is on an (intel) x86_64 box with 64GB of RAM. I don't recall the maker or details of the architecture off the top of my head, but it would be something "off the rack" from Dell or maybe HP. There were other users on the box at the time, but nothing heavy or that gave me any reason to think was affecting my program. It's running CentOS 5 I think, so that might make glibc several years old. Your malloc idea sounds plausible to me. If it is a libc problem, it would be nice if there was some way we could tell malloc to "live for today because there is no tomorrow" in the terminal phase of the program. I'm not sure exactly how to attack this. Callgrind is cool, but no way will work on something this size. Timed ltrace output might be interesting. Or maybe a gprof'ed Python, though that's more work. Regarding interning, I thought this only worked with strings. Is there some way to intern integers? I'm probably creating 300M integers more or less uniformly distributed across range(10000). Mike On Sat, Dec 20, 2008 at 4:08 AM, Andrew MacIntyre wrote: > Mike Coleman wrote: >> >> I have a program that creates a huge (45GB) defaultdict. (The keys >> are short strings, the values are short lists of pairs (string, int).) >> Nothing but possibly the strings and ints is shared. >> >> The program takes around 10 minutes to run, but longer than 20 minutes >> to exit (I gave up at that point). That is, after executing the final >> statement (a print), it is apparently spending a huge amount of time >> cleaning up before exiting. I haven't installed any exit handlers or >> anything like that, all files are already closed and stdout/stderr >> flushed, and there's nothing special going on. I have done >> 'gc.disable()' for performance (which is hideous without it)--I have >> no reason to think there are any loops. >> >> Currently I am working around this by doing an os._exit(), which is >> immediate, but this seems like a bit of hack. Is this something that >> needs fixing, or that has already been fixed? > > You don't mention the platform, but... > > This behaviour was not unknown in the distant past, with much smaller > datasets. Most of the problems then related to the platform malloc() > doing funny things as stuff was free()ed, like coalescing free space. > > [I once sat and watched a Python script run in something like 30 seconds > and then take nearly 10 minutes to terminate, as you describe (Python > 2.1/Solaris 2.5/Ultrasparc E3500)... and that was only a couple of > hundred MB of memory - the Solaris 2.5 malloc() had some undesirable > properties from Python's point of view] > > PyMalloc effectively removed this as an issue for most cases and platform > malloc()s have also become considerably more sophisticated since then, > but I wonder whether the sheer size of your dataset is unmasking related > issues. > > Note that in Python 2.5 PyMalloc does free() unused arenas as a surplus > accumulates (2.3 & 2.4 never free()ed arenas). Your platform malloc() > might have odd behaviour with 45GB of arenas returned to it piecemeal. > This is something that could be checked with a small C program. > Calling os._exit() circumvents the free()ing of the arenas. > > Also consider that, with the exception of small integers (-1..256), no > interning of integers is done. If your data contains large quantities > of integers with non-unique values (that aren't in the small integer > range) you may find it useful to do your own interning. > > -- > ------------------------------------------------------------------------- > Andrew I MacIntyre "These thoughts are mine alone..." > E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370 > andymac at pcug.org.au (alt) | Belconnen ACT 2616 > Web: http://www.andymac.org/ | Australia > From Scott.Daniels at Acm.Org Sat Dec 20 18:41:39 2008 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Sat, 20 Dec 2008 09:41:39 -0800 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> Message-ID: Mike Coleman wrote: > ... Regarding interning, I thought this only worked with strings. > Is there some way to intern integers? I'm probably creating 300M > integers more or less uniformly distributed across range(10000)? held = list(range(10000)) ... troublesome_dict[string] = held[number_to_hold] ... --Scott David Daniels Scott.Daniels at Acm.Org From kristjan at ccpgames.com Sat Dec 20 19:25:25 2008 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sat, 20 Dec 2008 18:25:25 +0000 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702BDD@exchis.ccp.ad.local> You can always try poor-man's profiling, which is surprisingly useful in the face of massive performance problems. Just attach a debugger to the program, and when it suffering from a performance problem, break the execution on a regular basis. You are statistically very likely to get a callstack representative of the problem you are having. Do this a few times and you will get a fair impression of what the program is spending its time on. >From the debugger, you can also examine the python callstack of the program by examinging the 'f' local variable in the Frame Evaluation function. Have fun, K -----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Mike Coleman Sent: 20. desember 2008 17:09 To: Andrew MacIntyre Cc: Python Dev Subject: Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) I'm not sure exactly how to attack this. Callgrind is cool, but no way will work on something this size. Timed ltrace output might be interesting. Or maybe a gprof'ed Python, though that's more work. From mikko+python at redinnovation.com Sat Dec 20 20:27:15 2008 From: mikko+python at redinnovation.com (Mikko Ohtamaa) Date: Sat, 20 Dec 2008 21:27:15 +0200 Subject: [Python-Dev] VM imaging based launch optimizations for CPython? Message-ID: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com> Hi fellow snakemen and lizard ladies, We have been recently done lots of Python work on Nokia Series 60 phones and even managed to roll out some commercial Python based applications. In the future we plan to create some iPhone Python apps also. Python runs fine in phones - after it has been launched. Currently the biggest issue preventing the world dominance of Python based mobile applications is the start up time. We cope with the issue by using fancy splash screens and progress indicators, but it does't cure the fact that it takes a minute to show the main user interface of the application. Most of the time is spend in import executing opcodes and forming function and class structures in memory - something which cannot be easily boosted. Now, we have been thinking. Maemo has fork() based Python launcher ( http://blogs.gnome.org/johan/2007/01/18/introducing-python-launcher/) which greatly speed ups the start up time by holding Python in memory all the time. We cannot afford such luxury on Symbian and iPhone, since we do not control the operating system. So how about this 1. A Python application is launched normally 2. After VM has initialized module importing and reached a static launch state (meaning that the state is same on every launch) the VM state is written on to disk 3. Application continues execution and starts doing dynamic stuff 4. On the following launches, special init code is used which directly blits VM image from disk back to memory and we have reached the static state again without going whoops of executing import related opcodes 5. Also, I have heard a suggestion that VM image could be defragmented and analyzed offline Any opinions? Cheers, Mikko -- Mikko Ohtamaa Red Innovation Ltd. Oulu, Finland http://www.redinnovation.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Dec 20 20:45:11 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 20 Dec 2008 19:45:11 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?extremely_slow_exit_for_program_having_hug?= =?utf-8?b?ZSAoNDVHKSBkaWN0IChweXRob24gMi41LjIp?= References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <200812202155.28024.steve@pearwood.info> Message-ID: Steven D'Aprano pearwood.info> writes: > > In November 2007, a similar problem was reported on the comp.lang.python > newsgroup. 370MB was large enough to demonstrate the problem. I don't > know if a bug was ever reported. Do you still reproduce it on trunk? I've tried your scripts on my machine and they work fine, even if I leave garbage collecting enabled during the process. (dual core 64-bit machine but in 32-bit mode) From mal at egenix.com Sat Dec 20 21:04:32 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 20 Dec 2008 21:04:32 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> Message-ID: <494D4FD0.4020202@egenix.com> On 2008-12-20 17:57, Mike Coleman wrote: > On Sat, Dec 20, 2008 at 4:02 AM, Kristj?n Valur J?nsson > wrote: >> Can you distill the program into something reproducible? >> Maybe with something slightly less than 45Gb but still exhibiting some degradation of exit performance? >> I can try to point our commercial profiling tools at it and see what it is doing. > > I will try next week to see if I can come up with a smaller, > submittable example. Thanks. These long exit times are usually caused by the garbage collection of objects. This can be a very time consuming task. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 20 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From leif.walsh at gmail.com Sat Dec 20 21:20:22 2008 From: leif.walsh at gmail.com (Leif Walsh) Date: Sat, 20 Dec 2008 15:20:22 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494D4FD0.4020202@egenix.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> Message-ID: On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote: > These long exit times are usually caused by the garbage collection > of objects. This can be a very time consuming task. In that case, the question would be "why is the interpreter collecting garbage when it knows we're trying to exit anyway?". -- Cheers, Leif From fuzzyman at voidspace.org.uk Sat Dec 20 21:25:42 2008 From: fuzzyman at voidspace.org.uk (Michael Foord) Date: Sat, 20 Dec 2008 20:25:42 +0000 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> Message-ID: <494D54C6.3000500@voidspace.org.uk> Leif Walsh wrote: > On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote: > >> These long exit times are usually caused by the garbage collection >> of objects. This can be a very time consuming task. >> > > In that case, the question would be "why is the interpreter collecting > garbage when it knows we're trying to exit anyway?". > > Because finalizers are only called when an object is destroyed presumably. Michael -- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog From skip at pobox.com Sat Dec 20 21:26:20 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 20 Dec 2008 14:26:20 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> Message-ID: <18765.21740.137339.943481@montanaro-dyndns-org.local> Leif> In that case, the question would be "why is the interpreter Leif> collecting garbage when it knows we're trying to exit anyway?". Because useful side effects are sometimes performed as a result of this activity (flushing disk buffers, closing database connections, etc). Skip From tim.peters at gmail.com Sat Dec 20 21:34:19 2008 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 20 Dec 2008 15:34:19 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> Message-ID: <1f7befae0812201234h71fffc0cnf3f01ce08bc70ffa@mail.gmail.com> [M.-A. Lemburg] >> These long exit times are usually caused by the garbage collection >> of objects. This can be a very time consuming task. [Leif Walsh] > In that case, the question would be "why is the interpreter collecting > garbage when it knows we're trying to exit anyway?". Because user-defined destructors (like __del__ methods and weakref callbacks) may be associated with garbage, and users presumably want those to execute. Doing so requires identifying identifying garbage and releasing it, same as if the interpreter didn't happen to be exiting. BTW, the original poster should try this: use whatever tools the OS supplies to look at CPU and disk usage during the long exit. What I /expect/ is that almost no CPU time is being used, while the disk is grinding itself to dust. That's what happens when a large number of objects have been swapped out to disk, and exit processing has to page them all back into memory again (in order to decrement their refcounts). Python's cyclic gc (the `gc` module) has nothing to do with this -- it's typically the been-there-forever refcount-based non-cyclic gc that accounts for supernaturally long exit times. If that is the case here, there's no evident general solution. If you have millions of objects still alive at exit, refcount-based reclamation has to visit all of them, and if they've been swapped out to disk it can take a very long time to swap them all back into memory again. From mal at egenix.com Sat Dec 20 21:50:19 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 20 Dec 2008 21:50:19 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> Message-ID: <494D5A8B.8060000@egenix.com> On 2008-12-20 21:20, Leif Walsh wrote: > On Sat, Dec 20, 2008 at 3:04 PM, M.-A. Lemburg wrote: >> These long exit times are usually caused by the garbage collection >> of objects. This can be a very time consuming task. > > In that case, the question would be "why is the interpreter collecting > garbage when it knows we're trying to exit anyway?". It cannot know until the very end, because there may still be some try: ... except SystemExit: ... somewhere in the code waiting to trigger and stop the system exit. If you want a really fast exit, try this: import os os.kill(os.getpid(), 9) But you better know what you're doing if you take this approach... -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 20 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From leif.walsh at gmail.com Sat Dec 20 22:01:59 2008 From: leif.walsh at gmail.com (Leif Walsh) Date: Sat, 20 Dec 2008 16:01:59 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <18765.21740.137339.943481@montanaro-dyndns-org.local> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> Message-ID: (@Skip, Michael, Tim) On Sat, Dec 20, 2008 at 3:26 PM, wrote: > Because useful side effects are sometimes performed as a result of this > activity (flushing disk buffers, closing database connections, etc). Of course they are. But what about the case given above: On Sat, Dec 20, 2008 at 5:55 AM, Steven D'Aprano wrote: > I was originally one of the skeptics until I reproduced the original > posters problem. I generated a sample file 8 million key/value pairs as > a 370MB text file. Reading it into a dict took two and a half minutes > on my relatively slow computer. But deleting the dict took more than 30 > minutes even with garbage collection switched off. It might be a semantic change that I'm looking for here, but it seems to me that if you turn off the garbage collector, you should be able to expect that either it also won't run on exit, or it should have a way of letting you tell it not to run on exit. If I'm running without a garbage collector, that assumes I'm at least cocky enough to think I know when I'm done with my objects, so I should know to delete the objects that have __del__ functions I care about before I exit. Well, maybe; I'm sure one of you could drag out a programmer that would make that mistake, but turning off the garbage collector to me seems to send the experience message, at least a little. Does the garbage collector run any differently when the process is exiting? It seems that it wouldn't need to do anything more that run through all objects in the heap and delete them, which doesn't require anything fancy, and should be able to sort by address to aid with caching. If it's already this fast, then I guess it really is the sheer number of function calls necessary that are causing such a slowdown in the cases we've seen, but I find this hard to believe. -- Cheers, Leif From tim.peters at gmail.com Sat Dec 20 22:03:11 2008 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 20 Dec 2008 16:03:11 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> Message-ID: <1f7befae0812201303x39b21d00qcda6e897a29371db@mail.gmail.com> [Mike Coleman] >> ... Regarding interning, I thought this only worked with strings. Implementation details. Recent versions of CPython also, e.g., "intern" the empty tuple, and very small integers. >> Is there some way to intern integers? I'm probably creating 300M >> integers more or less uniformly distributed across range(10000)? Interning would /vastly/ reduce memory use for ints in that case, from gigabytes down to less than half a megabyte. [Scott David Daniels] > held = list(range(10000)) > ... > troublesome_dict[string] = held[number_to_hold] > ... More generally, but a bit slower, for objects usable as dict keys, change code of the form: x = whatever_you_do_to_get_a_new_object() use(x) to: x = whatever_you_do_to_get_a_new_object() x = intern_it(x, x) use(x) where `intern_it` is defined like so once at the start of the program: intern_it = {}.setdefault This snippet may make the mechanism clearer: >>> intern_it = {}.setdefault >>> x = 3000 >>> id(intern_it(x, x)) 36166156 >>> x = 1000 + 2000 >>> id(intern_it(x, x)) 36166156 >>> x = "works for computed strings too" >>> id(intern_it(x, x)) 27062696 >>> x = "works for computed strings t" + "o" * 2 >>> id(intern_it(x, x)) 27062696 From tim.peters at gmail.com Sat Dec 20 22:11:30 2008 From: tim.peters at gmail.com (Tim Peters) Date: Sat, 20 Dec 2008 16:11:30 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> Message-ID: <1f7befae0812201311t974df22m75096fe48391c153@mail.gmail.com> [Leif Walsh] > ... > It might be a semantic change that I'm looking for here, but it seems > to me that if you turn off the garbage collector, you should be able > to expect that either it also won't run on exit, It won't then, but "the garbage collector" is the gc module, and that only performs /cyclic/ garbage collection. There is no way to stop refcount-based garbage collection. Read my message again. > or it should have a > way of letting you tell it not to run on exit. If I'm running without > a garbage collector, that assumes I'm at least cocky enough to think I > know when I'm done with my objects, so I should know to delete the > objects that have __del__ functions I care about before I exit. Well, > maybe; I'm sure one of you could drag out a programmer that would make > that mistake, but turning off the garbage collector to me seems to > send the experience message, at least a little. This probably isn't a problem with cyclic gc (reread my msg). > Does the garbage collector run any differently when the process is > exiting? No. > It seems that it wouldn't need to do anything more that run > through all objects in the heap and delete them, which doesn't require > anything fancy, Reread my msg -- already explained the likely cause here (if "all the objects in the heap" have in fact been swapped out to disk, it can take an enormously long time to just "run through" them all). > and should be able to sort by address to aid with > caching. That one isn't possible. There is no list of "all objects" to /be/ sorted. The only way to find all the objects is to traverse the object graph from its roots, which is exactly what non-cyclic gc does anyway. > If it's already this fast, then I guess it really is the > sheer number of function calls necessary that are causing such a > slowdown in the cases we've seen, but I find this hard to believe. My guess remains that CPU usage is trivial here, and 99.99+% of the wall-clock time is consumed waiting for disk reads. Either that, or that platform malloc is going nuts. From solipsis at pitrou.net Sat Dec 20 22:13:11 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 20 Dec 2008 21:13:11 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?extremely_slow_exit_for_program_having_hug?= =?utf-8?b?ZSAoNDVHKQlkaWN0IChweXRob24gMi41LjIp?= References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> Message-ID: Leif Walsh gmail.com> writes: > > It might be a semantic change that I'm looking for here, but it seems > to me that if you turn off the garbage collector, you should be able > to expect that either it also won't run on exit, or it should have a > way of letting you tell it not to run on exit. [...] I'm skeptical that it's a garbage collector problem. The script creates one dict containing lots of strings and ints. The thing is, strings and ints aren't tracked by the GC as they are simple atomic objects. Therefore, the /only/ object created by the script which is tracked by the GC is the dict. Moreover, since there is no cycle created, the dict should be directly destroyed when its last reference dies (the "del" statement), not go through the garbage collection process. Given that the problem is reproduced on certain systems and not others, it can be related to an interaction between allocation patterns of the dict implementation, the Python memory allocator, and the implementation of the C malloc() / free() functions. I'm no expert enough to find out more on the subject. From fabiofz at gmail.com Sat Dec 20 22:45:18 2008 From: fabiofz at gmail.com (Fabio Zadrozny) Date: Sat, 20 Dec 2008 19:45:18 -0200 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? In-Reply-To: References: <0016e64f68207a52a5045e6de625@google.com> Message-ID: It appears that this bug was already reported: http://bugs.python.org/issue4705 Any chance that it gets in the next 3.0.x bugfix release? Just as a note, if I do: sys.stdout._line_buffering = True, it also works, but doesn't seem right as it's accessing an internal attribute. Note 2: the solution that said to pass 'wb' does not work, because I need the output as text and not binary or text becomes garbled when it's not ascii. Thanks, Fabio On Fri, Dec 19, 2008 at 9:03 PM, Guido van Rossum wrote: > Fror truly unbuffered text output you'd have to make changes to the > io.TextIOWrapper class to flush after each write() call. That's an API > change -- the constructor currently has a line_buffering option but no > option for completely unbuffered mode. It would also require some > changes to io.open() which currently rejects buffering=0 in text mode. > All that suggests that it should wait until 3.1. > > However it might make sense to at least turn on line buffering when -u > or PYTHONUNBUFFERED is given; that doesn't require API changes and so > can be considered a bug fix. > > --Guido van Rossum (home page: http://www.python.org/~guido/) > > > > On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou wrote: >> >>> Well, ``python -h`` still lists it. >> >> Precisely, it says: >> >> -u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x >> see man page for details on internal buffering relating to '-u' >> >> Note the "binary". And indeed: >> >> ./python -u >> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54) >> [GCC 4.3.2] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >>>>> import sys >>>>> sys.stdout.buffer.write(b"y") >> y1 >>>>> >> >> I don't know what it would take to enable unbuffered text IO while keeping the >> current TextIOWrapper implementation... >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/fabiofz%40gmail.com > From martin at v.loewis.de Sat Dec 20 22:55:30 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 20 Dec 2008 22:55:30 +0100 Subject: [Python-Dev] VM imaging based launch optimizations for CPython? In-Reply-To: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com> References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com> Message-ID: <494D69D2.5090601@v.loewis.de> > Any opinions? I would use a different marshal implementation. Instead of defining a stream format for marshal, make marshal dump its graph of objects along with the actual memory layout. On load, copying can be avoided; just a few pointers need to be updated. The resulting marshal files would be platform-specific (wrt. endianness and pointer width). On marshaling, you copy all objects into a contiguous block of memory (8-aligned), and dump that. On unmarshaling, you just map that block. If the target supports true memory mapping with page boundaries, you might be able to store multiple .pyc files into a single page. This reformatting could be done offline also. A few things need to be considered: - compatibility. The original marshal code would probably need to be preserved for the "marshal" module. - relative pointers. Code objects, tuples, etc. contain pointers. Assuming the marshaled object cannot be loaded back into the same address, you need to adjust pointers. A common trick is to put a desired load address into the memory block, then try to load into that address. If the address is already taken, load into a different address, and walk though all objects, adjusting pointers. - type references. On loading, you will need to patch all ob_type fields. Put the marshal codes into the ob_type field on marshalling, then switch on unmarshalling. - references to interned strings. On loading, you can either intern them all, or you have a "fast interning" algorithm that assigns a fixed table of interned-string numbers. - reference counting. Make sure all these objects start out with a reference count of 1, so they will never become garbage. If you use a container file for multiple .pyc files, you can have additional savings by sharing strings across modules; this should help in particular for reference to builtin symbols, and for common method names. A fixed interning might become unnecessary as the unique single string object in the container will either become the interned string itself, or point it it after being interned once. With such a container system, unmarshalling should be lazy; e.g. for each object, the value of ob_type can be used to determine whether the object was unmarshalled. Of course, you still have the actual interpretation of the top-level module code - if it's not the marshalling but this part that actually costs performance, this efficient marshalling algorithm won't help. It would be interesting to find out which modules have a particularly high startup cost - perhaps they can be rewritten. Regards, Martin From tutufan at gmail.com Sat Dec 20 23:06:02 2008 From: tutufan at gmail.com (Mike Coleman) Date: Sat, 20 Dec 2008 16:06:02 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494D5A8B.8060000@egenix.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D5A8B.8060000@egenix.com> Message-ID: <3c6c07c20812201406j198acad7y8e04bae80324be0a@mail.gmail.com> On Sat, Dec 20, 2008 at 2:50 PM, M.-A. Lemburg wrote: > If you want a really fast exit, try this: > > import os > os.kill(os.getpid(), 9) > > But you better know what you're doing if you take this approach... This would work, but I think os._exit(EX_OK) is probably just as fast, and allows you to control the exit status... From martin at v.loewis.de Sat Dec 20 23:16:09 2008 From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 20 Dec 2008 23:16:09 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494D4FD0.4020202@egenix.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> Message-ID: <494D6EA9.2040201@v.loewis.de> >> I will try next week to see if I can come up with a smaller, >> submittable example. Thanks. > > These long exit times are usually caused by the garbage collection > of objects. This can be a very time consuming task. I doubt that. The long exit times are usually caused by a bad malloc implementation. Regards, Martin From arfrever.fta at gmail.com Sat Dec 20 23:28:18 2008 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Sat, 20 Dec 2008 23:28:18 +0100 Subject: [Python-Dev] 2.6.1 documentation not available for download Message-ID: <200812202328.20045.Arfrever.FTA@gmail.com> Python 2.6.1 documentation currently isn't available for download at: http://docs.python.org/ftp/python/doc/ Additionally please include version numbers in documentation archives (e.g. python-docs-html-2.6.1.tar.bz2). -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: From steve at holdenweb.com Sat Dec 20 23:37:12 2008 From: steve at holdenweb.com (Steve Holden) Date: Sat, 20 Dec 2008 17:37:12 -0500 Subject: [Python-Dev] 2.6.1 license Message-ID: It might be helpful if http://www.python.org/download/releases/2.6.1/license/ said it was also the official license for the 2.6.1 release (though I don't suppose it matters that it's still called the 2.5 license, since that's its origin). Another detail to go into the release manage PEP? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From steve at holdenweb.com Sat Dec 20 23:44:28 2008 From: steve at holdenweb.com (Steve Holden) Date: Sat, 20 Dec 2008 17:44:28 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> Message-ID: <494D754C.1050109@holdenweb.com> Antoine Pitrou wrote: > Leif Walsh gmail.com> writes: >> It might be a semantic change that I'm looking for here, but it seems >> to me that if you turn off the garbage collector, you should be able >> to expect that either it also won't run on exit, or it should have a >> way of letting you tell it not to run on exit. > [...] > > I'm skeptical that it's a garbage collector problem. The script creates one dict > containing lots of strings and ints. The thing is, strings and ints aren't > tracked by the GC as they are simple atomic objects. Therefore, the /only/ > object created by the script which is tracked by the GC is the dict. Moreover, > since there is no cycle created, the dict should be directly destroyed when its > last reference dies (the "del" statement), not go through the garbage collection > process. > > Given that the problem is reproduced on certain systems and not others, it can > be related to an interaction between allocation patterns of the dict > implementation, the Python memory allocator, and the implementation of the C > malloc() / free() functions. I'm no expert enough to find out more on the > subject. > I believe the OP engendered a certain amount of confusion by describing object deallocation as being performed by the garbage collector. So he perhaps didn't understand that even decref'ing all the objects only referenced by the dict will take a huge amount of time unless there's enough real memory to hold it. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From steve at holdenweb.com Sat Dec 20 23:44:28 2008 From: steve at holdenweb.com (Steve Holden) Date: Sat, 20 Dec 2008 17:44:28 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> Message-ID: <494D754C.1050109@holdenweb.com> Antoine Pitrou wrote: > Leif Walsh gmail.com> writes: >> It might be a semantic change that I'm looking for here, but it seems >> to me that if you turn off the garbage collector, you should be able >> to expect that either it also won't run on exit, or it should have a >> way of letting you tell it not to run on exit. > [...] > > I'm skeptical that it's a garbage collector problem. The script creates one dict > containing lots of strings and ints. The thing is, strings and ints aren't > tracked by the GC as they are simple atomic objects. Therefore, the /only/ > object created by the script which is tracked by the GC is the dict. Moreover, > since there is no cycle created, the dict should be directly destroyed when its > last reference dies (the "del" statement), not go through the garbage collection > process. > > Given that the problem is reproduced on certain systems and not others, it can > be related to an interaction between allocation patterns of the dict > implementation, the Python memory allocator, and the implementation of the C > malloc() / free() functions. I'm no expert enough to find out more on the > subject. > I believe the OP engendered a certain amount of confusion by describing object deallocation as being performed by the garbage collector. So he perhaps didn't understand that even decref'ing all the objects only referenced by the dict will take a huge amount of time unless there's enough real memory to hold it. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From musiccomposition at gmail.com Sat Dec 20 23:46:15 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sat, 20 Dec 2008 16:46:15 -0600 Subject: [Python-Dev] 2.6.1 documentation not available for download In-Reply-To: <200812202328.20045.Arfrever.FTA@gmail.com> References: <200812202328.20045.Arfrever.FTA@gmail.com> Message-ID: <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com> On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis wrote: > Python 2.6.1 documentation currently isn't available for download at: > http://docs.python.org/ftp/python/doc/ It is avaiable here, though: http://www.python.org/ftp/python/doc/current/ > > Additionally please include version numbers in documentation > archives (e.g. python-docs-html-2.6.1.tar.bz2). > -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From musiccomposition at gmail.com Sat Dec 20 23:56:46 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sat, 20 Dec 2008 16:56:46 -0600 Subject: [Python-Dev] 2.6.1 license In-Reply-To: References: Message-ID: <1afaf6160812201456kf192bf6r389fd6896bfb4fbd@mail.gmail.com> On Sat, Dec 20, 2008 at 4:37 PM, Steve Holden wrote: > It might be helpful if > > http://www.python.org/download/releases/2.6.1/license/ > > said it was also the official license for the 2.6.1 release (though I > don't suppose it matters that it's still called the 2.5 license, since > that's its origin). I've updated the website and the PEP. -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From arfrever.fta at gmail.com Sun Dec 21 00:02:05 2008 From: arfrever.fta at gmail.com (Arfrever Frehtes Taifersar Arahesis) Date: Sun, 21 Dec 2008 00:02:05 +0100 Subject: [Python-Dev] 2.6.1 documentation not available for download In-Reply-To: <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com> References: <200812202328.20045.Arfrever.FTA@gmail.com> <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com> Message-ID: <200812210002.05587.Arfrever.FTA@gmail.com> 2008-12-20 23:46:15 Benjamin Peterson napisa?(a): > On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis > wrote: > > Python 2.6.1 documentation currently isn't available for download at: > > http://docs.python.org/ftp/python/doc/ > > It is avaiable here, though: > > http://www.python.org/ftp/python/doc/current/ I need documentation created from the 'r261' tag, not from the HEAD of the 'release26-maint' branch. -- Arfrever Frehtes Taifersar Arahesis -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: From brett at python.org Sun Dec 21 00:15:22 2008 From: brett at python.org (Brett Cannon) Date: Sat, 20 Dec 2008 15:15:22 -0800 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? In-Reply-To: References: <0016e64f68207a52a5045e6de625@google.com> Message-ID: On Sat, Dec 20, 2008 at 13:45, Fabio Zadrozny wrote: > It appears that this bug was already reported: http://bugs.python.org/issue4705 > > Any chance that it gets in the next 3.0.x bugfix release? > > Just as a note, if I do: sys.stdout._line_buffering = True, it also > works, but doesn't seem right as it's accessing an internal attribute. > > Note 2: the solution that said to pass 'wb' does not work, because I > need the output as text and not binary or text becomes garbled when > it's not ascii. > Can't you decode the bytes after you receive them? -Brett > Thanks, > > Fabio > > On Fri, Dec 19, 2008 at 9:03 PM, Guido van Rossum wrote: >> Fror truly unbuffered text output you'd have to make changes to the >> io.TextIOWrapper class to flush after each write() call. That's an API >> change -- the constructor currently has a line_buffering option but no >> option for completely unbuffered mode. It would also require some >> changes to io.open() which currently rejects buffering=0 in text mode. >> All that suggests that it should wait until 3.1. >> >> However it might make sense to at least turn on line buffering when -u >> or PYTHONUNBUFFERED is given; that doesn't require API changes and so >> can be considered a bug fix. >> >> --Guido van Rossum (home page: http://www.python.org/~guido/) >> >> >> >> On Fri, Dec 19, 2008 at 2:47 PM, Antoine Pitrou wrote: >>> >>>> Well, ``python -h`` still lists it. >>> >>> Precisely, it says: >>> >>> -u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x >>> see man page for details on internal buffering relating to '-u' >>> >>> Note the "binary". And indeed: >>> >>> ./python -u >>> Python 3.1a0 (py3k:67839M, Dec 18 2008, 17:56:54) >>> [GCC 4.3.2] on linux2 >>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> import sys >>>>>> sys.stdout.buffer.write(b"y") >>> y1 >>>>>> >>> >>> I don't know what it would take to enable unbuffered text IO while keeping the >>> current TextIOWrapper implementation... >>> >>> Regards >>> >>> Antoine. >>> >>> >>> _______________________________________________ >>> Python-Dev mailing list >>> Python-Dev at python.org >>> http://mail.python.org/mailman/listinfo/python-dev >>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org >>> >> _______________________________________________ >> Python-Dev mailing list >> Python-Dev at python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fabiofz%40gmail.com >> > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org > From solipsis at pitrou.net Sun Dec 21 00:25:11 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 20 Dec 2008 23:25:11 +0000 (UTC) Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> <494D754C.1050109@holdenweb.com> Message-ID: Steve Holden holdenweb.com> writes: > I believe the OP engendered a certain amount of confusion by describing > object deallocation as being performed by the garbage collector. So he > perhaps didn't understand that even decref'ing all the objects only > referenced by the dict will take a huge amount of time unless there's > enough real memory to hold it. He said he has 64GB RAM so I assume all his working set was in memory, not swapped out. From alexandre at peadrop.com Sun Dec 21 00:28:23 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sat, 20 Dec 2008 18:28:23 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> Message-ID: On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman wrote: > I have a program that creates a huge (45GB) defaultdict. (The keys > are short strings, the values are short lists of pairs (string, int).) > Nothing but possibly the strings and ints is shared. > > That is, after executing the final statement (a print), it is apparently spending a > huge amount of time cleaning up before exiting. > I have done 'gc.disable()' for performance (which is hideous without it)--I have > no reason to think there are any loops. From alexandre at peadrop.com Sun Dec 21 00:40:25 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Sat, 20 Dec 2008 18:40:25 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> Message-ID: [Sorry, for the previous garbage post.] > On Fri, Dec 19, 2008 at 6:29 PM, Mike Coleman wrote: > I have a program that creates a huge (45GB) defaultdict. (The keys > are short strings, the values are short lists of pairs (string, int).) > Nothing but possibly the strings and ints is shared. Could you give us more information about the dictionary. For example, how many objects does it contain? Is 45GB the actual size of the dictionary or of the Python process? > That is, after executing the final statement (a print), it is apparently > spending a huge amount of time cleaning up before exiting. Most of this time is probably spent on DECREF'ing objects in the dictionary. As other mentioned, it would useful to have self-contained example to examine the behavior more closely. > I have done 'gc.disable()' for performance (which is hideous without it)--I > have no reason to think there are any loops. Have you seen any significant difference in the exit time when the cyclic GC is disabled or enabled? -- Alexandre From ncoghlan at gmail.com Sun Dec 21 01:14:44 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Dec 2008 10:14:44 +1000 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <1f7befae0812201234h71fffc0cnf3f01ce08bc70ffa@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <1f7befae0812201234h71fffc0cnf3f01ce08bc70ffa@mail.gmail.com> Message-ID: <494D8A74.9050306@gmail.com> Tim Peters wrote: > If that is the case here, there's no evident general solution. If you > have millions of objects still alive at exit, refcount-based > reclamation has to visit all of them, and if they've been swapped out > to disk it can take a very long time to swap them all back into memory > again. In that case, it sounds like using os._exit() to get out of the program without visiting all that memory *is* the right answer (or as right an answer as is available at least). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From andrew-pythondev at puzzling.org Sun Dec 21 01:15:30 2008 From: andrew-pythondev at puzzling.org (Andrew Bennetts) Date: Sun, 21 Dec 2008 11:15:30 +1100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <18765.5492.200918.790182@montanaro-dyndns-org.local> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> <18765.5492.200918.790182@montanaro-dyndns-org.local> Message-ID: <20081221001530.GA32606@steerpike.home.puzzling.org> skip at pobox.com wrote: > > Steve> Unfortunately there are doubtless programs out there that do rely > Steve> on actions being taken at shutdown. > > Indeed. I believe any code which calls atexit.register. > > Steve> Maybe os.exit() could be more widely advertised, though ... > > That would be os._exit(). Calling it avoids calls to exit functions > registered with atexit.register(). I believe it is both safe, and > reasonable programming practice for modules to register exit functions. > Both the logging and multiprocessing modules call it. It's incumbent on the > application programmer to know these details of the modules the app uses > (perhaps indirectly) to know whether or not it's safe/wise to call > os._exit(). You could call sys.exitfunc() just before os._exit(). -Andrew. From ncoghlan at gmail.com Sun Dec 21 01:28:25 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Dec 2008 10:28:25 +1000 Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup? In-Reply-To: <20081220170154.GA28166@panix.com> References: <494C1CE4.5080102@gmail.com> <20081220170154.GA28166@panix.com> Message-ID: <494D8DA9.6010307@gmail.com> Aahz wrote: > On Sat, Dec 20, 2008, Nick Coghlan wrote: >> It turns out that _PyBuiltin_Init doesn't call PyType_Ready on any of >> the builtin types - they're left to have it called implicitly when an >> operation using them needs tp_dict filled in. > > This seems like a release blocker for 3.0.1 to me The problem isn't actually as bad as I first thought (it turns out most of the builtin types *are* fully initialised in _Py_ReadyTypes, which is called from Py_InitializeEx). However, xrange/range are definitely missing from that function (which is the actual proximate cause of the strange range() hashing behaviour in Py3k), and I'm still hoping someone knows why the numeric types aren't being readied there when certain parts of the core need additional handling to cope with the possibility that those types aren't fully initialised (e.g. PyObject_Format has a lazy call to PyType_Ready with a comment noting that it may be asked to format floating point numbers before PyType_Ready has otherwise been called for the float type). That said, I have still added the range() hashing problem to the list of release blockers. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From tutufan at gmail.com Sun Dec 21 01:05:19 2008 From: tutufan at gmail.com (Mike Coleman) Date: Sat, 20 Dec 2008 18:05:19 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> Message-ID: <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com> Tim, I left out some details that I believe probably rule out the "swapped out" theory. The machine in question has 64GB RAM, but only 16GB swap. I'd prefer more swap, but in any case only around ~400MB of the swap was actually in use during my program's entire run. Furthermore, during my program's exit, it was using 100% CPU, and I'm 95% sure there was no significant "system" or "wait" CPU time for the system. (All observations via 'top'.) So, I think that the problem is entirely a computational one within this process. The system does have 8 CPUs. I'm not sure about it's memory architecture, but if it's some kind of NUMA box, I guess access to memory could be slower than what we'd normally expect. I'm skeptical about that being a significant factor here, though. Just to clarify, I didn't gc.disable() to address this problem, but rather because it destroys performance during the creation of the huge dict. I don't have a specific number, but I think disabling gc reduced construction from something like 70 minutes to 5 (or maybe 10). Quite dramatic. Mike >From Tim Peters: BTW, the original poster should try this: use whatever tools the OS supplies to look at CPU and disk usage during the long exit. What I /expect/ is that almost no CPU time is being used, while the disk is grinding itself to dust. That's what happens when a large number of objects have been swapped out to disk, and exit processing has to page them all back into memory again (in order to decrement their refcounts). From tutufan at gmail.com Sun Dec 21 01:22:40 2008 From: tutufan at gmail.com (Mike Coleman) Date: Sat, 20 Dec 2008 18:22:40 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com> Message-ID: <3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com> Re "held" and "intern_it": Haha! That's evil and extremely evil, respectively. :-) I will add these to the Python wiki if they're not already there... Mike From leif.walsh at gmail.com Sun Dec 21 01:34:35 2008 From: leif.walsh at gmail.com (Leif Walsh) Date: Sat, 20 Dec 2008 19:34:35 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <1f7befae0812201311t974df22m75096fe48391c153@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> <1f7befae0812201311t974df22m75096fe48391c153@mail.gmail.com> Message-ID: On Sat, Dec 20, 2008 at 4:11 PM, Tim Peters wrote: > [Lots of answers] Thanks. Wish I could have offered something useful. -- Cheers, Leif From solipsis at pitrou.net Sun Dec 21 01:35:40 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Dec 2008 00:35:40 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?extremely_slow_exit_for_program_having_hug?= =?utf-8?b?ZSAoNDVHKQlkaWN0IChweXRob24gMi41LjIp?= References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com> Message-ID: Mike Coleman gmail.com> writes: > > Just to clarify, I didn't gc.disable() to address this problem, but > rather because it destroys performance during the creation of the huge > dict. I don't have a specific number, but I think disabling gc > reduced construction from something like 70 minutes to 5 (or maybe > 10). Quite dramatic. There's a pending patch which should fix that problem: http://bugs.python.org/issue4074 Regards Antoine. From tutufan at gmail.com Sun Dec 21 02:09:00 2008 From: tutufan at gmail.com (Mike Coleman) Date: Sat, 20 Dec 2008 19:09:00 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> Message-ID: <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti wrote: > Could you give us more information about the dictionary. For example, > how many objects does it contain? Is 45GB the actual size of the > dictionary or of the Python process? The 45G was the VM size of the process (resident size was similar). The dict keys were all uppercase alpha strings of length 7. I don't have access at the moment, but maybe something like 10-100M of them (not sure how redundant the set is). The values are all lists of pairs, where each pair is a (string, int). The pair strings are of length around 30, and drawn from a "small" fixed set of around 60K strings (). As mentioned previously, I think the ints are drawn pretty uniformly from something like range(10000). The length of the lists depends on the redundancy of the key set, but I think there are around 100-200M pairs total, for the entire dict. (If you're curious about the application domain, see 'http://greylag.org'.) > Have you seen any significant difference in the exit time when the > cyclic GC is disabled or enabled? Unfortunately, with GC enabled, the application is too slow to be useful, because of the greatly increased time for dict creation. I suppose it's theoretically possible that with this increased time, the long time for exit will look less bad by comparison, but I'd be surprised if it makes any difference at all. I'm confident that there are no loops in this dict, and nothing for cyclic gc to collect. Mike From solipsis at pitrou.net Sun Dec 21 02:18:52 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Dec 2008 01:18:52 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?extremely_slow_exit_for_program_having_hug?= =?utf-8?b?ZSAoNDVHKQlkaWN0IChweXRob24gMi41LjIp?= References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> Message-ID: Mike Coleman gmail.com> writes: > > The 45G was the VM size of the process (resident size was similar). Can you reproduce it with a smaller working set? Something between 1 and 2GB, possibly randomly-generated, and post both the generation script and the problematic script on the bug tracker? From musiccomposition at gmail.com Sun Dec 21 04:25:16 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sat, 20 Dec 2008 21:25:16 -0600 Subject: [Python-Dev] 2.6.1 documentation not available for download In-Reply-To: <200812210002.05587.Arfrever.FTA@gmail.com> References: <200812202328.20045.Arfrever.FTA@gmail.com> <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com> <200812210002.05587.Arfrever.FTA@gmail.com> Message-ID: <1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com> On Sat, Dec 20, 2008 at 5:02 PM, Arfrever Frehtes Taifersar Arahesis wrote: > 2008-12-20 23:46:15 Benjamin Peterson napisa?(a): >> On Sat, Dec 20, 2008 at 4:28 PM, Arfrever Frehtes Taifersar Arahesis >> wrote: >> > Python 2.6.1 documentation currently isn't available for download at: >> > http://docs.python.org/ftp/python/doc/ >> >> It is avaiable here, though: >> >> http://www.python.org/ftp/python/doc/current/ > > I need documentation created from the 'r261' tag, not from the HEAD of > the 'release26-maint' branch. I've made documentation for 2.6.1 now. It's at http://www.python.org/ftp/python/doc/2.6.1 > -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From jeremy at alum.mit.edu Sun Dec 21 05:21:59 2008 From: jeremy at alum.mit.edu (Jeremy Hylton) Date: Sat, 20 Dec 2008 23:21:59 -0500 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> Message-ID: 4631 should be a release blocker. I'll have a bit of time on Monday and Tuesday to wrap it up. Jeremy On Fri, Dec 19, 2008 at 5:28 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > I'd like to get Python 3.0.1 out before the end of the year. There are no > showstoppers, but I haven't yet looked at the deferred blockers or the > buildbots. > > Do you think we can get 3.0.1 out on December 24th? Or should we wait until > after Christmas and get it out, say on the 29th? Do we need an rc? > > This question goes mostly to Martin and Georg. What would work for you > guys? > > - -Barry > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (Darwin) > > iQCVAwUBSUwgEXEjvBPtnXfVAQIthgP7BDS6xfBHhADKc50ANvZ5aAfWhGSU9GH/ > DR+IRduVmvosu9gm92hupCOaLCN4IbtyFx27A8LQuPNVc4BVrhWfDKDSzpxO2MJu > xLJntkF2BRWODSbdrLGdZ6H6WDT0ZAhn6ZjlWXwxhGxQ5FwEJb7moMuY7jAIEeor > 5n6Ag5zT+e8= > =oU/g > -----END PGP SIGNATURE----- > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/jeremy%40alum.mit.edu > From andymac at bullseye.apana.org.au Sun Dec 21 07:30:29 2008 From: andymac at bullseye.apana.org.au (Andrew MacIntyre) Date: Sun, 21 Dec 2008 17:30:29 +1100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> Message-ID: <494DE285.6040301@bullseye.andymac.org> Mike Coleman wrote: > Andrew, this is on an (intel) x86_64 box with 64GB of RAM. I don't > recall the maker or details of the architecture off the top of my > head, but it would be something "off the rack" from Dell or maybe HP. > There were other users on the box at the time, but nothing heavy or > that gave me any reason to think was affecting my program. > > It's running CentOS 5 I think, so that might make glibc several years > old. Your malloc idea sounds plausible to me. If it is a libc > problem, it would be nice if there was some way we could tell malloc > to "live for today because there is no tomorrow" in the terminal phase > of the program. > > I'm not sure exactly how to attack this. Callgrind is cool, but no > way will work on something this size. Timed ltrace output might be > interesting. Or maybe a gprof'ed Python, though that's more work. Some malloc()s (notably FreeBSD's) can be externally tuned at runtime via options in environment variables or other mechanisms - the malloc man page on your system might be helpful if your platform has something like this. It is likely that PyMalloc would be better with a way to disable the free()ing of empty arenas, or move to an arrangement where (like the various type free-lists in 2.6+) explicit action can force pruning of empty arenas - there are other usage patterns than yours which would benefit (performance wise) from not freeing arenas automatically. -- ------------------------------------------------------------------------- Andrew I MacIntyre "These thoughts are mine alone..." E-mail: andymac at bullseye.apana.org.au (pref) | Snail: PO Box 370 andymac at pcug.org.au (alt) | Belconnen ACT 2616 Web: http://www.andymac.org/ | Australia From yinon.me at gmail.com Sun Dec 21 10:19:39 2008 From: yinon.me at gmail.com (Yinon Ehrlich) Date: Sun, 21 Dec 2008 11:19:39 +0200 Subject: [Python-Dev] os.defpath for Windows Message-ID: <494E0A2B.4080704@gmail.com> Hi, just saw that os.defpath for Windows is defined as Lib/ntpath.py:30:defpath = '.;C:\\bin' Most Windows machines I saw has no c:\bin directory. Any reason why it was defined this way ? Thanks, Yinon From martin at v.loewis.de Sun Dec 21 10:46:46 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 21 Dec 2008 10:46:46 +0100 Subject: [Python-Dev] 2.6.1 documentation not available for download In-Reply-To: <1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com> References: <200812202328.20045.Arfrever.FTA@gmail.com> <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com> <200812210002.05587.Arfrever.FTA@gmail.com> <1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com> Message-ID: <494E1086.5030608@v.loewis.de> > I've made documentation for 2.6.1 now. It's at > http://www.python.org/ftp/python/doc/2.6.1 In previous releases (back to 1.2), these files had version numbers in them. It would be good if those could be added for the more recent documentation sets as well. Regards, Martin From martin at v.loewis.de Sun Dec 21 10:48:13 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sun, 21 Dec 2008 10:48:13 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494DE285.6040301@bullseye.andymac.org> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494CC400.7070404@bullseye.andymac.org> <3c6c07c20812200909kae56c35wbb4a7bc9fe6b40e4@mail.gmail.com> <494DE285.6040301@bullseye.andymac.org> Message-ID: <494E10DD.2000505@v.loewis.de> > It is likely that PyMalloc would be better with a way to disable the > free()ing of empty arenas, or move to an arrangement where (like the > various type free-lists in 2.6+) explicit action can force pruning of > empty arenas - there are other usage patterns than yours which would > benefit (performance wise) from not freeing arenas automatically. Before such a mechanism is added, I'd like to establish for a fact that this is an actual problem. Regards, Martin From fabiofz at gmail.com Sun Dec 21 12:28:39 2008 From: fabiofz at gmail.com (Fabio Zadrozny) Date: Sun, 21 Dec 2008 09:28:39 -0200 Subject: [Python-Dev] Can't have unbuffered text I/O in Python 3.0? In-Reply-To: References: <0016e64f68207a52a5045e6de625@google.com> Message-ID: >> It appears that this bug was already reported: http://bugs.python.org/issue4705 >> >> Any chance that it gets in the next 3.0.x bugfix release? >> >> Just as a note, if I do: sys.stdout._line_buffering = True, it also >> works, but doesn't seem right as it's accessing an internal attribute. >> >> Note 2: the solution that said to pass 'wb' does not work, because I >> need the output as text and not binary or text becomes garbled when >> it's not ascii. >> > > Can't you decode the bytes after you receive them? > Well, in short, no (long answer is that I probably could if I spent a long time doing my own console instead of relying on what's already done and working in Eclipse for all the current available languages it supports, but that just doesn't seem right). Also, it's seems easily solvable (enabling line buffering for the python streams when -u is passed) in the Python side... My current workaround is doing that on a custom site-initialization when a Python 3 interpreter is found, but I find that this is not the right way for doing it, and it really feels like a Python bug. -- Fabio From dima at hlabs.spb.ru Sun Dec 21 12:56:31 2008 From: dima at hlabs.spb.ru (Dmitry Vasiliev) Date: Sun, 21 Dec 2008 14:56:31 +0300 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: <16D50043-22B0-4711-BE91-E752953444EA@python.org> References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> <494C2369.5030901@gmail.com> <16D50043-22B0-4711-BE91-E752953444EA@python.org> Message-ID: <494E2EEF.3080207@hlabs.spb.ru> Barry Warsaw wrote: > Thanks. I've bumped that to release blocker for now. If there are any > other 'high' bugs that you want considered for 3.0.1, please make the > release blockers too, for now. I think wsgiref package needs to be fixed. For now it's totally broken. I've already found 4 issues about that: http://bugs.python.org/issue3348 http://bugs.python.org/issue3401 http://bugs.python.org/issue3795 http://bugs.python.org/issue4522 What needs to be fixed: 1. Headers handling in wsgiref.simple_server. Not so hard actually - in a few places headers expected as a list object instead of a dict. 2. wsgiref.handlers should support bytes instead of str. I think WSGI applications must return bytes as a result but we can allow Unicode strings in start_response() because the resulting encoding for headers is known and strings can be safely encoded. So the fix won't be so hard too - few asserts needs to be fixed and headers output needs to be directed through auxiliary encoding method. 3. Tests 4. Documentation examples. I can create the patch before December 24th if needed. -- Dmitry Vasiliev http://hlabs.spb.ru From musiccomposition at gmail.com Sun Dec 21 17:37:03 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sun, 21 Dec 2008 10:37:03 -0600 Subject: [Python-Dev] 2.6.1 documentation not available for download In-Reply-To: <494E1086.5030608@v.loewis.de> References: <200812202328.20045.Arfrever.FTA@gmail.com> <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com> <200812210002.05587.Arfrever.FTA@gmail.com> <1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com> <494E1086.5030608@v.loewis.de> Message-ID: <1afaf6160812210837t23788b40jd53f3eaaf8674244@mail.gmail.com> On Sun, Dec 21, 2008 at 3:46 AM, "Martin v. L?wis" wrote: > In previous releases (back to 1.2), these files had version > numbers in them. It would be good if those could be added for > the more recent documentation sets as well. I agree that adding version numbers would be nice, but I'm also afraid of breaking people's automatic downloads of the documentation. Perhaps add symlinks? -- Cheers, Benjamin Peterson "There's nothing quite as beautiful as an oboe... except a chicken stuck in a vacuum cleaner." From stijn.deweirdt at ugent.be Sun Dec 21 17:35:38 2008 From: stijn.deweirdt at ugent.be (Stijn De Weirdt) Date: Sun, 21 Dec 2008 17:35:38 +0100 Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2 Message-ID: <1229877338.14751.34.camel@spike.ugent.be> hi all, i'm trying to build python 2.5.3 on centos5.2 x86_64 (base gcc is 4.1.2) output of env, configure, make -j and make test at http://users.ugent.be/~stdweird/python-gcc-seg.tar.gz this all seems ok (at least to me ;) but the following code gives a segfault instead of an IOerror fname='test123' f=open(fname,'w') f.read() (test123 doesn't exists. it is a reduced problem from a scipy unittest). with system python (2.4.3) i get: IOError: [Errno 9] Bad file descriptor any hints what might cause this (or how i can figure it out). i have a coredump, but have no clue what to look for. many thanks, stijn -- The system will shutdown in 5 minutes. From skip at pobox.com Sun Dec 21 18:53:18 2008 From: skip at pobox.com (skip at pobox.com) Date: Sun, 21 Dec 2008 11:53:18 -0600 Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2 In-Reply-To: <1229877338.14751.34.camel@spike.ugent.be> References: <1229877338.14751.34.camel@spike.ugent.be> Message-ID: <18766.33422.496164.601910@montanaro-dyndns-org.local> Stijn> any hints what might cause this (or how i can figure it out). i Stijn> have a coredump, but have no clue what to look for. I can reproduce it on my Mac. The croak happens while it is attempting to raise the exception about a bad file descriptor. Unfortunately, in PyErr_Restore the call to PyThreadState_GET() returns NULL which means that _PyThreadState_Current is NULL. I see no differences between pystate.[ch] in the 2.5 and 2.6 branches. There must be something different about the way PyThreadState_Swap or PyThreadState_DeleteCurrent are used. Those are the only two routines which appear to set it. Did this not happen with 2.5.2? -- Skip Montanaro - skip at pobox.com - http://smontanaro.dyndns.org/ From martin at v.loewis.de Sun Dec 21 18:57:38 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 21 Dec 2008 18:57:38 +0100 Subject: [Python-Dev] 2.6.1 documentation not available for download In-Reply-To: <1afaf6160812210837t23788b40jd53f3eaaf8674244@mail.gmail.com> References: <200812202328.20045.Arfrever.FTA@gmail.com> <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com> <200812210002.05587.Arfrever.FTA@gmail.com> <1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com> <494E1086.5030608@v.loewis.de> <1afaf6160812210837t23788b40jd53f3eaaf8674244@mail.gmail.com> Message-ID: <494E8392.40102@v.loewis.de> > I agree that adding version numbers would be nice, but I'm also afraid > of breaking people's automatic downloads of the documentation. Perhaps > add symlinks? For the releases that have been made, yes (or, actually, hard links would work as well). For the releases yet to come, it would be good if the release process created version-numbered files. Regards, Martin From lists at cheimes.de Sun Dec 21 19:27:29 2008 From: lists at cheimes.de (Christian Heimes) Date: Sun, 21 Dec 2008 19:27:29 +0100 Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2 In-Reply-To: <18766.33422.496164.601910@montanaro-dyndns-org.local> References: <1229877338.14751.34.camel@spike.ugent.be> <18766.33422.496164.601910@montanaro-dyndns-org.local> Message-ID: skip at pobox.com schrieb: > Stijn> any hints what might cause this (or how i can figure it out). i > Stijn> have a coredump, but have no clue what to look for. > > I can reproduce it on my Mac. The croak happens while it is attempting to > raise the exception about a bad file descriptor. Unfortunately, in > PyErr_Restore the call to PyThreadState_GET() returns NULL which means that > _PyThreadState_Current is NULL. I see no differences between pystate.[ch] > in the 2.5 and 2.6 branches. There must be something different about the > way PyThreadState_Swap or PyThreadState_DeleteCurrent are used. Those are > the only two routines which appear to set it. > > Did this not happen with 2.5.2? Wild guess: the bug might be related to http://bugs.python.org/issue1683. From the top of my head it's the only major change in the thread state code that I can recall. Christian From rhamph at gmail.com Sun Dec 21 19:44:12 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 21 Dec 2008 11:44:12 -0700 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> Message-ID: On Sat, Dec 20, 2008 at 6:09 PM, Mike Coleman wrote: > On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti >> Have you seen any significant difference in the exit time when the >> cyclic GC is disabled or enabled? > > Unfortunately, with GC enabled, the application is too slow to be > useful, because of the greatly increased time for dict creation. I > suppose it's theoretically possible that with this increased time, the > long time for exit will look less bad by comparison, but I'd be > surprised if it makes any difference at all. I'm confident that there > are no loops in this dict, and nothing for cyclic gc to collect. Try putting an explicit gc.collect() at the end, with the usual timestamps before and after. After that try deleting your dict, then calling gc.collect(), with timestamps throughout. -- Adam Olsen, aka Rhamphoryncus From musiccomposition at gmail.com Sun Dec 21 19:54:07 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sun, 21 Dec 2008 12:54:07 -0600 Subject: [Python-Dev] 2.6.1 documentation not available for download In-Reply-To: <494E8392.40102@v.loewis.de> References: <200812202328.20045.Arfrever.FTA@gmail.com> <1afaf6160812201446r55c93eb6s6702e65611d11bcf@mail.gmail.com> <200812210002.05587.Arfrever.FTA@gmail.com> <1afaf6160812201925l43bd765at102379a1d81e951d@mail.gmail.com> <494E1086.5030608@v.loewis.de> <1afaf6160812210837t23788b40jd53f3eaaf8674244@mail.gmail.com> <494E8392.40102@v.loewis.de> Message-ID: <1afaf6160812211054x1f6fba0eg8336280eaeba0284@mail.gmail.com> On Sun, Dec 21, 2008 at 11:57 AM, "Martin v. L?wis" wrote: >> I agree that adding version numbers would be nice, but I'm also afraid >> of breaking people's automatic downloads of the documentation. Perhaps >> add symlinks? > > For the releases that have been made, yes (or, actually, hard links > would work as well). For the releases yet to come, it would be good > if the release process created version-numbered files. Ok. I will add hardlinks for past releases and modify the Doc/Makefile to add version numbers. -- Regards, Benjamin Peterson From scott+python-dev at scottdial.com Sun Dec 21 20:33:51 2008 From: scott+python-dev at scottdial.com (Scott Dial) Date: Sun, 21 Dec 2008 14:33:51 -0500 Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2 In-Reply-To: <18766.33422.496164.601910@montanaro-dyndns-org.local> References: <1229877338.14751.34.camel@spike.ugent.be> <18766.33422.496164.601910@montanaro-dyndns-org.local> Message-ID: <494E9A1F.80808@scottdial.com> skip at pobox.com wrote: > Did this not happen with 2.5.2? I have 2.5.1 and 2.5.2 and it produces an IOError, just as it should. So this was indeed introduced by 2.5.3. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From scott+python-dev at scottdial.com Sun Dec 21 21:57:29 2008 From: scott+python-dev at scottdial.com (Scott Dial) Date: Sun, 21 Dec 2008 15:57:29 -0500 Subject: [Python-Dev] python 2.5.3 segmentation fault with gcc 4.1.2 In-Reply-To: <1229877338.14751.34.camel@spike.ugent.be> References: <1229877338.14751.34.camel@spike.ugent.be> Message-ID: <494EADB9.4020408@scottdial.com> Stijn De Weirdt wrote: > but the following code gives a segfault instead of an IOerror > fname='test123' > f=open(fname,'w') > f.read() I've tracked this down to r67740: """ Issue #1706039: Support continued reading from a file even after EOF was hit. """ Looking at the diff, I question the correctness of this patch. I believe the actual issue is the Py_UniversalNewlineFread() was changed to make calls to PyErr_SetFromErrno(), but then these calls occur within an ALLOW_THREADS block. I was going to try to make a new patch, but the test case that was added for it succeeded *before* the patch was applied (I reverted fileobject.c to r67739) on many platforms. I don't have access to a platform which exhibits the problem described in the tracker. Reading people's assessment, I *think* the correct patch is merely to add a call to clearerr() just before calling fread() in each function (to clear the EOF flag before performing the fread()). I don't really understand what the point of all the other changes are in the diff. I can't test my assessment because it seems the only platform discussed that had a problem was OS X (and I don't have one of those). -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From martin at v.loewis.de Mon Dec 22 11:06:10 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Dec 2008 11:06:10 +0100 Subject: [Python-Dev] Releasing 2.5.4 Message-ID: <494F6692.8000001@v.loewis.de> It seems r67740 shouldn't have been committed. Since this is a severe regression, I think I'll have to revert it, and release 2.5.4 with just that change. Unless I hear otherwise, I would release Python 2.5.4 (without a release candidate) tomorrow. Regards, Martin From mal at egenix.com Mon Dec 22 13:20:59 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 22 Dec 2008 13:20:59 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494D6EA9.2040201@v.loewis.de> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> Message-ID: <494F862B.60701@egenix.com> On 2008-12-20 23:16, Martin v. L?wis wrote: >>> I will try next week to see if I can come up with a smaller, >>> submittable example. Thanks. >> These long exit times are usually caused by the garbage collection >> of objects. This can be a very time consuming task. > > I doubt that. The long exit times are usually caused by a bad > malloc implementation. With "garbage collection" I meant the process of Py_DECREF'ing the objects in large containers or deeply nested structures, not the GC mechanism for breaking circular references in Python. This will usually also involve free() calls, so the malloc implementation affects this as well. However, I've seen such long exit times on Linux and Windows, which both have rather good malloc implementations. I don't think there's anything much we can do about it at the interpreter level. Deleting millions of objects takes time and that's not really surprising at all. It takes even longer if you have instances with .__del__() methods written in Python. Applications can choose other mechanisms for speeding up the exit process in various (less clean) ways, if they have a need for this. BTW: Rather than using a huge in-memory dict, I'd suggest to either use an on-disk dictionary such as the ones found in mxBeeBase or a database. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 22 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ggpolo at gmail.com Mon Dec 22 13:45:36 2008 From: ggpolo at gmail.com (Guilherme Polo) Date: Mon, 22 Dec 2008 10:45:36 -0200 Subject: [Python-Dev] [capi-sig] Exceptions with additional instance variables In-Reply-To: References: Message-ID: On Mon, Dec 22, 2008 at 10:06 AM, wrote: > On Mon, Dec 22, 2008 at 03:29, Guilherme Polo wrote: >> On Sun, Dec 21, 2008 at 11:02 PM, wrote: >>> Hello, >>> >>> I'm trying to implement custom exception that have to carry some >>> useful info by means of instance members, to be used like: >>> >>> try: >>> // some code >>> except MyException, data: >>> // use data.errorcode, data.errorcategory, data.errorlevel, >>> data.errormessage and some others >>> >>> The question is - how to implement the instance variables with >>> PyErr_NewException? >> >> Using PyErr_NewException is fine. You must understand that an >> exception is a class, and thus PyErr_NewException creates one for you >> and returns it. >> Just like you would do with a class that has __dict__, set some >> attributes to what you want. That is, use PyObject_SetAttrString or >> something more appropriated for you. > > Ok so I did the following. In init function (forget refcounting and > error checking for a moment ;-) > > PyObject *dict = PyDict_New(); > PyDict_SetItemString(dict, "errorcode", PyInt_FromLong(0)); > static PyObject *myexception = > PyErr_NewException("module.MyException", NULL, dict); You do not really have to create a dict here, one will be created for you if you pass a NULL there. > PyModule_AddObject(module, "MyException", myexception); > > It worked more or less as expected, the help shown: > > | ---------------------------------------------------------------------- > | Data and other attributes defined here: > | > | errorcode = 0 > | > | ---------------------------------------------------------------------- > > Then I did the following when raising the exception: > > PyObject_SetAttrString(myexception, "errorcode", PyInt_FromLong(111)); > PyErr_SetString(myexception, "Bad thing happened"); > return NULL; > > and the test code was: > try: > do_bad_thing(); > except MyException, data: > > and you surely already guessed it -- data.errorcode was 0.... Not only > that, module.MyException.errorcode was also 0... > > What I'm doing wrong? I certainly don't get the idea of exceptions in > Python, especially what is being raised - a class or an instance? There are two forms raise can take, both will end up involving a class and a intsance. > If > the latter - how's the class instantiated? You can call a class to instantiate it. > If not - what about values > in different threads? The docs are so vague about that... > > > Thanks again in advance, > Chojrak > Again, an exception is a class, so you could create a new type in C, and do anything you wanted. But you probably don't want to create a new type to achieve this, so there are two simple ways I'm going to paste below: #include "Python.h" static PyObject *MyErr; static PyMethodDef module_methods[] = { {"raise_test", (PyCFunction)raise_test, METH_NOARGS, NULL}, {NULL}, }; PyMODINIT_FUNC initfancy_exc(void) { PyObject *m; m = Py_InitModule("fancy_exc", module_methods); if (m == NULL) return; MyErr = PyErr_NewException("fancy_exc.err", NULL, NULL); Py_INCREF(MyErr); if (PyModule_AddObject(m, "err", MyErr) < 0) return; } the raise_test function is missing, pick one of these: static PyObject * raise_test(PyObject *self) { PyObject_SetAttrString(MyErr, "code", PyInt_FromLong(42)); PyObject_SetAttrString(MyErr, "category", PyString_FromString("nice one")); PyErr_SetString(MyErr, "All is good, I hope"); return NULL; } or static PyObject * raise_test(PyObject *self) { PyObject *t = PyTuple_New(3); PyTuple_SetItem(t, 0, PyString_FromString("error message")); PyTuple_SetItem(t, 1, PyInt_FromLong(10)); PyTuple_SetItem(t, 2, PyString_FromString("category name here")); PyErr_SetObject(MyErr, t); Py_DECREF(t); return NULL; } In this second form you check for the args attribute of the exception. -- -- Guilherme H. Polo Goncalves From ggpolo at gmail.com Mon Dec 22 13:48:46 2008 From: ggpolo at gmail.com (Guilherme Polo) Date: Mon, 22 Dec 2008 10:48:46 -0200 Subject: [Python-Dev] [capi-sig] Exceptions with additional instance variables In-Reply-To: References: Message-ID: On Mon, Dec 22, 2008 at 10:45 AM, Guilherme Polo wrote: > On Mon, Dec 22, 2008 at 10:06 AM, wrote: >> On Mon, Dec 22, 2008 at 03:29, Guilherme Polo wrote: >>> On Sun, Dec 21, 2008 at 11:02 PM, wrote: >>>> Hello, >>>> >>>> I'm trying to implement custom exception that have to carry some >>>> useful info by means of instance members, to be used like: >>>> >>>> try: >>>> // some code >>>> except MyException, data: >>>> // use data.errorcode, data.errorcategory, data.errorlevel, >>>> data.errormessage and some others >>>> >>>> The question is - how to implement the instance variables with >>>> PyErr_NewException? >>> >>> Using PyErr_NewException is fine. You must understand that an >>> exception is a class, and thus PyErr_NewException creates one for you >>> and returns it. >>> Just like you would do with a class that has __dict__, set some >>> attributes to what you want. That is, use PyObject_SetAttrString or >>> something more appropriated for you. >> >> Ok so I did the following. In init function (forget refcounting and >> error checking for a moment ;-) >> >> PyObject *dict = PyDict_New(); >> PyDict_SetItemString(dict, "errorcode", PyInt_FromLong(0)); >> static PyObject *myexception = >> PyErr_NewException("module.MyException", NULL, dict); > > You do not really have to create a dict here, one will be created for > you if you pass a NULL there. > >> PyModule_AddObject(module, "MyException", myexception); >> >> It worked more or less as expected, the help shown: >> >> | ---------------------------------------------------------------------- >> | Data and other attributes defined here: >> | >> | errorcode = 0 >> | >> | ---------------------------------------------------------------------- >> >> Then I did the following when raising the exception: >> >> PyObject_SetAttrString(myexception, "errorcode", PyInt_FromLong(111)); >> PyErr_SetString(myexception, "Bad thing happened"); >> return NULL; >> >> and the test code was: >> try: >> do_bad_thing(); >> except MyException, data: >> >> and you surely already guessed it -- data.errorcode was 0.... Not only >> that, module.MyException.errorcode was also 0... >> >> What I'm doing wrong? I certainly don't get the idea of exceptions in >> Python, especially what is being raised - a class or an instance? > > There are two forms raise can take, both will end up involving a class > and a intsance. > >> If >> the latter - how's the class instantiated? > > You can call a class to instantiate it. > >> If not - what about values >> in different threads? The docs are so vague about that... >> >> >> Thanks again in advance, >> Chojrak >> > > Again, an exception is a class, so you could create a new type in C, > and do anything you wanted. But you probably don't want to create a > new type to achieve this By creating a type I mean one that involves defining a tp_init, and everything else your type needs, not about the simple one created by PyErr_NewException. > , so there are two simple ways I'm going to > paste below: > > #include "Python.h" > > static PyObject *MyErr; > > static PyMethodDef module_methods[] = { > {"raise_test", (PyCFunction)raise_test, METH_NOARGS, NULL}, > {NULL}, > }; > > PyMODINIT_FUNC > initfancy_exc(void) > { > PyObject *m; > > m = Py_InitModule("fancy_exc", module_methods); > if (m == NULL) > return; > > MyErr = PyErr_NewException("fancy_exc.err", NULL, NULL); > > Py_INCREF(MyErr); > if (PyModule_AddObject(m, "err", MyErr) < 0) > return; > } > > the raise_test function is missing, pick one of these: > > static PyObject * > raise_test(PyObject *self) > { > PyObject_SetAttrString(MyErr, "code", PyInt_FromLong(42)); > PyObject_SetAttrString(MyErr, "category", PyString_FromString("nice one")); > PyErr_SetString(MyErr, "All is good, I hope"); > return NULL; > } > > or > > static PyObject * > raise_test(PyObject *self) > { > > PyObject *t = PyTuple_New(3); > PyTuple_SetItem(t, 0, PyString_FromString("error message")); > PyTuple_SetItem(t, 1, PyInt_FromLong(10)); > PyTuple_SetItem(t, 2, PyString_FromString("category name here")); > PyErr_SetObject(MyErr, t); > Py_DECREF(t); > return NULL; > } > > In this second form you check for the args attribute of the exception. > > -- > -- Guilherme H. Polo Goncalves > -- -- Guilherme H. Polo Goncalves From skip at pobox.com Mon Dec 22 15:35:25 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 22 Dec 2008 08:35:25 -0600 Subject: [Python-Dev] Releasing 2.5.4 In-Reply-To: <494F6692.8000001@v.loewis.de> References: <494F6692.8000001@v.loewis.de> Message-ID: <18767.42413.194130.39594@montanaro-dyndns-org.local> Martin> It seems r67740 shouldn't have been committed. Since this is a Martin> severe regression, I think I'll have to revert it, and release Martin> 2.5.4 with just that change. Martin> Unless I hear otherwise, I would release Python 2.5.4 (without a Martin> release candidate) tomorrow. I don't think there is a test case which fails with it applied and passes with it removed. If not, I think it might be worthwhile to write such a test even if it's used temporarily just to test the change. I wrote a trivial test case: Index: Lib/test/test_file.py =================================================================== --- Lib/test/test_file.py (revision 67899) +++ Lib/test/test_file.py (working copy) @@ -116,6 +116,8 @@ except: self.assertEquals(self.f.__exit__(*sys.exc_info()), None) + def testReadWhenWriting(self): + self.assertRaises(IOError, self.f.read) class OtherFileTests(unittest.TestCase): which segfaults (on Solaris 10 at least) when run with the 2.5.3 released code and which passes after I undo r67740. Should we add this to the active branches (2.6, trunk, py3k, 3.0)? Skip From martin at v.loewis.de Mon Dec 22 15:52:59 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Dec 2008 15:52:59 +0100 Subject: [Python-Dev] Releasing 2.5.4 In-Reply-To: <18767.42413.194130.39594@montanaro-dyndns-org.local> References: <494F6692.8000001@v.loewis.de> <18767.42413.194130.39594@montanaro-dyndns-org.local> Message-ID: <494FA9CB.4010802@v.loewis.de> > Should we add this to the active branches (2.6, trunk, py3k, 3.0)? Sure! Go ahead. For 2.5.3, I'd rather not add an additional test case, but merely revert the patch. Regards, Martin From fdrake at acm.org Mon Dec 22 15:39:18 2008 From: fdrake at acm.org (Fred Drake) Date: Mon, 22 Dec 2008 09:39:18 -0500 Subject: [Python-Dev] Releasing 2.5.4 In-Reply-To: <18767.42413.194130.39594@montanaro-dyndns-org.local> References: <494F6692.8000001@v.loewis.de> <18767.42413.194130.39594@montanaro-dyndns-org.local> Message-ID: On Dec 22, 2008, at 9:35 AM, skip at pobox.com wrote: > I don't think there is a test case which fails with it applied and > passes > with it removed. If not, I think it might be worthwhile to write > such a > test even if it's used temporarily just to test the change. I wrote a > trivial test case: If this is sufficient to drive a release, then whatever test there is should be part of the release as well. -Fred -- Fred Drake From barry at python.org Mon Dec 22 17:15:27 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 22 Dec 2008 11:15:27 -0500 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: <494C5C06.30109@v.loewis.de> References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> <494C5C06.30109@v.loewis.de> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 19, 2008, at 9:44 PM, Martin v. L?wis wrote: >> Do you think we can get 3.0.1 out on December 24th? > > I won't have physical access to my build machine from December 24th to > January 3rd. Okay. Let's just push it until after the new year then. In the mean time, please continue to work on fixes for 3.0.1. I'm thinking tentatively to do a release the week of January 5th. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSU+9IHEjvBPtnXfVAQL0vQQAmxcMDP1GUuhCOxCVHqnSGaywdG1mz3f0 iNCNs4lVsRLYV/AVdf/tbpWyLbcUvFL0hUyLDp8PCScOjZReKwe6VpnujL/BwU5E 4P7RtUn493QGqkFJDjHNJ2SIcxOfzk9Y7E3qyS0QHPmsqmNpSD6ZQQd0PkdCoqQo f08Z9HrKZZw= =ujaK -----END PGP SIGNATURE----- From barry at python.org Mon Dec 22 17:16:19 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 22 Dec 2008 11:16:19 -0500 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: <494E2EEF.3080207@hlabs.spb.ru> References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> <494C2369.5030901@gmail.com> <16D50043-22B0-4711-BE91-E752953444EA@python.org> <494E2EEF.3080207@hlabs.spb.ru> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 21, 2008, at 6:56 AM, Dmitry Vasiliev wrote: > Barry Warsaw wrote: >> Thanks. I've bumped that to release blocker for now. If there are >> any >> other 'high' bugs that you want considered for 3.0.1, please make the >> release blockers too, for now. > > I think wsgiref package needs to be fixed. For now it's totally > broken. > I've already found 4 issues about that: > > http://bugs.python.org/issue3348 > http://bugs.python.org/issue3401 > http://bugs.python.org/issue3795 > http://bugs.python.org/issue4522 Please make sure these issues are release blockers. Fixes before January 5th would be able to make it into 3.0.1. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSU+9U3EjvBPtnXfVAQII5wP+M9tyL169XMIwoibqupyPErAjHNL+zWD1 wydak1MKc/gF6KvSFfs9t6uuI3p8GI42dNxeHXIXsCb1he16YfUgu7xG210ZJ9C3 YkDcr6vDDMYUvMI8XdVJGh9ASnQhrQRiyMI/TtiJTh16t3wnn78EH2F2IyrYcDrD 0xaKQjaK1+k= =t6EL -----END PGP SIGNATURE----- From solipsis at pitrou.net Mon Dec 22 17:38:24 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Dec 2008 16:38:24 +0000 (UTC) Subject: [Python-Dev] Python 3.0.1 References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> <494C2369.5030901@gmail.com> <16D50043-22B0-4711-BE91-E752953444EA@python.org> <494E2EEF.3080207@hlabs.spb.ru> Message-ID: Barry Warsaw python.org> writes: > > Please make sure these issues are release blockers. Fixes before > January 5th would be able to make it into 3.0.1. Should http://bugs.python.org/issue4486 be a release blocker as well? (I don't think so, but...) From barry at python.org Mon Dec 22 17:59:44 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 22 Dec 2008 11:59:44 -0500 Subject: [Python-Dev] Python 3.0.1 In-Reply-To: References: <920AFFA0-E692-4169-AA4C-B3176596D2F6@python.org> <494C2369.5030901@gmail.com> <16D50043-22B0-4711-BE91-E752953444EA@python.org> <494E2EEF.3080207@hlabs.spb.ru> Message-ID: <99E28236-03DE-4A21-93BF-B94B3114A6DE@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Dec 22, 2008, at 11:38 AM, Antoine Pitrou wrote: > Barry Warsaw python.org> writes: >> >> Please make sure these issues are release blockers. Fixes before >> January 5th would be able to make it into 3.0.1. > > Should http://bugs.python.org/issue4486 be a release blocker as well? > (I don't think so, but...) I don't think so either. It would be nice to have, but it needn't hold up the release. - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSU/HgnEjvBPtnXfVAQKzcAP+NThqngryODxF/bKpeMs/EhpjfI9HV4eC Lul5LMocaxEe91ontMjhfnZQo6Tx/jJCGECzVLCLXVmrjKg7/d6/9TFEByc9OWFm zODpRvQ+4u+jd8c8DcBQmEwuFJF4MQZ5x6SUP8HxRTLmWq1KMcGM5WTNHCxMoOVw Gkg8JmknqjM= =6teE -----END PGP SIGNATURE----- From tutufan at gmail.com Mon Dec 22 19:01:56 2008 From: tutufan at gmail.com (Mike Coleman) Date: Mon, 22 Dec 2008 12:01:56 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> Message-ID: <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> Thanks for all of the useful suggestions. Here are some preliminary results. With still gc.disable(), at the end of the program I first did a gc.collect(), which took about five minutes. (So, reason enough not to gc.enable(), at least without Antoine's patch.) After that, I did a .clear() on the huge dict. That's where the time is being spent. Doing the suggested "poor man's profiling" (repeated backtraces via gdb), for 20 or so samples, one is within libc free, but all of the rest are in the same place (same source line) within PyObjectFree (see below), sometimes within list_dealloc and sometimes within tuple_dealloc. So, apparently a lot of time is being spent in this loop: /* Case 3: We have to move the arena towards the end * of the list, because it has more free pools than * the arena to its right. ... /* Locate the new insertion point by iterating over * the list, using our nextarena pointer. */ while (ao->nextarena != NULL && nf > ao->nextarena->nfreepools) { ao->prevarena = ao->nextarena; ao->nextarena = ao->nextarena->nextarena; } Investigating further, from one stop, I used gdb to follow the chain of pointers in the nextarena and prevarena directions. There were 5449 and 112765 links, respectively. maxarenas is 131072. Sampling nf at different breaks gives values in the range(10,20). This loop looks like an insertion sort. If it's the case that only a "few" iterations are ever needed for any given free, this might be okay--if not, it would seem that this must be quadratic. I attempted to look further by setting a silent break with counter within the loop and another break after the loop to inspect the counter, but gdb's been buzzing away on that for 40 minutes without coming back. That might mean that there are a lot of passes through this loop per free (i.e., that gdb is taking a long time to process 100,000 silent breaks), or perhaps I've made a mistake, or gdb isn't handling this well. In any case, this looks like the problem locus. It's tempting to say "don't do this arena ordering optimization if we're doing final cleanup", but really the program could have done this .clear() at any point. Maybe there needs to be a flag to disable it altogether? Or perhaps there's a smarter way to manage the list of arena/free pool info. Mike Program received signal SIGINT, Interrupt. 0x00000000004461dc in PyObject_Free (p=0x5ec043db0) at Objects/obmalloc.c:1064 1064 while (ao->nextarena != NULL && (gdb) bt #0 0x00000000004461dc in PyObject_Free (p=0x5ec043db0) at Objects/obmalloc.c:1064 #1 0x0000000000433478 in list_dealloc (op=0x5ec043dd0) at Objects/listobject.c:281 #2 0x000000000044075b in PyDict_Clear (op=0x74c7cd0) at Objects/dictobject.c:757 #3 0x00000000004407b9 in dict_clear (mp=0x5ec043db0) at Objects/dictobject.c:1776 #4 0x0000000000485905 in PyEval_EvalFrameEx (f=0x746ca50, throwflag=) at Python/ceval.c:3557 #5 0x000000000048725f in PyEval_EvalCodeEx (co=0x72643f0, globals=, locals=, args=0x1, argcount=0, kws=0x72a5770, kwcount=0, defs=0x743eba8, defcount=1, closure=0x0) at Python/ceval.c:2836 #6 0x00000000004855bc in PyEval_EvalFrameEx (f=0x72a55f0, throwflag=) at Python/ceval.c:3669 #7 0x000000000048725f in PyEval_EvalCodeEx (co=0x72644e0, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2836 #8 0x00000000004872a2 in PyEval_EvalCode (co=0x5ec043db0, globals=0x543e41f10, locals=0x543b969c0) at Python/ceval.c:494 #9 0x00000000004a844e in PyRun_FileExFlags (fp=0x7171010, filename=0x7ffffaf6b419 "/home/mkc/greylag/main/greylag_reannotate.py", start=, globals=0x7194510, locals=0x7194510, closeit=1, flags=0x7ffffaf69080) at Python/pythonrun.c:1273 #10 0x00000000004a86e0 in PyRun_SimpleFileExFlags (fp=0x7171010, filename=0x7ffffaf6b419 "/home/mkc/greylag/main/greylag_reannotate.py", closeit=1, flags=0x7ffffaf69080) at Python/pythonrun.c:879 #11 0x0000000000412275 in Py_Main (argc=, argv=0x7ffffaf691a8) at Modules/main.c:523 #12 0x00000030fea1d8b4 in __libc_start_main () from /lib64/libc.so.6 #13 0x0000000000411799 in _start () On Sun, Dec 21, 2008 at 12:44 PM, Adam Olsen wrote: > On Sat, Dec 20, 2008 at 6:09 PM, Mike Coleman wrote: >> On Sat, Dec 20, 2008 at 5:40 PM, Alexandre Vassalotti >>> Have you seen any significant difference in the exit time when the >>> cyclic GC is disabled or enabled? >> >> Unfortunately, with GC enabled, the application is too slow to be >> useful, because of the greatly increased time for dict creation. I >> suppose it's theoretically possible that with this increased time, the >> long time for exit will look less bad by comparison, but I'd be >> surprised if it makes any difference at all. I'm confident that there >> are no loops in this dict, and nothing for cyclic gc to collect. > > Try putting an explicit gc.collect() at the end, with the usual > timestamps before and after. > > After that try deleting your dict, then calling gc.collect(), with > timestamps throughout. > > > -- > Adam Olsen, aka Rhamphoryncus > From tutufan at gmail.com Mon Dec 22 19:13:33 2008 From: tutufan at gmail.com (Mike Coleman) Date: Mon, 22 Dec 2008 12:13:33 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494F862B.60701@egenix.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> Message-ID: <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg wrote: > BTW: Rather than using a huge in-memory dict, I'd suggest to either > use an on-disk dictionary such as the ones found in mxBeeBase or > a database. I really want this to work in-memory. I have 64G RAM, and I'm only trying to use 45G of it ("only" 45G :-), and I don't need the results to persist after the program finishes. Python should be able to do this. I don't want to hear "Just use Perl instead" from my co-workers... ;-) From rhamph at gmail.com Mon Dec 22 21:22:56 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 22 Dec 2008 13:22:56 -0700 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> Message-ID: On Mon, Dec 22, 2008 at 11:01 AM, Mike Coleman wrote: > Thanks for all of the useful suggestions. Here are some preliminary results. > > With still gc.disable(), at the end of the program I first did a > gc.collect(), which took about five minutes. (So, reason enough not > to gc.enable(), at least without Antoine's patch.) > > After that, I did a .clear() on the huge dict. That's where the time > is being spent. Doing the suggested "poor man's profiling" (repeated > backtraces via gdb), for 20 or so samples, one is within libc free, > but all of the rest are in the same place (same source line) within > PyObjectFree (see below), sometimes within list_dealloc and sometimes > within tuple_dealloc. So, apparently a lot of time is being spent in > this loop: > > > /* Case 3: We have to move the arena towards the end > * of the list, because it has more free pools than > * the arena to its right. > > ... > > /* Locate the new insertion point by iterating over > * the list, using our nextarena pointer. > */ > while (ao->nextarena != NULL && > nf > ao->nextarena->nfreepools) { > ao->prevarena = ao->nextarena; > ao->nextarena = ao->nextarena->nextarena; > } > > Investigating further, from one stop, I used gdb to follow the chain > of pointers in the nextarena and prevarena directions. There were > 5449 and 112765 links, respectively. maxarenas is 131072. > > Sampling nf at different breaks gives values in the range(10,20). > > This loop looks like an insertion sort. If it's the case that only a > "few" iterations are ever needed for any given free, this might be > okay--if not, it would seem that this must be quadratic. > > I attempted to look further by setting a silent break with counter > within the loop and another break after the loop to inspect the > counter, but gdb's been buzzing away on that for 40 minutes without > coming back. That might mean that there are a lot of passes through > this loop per free (i.e., that gdb is taking a long time to process > 100,000 silent breaks), or perhaps I've made a mistake, or gdb isn't > handling this well. To make sure that's the correct line please recompile python without optimizations. GCC happily reorders and merges different parts of a function. Adding a counter in C and recompiling would be a lot faster than using a gdb hook. -- Adam Olsen, aka Rhamphoryncus From martin at v.loewis.de Mon Dec 22 21:38:55 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Dec 2008 21:38:55 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> Message-ID: <494FFADF.7020609@v.loewis.de> > Or perhaps there's a smarter way to manage the list of > arena/free pool info. If that code is the real problem (in a reproducible test case), then this approach is the only acceptable solution. Disabling long-running code is not acceptable. Regards, Martin From tutufan at gmail.com Mon Dec 22 21:43:33 2008 From: tutufan at gmail.com (Mike Coleman) Date: Mon, 22 Dec 2008 14:43:33 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494FFADF.7020609@v.loewis.de> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <494FFADF.7020609@v.loewis.de> Message-ID: <3c6c07c20812221243s6930407fx5ba9f3a14f48a2d9@mail.gmail.com> On Mon, Dec 22, 2008 at 2:38 PM, "Martin v. L?wis" wrote: >> Or perhaps there's a smarter way to manage the list of >> arena/free pool info. > > If that code is the real problem (in a reproducible test case), > then this approach is the only acceptable solution. Disabling > long-running code is not acceptable. By "disabling", I meant disabling the optimization that's trying to rearrange the arenas so that more memory can be returned to the OS. This presumably wouldn't be any worse than things were in Python 2.4, when memory was never returned to the OS. (I'm working on a test case.) From krstic at solarsail.hcs.harvard.edu Mon Dec 22 21:54:35 2008 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=) Date: Mon, 22 Dec 2008 15:54:35 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> Message-ID: <13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu> On Dec 22, 2008, at 1:13 PM, Mike Coleman wrote: > On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg wrote: >> BTW: Rather than using a huge in-memory dict, I'd suggest to either >> use an on-disk dictionary such as the ones found in mxBeeBase or >> a database. > > I really want this to work in-memory. I have 64G RAM, and I'm only > trying to use 45G of it ("only" 45G :-), and I don't need the results > to persist after the program finishes. It's still not clear to me, from reading the whole thread, precisely what you're seeing. A self-contained test case, preferably with generated random data, would be great, and save everyone a lot of investigation time. In the meantime, can you 1) turn off all swap files and partitions, and 2) confirm positively that your CPU cycles are burning up in userland? (In general, unless you know exactly why your workload needs swap, and have written your program to take swapping into account, having _any_ swap on a machine with 64GB RAM is lunacy. The machine will grind to a complete standstill long before filling up gigabytes of swap.) -- Ivan Krsti? | http://radian.org From mal at egenix.com Mon Dec 22 22:07:38 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 22 Dec 2008 22:07:38 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> Message-ID: <4950019A.7030509@egenix.com> On 2008-12-22 19:13, Mike Coleman wrote: > On Mon, Dec 22, 2008 at 6:20 AM, M.-A. Lemburg wrote: >> BTW: Rather than using a huge in-memory dict, I'd suggest to either >> use an on-disk dictionary such as the ones found in mxBeeBase or >> a database. > > I really want this to work in-memory. I have 64G RAM, and I'm only > trying to use 45G of it ("only" 45G :-), and I don't need the results > to persist after the program finishes. > > Python should be able to do this. I don't want to hear "Just use Perl > instead" from my co-workers... ;-) What kinds of objects are you storing in your dictionary ? Python instances, strings, integers ? The time it takes to deallocate the objects in your dictionary depends a lot on the types you are using. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 22 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From martin at v.loewis.de Mon Dec 22 22:11:56 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Dec 2008 22:11:56 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221243s6930407fx5ba9f3a14f48a2d9@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <494FFADF.7020609@v.loewis.de> <3c6c07c20812221243s6930407fx5ba9f3a14f48a2d9@mail.gmail.com> Message-ID: <4950029C.5050107@v.loewis.de> >> If that code is the real problem (in a reproducible test case), >> then this approach is the only acceptable solution. Disabling >> long-running code is not acceptable. > > By "disabling", I meant disabling the optimization that's trying to > rearrange the arenas so that more memory can be returned to the OS. I meant the same thing - I'm opposed to giving up one feature or optimization in favor of a different feature or optimization. > This presumably wouldn't be any worse than things were in Python 2.4, > when memory was never returned to the OS. Going back to the state of Python 2.4 would not be acceptable. Regards, Martin From chojrak11 at gmail.com Mon Dec 22 22:21:21 2008 From: chojrak11 at gmail.com (chojrak11 at gmail.com) Date: Mon, 22 Dec 2008 22:21:21 +0100 Subject: [Python-Dev] [capi-sig] Exceptions with additional instance variables In-Reply-To: References: Message-ID: 2008/12/22 Guilherme Polo : > On Mon, Dec 22, 2008 at 10:06 AM, wrote: > > #include "Python.h" > > static PyObject *MyErr; > > static PyMethodDef module_methods[] = { > {"raise_test1", (PyCFunction)raise_test1, METH_NOARGS, NULL}, > {"raise_test2", (PyCFunction)raise_test2, METH_NOARGS, NULL}, > {"raise_test3", (PyCFunction)raise_test3, METH_NOARGS, NULL}, > {NULL}, > }; > > PyMODINIT_FUNC > initfancy_exc(void) > { > PyObject *m; > > m = Py_InitModule("fancy_exc", module_methods); > if (m == NULL) > return; > > MyErr = PyErr_NewException("fancy_exc.err", NULL, NULL); > > Py_INCREF(MyErr); > if (PyModule_AddObject(m, "err", MyErr) < 0) > return; > } > > static PyObject * > raise_test1(PyObject *self) > { > PyObject_SetAttrString(MyErr, "code", PyInt_FromLong(42)); > PyObject_SetAttrString(MyErr, "category", PyString_FromString("nice one")); > PyErr_SetString(MyErr, "All is good, I hope"); > return NULL; > } > > static PyObject * > raise_test2(PyObject *self) > { > > PyObject *t = PyTuple_New(3); > PyTuple_SetItem(t, 0, PyString_FromString("error message")); > PyTuple_SetItem(t, 1, PyInt_FromLong(10)); > PyTuple_SetItem(t, 2, PyString_FromString("category name here")); > PyErr_SetObject(MyErr, t); > Py_DECREF(t); > return NULL; > } > > In this second form you check for the args attribute of the exception. static PyObject * raise_test3(PyObject *self) { PyObject *d = PyDict_New(); PyDict_SetItemString(d, "category", PyInt_FromLong(111)); PyDict_SetItemString(d, "message", PyString_FromString("error message")); PyErr_SetObject(MyErr, d); Py_DECREF(d); return NULL; } (Small changes in the above code to be able to call more variants of raise_test methods simultaneously.) Yes! I finally understood this (I think...) So to explain things for people like me: 1) PyErr_NewException creates *the class* in the module, it's a simple method of creating exception classes, but classes created that way are limited in features (i.e. cannot be manipulated from the module in all ways a 'full' type can). Third argument to PyErr_NewException can be NULL, in which case API will create an empty dictionary. After creating the class you need to add it to the module with PyModule_AddObject. Side note: If you want to specify a help for the class, you do PyObject_SetAttrString on the class with the key '__doc__'. 2) there's no instantiation anywhere: a. PyErr_SetString and PyErr_SetObject set the exception *class* (exception type) and exception data -- see http://docs.python.org/c-api/exceptions.html which notes that exceptions are similar in concept to the global 'errno' variable, so you just set what type of last error was and what error message (or other data) you want to associate with it b. the "code" and "category" variables from raise_test1() in the above example inserted with PyObject_SetAttrString() are *class* variables, not instance variables: try: fancy_exc.raise_test1() except fancy_exc.err, e: print e.code, fancy_exc.err.code print fancy_exc.err.code it prints: 42 42 42 c. the data is still present in the fancy_exc.err class after exception handling is finished, which is ok for now but may be problematic in case of multithreaded usage patterns (however I probably don't understand how multithreading in Python works) 3) alternative to the above is to pass all required data to the exception with PyErr_SetObject - you can prepare a dictionary or a tuple earlier, which will be accessible with 'args' member: try: fancy_exc.raise_test2() except fancy_exc.err, e: print e.args[0] If it's dictionary, the syntax is a bit weird because e.args is always a tuple: try: fancy_exc.raise_test3() except fancy_exc.err, e: print e.args[0]['category'] The 'args' values are unavailable outside of 'except' clause, however you can still use the 'e' variable which retains the values. So it's an instance variable. 4) creating the exception class using a new type in C (PyTypeObject structure) would give the most robust solution because every nuance of the class can be manipulated, but it's not worth the trouble now. I can switch to it transparently at a later time. Transparently means that nothing will need to be updated in Python solutions written by the module users. 5) most of the projects I've inspected with Google Code Search use the PyErr_NewException approach. 6) there's the option of using Cython which simplifies creating extensions and hides many unnecessary internals. Many thanks Guilherme and Stefan for your help and for the patience. Kind regards, Chojrak From krstic at solarsail.hcs.harvard.edu Mon Dec 22 22:23:12 2008 From: krstic at solarsail.hcs.harvard.edu (=?UTF-8?Q?Ivan_Krsti=C4=87?=) Date: Mon, 22 Dec 2008 16:23:12 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <4950019A.7030509@egenix.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> <4950019A.7030509@egenix.com> Message-ID: <8E0B5EC0-229C-4B31-80F3-569FFC2F43D7@solarsail.hcs.harvard.edu> On Dec 22, 2008, at 4:07 PM, M.-A. Lemburg wrote: > What kinds of objects are you storing in your dictionary ? Python > instances, strings, integers ? Answered in a previous message: On Dec 20, 2008, at 8:09 PM, Mike Coleman wrote: > The dict keys were all uppercase alpha strings of length 7. I don't > have access at the moment, but maybe something like 10-100M of them > (not sure how redundant the set is). The values are all lists of > pairs, where each pair is a (string, int). The pair strings are of > length around 30, and drawn from a "small" fixed set of around 60K > strings (). As mentioned previously, I think the ints are drawn > pretty uniformly from something like range(10000). The length of the > lists depends on the redundancy of the key set, but I think there are > around 100-200M pairs total, for the entire dict. > > (If you're curious about the application domain, see 'http://greylag.org > '.) -- Ivan Krsti? | http://radian.org From chojrak11 at gmail.com Mon Dec 22 22:25:12 2008 From: chojrak11 at gmail.com (chojrak11 at gmail.com) Date: Mon, 22 Dec 2008 22:25:12 +0100 Subject: [Python-Dev] [capi-sig] Exceptions with additional instance variables In-Reply-To: References: Message-ID: Not this list, sorry.... From steve at pearwood.info Mon Dec 22 22:45:42 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 23 Dec 2008 08:45:42 +1100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <494F862B.60701@egenix.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> Message-ID: <200812230845.42805.steve@pearwood.info> On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote: > On 2008-12-20 23:16, Martin v. L?wis wrote: > >>> I will try next week to see if I can come up with a smaller, > >>> submittable example. Thanks. > >> > >> These long exit times are usually caused by the garbage collection > >> of objects. This can be a very time consuming task. > > > > I doubt that. The long exit times are usually caused by a bad > > malloc implementation. > > With "garbage collection" I meant the process of Py_DECREF'ing the > objects in large containers or deeply nested structures, not the GC > mechanism for breaking circular references in Python. > > This will usually also involve free() calls, so the malloc > implementation affects this as well. However, I've seen such long > exit times on Linux and Windows, which both have rather good > malloc implementations. > > I don't think there's anything much we can do about it at the > interpreter level. Deleting millions of objects takes time and that's > not really surprising at all. It takes even longer if you have > instances with .__del__() methods written in Python. This behaviour appears to be specific to deleting dicts, not deleting random objects. I haven't yet confirmed that the problem still exists in trunk (I hope to have time tonight or tomorrow), but in my previous tests deleting millions of items stored in a list of tuples completed in a minute or two, while deleting the same items stored as key:item pairs in a dict took 30+ minutes. I say plus because I never had the patience to let it run to completion, it could have been hours for all I know. > Applications can choose other mechanisms for speeding up the > exit process in various (less clean) ways, if they have a need for > this. > > BTW: Rather than using a huge in-memory dict, I'd suggest to either > use an on-disk dictionary such as the ones found in mxBeeBase or > a database. The original poster's application uses 45GB of data. In my earlier tests, I've experienced the problem with ~ 300 *megabytes* of data: hardly what I would call "huge". -- Steven D'Aprano From chambon.pascal at wanadoo.fr Mon Dec 22 22:49:58 2008 From: chambon.pascal at wanadoo.fr (Pascal Chambon) Date: Mon, 22 Dec 2008 22:49:58 +0100 Subject: [Python-Dev] Hello everyone + little question around Cpython/stackless Message-ID: <49500B86.1070605@wanadoo.fr> Hello snakemen and snakewomen I'm Pascal Chambon, a french engineer just leaving my Telecom School, blatantly fond of Python, of its miscellaneous offsprings and of all what's around dynamic languages and high level programming concepts. I'm currently studying all I can find on stackless python, PYPY and the concepts they've brought to Python, and so far I wonder : since stackless python claims to be 100% compatible with CPython's extensions, faster, and brings lots of fun stuffs (tasklets, coroutines and no C stack), how comes it hasn't been merged back, to become the standard 'fast' python implementation ? Would I have missed some crucial point around there ? Isn't that a pity to maintain two separate branches if they actually complete each other very well ? Waiting for your lights on this subject, regards, Pascal From martin at v.loewis.de Mon Dec 22 22:58:24 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Dec 2008 22:58:24 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> Message-ID: <49500D80.2090201@v.loewis.de> > Investigating further, from one stop, I used gdb to follow the chain > of pointers in the nextarena and prevarena directions. There were > 5449 and 112765 links, respectively. maxarenas is 131072. To reduce the time for keeping sorted lists of arenas, I was first thinking of a binheap. I had formulated it all, and don't want to waste that effort, so I attach it below in case my second idea (right below) is flawed. It then occurred that there are only 64 different values for nfreepools, as ARENA_SIZE is 256kiB, and POOL_SIZE is 4kiB. So rather than keeping the list sorted, I now propose to maintain 64 lists, accessible in an array double-linked lists indexed by nfreepools. Whenever nfreepools changes, the arena_object is unlinked from its current list, and linked into the new list. This should reduce the overhead for keeping the lists sorted down from O(n) to O(1), with a moderate overhead of 64 pointers (512 Bytes in your case). Allocation of a new pool would have to do a linear search in these pointers (finding the arena with the least number of pools); this could be sped up with a finger pointing to the last index where a pool was found (-1, since that pool will have moved). Regards, Martin a) usable_arenas becomes an arena_object**, pointing to an array of maxarenas+1 arena*. A second variable max_usable_arenas is added. arena_object loses the prevarena pointer, and gains a usable_index value of type size_t (which is 0 for unused or completely allocated arena_objects). usable_arenas should stay heap-sorted, with the arena_object with the smallest nfreepools at index 1. b) sink and swim operations are added, which keep usable_index intact whenever arena_object pointers get swapped. c) whenever a pool is allocated in an arena, nfreepools decreases, and swim is called for the arena. whenever a pool becomes free, sink is called. d) when the last pool was allocated in an arena, it is removed from the heap. likewise, when all pools are freed in an arena, it is removed from the heap and returned to the system. e) when the first pool gets freed in an arena, it is added to the heap. On each pool allocation/deallocation, this should get the O(n) complexity of keeping the arena list sorted down to O(log n). From skip at pobox.com Mon Dec 22 23:02:06 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 22 Dec 2008 16:02:06 -0600 Subject: [Python-Dev] If I check something in ... Message-ID: <18768.3678.749094.475868@montanaro-dyndns-org.local> I have this trivial little test case for test_file.py: + def testReadWhenWriting(self): + self.assertRaises(IOError, self.f.read) I would like to add it to the 2.6 and 3.0 maintenance branch and the 2.x trunk and the py3k branch. What is the preferred way to do that? Do I really have to do the same task four times or can I check it in once (or twice) secure in the belief that someone will come along and do a monster merge? Thx, Skip From musiccomposition at gmail.com Mon Dec 22 23:06:41 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Mon, 22 Dec 2008 16:06:41 -0600 Subject: [Python-Dev] If I check something in ... In-Reply-To: <18768.3678.749094.475868@montanaro-dyndns-org.local> References: <18768.3678.749094.475868@montanaro-dyndns-org.local> Message-ID: <1afaf6160812221406m6f47ff26gfa94f30571f3ca5a@mail.gmail.com> On Mon, Dec 22, 2008 at 4:02 PM, wrote: > > I have this trivial little test case for test_file.py: > > + def testReadWhenWriting(self): > + self.assertRaises(IOError, self.f.read) > > I would like to add it to the 2.6 and 3.0 maintenance branch and the 2.x > trunk and the py3k branch. What is the preferred way to do that? Do I > really have to do the same task four times or can I check it in once (or > twice) secure in the belief that someone will come along and do a monster > merge? If you check it into the trunk, it will find it's way into 2.6, 3.1, and 3.0. -- Regards, Benjamin Peterson From martin at v.loewis.de Mon Dec 22 23:23:58 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Dec 2008 23:23:58 +0100 Subject: [Python-Dev] Hello everyone + little question around Cpython/stackless In-Reply-To: <49500B86.1070605@wanadoo.fr> References: <49500B86.1070605@wanadoo.fr> Message-ID: <4950137E.8040506@v.loewis.de> > I'm currently studying all I can find on stackless python, PYPY and the > concepts they've brought to Python, and so far I wonder : since > stackless python claims to be 100% compatible with CPython's extensions, > faster, and brings lots of fun stuffs (tasklets, coroutines and no C > stack), how comes it hasn't been merged back, to become the standard > 'fast' python implementation ? There is a long history to it, and multiple reasons influenced that status. In summary, some of the reasons were: - Stackless Python was never officially proposed for inclusion into Python (it may be that parts of it were, and of those parts actually did get added). - Stackless Python originally was fairly unmaintainable; this prevented its inclusion. - in its current form, it has limited portability, as it needs to be ported to each microprocessor and operating system separately. CPython has so far avoided using assembler code, and is fairly portable. Regards, Martin From martin at v.loewis.de Mon Dec 22 23:27:07 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 22 Dec 2008 23:27:07 +0100 Subject: [Python-Dev] If I check something in ... In-Reply-To: <18768.3678.749094.475868@montanaro-dyndns-org.local> References: <18768.3678.749094.475868@montanaro-dyndns-org.local> Message-ID: <4950143B.50100@v.loewis.de> > I would like to add it to the 2.6 and 3.0 maintenance branch and the 2.x > trunk and the py3k branch. What is the preferred way to do that? Do I > really have to do the same task four times or can I check it in once (or > twice) secure in the belief that someone will come along and do a monster > merge? You shouldn't check it in four times. But (IMO) you also shouldn't wait for somebody else to merge it (I know some people disagree with that recommendation). Instead, you should commit it into trunk, and then run svnmerge.py three times, namely: - in a release26-maint checkout, run svnmerge.py -r svn commit -F svnmerge-commit-something-press-tab - in a py3k checkout, run svnmerge.py -r svn commit -F svnmerge-commit-something-press-tab - in a release30-maint check, then run svnmerge.py -r svn revert . svn commit -F svnmerge-commit-something-press-tab Regards, Martin From solipsis at pitrou.net Mon Dec 22 23:35:30 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Dec 2008 22:35:30 +0000 (UTC) Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <49500D80.2090201@v.loewis.de> Message-ID: Martin v. L?wis v.loewis.de> writes: > > It then occurred that there are only 64 different values for nfreepools, > as ARENA_SIZE is 256kiB, and POOL_SIZE is 4kiB. So rather than keeping > the list sorted, I now propose to maintain 64 lists, accessible in > an array double-linked lists indexed by nfreepools. Whenever nfreepools > changes, the arena_object is unlinked from its current list, and linked > into the new list. This should reduce the overhead for keeping the lists > sorted down from O(n) to O(1), with a moderate overhead of 64 pointers > (512 Bytes in your case). > > Allocation of a new pool would have to do a linear search in these > pointers (finding the arena with the least number of pools); You mean the least number of free pools, right? IIUC, the heuristic is to favour a small number of busy arenas rather than a lot of sparse ones. And, by linear search in these pointers, do you mean just probe the 64 lists for the first non-NULL list head? If so, then it's likely fast enough for a rather infrequent operation. Now, we should find a way to benchmark this without having to steal Mike's machine and wait 30 minutes every time. Regards Antoine. From musiccomposition at gmail.com Mon Dec 22 23:39:08 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Mon, 22 Dec 2008 16:39:08 -0600 Subject: [Python-Dev] If I check something in ... In-Reply-To: <4950143B.50100@v.loewis.de> References: <18768.3678.749094.475868@montanaro-dyndns-org.local> <4950143B.50100@v.loewis.de> Message-ID: <1afaf6160812221439p49931977jad433cf95369c071@mail.gmail.com> On Mon, Dec 22, 2008 at 4:27 PM, "Martin v. L?wis" wrote: > You shouldn't check it in four times. But (IMO) you also shouldn't wait > for somebody else to merge it (I know some people disagree with that > recommendation). I don't completely disagree. Certainly, if you want to make sure your change is merged correctly into every branches, then please do merge it yourself. It's also nice if platform-specific merges (ie Windows build files) are handled by the original committer. However, minor changes to the documentation or code formatting and even simple bug fixes are trivial to merge all at once between branches. In the end, I suppose it doesn't really matter; everyone can do what they are comfortable with. -- Regards, Benjamin From martin at v.loewis.de Mon Dec 22 23:55:40 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Mon, 22 Dec 2008 23:55:40 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <49500D80.2090201@v.loewis.de> Message-ID: <49501AEC.3010805@v.loewis.de> >> Allocation of a new pool would have to do a linear search in these >> pointers (finding the arena with the least number of pools); > > You mean the least number of free pools, right? Correct. > IIUC, the heuristic is to favour > a small number of busy arenas rather than a lot of sparse ones. Correct. Or, more precisely, the hope is indeed to make most arenas sparse, so that they eventually see all their pools freed. > And, by linear search in these pointers, do you mean just probe the 64 lists for > the first non-NULL list head? Correct. > If so, then it's likely fast enough for a rather infrequent operation. I would hope so, yes. However, the same hope applied to the current code (how much time can it take to sink an arena in a linear list?), so if we have the prospect of using larger arenas some day, this might change. > Now, we should find a way to benchmark this without having to steal Mike's > machine and wait 30 minutes every time. I think this can be simulated by using just arena objects, with no associated arenas, and just adjusting pool counters. Allocate 100,000 arena objects, and start out with them all being completely allocated. Then randomly chose one arena to deallocate a pool from; from time to time, also allocate a new pool. Unfortunately, this will require some hacking of the code to take the measurements. Alternatively, make the arena size 4k, and the pool size 32 bytes, and then come with a pattern to allocate and deallocate 8 byte blocks. Not sure whether the code works for these parameters, though (but it might be useful to fix it for non-standard sizes). This would require only 400MiB of memory to run the test. I think obmalloc is fairly independent from the rest of Python, so it should be possible to link it with a separate main() function, and nothing else of Python. Regards, Martin From tutufan at gmail.com Tue Dec 23 00:28:48 2008 From: tutufan at gmail.com (Mike Coleman) Date: Mon, 22 Dec 2008 17:28:48 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> <13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu> Message-ID: <3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com> On Mon, Dec 22, 2008 at 2:54 PM, Ivan Krsti? wrote: > It's still not clear to me, from reading the whole thread, precisely what > you're seeing. A self-contained test case, preferably with generated random > data, would be great, and save everyone a lot of investigation time. I'm still working on a test case. The first couple of attempts, using a half-hearted attempt to model the application behavior wrt this dict didn't demonstrate bad behavior. My impression is that no one's burning much time on this but me at the moment, aside from offering helpful advice. If you are, you might want to wait. I noticed just now that the original hardware was throwing some chipkills, so I'm retesting on something else. > In the > meantime, can you 1) turn off all swap files and partitions, and 2) confirm > positively that your CPU cycles are burning up in userland? For (1), I don't have that much control over the machine. Plus, based on watching with top, I seriously doubt the process is using swap in any way. For (2), yes, 100% CPU usage. > (In general, unless you know exactly why your workload needs swap, and have > written your program to take swapping into account, having _any_ swap on a > machine with 64GB RAM is lunacy. The machine will grind to a complete > standstill long before filling up gigabytes of swap.) The swap is not there to support my application per se. Clearly if you're swapping, generally you're crawling. This host is used by a reasonably large set of non- and novice programmers, who sometimes vacuum up VM without realizing it. If you have a nice, big swap space, you can 'kill -STOP' these offenders, and allow them to swap out while you have a leisurely discussion with the owner and possibly 'kill -CONT' later, as opposed to having to do a quick 'kill -KILL' to save the machine. That's my thinking, anyway. Mike From tutufan at gmail.com Tue Dec 23 01:19:10 2008 From: tutufan at gmail.com (Mike Coleman) Date: Mon, 22 Dec 2008 18:19:10 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> Message-ID: <3c6c07c20812221619i5388b857vc17fc59884a3323d@mail.gmail.com> On Mon, Dec 22, 2008 at 2:22 PM, Adam Olsen wrote: > To make sure that's the correct line please recompile python without > optimizations. GCC happily reorders and merges different parts of a > function. > > Adding a counter in C and recompiling would be a lot faster than using > a gdb hook. Okay, I did this. The results are the same, except that now sampling selects the different source statements within this loop, instead of just the top of the loop (which makes sense). I added a counter (static volatile long) as suggested, and a breakpoint to sample it. Not every pass through PyObject_Free takes case 3, but for those that do, this loop runs around 100-25000 times. I didn't try to graph it, but based on a quick sample, it looks like more than 5000 iterations on most occasions. The total counter is 12.4 billion at the moment, and still growing. That seems high, but I'm not sure what would be expected or hoped for. I have a script that demonstrates the problem, but unfortunately the behavior isn't clearly bad until large amounts of memory are used. I don't think it shows at 2G, for example. (A 32G machine is sufficient.) Here is a log of running the program at different sizes ($1): 1 4.04686999321 0.696660041809 2 8.1575551033 1.46393489838 3 12.6426320076 2.30558800697 4 16.471298933 3.80377006531 5 20.1461620331 4.96685886383 6 25.150053978 5.48230814934 7 28.9099609852 7.41244196892 8 32.283219099 6.31711483002 9 36.6974511147 7.40236377716 10 40.3126089573 9.01174497604 20 81.7559120655 20.3317198753 30 123.67071104 31.4815018177 40 161.935647011 61.4484620094 50 210.610441923 88.6161060333 60 248.89805007 118.821491003 70 288.944771051 194.166989088 80 329.93295002 262.14109993 90 396.209988832 454.317914009 100 435.610564947 564.191882133 If you plot this, it is clearly quadratic (or worse). Here is the script: #!/usr/bin/env python """ Try to trigger quadratic (?) behavior during .clear() of a large but simple defaultdict. """ from collections import defaultdict import time import sys import gc; gc.disable() print >> sys.stderr, sys.version h = defaultdict(list) n = 0 lasttime = time.time() megs = int(sys.argv[1]) print megs, sys.stdout.flush() # 100M iterations -> ~24GB? on my 64-bit host for i in xrange(megs * 1024 * 1024): s = '%0.7d' % i h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345)) h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345)) h[s].append(('xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', 12345)) # if (i % 1000000) == 0: # t = time.time() # print >> sys.stderr, t-lasttime # lasttime = t t = time.time() print t-lasttime, sys.stdout.flush() lasttime = t h.clear() t = time.time() print t-lasttime, sys.stdout.flush() lasttime = t print From krstic at solarsail.hcs.harvard.edu Tue Dec 23 01:32:25 2008 From: krstic at solarsail.hcs.harvard.edu (=?ISO-8859-2?Q?Ivan_Krsti=E6?=) Date: Mon, 22 Dec 2008 19:32:25 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> <13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu> <3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com> Message-ID: <002EDCFD-21E6-4DFF-93BF-9C86AA625AD5@solarsail.hcs.harvard.edu> On Dec 22, 2008, at 6:28 PM, Mike Coleman wrote: > For (2), yes, 100% CPU usage. 100% _user_ CPU usage? (I'm trying to make sure we're not chasing some particular degeneration of kmalloc/vmalloc and friends.) -- Ivan Krsti? | http://radian.org From solipsis at pitrou.net Tue Dec 23 01:34:53 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Dec 2008 00:34:53 +0000 (UTC) Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <49500D80.2090201@v.loewis.de> <49501AEC.3010805@v.loewis.de> Message-ID: > Now, we should find a way to benchmark this without having to steal Mike's > machine and wait 30 minutes every time. So, I seem to reproduce it. The following script takes about 15 seconds to run and allocates a 2 GB dict which it deletes at the end (gc disabled of course). With 2.4, deleting the dict takes ~1.2 seconds while with 2.5 and higher (including 3.0), deleting the dict takes ~3.5 seconds. Nothing spectacular but the difference is clear. Also, after the dict is deleted and before the program exits, you can witness (with `ps` or `top`) that 2.5 and higher has reclaimed 1GB, while 2.4 has reclaimed nothing. There is a sleep() call at the end so that you have the time :-) You can tune memory occupation at the beginning of the script, but the lower the more difficult it will be to witness a difference. Regards Antoine. ####### import random import time import gc import itertools # Adjust this parameter according to your system RAM! target_size = int(2.0 * 1024**3) # 2.0 GB pool_size = 4 * 1024 # This is a ballpark estimate: 60 bytes overhead for each # { dict entry struct + float object + tuple object header }, # 1.3 overallocation factor for the dict. target_length = int(target_size / (1.3 * (pool_size + 60))) def make_dict(): print ("filling dict up to %d entries..." % target_length) # 1. Initialize the dict from a set of pre-computed random keys. keys = [random.random() for i in range(target_length)] d = dict.fromkeys(keys) # 2. Build the values that will constitute the dict. Each value will, as # far as possible, span a contiguous `pool_size` memory area. # Over 256 bytes per alloc, PyObject_Malloc defers to the system malloc() # We avoid that by allocating tuples of smaller longs. int_size = 200 # 24 roughly accounts for the long object overhead (YMMV) int_start = 1 << ((int_size - 24) * 8 - 7) int_range = range(1, 1 + pool_size // int_size) values = [None] * target_length # Maximize allocation locality by pre-allocating the values for n in range(target_length): values[n] = tuple(int_start + j for j in int_range) if n % 10000 == 0: print (" %d iterations" % n) # The keys are iterated over in their original order rather than in # dict order, so as to randomly spread the values in the internal dict # table wrt. allocation address. for n, k in enumerate(keys): d[k] = values[n] print ("dict filled!") return d if __name__ == "__main__": gc.disable() t1 = time.time() d = make_dict() t2 = time.time() print (" -> %.3f s." % (t2 - t1)) print ("deleting dict...") t2 = time.time() del d t3 = time.time() print (" -> %.3f s." % (t3 - t2)) print ("Finished, you can press Ctrl+C.") time.sleep(10.0) From skip at pobox.com Tue Dec 23 01:41:41 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 22 Dec 2008 18:41:41 -0600 Subject: [Python-Dev] If I check something in ... In-Reply-To: <1afaf6160812221406m6f47ff26gfa94f30571f3ca5a@mail.gmail.com> References: <18768.3678.749094.475868@montanaro-dyndns-org.local> <1afaf6160812221406m6f47ff26gfa94f30571f3ca5a@mail.gmail.com> Message-ID: <18768.13253.753399.192276@montanaro-dyndns-org.local> Benjamin> If you check it into the trunk, it will find it's way into Benjamin> 2.6, 3.1, and 3.0. Outstanding! Thx, Skip From ncoghlan at gmail.com Tue Dec 23 02:24:44 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Dec 2008 11:24:44 +1000 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <200812230845.42805.steve@pearwood.info> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <200812230845.42805.steve@pearwood.info> Message-ID: <49503DDC.7080107@gmail.com> Steven D'Aprano wrote: > This behaviour appears to be specific to deleting dicts, not deleting > random objects. I haven't yet confirmed that the problem still exists > in trunk (I hope to have time tonight or tomorrow), but in my previous > tests deleting millions of items stored in a list of tuples completed > in a minute or two, while deleting the same items stored as key:item > pairs in a dict took 30+ minutes. I say plus because I never had the > patience to let it run to completion, it could have been hours for all > I know. There's actually an interesting comment in list_dealloc: /* Do it backwards, for Christian Tismer. There's a simple test case where somehow this reduces thrashing when a *very* large list is created and immediately deleted. */ The "backwards" the comment is referring to is the fact that it invokes DECREF on the last item in the list first and counts back down to the first item, instead of starting at the first item and incrementing the index each time around the loop. The revision number on that (13452) indicates that it predates the implementation of PyObject_Malloc and friends, so it was probably avoiding pathological behaviour in platform malloc() implementations by free'ing memory in the reverse order to which it was allocated (assuming the list was built initially starting with the first item). However, I'm now wondering it if also has the side effect of avoiding the quadratic behaviour Mike has found inside the more recent code to release arenas back to the OS. I'm working on a simple benchmark that looks for non-linear scaling of the deallocation times - I'll include a case of deallocation of a reversed list along with a normal list and a dictionary. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From tutufan at gmail.com Tue Dec 23 03:05:06 2008 From: tutufan at gmail.com (Mike Coleman) Date: Mon, 22 Dec 2008 20:05:06 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <002EDCFD-21E6-4DFF-93BF-9C86AA625AD5@solarsail.hcs.harvard.edu> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> <13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu> <3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com> <002EDCFD-21E6-4DFF-93BF-9C86AA625AD5@solarsail.hcs.harvard.edu> Message-ID: <3c6c07c20812221805m12820ca6la5f8643c6fd38af@mail.gmail.com> 2008/12/22 Ivan Krsti? : > On Dec 22, 2008, at 6:28 PM, Mike Coleman wrote: >> >> For (2), yes, 100% CPU usage. > > 100% _user_ CPU usage? (I'm trying to make sure we're not chasing some > particular degeneration of kmalloc/vmalloc and friends.) Yes, user. No noticeable sys or wait CPU going on. From tutufan at gmail.com Tue Dec 23 03:32:07 2008 From: tutufan at gmail.com (Mike Coleman) Date: Mon, 22 Dec 2008 20:32:07 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221805m12820ca6la5f8643c6fd38af@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <3c6c07c20812221013p43e0281akd56aabc2c05402e7@mail.gmail.com> <13B12BDC-E765-499C-B8A2-E73E4DBC7F30@solarsail.hcs.harvard.edu> <3c6c07c20812221528y7f013944vb7cb27fb4ab07e8d@mail.gmail.com> <002EDCFD-21E6-4DFF-93BF-9C86AA625AD5@solarsail.hcs.harvard.edu> <3c6c07c20812221805m12820ca6la5f8643c6fd38af@mail.gmail.com> Message-ID: <3c6c07c20812221832q79295e4au7e7ba9471749e743@mail.gmail.com> I unfortunately don't have time to work out how obmalloc works myself, but I wonder if any of the constants in that file might need to scale somehow with memory size. That is, is it possible that some of them that work okay with 1G RAM won't work well with (say) 128G or 1024G (coming soon enough)? From ncoghlan at gmail.com Tue Dec 23 04:25:47 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Dec 2008 13:25:47 +1000 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812221619i5388b857vc17fc59884a3323d@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <3c6c07c20812221619i5388b857vc17fc59884a3323d@mail.gmail.com> Message-ID: <49505A3B.2000101@gmail.com> Mike Coleman wrote: > If you plot this, it is clearly quadratic (or worse). Here's another comparison script that tries to probe the vagaries of the obmalloc implementation. It looks at the proportional increases in deallocation times for lists and dicts as the number of contained items increases when using a variety of deallocation orders: - in hash order (dict) - in reverse order of allocation (list) - in order of allocation (list, reversed in place) - in random order (list, shuffled in place using the random module) I've included the final output from a run on my own machine below [1], but here are the main points I get out of it: - at the sizes I can test (up to 20 million items in the containers), this version of the script doesn't show any particularly horrible non-linearity with deallocation of dicts, lists or reversed lists. - when the items in a list are deallocated in *random* order, however, the deallocation times are highly non-linear - by the time we get to 20 million items, deallocating in random order takes nearly twice as long as deallocation in either order of allocation or in reverse order. - after the list of items had been deallocated in random order, subsequent deallocation of a dict and the list took significantly longer than when those operations took place on a comparatively "clean" obmalloc state. I'm going to try making a new version of the script that uses random integers with a consistent number of digits in place of the monotically increasing values that are currently used and see what effect that has on the dict scaling (that's where I expect to see the greatest effect, since the hash ordering is the one which will be most affected by the change to the item contents). Cheers, Nick. [1] Full final results from local test run: Dict: (Baseline=0.003135 seconds) 100000=100.0% 1000000=1020.9% 2000000=2030.5% 5000000=5026.7% 10000000=10039.7% 20000000=20086.4% List: (Baseline=0.005764 seconds) 100000=100.0% 1000000=1043.7% 2000000=2090.1% 5000000=5227.2% 10000000=10458.1% 20000000=20942.7% ReversedList: (Baseline=0.005879 seconds) 100000=100.0% 1000000=1015.0% 2000000=2023.5% 5000000=5057.1% 10000000=10114.0% 20000000=20592.6% ShuffledList: (Baseline=0.028241 seconds) 100000=100.0% 1000000=1296.0% 2000000=2877.3% 5000000=7960.1% 10000000=17216.9% 20000000=37599.9% PostShuffleDict: (Baseline=0.016229 seconds) 100000=100.0% 1000000=1007.9% 2000000=2018.4% 5000000=5075.3% 10000000=10217.5% 20000000=20873.1% PostShuffleList: (Baseline=0.020551 seconds) 100000=100.0% 1000000=1021.9% 2000000=1978.2% 5000000=4953.6% 10000000=10262.3% 20000000=19854.0% Baseline changes for Dict and List after deallocation of list in random order: Dict: 517.7% List: 356.5% -------------- next part -------------- A non-text attachment was scrubbed... Name: dealloc_timing.py Type: text/x-python Size: 2003 bytes Desc: not available URL: From alexandre at peadrop.com Tue Dec 23 04:26:29 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Mon, 22 Dec 2008 22:26:29 -0500 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <49500D80.2090201@v.loewis.de> <49501AEC.3010805@v.loewis.de> Message-ID: On Mon, Dec 22, 2008 at 7:34 PM, Antoine Pitrou wrote: > >> Now, we should find a way to benchmark this without having to steal Mike's >> machine and wait 30 minutes every time. > > So, I seem to reproduce it. The following script takes about 15 seconds to > run and allocates a 2 GB dict which it deletes at the end (gc disabled of > course). > With 2.4, deleting the dict takes ~1.2 seconds while with 2.5 and higher > (including 3.0), deleting the dict takes ~3.5 seconds. Nothing spectacular > but the difference is clear. > I modified your script to delete the dictionary without actually deallocating the items in it. You can speed up a dictionary deallocation significantly if you keep a reference to its items and delete the dictionary before deleting its items. In Python 2.4, the same behavior exists, but is not as strongly marked as in Python 2.6 with pymalloc enabled. I can understand that deallocating the items in the order (or actually, the reverse order) they were allocated is faster, than doing so in a rather haphazard manner (i.e., like dict). However, I am not sure why pymalloc accentuate this behavior. -- Alexandre Python 2.6 with pymalloc, without pydebug alex at helios:~$ python2.6 dict_dealloc_test.py creating 397476 items... -> 6.613 s. building dict... -> 0.230 s. deleting items... -> 0.059 s. deleting dict... -> 2.299 s. total deallocation time: 2.358 seconds. alex at helios:~$ python2.6 dict_dealloc_test.py creating 397476 items... -> 6.530 s. building dict... -> 0.228 s. deleting dict... -> 0.089 s. deleting items... -> 0.971 s. total deallocation time: 1.060 seconds. Python 2.6 without pymalloc, without pydebug alex at helios:release26-maint$ ./python /home/alex/dict_dealloc_test.py creating 397476 items... -> 5.921 s. building dict... -> 0.244 s. deleting items... -> 0.073 s. deleting dict... -> 1.502 s. total deallocation time: 1.586 seconds. alex at helios:release26-maint$ ./python /home/alex/dict_dealloc_test.py creating 397476 items... -> 6.122 s. building dict... -> 0.237 s. deleting dict... -> 0.092 s. deleting items... -> 1.238 s. total deallocation time: 1.330 seconds. alex at helios:~$ python2.4 dict_dealloc_test.py creating 397476 items... -> 6.164 s. building dict... -> 0.218 s. deleting items... -> 0.057 s. deleting dict... -> 1.185 s. total deallocation time: 1.243 seconds. alex at helios:~$ python2.4 dict_dealloc_test.py creating 397476 items... -> 6.202 s. building dict... -> 0.218 s. deleting dict... -> 0.090 s. deleting items... -> 0.852 s. total deallocation time: 0.943 seconds. ###### import random import time import gc # Adjust this parameter according to your system RAM! target_size = int(2.0 * 1024**3) # 2.0 GB pool_size = 4 * 1024 # This is a ballpark estimate: 60 bytes overhead for each # { dict entry struct + float object + tuple object header }, # 1.3 overallocation factor for the dict. target_length = int(target_size / (1.3 * (pool_size + 60))) def make_items(): print ("creating %d items..." % target_length) # 1. Initialize a set of pre-computed random keys. keys = [random.random() for i in range(target_length)] # 2. Build the values that will constitute the dict. Each value will, as # far as possible, span a contiguous `pool_size` memory area. # Over 256 bytes per alloc, PyObject_Malloc defers to the system malloc() # We avoid that by allocating tuples of smaller longs. int_size = 200 # 24 roughly accounts for the long object overhead (YMMV) int_start = 1 << ((int_size - 24) * 8 - 7) int_range = range(1, 1 + pool_size // int_size) values = [None] * target_length # Maximize allocation locality by pre-allocating the values for n in range(target_length): values[n] = tuple(int_start + j for j in int_range) return list(zip(keys,values)) if __name__ == "__main__": gc.disable() t1 = time.time() items = make_items() t2 = time.time() print " -> %.3f s." % (t2 - t1) print "building dict..." t1 = time.time() testdict = dict(items) t2 = time.time() print " -> %.3f s." % (t2 - t1) def delete_testdict(): global testdict print "deleting dict..." t1 = time.time() del testdict t2 = time.time() print " -> %.3f s." % (t2 - t1) def delete_items(): global items print "deleting items..." t1 = time.time() del items t2 = time.time() print " -> %.3f s." % (t2 - t1) t1 = time.time() # Swap these, and look at the total time delete_items() delete_testdict() t2 = time.time() print "total deallocation time: %.3f seconds." % (t2 - t1) From skip at pobox.com Tue Dec 23 04:56:18 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 22 Dec 2008 21:56:18 -0600 Subject: [Python-Dev] If I check something in ... In-Reply-To: <4950143B.50100@v.loewis.de> References: <18768.3678.749094.475868@montanaro-dyndns-org.local> <4950143B.50100@v.loewis.de> Message-ID: <18768.24930.356203.736710@montanaro-dyndns-org.local> Martin> Instead, you should commit it into trunk, and then run svnmerge.py three Martin> times, namely: ... Thanks for that cheat sheet. I never would have figured that out on my own. Well, at least not in a timely fashion. Skip From scott+python-dev at scottdial.com Mon Dec 22 15:47:01 2008 From: scott+python-dev at scottdial.com (Scott Dial) Date: Mon, 22 Dec 2008 09:47:01 -0500 Subject: [Python-Dev] Releasing 2.5.4 In-Reply-To: <494F6692.8000001@v.loewis.de> References: <494F6692.8000001@v.loewis.de> Message-ID: <494FA865.2050009@scottdial.com> Martin v. L?wis wrote: > It seems r67740 shouldn't have been committed. Since this > is a severe regression, I think I'll have to revert it, and > release 2.5.4 with just that change. My understanding of the problem is that clearerr() needs to be called before any FILE read operations on *some* platforms. The only platform I saw mentioned was OS X. Towards that end, I have attached a much simpler patch onto the tracker issue, which maybe somebody can verify solves the problem because I do not have access to a platform which fails the test that was originally given. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From lance.ellinghaus at eds.com Tue Dec 23 07:28:19 2008 From: lance.ellinghaus at eds.com (Ellinghaus, Lance) Date: Tue, 23 Dec 2008 01:28:19 -0500 Subject: [Python-Dev] Problems compiling 2.6.1 on Solaris 10 Message-ID: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com> I am hoping someone can assist me. I normally don't care if the _ctypes module builds or not, but I now need to have it build. I am running Solaris 10 with Sun's C compiler under SunStudio 11. After running 'configure' and 'make', the _ctypes module fails with the following error: cc -xcode=pic32 -DNDEBUG -O -I. -I/data/python/Python-2.6.1/./Include -Ibuild/temp.solaris-2.10-sun4u-2.6/libffi/include -Ibuild/temp.solaris-2.10-sun4u-2.6/libffi -I/data/python/Python-2.6.1/Modules/_ctypes/libffi/src -I/usr/local/python/include -I. -IInclude -I./Include -I/usr/local/include -I/data/python/Python-2.6.1/Include -I/data/python/Python-2.6.1 -c /data/python/Python-2.6.1/Modules/_ctypes/_ctypes.c -o build/temp.solaris-2.10-sun4u-2.6/data/python/Python-2.6.1/Modules/_ctyp es/_ctypes.o "build/temp.solaris-2.10-sun4u-2.6/libffi/include/ffi.h", line 257: syntax error before or at: __attribute__ "build/temp.solaris-2.10-sun4u-2.6/libffi/include/ffi.h", line 257: warning: old-style declaration or incorrect type for: __attribute__ "build/temp.solaris-2.10-sun4u-2.6/libffi/include/ffi.h", line 257: warning: syntax error: empty declaration "/data/python/Python-2.6.1/Modules/_ctypes/_ctypes.c", line 187: cannot recover from previous errors cc: acomp failed for /data/python/Python-2.6.1/Modules/_ctypes/_ctypes.c Is there anything special I have to do to get it to compile under Solaris 10 and SunStudio 11? BTW: I cannot use GCC. Thank you very much, Lance -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin at v.loewis.de Tue Dec 23 10:37:57 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Dec 2008 10:37:57 +0100 Subject: [Python-Dev] Releasing 2.5.4 In-Reply-To: <494FA865.2050009@scottdial.com> References: <494F6692.8000001@v.loewis.de> <494FA865.2050009@scottdial.com> Message-ID: <4950B175.1020704@v.loewis.de> > My understanding of the problem is that clearerr() needs to be called > before any FILE read operations on *some* platforms. The only platform I > saw mentioned was OS X. Towards that end, I have attached a much simpler > patch onto the tracker issue, which maybe somebody can verify solves the > problem because I do not have access to a platform which fails the test > that was originally given. Thanks. I won't then reject the patch outright, only revert it from 2.5. I can't give this a second try, as 2.5.3 was already supposed to be the last release - I don't want to find myself reverting your patch two weeks from now. Is the approach that you add a clearerr call is added for each read operation? Regards, Martin From martin at v.loewis.de Tue Dec 23 10:44:33 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Tue, 23 Dec 2008 10:44:33 +0100 Subject: [Python-Dev] Problems compiling 2.6.1 on Solaris 10 In-Reply-To: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com> References: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com> Message-ID: <4950B301.4020702@v.loewis.de> > I am hoping someone can assist me. I normally don?t care if the _ctypes > module builds or not, but I now need to have it build. > > I am running Solaris 10 with Sun?s C compiler under SunStudio 11. I don't think ctypes (rather, libffi) supports Sun C. You will need to port it (as you have already ruled out the other options, such as using gcc, or not using ctypes). Regards, Martin From ncoghlan at gmail.com Tue Dec 23 11:43:55 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Dec 2008 20:43:55 +1000 Subject: [Python-Dev] Problems compiling 2.6.1 on Solaris 10 In-Reply-To: <4950B301.4020702@v.loewis.de> References: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com> <4950B301.4020702@v.loewis.de> Message-ID: <4950C0EB.9030901@gmail.com> Martin v. L?wis wrote: >> I am hoping someone can assist me. I normally don?t care if the _ctypes >> module builds or not, but I now need to have it build. >> >> I am running Solaris 10 with Sun?s C compiler under SunStudio 11. > > I don't think ctypes (rather, libffi) supports Sun C. You will need to > port it (as you have already ruled out the other options, such as using > gcc, or not using ctypes). There is also an existing issue relating to this: http://bugs.python.org/issue2552 (although it doesn't add much beyond what Martin already said) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From rocky at gnu.org Tue Dec 23 12:55:40 2008 From: rocky at gnu.org (Rocky Bernstein) Date: Tue, 23 Dec 2008 06:55:40 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? Message-ID: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> Now that there is a package mechanism (are package mechanisms?) like zipimporter that bundle source code into a single file, should the notion of a "file" location should be adjusted to include the package and/or importer? Is there a standard API or routine which can extract this information given a code object? A use case here I am thinking of here is in a stack trace or a debugger, or a tool which wants to show in great detail information from a code object possibly via a frame. For example does this come from a zipped egg? And if so, which one? For concreteness, here is what I did and here's what I saw. Select one of the zipimporter eggs at http://code.google.com/p/pytracer and install one of these. I did this on GNU/Linux and Python 2.5 and I look at the co_filename of one of the methods: >>> import tracer >>> tracer.__dict__['size'].func_code.co_filename 'build/bdist.linux-i686/egg/tracer.py' But there is no file called "build/bdist.linux-686/egg/tracer.py" in the filesystem. Instead there is a member "tracer.py" inside /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg'. It's possible I caused this egg to get built incorrectly or that setuptools has a bug which entered that misleading information. However, shouldn't there be a standard way to untangle package location, loader and member inside the package? As best as I can tell, PEP 302 which discussed importer hooks and suggests a standard way to get file data. But it doesn't address a standard way to get container package and/or loader information. Also I'm not sure there *is* a standard print string way to show member inside a package. zipimporter may insert co_filename strings like: /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py but the trouble with this is that it means file routines have to scan the path and notice say that /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg is a *file*, not a directory. And a file stat/reading routine needs to understand what kind of packager that is in order to get tracer.py information. (Are there any file routines in place for doing this?) Thanks. From mal at egenix.com Tue Dec 23 13:47:15 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 23 Dec 2008 13:47:15 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <200812230845.42805.steve@pearwood.info> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <200812230845.42805.steve@pearwood.info> Message-ID: <4950DDD3.7030601@egenix.com> On 2008-12-22 22:45, Steven D'Aprano wrote: > On Mon, 22 Dec 2008 11:20:59 pm M.-A. Lemburg wrote: >> On 2008-12-20 23:16, Martin v. L?wis wrote: >>>>> I will try next week to see if I can come up with a smaller, >>>>> submittable example. Thanks. >>>> These long exit times are usually caused by the garbage collection >>>> of objects. This can be a very time consuming task. >>> I doubt that. The long exit times are usually caused by a bad >>> malloc implementation. >> With "garbage collection" I meant the process of Py_DECREF'ing the >> objects in large containers or deeply nested structures, not the GC >> mechanism for breaking circular references in Python. >> >> This will usually also involve free() calls, so the malloc >> implementation affects this as well. However, I've seen such long >> exit times on Linux and Windows, which both have rather good >> malloc implementations. >> >> I don't think there's anything much we can do about it at the >> interpreter level. Deleting millions of objects takes time and that's >> not really surprising at all. It takes even longer if you have >> instances with .__del__() methods written in Python. > > > This behaviour appears to be specific to deleting dicts, not deleting > random objects. I haven't yet confirmed that the problem still exists > in trunk (I hope to have time tonight or tomorrow), but in my previous > tests deleting millions of items stored in a list of tuples completed > in a minute or two, while deleting the same items stored as key:item > pairs in a dict took 30+ minutes. I say plus because I never had the > patience to let it run to completion, it could have been hours for all > I know. That's interesting. The dictionary dealloc routine doesn't give any hint as to why this should take longer than deallocating a list of tuples. However, due to the way dictionary tables are allocated, it is possible that you create a table that is nearly twice the size of the actual number of items needed by the dictionary. At those dictionary size, this can result in a lot of extra memory being allocated, certainly more than the corresponding list of tuples would use. >> Applications can choose other mechanisms for speeding up the >> exit process in various (less clean) ways, if they have a need for >> this. >> >> BTW: Rather than using a huge in-memory dict, I'd suggest to either >> use an on-disk dictionary such as the ones found in mxBeeBase or >> a database. > > The original poster's application uses 45GB of data. In my earlier > tests, I've experienced the problem with ~ 300 *megabytes* of data: > hardly what I would call "huge". Times have changed, that's true :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 23 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From scott+python-dev at scottdial.com Tue Dec 23 15:04:51 2008 From: scott+python-dev at scottdial.com (Scott Dial) Date: Tue, 23 Dec 2008 09:04:51 -0500 Subject: [Python-Dev] Releasing 2.5.4 In-Reply-To: <4950B175.1020704@v.loewis.de> References: <494F6692.8000001@v.loewis.de> <494FA865.2050009@scottdial.com> <4950B175.1020704@v.loewis.de> Message-ID: <4950F003.2060802@scottdial.com> Martin v. L?wis wrote: >> My understanding of the problem is that clearerr() needs to be called >> before any FILE read operations on *some* platforms. The only platform I >> saw mentioned was OS X. Towards that end, I have attached a much simpler >> patch onto the tracker issue, which maybe somebody can verify solves the >> problem because I do not have access to a platform which fails the test >> that was originally given. > > Thanks. I won't then reject the patch outright, only revert it from 2.5. > I can't give this a second try, as 2.5.3 was already supposed to be the > last release - I don't want to find myself reverting your patch two > weeks from now. I agree, and as far as I can tell, the bug (assuming the report is accurate) only occurs on a few platforms and since it's received little attention over the life of the issue on the tracker, I imagine it's not very important to many people. And since I don't have an effected platform to test, I can't even be sure that it really solves the bug. So, I agree leave it out. > Is the approach that you add a clearerr call is added for each read > operation? Yes, I merely added clearerr() calls just prior to first the fread, fgets, and getc calls in each of the read methods for files. I'll make a clean patch against the trunk and update the issue on the tracker, then maybe the reporter or someone else with an effected platform can verify my patch. -Scott -- Scott Dial scott at scottdial.com scodial at cs.indiana.edu From p.f.moore at gmail.com Tue Dec 23 15:06:31 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 23 Dec 2008 14:06:31 +0000 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> Message-ID: <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com> 2008/12/23 Rocky Bernstein : > Now that there is a package mechanism (are package mechanisms?) like > zipimporter that bundle source code into a single file, should the > notion of a "file" location should be adjusted to include the package > and/or importer? Check PEP 302 (http://www.python.org/dev/peps/pep-0302/) specifically the get_source (optional) method. It's not exactly what you describe, but it may help. Please note that it's optional - if you loaded the code from a zipfile containing only bytecode files, there is no source to get, so you have to be prepared for that case. But if the source is available, this should give you a way of getting to it. Paul. From ncoghlan at gmail.com Tue Dec 23 16:29:23 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Dec 2008 01:29:23 +1000 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> Message-ID: <495103D3.9000505@gmail.com> Rocky Bernstein wrote: > As best as I can tell, PEP 302 which discussed importer hooks and > suggests a standard way to get file data. But it doesn't address a > standard way to get container package and/or loader information. If a "filename" may not be an actual filename, but instead a pseduo-filename created based on the __file__ attribute of a Python module, then there are a few mechanisms for accessing it: 1. Use the package/module name and the relative path from that location, then use pkgutil.get_data to retrieve it. This has the advantage of correctly handling the case where no __loader__ attribute is present (or it is None), which can happen for standard filesystem imports. However, it only works in Python 2.6 and above (since get_data() is a new addition to pkgutil). 2. Implement your own version of pkgutil.get_data - more work, but it is the only way to get something along those lines that works for versions prior to Python 2.6 3. Do what a number of standard library APIs (e.g. linecache) that accept filenames do and also accept an optional "module globals" argument. If the globals argument is passed in and contains a "__loader__" entry, use the appropriate loader method when processing the "filename" that was passed in. > Also I'm not sure there *is* a standard print string way to show > member inside a package. zipimporter may insert co_filename strings > like: > > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py > > but the trouble with this is that it means file routines have to scan > the path and notice say that > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg is a *file*, > not a directory. And a file stat/reading routine needs to understand > what kind of packager that is in order to get tracer.py information. > > (Are there any file routines in place for doing this?) Finding a loader given only a pseudo-filename and no module is actually possible in the specific case of zipimport, but is still pretty obscure at this point in time: 1. Scan sys.path looking for an entry that matches the start of the pseudo-filename (remembering to use os.path.normpath). 2. Once such a path entry has been found, use PEP 302 to find the associated importer object (the undocumented pkgutil.get_importer function does exactly that - although, as with any undocumented feature, the promises of API compatibility across major version changes aren't as strong as they would be for an officially documented and supported interface). 3. Hope that the importer is one like zipimport that allows get_data() to be invoked directly on the importer object, rather than only providing it on a separate loader object after the module has been loaded. If it needs a real loader instead of just the importer, then you're back to the original problem of needing a module or package name (or globals dictionary) in addition to the pseudo filename. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From p.f.moore at gmail.com Tue Dec 23 16:41:56 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 23 Dec 2008 15:41:56 +0000 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <18768.63272.61558.985690@panix5.panix.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com> <18768.63272.61558.985690@panix5.panix.com> Message-ID: <79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com> 2008/12/23 : > What is wanted is a uniform way get and describe a file location > from a code object that takes into account the file might be a member > of an archive. But a code object may not have come from a file. Ignoring the interactive prompt (not because it's unimportant, just because people have a tendency to assume it's the only special case :-)) you need to consider code loaded via a PEP302 importer from (say) a sqlite database, or code created using compile(), or possibly even more esoteric means. So I'm not sure your request is clearly specified. > Are there even guidelines for saying what string goes into a code > object's co_filename? Clearly it should be related to the source code > that generated the code, and there are various conventions that seem > to exist when the code comes from an "eval" or an "exec". I'm not aware of guidelines - the documentation for compile() says "The filename argument should give the file from which the code was read; pass some recognizable value if it wasn't read from a file ('' is commonly used)" which is pretty non-commital. > But empirically it seems as though there's some variation. It could be > an absolute file or a file with no root directory specified. (But is > it possible to have things like "." and ".."?). And in the case of a > member of a package what happens? Should it be just the member without > the package? Or should it include the package name like > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ? > > Or be unspecified? If left unspecified as I gather it is now, it makes > it more important to have some sort of common routine to be able to > pick out the archive part in a filesystem from the member name inside > the archive. I think you need to be clear on *why* you want to know this information. Once it's clear what you're trying to achieve, it will be easier to say what the options are. It sounds like you're trying to propose a stronger convention, to be enforced in the future. (At least, your suggestion of producing stack traces implies that you want stack trace code not to have to deal with the current situation). When PEP 302 was being developed, we were looking at similar issues. That's why I pointed you at get_source() - it was the best we could do with all the various conflicting requirements, and the fact that it's optional is because we had to cater for cases where there simply wasn't a meaningful answer. Frankly, backward compatibility requirements kill a lot of the options here. Maybe what you want is a *pair* of linked conventions: - co_filename (or a replacement) returns a (notionally opaque, but in practice a filename for file-based cases) token representing "the file or other object the code came from" - xxx.get_source_code(token) is a function (I don't know where, xxx is a placeholder for some "suitable" module) which, given such a token, returns the source, or None if there's no viable concept of "the source". Or maybe you want a (possibly separate) attribute of a code object, which holds a string containing a human-readable (but quite possibly not machine-parseable) value representing the "place the code came from" - co_filename is essentially this at the moment, and maybe your complaint is merely that you don't find its contents sufficiently human-readable in the case of the zipimport module (in which case you might want to search some of the archives for the discussions on the constraints imposed on zipimport, because objects on sys.path must be strings and cannot be arbitrary objects...) I'm sorry if this is a little rambling. I can appreciate that there's some sort of issue that you see here, but I don't yet see any practical way of changing things that would help. And as always, there's backward compatibility to consider - existing code isn't going to change, so new code has to be prepared to handle that. I hope this is of some help, Paul. From ncoghlan at gmail.com Tue Dec 23 16:42:07 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Dec 2008 01:42:07 +1000 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <4950DDD3.7030601@egenix.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <494D6EA9.2040201@v.loewis.de> <494F862B.60701@egenix.com> <200812230845.42805.steve@pearwood.info> <4950DDD3.7030601@egenix.com> Message-ID: <495106CF.5070302@gmail.com> M.-A. Lemburg wrote: > On 2008-12-22 22:45, Steven D'Aprano wrote: >> This behaviour appears to be specific to deleting dicts, not deleting >> random objects. I haven't yet confirmed that the problem still exists >> in trunk (I hope to have time tonight or tomorrow), but in my previous >> tests deleting millions of items stored in a list of tuples completed >> in a minute or two, while deleting the same items stored as key:item >> pairs in a dict took 30+ minutes. I say plus because I never had the >> patience to let it run to completion, it could have been hours for all >> I know. > > That's interesting. The dictionary dealloc routine doesn't give > any hint as to why this should take longer than deallocating > a list of tuples. Shuffling the list with random.shuffle before deleting it makes a *massive* difference to how long the deallocation takes. Not only that, but after the shuffled list has been deallocated, deleting an unshuffled list subsequently takes significantly longer. (I posted numbers and a test script showing these effects elsewhere in the thread). The important factor seems to be deallocation order relative to allocation order. A simple list deletes objects in the reverse of the order of creation, while a reversed list deletes them in order of creation. Both of these seem to scale fairly linearly. A dict with a hash order that I believe is a fair approximation of creation order also didn't appear to exhibit particularly poor scaling (at least not within the 20 million objects I could test). The shuffled list, on the other hand, was pretty atrocious, taking nearly twice as long to be destroyed as an unshuffled list of the same size. I'd like to add another dict to the test which eliminates the current coupling between hash order and creation order, and see if it exhibits poor behaviour which is similar to that of the shuffled list, but I'm not sure when I'll get to that (probably post-Christmas). Note that I think these results are consistent with the theory that the problem lies in the way partially allocated memory pools are tracked in the obmalloc code - it makes sense that deallocating in creation order or in reverse of creation order would tend to clean up each arena in order and keep the obmalloc internal state neat and tidy, while deallocating objects effectively at random would lead to a lot of additional bookkeeping as the "most used" and "least used" arenas change over the course of the deallocation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From p.f.moore at gmail.com Tue Dec 23 17:00:25 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 23 Dec 2008 16:00:25 +0000 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <495103D3.9000505@gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <495103D3.9000505@gmail.com> Message-ID: <79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.com> 2008/12/23 Nick Coghlan : > Finding a loader given only a pseudo-filename and no module is actually > possible in the specific case of zipimport, but is still pretty obscure > at this point in time: > > 1. Scan sys.path looking for an entry that matches the start of the > pseudo-filename (remembering to use os.path.normpath). > > 2. Once such a path entry has been found, use PEP 302 to find the > associated importer object (the undocumented pkgutil.get_importer > function does exactly that - although, as with any undocumented feature, > the promises of API compatibility across major version changes aren't as > strong as they would be for an officially documented and supported > interface). > > 3. Hope that the importer is one like zipimport that allows get_data() > to be invoked directly on the importer object, rather than only > providing it on a separate loader object after the module has been > loaded. If it needs a real loader instead of just the importer, then > you're back to the original problem of needing a module or package name > (or globals dictionary) in addition to the pseudo filename. There were lots of proposals tossed around on python-dev at the time PEP 302 was being developed, which might have made all this easier. Most, if not all, were killed by backward compatibility requirements. I have some hopes that when Brett completes his "import in Python" work, that will add sufficient flexibility to allow people to experiment with all of this machinery, and ultimately maybe move forward with a more modular import mechanism. But the timescales for Brett's changes won't be until at least Python 3.1, and it'll be a release or two after that before any significant change can be eased in in a compatible manner. That's going to take a lot of energy on someone's part. Paul. PS One of these days, I'm going to write an insanely useful importer which takes the least-convenient option wherever PEP 302 allows flexibility. It'll be adopted by everyone because it's so great, and all the software that currently makes unwarranted assumptions about importers will break and get fixed to support it because otherwise its users will rebel, and we'll live in a paradise where everything follows the specs to the letter. Oh, yes, and I'm going to win the lottery every week for the next month :-) PPS Seriously, setuptools and the adoptions of eggs has pushed a lot of code to be much more careful about unwarranted assumptions that code lives in the filesystem. That's an incredibly good thing, and very hard to do right (witness the setuptools "zip_safe" parameter which acts as a get-out clause). Much kudos to setuptools for getting as far as it has. From rocky at gnu.org Tue Dec 23 15:35:20 2008 From: rocky at gnu.org (rocky at gnu.org) Date: Tue, 23 Dec 2008 09:35:20 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com> Message-ID: <18768.63272.61558.985690@panix5.panix.com> Paul Moore writes: > 2008/12/23 Rocky Bernstein : > > Now that there is a package mechanism (are package mechanisms?) like > > zipimporter that bundle source code into a single file, should the > > notion of a "file" location should be adjusted to include the package > > and/or importer? > > Check PEP 302 (http://www.python.org/dev/peps/pep-0302/) specifically > the get_source (optional) method. Yes, that's one of the things I was thinking when I wrote: As best as I can tell, PEP 302 which discussed importer hooks and suggests a standard way to get file data. And by "suggests" I meant was implying that yes I know this is optional. > It's not exactly what you describe, > but it may help. Yes, it's not exactly what is desired. > Please note that it's optional - if you loaded the > code from a zipfile containing only bytecode files, there is no source > to get, so you have to be prepared for that case. But if the source is > available, this should give you a way of getting to it. What is wanted is a uniform way get and describe a file location from a code object that takes into account the file might be a member of an archive. Are there even guidelines for saying what string goes into a code object's co_filename? Clearly it should be related to the source code that generated the code, and there are various conventions that seem to exist when the code comes from an "eval" or an "exec". But empirically it seems as though there's some variation. It could be an absolute file or a file with no root directory specified. (But is it possible to have things like "." and ".."?). And in the case of a member of a package what happens? Should it be just the member without the package? Or should it include the package name like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ? Or be unspecified? If left unspecified as I gather it is now, it makes it more important to have some sort of common routine to be able to pick out the archive part in a filesystem from the member name inside the archive. > > Paul. > From pje at telecommunity.com Tue Dec 23 17:19:52 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 23 Dec 2008 11:19:52 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com > References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> Message-ID: <20081223161810.B3B513A409D@sparrow.telecommunity.com> At 06:55 AM 12/23/2008 -0500, Rocky Bernstein wrote: >Now that there is a package mechanism (are package mechanisms?) like >zipimporter that bundle source code into a single file, should the >notion of a "file" location should be adjusted to include the package >and/or importer? > >Is there a standard API or routine which can extract this information >given a code object? The inspect module (in 2.5 and up) supports retrieving the source lines for any object that has module globals. So you could do it for a class, a function, a method, module-level code, or even a frame, but not for a standalone code object. I believe there are also certain inspect module APIs that will return a pseudo-filename, i.e. the zipfile name followed by the path within the zipfile. >Also I'm not sure there *is* a standard print string way to show >member inside a package. zipimporter may insert co_filename strings >like: > > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py AFAIK, it'll only do this if the zipfile doesn't contain a usable .pyc or .pyo. Ordinarily, co_filename will be the name of the original source file before the zipfile was created. From pje at telecommunity.com Tue Dec 23 17:29:22 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 23 Dec 2008 11:29:22 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.co m> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <495103D3.9000505@gmail.com> <79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.com> Message-ID: <20081223162739.BA5E83A409D@sparrow.telecommunity.com> At 04:00 PM 12/23/2008 +0000, Paul Moore wrote: >PPS Seriously, setuptools and the adoptions of eggs has pushed a lot >of code to be much more careful about unwarranted assumptions that >code lives in the filesystem. That's an incredibly good thing, and >very hard to do right (witness the setuptools "zip_safe" parameter >which acts as a get-out clause). Much kudos to setuptools for getting >as far as it has. And ironically, if I ever get the time to actually work on a new version of easy_install (as opposed to perpetually tweaking the old one), the default zipping and default sys.path munging will be among the first things to go. ;-) Ironically, my choice of isolated directories and zipfiles for quick-and-dirty uninstall support has ended up costing far too much, compared to if I'd just taken the time to design a decent uninstall feature. Of course, hindsight is 20-20; in order to fully understand the requirements of a problem, you sometimes have to get a rather long way towards solving it the simple, obvious... and wrong way. (And, it didn't help that I had significant time constraints pushing me in the direction of the Seemingly-Simplest-At-The-Moment Thing That Could Possibly Work.) From rocky at panix.com Tue Dec 23 17:36:48 2008 From: rocky at panix.com (R. Bernstein) Date: Tue, 23 Dec 2008 11:36:48 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com> <18768.63272.61558.985690@panix5.panix.com> <79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com> Message-ID: <18769.5024.990970.46864@panix5.panix.com> Paul Moore writes: > 2008/12/23 : > > What is wanted is a uniform way get and describe a file location > > from a code object that takes into account the file might be a member > > of an archive. > > But a code object may not have come from a file. Right. That's why I mentioned for example "eval" and "exec" that you cite below. So remove the "file" in what is cited above. Replace with: "a unform way to get information (not necessarily just the source text) about the location/origin of code from a code object. > Ignoring the > interactive prompt (not because it's unimportant, just because people > have a tendency to assume it's the only special case :-)) you need to > consider code loaded via a PEP302 importer from (say) a sqlite > database, or code created using compile(), or possibly even more > esoteric means. > > So I'm not sure your request is clearly specified. Is the above any more clear? > > > Are there even guidelines for saying what string goes into a code > > object's co_filename? Clearly it should be related to the source code > > that generated the code, and there are various conventions that seem > > to exist when the code comes from an "eval" or an "exec". > > I'm not aware of guidelines - the documentation for compile() says > "The filename argument should give the file from which the code was > read; pass some recognizable value if it wasn't read from a file > ('' is commonly used)" which is pretty non-commital. > > > But empirically it seems as though there's some variation. It could be > > an absolute file or a file with no root directory specified. (But is > > it possible to have things like "." and ".."?). And in the case of a > > member of a package what happens? Should it be just the member without > > the package? Or should it include the package name like > > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ? > > > > Or be unspecified? If left unspecified as I gather it is now, it makes > > it more important to have some sort of common routine to be able to > > pick out the archive part in a filesystem from the member name inside > > the archive. > > I think you need to be clear on *why* you want to know this > information. Once it's clear what you're trying to achieve, it will be > easier to say what the options are. This is what I wrote originally (slightly modified): A use case here I am thinking of here is in a stack trace or a debugger, or a tool which wants to show in great detail, information from a code object obtained possibly via a frame object. I find it kind of sucky to see in a traceback: "" as opposed to the text (or prefix of the text) of the actual string that was passed. Or something that has been referred to as a "pseudo-file" like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py when it is really member foo/bar.py of zipped egg /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg. (As a separate issue, it seems that zipimporter file locations inside setuptools may have a problem.) Inside a debugger or an IDE, it is conceivable a person might want loader, and module information, and if the code is part of an archive file, then member information. (If part of an eval string then, the eval string.) > > It sounds like you're trying to propose a stronger convention, to be > enforced in the future. Well, I wasn't sure if there was one. But I gather from what you write, there isn't. :-) Yes, I would suggest a stronger convention. Or a more up-front statement that none is desired/forthcoming. > (At least, your suggestion of producing stack > traces implies that you want stack trace code not to have to deal with > the current situation). When PEP 302 was being developed, we were > looking at similar issues. That's why I pointed you at get_source() - > it was the best we could do with all the various conflicting > requirements, and the fact that it's optional is because we had to > cater for cases where there simply wasn't a meaningful answer. > Frankly, backward compatibility requirements kill a lot of the options > here. > > Maybe what you want is a *pair* of linked conventions: > > - co_filename (or a replacement) returns a (notionally opaque, but > in practice a filename for file-based cases) token representing "the > file or other object the code came from" This would be nice. > - xxx.get_source_code(token) is a function (I don't know where, > xxx is a placeholder for some "suitable" module) which, given such a > token, returns the source, or None if there's no viable concept of > "the source". There always is a viable concept of a source. It's whatever was done to get the code. For example, if it was via an eval then the source was the eval function and a string, same for exec. If it's via database access, well that then and some summary info about what's known about that. > > Or maybe you want a (possibly separate) attribute of a code object, > which holds a string containing a human-readable (but quite possibly > not machine-parseable) value representing the "place the code came > from" - co_filename is essentially this at the moment, and maybe your > complaint is merely that you don't find its contents sufficiently > human-readable in the case of the zipimport module (in which case you > might want to search some of the archives for the discussions on the > constraints imposed on zipimport, because objects on sys.path must be > strings and cannot be arbitrary objects...) There are two problems. One is displaying location information in an unambiguous way -- the pseudo-file above is ambiguous and so is since there's no guarentee that OS's make to not name a file that. The second problem is programmatically getting information such as a debugger or an IDE might do so that the information can be conveyed back to a user who might want to inspect surrounding source code or modules. > > I'm sorry if this is a little rambling. I can appreciate that there's > some sort of issue that you see here, but I don't yet see any > practical way of changing things that would help. And as always, > there's backward compatibility to consider - existing code isn't going > to change, so new code has to be prepared to handle that. > > I hope this is of some help, Yes, thanks. At least I now have a clearer idea of the state of where things stand. > Paul. > From p.f.moore at gmail.com Tue Dec 23 17:55:36 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 23 Dec 2008 16:55:36 +0000 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <18769.5024.990970.46864@panix5.panix.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com> <18768.63272.61558.985690@panix5.panix.com> <79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com> <18769.5024.990970.46864@panix5.panix.com> Message-ID: <79990c6b0812230855u1af71ee3pc396fbe2782cdc9f@mail.gmail.com> 2008/12/23 R. Bernstein : > A use case here I am thinking of here is in a stack trace or a > debugger, or a tool which wants to show in great detail, information > from a code object obtained possibly via a frame object. Thanks for the clarifications. I see what you're after much better now. > I find it kind of sucky to see in a traceback: "" as opposed > to the text (or prefix of the text) of the actual string that was > passed. Or something that has been referred to as a "pseudo-file" like > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py > when it is really member foo/bar.py of zipped egg > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg. Fair comment. That points to a "human readable" type of string. It's not available at the moment, but I guess it could be. But see below. > > - xxx.get_source_code(token) is a function (I don't know where, > > xxx is a placeholder for some "suitable" module) which, given such a > > token, returns the source, or None if there's no viable concept of > > "the source". > > There always is a viable concept of a source. It's whatever was done > to get the code. For example, if it was via an eval then the source > was the eval function and a string, same for exec. If it's via > database access, well that then and some summary info about what's > known about that. Hmm, "source" colloquially, yes "bytecode loaded from ....\xxx.pyc", for example. But not "source" in the sense of "source code". Some applications run with only bytecode shipped, no source code available at all. > There are two problems. One is displaying location information in an > unambiguous way -- the pseudo-file above is ambiguous and so is > since there's no guarentee that OS's make to not name a file > that. The second problem is programmatically getting information such > as a debugger or an IDE might do so that the information can be > conveyed back to a user who might want to inspect surrounding source > code or modules. This is more than you were asking for above. The first problem is addressed with a "human readable" (narrative) description, as above. The second, however, requires machine-readable access to source code (if it exists). That's what the loader get_source() call does for you. But you have to be prepared for the fact that it may not be possible to get source code, and decide what you want to happen in that case. > > I hope this is of some help, > > Yes, thanks. At least I now have a clearer idea of the state of > where things stand. Good. Sorry it's not better news :-) Paul From rocky at panix.com Tue Dec 23 17:55:00 2008 From: rocky at panix.com (R. Bernstein) Date: Tue, 23 Dec 2008 11:55:00 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <495103D3.9000505@gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <495103D3.9000505@gmail.com> Message-ID: <18769.6116.425537.968778@panix5.panix.com> Nick Coghlan writes: > 3. Do what a number of standard library APIs (e.g. linecache) that > accept filenames do and also accept an optional "module globals" > argument. Actually, I did this and committed a change (to pydb) before posting any of these queries. ;-) If "a number of standard library APIs" are doing the *same* thing, then shouldn't this exposed as a common routine? If on the other hand, by "a number" you mean "one" as in linecache -- 1 *is* a number too! -- then perhaps the relevant code that is buried inside the "updatecache" should be exposed on its own. (As a side benefit that code can be tested separately too!) Should I file a feature request for this? From lance.ellinghaus at eds.com Tue Dec 23 18:03:02 2008 From: lance.ellinghaus at eds.com (Ellinghaus, Lance) Date: Tue, 23 Dec 2008 12:03:02 -0500 Subject: [Python-Dev] Problems compiling 2.6.1 on Solaris 10 In-Reply-To: <4950B301.4020702@v.loewis.de> References: <752A61D5C34D41478E638FC92AF9051B035635A5@usahm207.amer.corp.eds.com> <4950B301.4020702@v.loewis.de> Message-ID: <752A61D5C34D41478E638FC92AF9051B035636C9@usahm207.amer.corp.eds.com> Martin, Thank you very much. At least I know what I need to do now. > From: "Martin v. L?wis" [mailto:martin at v.loewis.de] > I don't think ctypes (rather, libffi) supports Sun C. You will need to > port it (as you have already ruled out the other options, such as using > gcc, or not using ctypes). Lance From tutufan at gmail.com Tue Dec 23 18:54:11 2008 From: tutufan at gmail.com (Mike Coleman) Date: Tue, 23 Dec 2008 11:54:11 -0600 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com> <3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com> Message-ID: <3c6c07c20812230954h216d784w183ca8952d89c793@mail.gmail.com> On Sat, Dec 20, 2008 at 6:22 PM, Mike Coleman wrote: > Re "held" and "intern_it": Haha! That's evil and extremely evil, > respectively. :-) P.S. I tried the "held" idea out (interning integers in a list), and unfortunately it didn't make that much difference. In the example I tried, there were 104465178 instances of integers from range(33467). I guess if ints are 12 bytes (per Beazley's book, but not sure if that still holds), then that would correspond to a 1GB reduction. Judging by 'top', it might have been 2 or 3GB instead, from a total of 45G. Mike From tutufan at gmail.com Tue Dec 23 18:59:21 2008 From: tutufan at gmail.com (Mike Coleman) Date: Tue, 23 Dec 2008 11:59:21 -0600 Subject: [Python-Dev] suggest change to "Failed to find the necessary bits to build these modules" message Message-ID: <3c6c07c20812230959r4185d1act3a27b4dc02a4a82d@mail.gmail.com> I was thrown by the "Failed to find the necessary bits to build these modules" message at the end of newer Python builds, and thought that this indicated that the Python executable itself was not built. That's arguably stupidity on my part, but I wonder if others will not trip on this, too. Would it be possible to change this wording slightly, to something like Python built, but failed to find the necessary bits to build these modules ? From brett at python.org Tue Dec 23 19:12:09 2008 From: brett at python.org (Brett Cannon) Date: Tue, 23 Dec 2008 10:12:09 -0800 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <495103D3.9000505@gmail.com> <79990c6b0812230800h7ac9ddb1me14733224fe7c53a@mail.gmail.com> Message-ID: On Tue, Dec 23, 2008 at 08:00, Paul Moore wrote: > 2008/12/23 Nick Coghlan : >> Finding a loader given only a pseudo-filename and no module is actually >> possible in the specific case of zipimport, but is still pretty obscure >> at this point in time: >> >> 1. Scan sys.path looking for an entry that matches the start of the >> pseudo-filename (remembering to use os.path.normpath). >> >> 2. Once such a path entry has been found, use PEP 302 to find the >> associated importer object (the undocumented pkgutil.get_importer >> function does exactly that - although, as with any undocumented feature, >> the promises of API compatibility across major version changes aren't as >> strong as they would be for an officially documented and supported >> interface). >> >> 3. Hope that the importer is one like zipimport that allows get_data() >> to be invoked directly on the importer object, rather than only >> providing it on a separate loader object after the module has been >> loaded. If it needs a real loader instead of just the importer, then >> you're back to the original problem of needing a module or package name >> (or globals dictionary) in addition to the pseudo filename. > > There were lots of proposals tossed around on python-dev at the time > PEP 302 was being developed, which might have made all this easier. > Most, if not all, were killed by backward compatibility requirements. > > I have some hopes that when Brett completes his "import in Python" > work, that will add sufficient flexibility to allow people to > experiment with all of this machinery, and ultimately maybe move > forward with a more modular import mechanism. I have actually made a good amount of progress as of late. It's a New Years resolution to get importlib done, but I am actually aiming for before January 1 (sans the damn compile() problem I am having).This goal does ignore everything but a compatible __import__, though. > But the timescales for > Brett's changes won't be until at least Python 3.1, and it'll be a > release or two after that before any significant change can be eased > in in a compatible manner. I suspect that any import work will be a Pending/DeprecationWarning deal, so 3.3 would be the first version that could have any real changes as the default. > That's going to take a lot of energy on > someone's part. That would be me. =) After importlib is finished I have a couple of PEPs planned plus properly documenting how the import machinery works in the language spec. And I suspect this will lead to some discussions about things, e.g. requirements of the format for __file__ and __path__ in regards to when they point inside of an archive, etc. -Brett From brett at python.org Tue Dec 23 19:13:17 2008 From: brett at python.org (Brett Cannon) Date: Tue, 23 Dec 2008 10:13:17 -0800 Subject: [Python-Dev] suggest change to "Failed to find the necessary bits to build these modules" message In-Reply-To: <3c6c07c20812230959r4185d1act3a27b4dc02a4a82d@mail.gmail.com> References: <3c6c07c20812230959r4185d1act3a27b4dc02a4a82d@mail.gmail.com> Message-ID: On Tue, Dec 23, 2008 at 09:59, Mike Coleman wrote: > I was thrown by the "Failed to find the necessary bits to build these > modules" message at the end of newer Python builds, and thought that > this indicated that the Python executable itself was not built. > That's arguably stupidity on my part, but I wonder if others will not > trip on this, too. > > Would it be possible to change this wording slightly, to something like > > Python built, but failed to find the necessary bits to build these modules > > ? Sounds reasonable to me. Can you file a bug report at bugs.python.org, Mike, so this doesn't get lost? -Brett From tutufan at gmail.com Tue Dec 23 19:22:42 2008 From: tutufan at gmail.com (Mike Coleman) Date: Tue, 23 Dec 2008 12:22:42 -0600 Subject: [Python-Dev] suggest change to "Failed to find the necessary bits to build these modules" message In-Reply-To: References: <3c6c07c20812230959r4185d1act3a27b4dc02a4a82d@mail.gmail.com> Message-ID: <3c6c07c20812231022y53f267bcg1c86339b30b0e074@mail.gmail.com> Done: http://bugs.python.org/issue4731 On Tue, Dec 23, 2008 at 12:13 PM, Brett Cannon wrote: > On Tue, Dec 23, 2008 at 09:59, Mike Coleman wrote: >> I was thrown by the "Failed to find the necessary bits to build these >> modules" message at the end of newer Python builds, and thought that >> this indicated that the Python executable itself was not built. >> That's arguably stupidity on my part, but I wonder if others will not >> trip on this, too. >> >> Would it be possible to change this wording slightly, to something like >> >> Python built, but failed to find the necessary bits to build these modules >> >> ? > > Sounds reasonable to me. Can you file a bug report at bugs.python.org, > Mike, so this doesn't get lost? > > -Brett > From martin at v.loewis.de Tue Dec 23 21:20:40 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Dec 2008 21:20:40 +0100 Subject: [Python-Dev] [ANN] Python 2.5.4 (final) Message-ID: <49514818.7060103@v.loewis.de> On behalf of the Python development team and the Python community, I'm happy to announce the release of Python 2.5.4 (final). Python 2.5.3 unfortunately contained an incorrect patch that could cause interpreter crashes; the only change in Python 2.5.4 relative to 2.5.4 is the reversal of this patch. 2.5.4 is the last bug fix release of Python 2.5. Future 2.5.x releases will only include security fixes. According to the release notes, about 80 bugs and patches have been addressed since Python 2.5.2, many of them improving the stability of the interpreter, and improving its portability. See the release notes at the website (also available as Misc/NEWS in the source distribution) for details of bugs fixed; most of them prevent interpreter crashes (and now cause proper Python exceptions in cases where the interpreter may have crashed before). For more information on Python 2.5.4, including download links for various platforms, release notes, and known issues, please see: http://www.python.org/2.5.4 Highlights of the previous major Python releases are available from the Python 2.5 page, at http://www.python.org/2.5/highlights.html Enjoy this release, Martin Martin v. Loewis martin at v.loewis.de Python Release Manager (on behalf of the entire python-dev team) From chambon.pascal at wanadoo.fr Tue Dec 23 21:55:10 2008 From: chambon.pascal at wanadoo.fr (Pascal Chambon) Date: Tue, 23 Dec 2008 21:55:10 +0100 Subject: [Python-Dev] Hello everyone + little question around Cpython/stackless In-Reply-To: <4950137E.8040506@v.loewis.de> References: <49500B86.1070605@wanadoo.fr> <4950137E.8040506@v.loewis.de> Message-ID: <4951502E.2030805@wanadoo.fr> Allright then, I understand the problem... Thanks a lot, regards, Pascal > From kristjan at ccpgames.com Tue Dec 23 22:08:16 2008 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Tue, 23 Dec 2008 21:08:16 +0000 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <49501AEC.3010805@v.loewis.de> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <49500D80.2090201@v.loewis.de> <49501AEC.3010805@v.loewis.de> Message-ID: <930F189C8A437347B80DF2C156F7EC7F04D1702E12@exchis.ccp.ad.local> I'd like to suggest here, if you are giving this code a facelift, that on Windows you use VirtualAlloc and friends to allocate the arenas. This gives you the most direct access to the VM manager and makes sure that a release arena is immediately availible to the rest of the system. It also makes sure that you don't mess with the regular heap and fragment it. Kristj?n -----Original Message----- From: python-dev-bounces+kristjan=ccpgames.com at python.org [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] On Behalf Of "Martin v. L?wis" Sent: 22. desember 2008 22:56 To: Antoine Pitrou Cc: python-dev at python.org Subject: Re: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) >> Allocation of a new pool would have to do a linear search in these >> pointers (finding the arena with the least number of pools); > > You mean the least number of free pools, right? Correct. From martin at v.loewis.de Tue Dec 23 22:52:31 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Dec 2008 22:52:31 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <930F189C8A437347B80DF2C156F7EC7F04D1702E12@exchis.ccp.ad.local> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <3c6c07c20812201709t847f550r25bbad5835961fa7@mail.gmail.com> <3c6c07c20812221001l29129efj401d1e8b543db427@mail.gmail.com> <49500D80.2090201@v.loewis.de> <49501AEC.3010805@v.loewis.de> <930F189C8A437347B80DF2C156F7EC7F04D1702E12@exchis.ccp.ad.local> Message-ID: <49515D9F.4020207@v.loewis.de> > I'd like to suggest here, if you are giving this code a facelift, > that on Windows you use VirtualAlloc and friends to allocate the > arenas. This gives you the most direct access to the VM manager and > makes sure that a release arena is immediately availible to the rest > of the system. It also makes sure that you don't mess with the > regular heap and fragment it. While I'd like to see this done myself, I believe it is independent from the problem at hand. Contributions are welcome. Regards, Martin From tjreedy at udel.edu Tue Dec 23 23:03:51 2008 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 23 Dec 2008 17:03:51 -0500 Subject: [Python-Dev] [ANN] Python 2.5.4 (final) In-Reply-To: <49514818.7060103@v.loewis.de> References: <49514818.7060103@v.loewis.de> Message-ID: Martin v. L?wis wrote: > For more information on Python 2.5.4, including download > links for various platforms, release notes, and known issues, please > see: > > http://www.python.org/2.5.4 http://www.python.org/download/releases/2.5.4/ From ncoghlan at gmail.com Tue Dec 23 23:29:36 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Dec 2008 08:29:36 +1000 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <18769.6116.425537.968778@panix5.panix.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <495103D3.9000505@gmail.com> <18769.6116.425537.968778@panix5.panix.com> Message-ID: <49516650.7010005@gmail.com> R. Bernstein wrote: > Nick Coghlan writes: > > 3. Do what a number of standard library APIs (e.g. linecache) that > > accept filenames do and also accept an optional "module globals" > > argument. > > Actually, I did this and committed a change (to pydb) before posting > any of these queries. ;-) > > If "a number of standard library APIs" are doing the *same* thing, > then shouldn't this exposed as a common routine? > > If on the other hand, by "a number" you mean "one" as in linecache -- > 1 *is* a number too! -- then perhaps the relevant code that is buried > inside the "updatecache" should be exposed on its own. (As a side > benefit that code can be tested separately too!) > > Should I file a feature request for this? The reason for my slightly odd phrasing is that all of the examples I was originally going to mention (traceback, pdb, doctest, inspect) actually all end up calling linecache to do the heavy lifting. So it is possible that linecache.getlines() actually *is* the common routine you're looking for - it just needs to be added to the documentation and the __all__ attribute for linecache to be officially supported. Currently, only the single line getline() function is documented and exposed via __all__, but I don't see any reason for that restriction - linecache.getlines() has been there with a stable API since at least Python 2.5. For cases where you have an appropriate Python object (i.e. a module, function, method, class, traceback, frame or code object) rather than a pseudo-filename, then inspect.getsource() actually jumps through a lot of hoops to try to find the actual source code for that object - in those cases, using the appropriate inspect function is generally a much better idea than trying to interpret __file__ yourself. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From martin at v.loewis.de Tue Dec 23 23:43:45 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 23 Dec 2008 23:43:45 +0100 Subject: [Python-Dev] [ANN] Python 2.5.4 (final) In-Reply-To: References: <49514818.7060103@v.loewis.de> Message-ID: <495169A1.6000205@v.loewis.de> >> For more information on Python 2.5.4, including download >> links for various platforms, release notes, and known issues, please >> see: >> >> http://www.python.org/2.5.4 > > http://www.python.org/download/releases/2.5.4/ Thanks for pointing that out; the original URL now also works as well (as it does for all other releases). Regards, Martin From steve at pearwood.info Wed Dec 24 00:39:30 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 24 Dec 2008 10:39:30 +1100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <200812202155.28024.steve@pearwood.info> Message-ID: <200812241039.31452.steve@pearwood.info> On Sun, 21 Dec 2008 06:45:11 am Antoine Pitrou wrote: > Steven D'Aprano pearwood.info> writes: > > In November 2007, a similar problem was reported on the > > comp.lang.python newsgroup. 370MB was large enough to demonstrate > > the problem. I don't know if a bug was ever reported. > > Do you still reproduce it on trunk? > I've tried your scripts on my machine and they work fine, even if I > leave garbage collecting enabled during the process. > (dual core 64-bit machine but in 32-bit mode) I'm afraid that sometime over the last year, I replaced my computer's motherboard, and now I can't reproduce the behaviour at all. I've tried two different boxes, with both Python 2.6.1 and 2.5.1. -- Steven D'Aprano From rocky at panix.com Wed Dec 24 05:22:09 2008 From: rocky at panix.com (R. Bernstein) Date: Tue, 23 Dec 2008 23:22:09 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <49516650.7010005@gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <495103D3.9000505@gmail.com> <18769.6116.425537.968778@panix5.panix.com> <49516650.7010005@gmail.com> Message-ID: <18769.47345.382346.169427@panix5.panix.com> Nick Coghlan writes: > R. Bernstein wrote: > > Nick Coghlan writes: > > > 3. Do what a number of standard library APIs (e.g. linecache) that > > > accept filenames do and also accept an optional "module globals" > > > argument. > > > > Actually, I did this and committed a change (to pydb) before posting > > any of these queries. ;-) > > > > If "a number of standard library APIs" are doing the *same* thing, > > then shouldn't this exposed as a common routine? > > > > If on the other hand, by "a number" you mean "one" as in linecache -- > > 1 *is* a number too! -- then perhaps the relevant code that is buried > > inside the "updatecache" should be exposed on its own. (As a side > > benefit that code can be tested separately too!) > > > > Should I file a feature request for this? > > The reason for my slightly odd phrasing is that all of the examples I > was originally going to mention (traceback, pdb, doctest, inspect) > actually all end up calling linecache to do the heavy lifting. > > So it is possible that linecache.getlines() actually *is* the common > routine you're looking for I never asked about getting the text lines for the source code, no matter how many times people suggest that as an alternative. :-) Instead, I was asking about a common way to get information about the source location for say a frame or traceback object (which might include package name and type) and suggest that there should be a more unambiguous way to display this information than seems to be in use at present. Part of work to retrieve or displaying that information has to do the some of the same things that is inside of linecache.updatecache() *before* it retrieves the lines of the source code (when possible). And possibly parts of it include parts of what's done in pieces of the inspect module. > - it just needs to be added to the > documentation and the __all__ attribute for linecache to be officially > supported. Currently, only the single line getline() function is > documented and exposed via __all__, but I don't see any reason for that > restriction - linecache.getlines() has been there with a stable API > since at least Python 2.5. > > For cases where you have an appropriate Python object (i.e. a module, > function, method, class, traceback, frame or code object) rather than a > pseudo-filename, then inspect.getsource() actually jumps through a lot > of hoops to try to find the actual source code for that object - in > those cases, using the appropriate inspect function is generally a much > better idea than trying to interpret __file__ yourself. > > Cheers, > Nick. Thanks for the information. I will keep in mind those inspect routines. They probably will be a helpful for another problem I had been wondering about -- how one can determine if there is no code associated at a given a line and file. (In other words and invalid location for a debugger line breakpoint, such as because the line part of a comment or the interior line of a string that spans many lines) > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/rocky%40gnu.org > From steve at holdenweb.com Wed Dec 24 05:37:13 2008 From: steve at holdenweb.com (Steve Holden) Date: Tue, 23 Dec 2008 23:37:13 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <18769.47345.382346.169427@panix5.panix.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <495103D3.9000505@gmail.com> <18769.6116.425537.968778@panix5.panix.com> <49516650.7010005@gmail.com> <18769.47345.382346.169427@panix5.panix.com> Message-ID: R. Bernstein wrote: > Nick Coghlan writes: > > R. Bernstein wrote: > > > Nick Coghlan writes: > > > > 3. Do what a number of standard library APIs (e.g. linecache) that > > > > accept filenames do and also accept an optional "module globals" > > > > argument. > > > > > > Actually, I did this and committed a change (to pydb) before posting > > > any of these queries. ;-) > > > > > > If "a number of standard library APIs" are doing the *same* thing, > > > then shouldn't this exposed as a common routine? > > > > > > If on the other hand, by "a number" you mean "one" as in linecache -- > > > 1 *is* a number too! -- then perhaps the relevant code that is buried > > > inside the "updatecache" should be exposed on its own. (As a side > > > benefit that code can be tested separately too!) > > > > > > Should I file a feature request for this? > > > > The reason for my slightly odd phrasing is that all of the examples I > > was originally going to mention (traceback, pdb, doctest, inspect) > > actually all end up calling linecache to do the heavy lifting. > > > > So it is possible that linecache.getlines() actually *is* the common > > routine you're looking for > > I never asked about getting the text lines for the source code, no > matter how many times people suggest that as an alternative. :-) > > Instead, I was asking about a common way to get information about the > source location for say a frame or traceback object (which might > include package name and type) and suggest that there should be a more > unambiguous way to display this information than seems to be in use at > present. > I agree. Since PEP 302 many parts of Python are rather too file-centric for my liking. I notes almost four years ago, for example, that the interpreter assumes that the os module will be imported from filestore in order to set the prefix. This issue appears to have received no attention since, and I'm certainly not the one with the best skills or knowledge to solve this problem. http://bugs.python.org/issue1116520 > Part of work to retrieve or displaying that information has to do the > some of the same things that is inside of linecache.updatecache() > *before* it retrieves the lines of the source code (when > possible). And possibly parts of it include parts of what's done in > pieces of the inspect module. > > > - it just needs to be added to the > > documentation and the __all__ attribute for linecache to be officially > > supported. Currently, only the single line getline() function is > > documented and exposed via __all__, but I don't see any reason for that > > restriction - linecache.getlines() has been there with a stable API > > since at least Python 2.5. > > > > For cases where you have an appropriate Python object (i.e. a module, > > function, method, class, traceback, frame or code object) rather than a > > pseudo-filename, then inspect.getsource() actually jumps through a lot > > of hoops to try to find the actual source code for that object - in > > those cases, using the appropriate inspect function is generally a much > > better idea than trying to interpret __file__ yourself. > > > > Cheers, > > Nick. > > Thanks for the information. I will keep in mind those inspect routines. > > They probably will be a helpful for another problem I had been > wondering about -- how one can determine if there is no code > associated at a given a line and file. (In other words and invalid > location for a debugger line breakpoint, such as because the line > part of a comment or the interior line of a string that spans many > lines) > Looks like that start of some necessary attention to this issue. The inspect module might indeed offer the right facilities. I'm still wondering what we do about the various prefix settings in an environment where there are no filestore imports at all. In the event I can assist feel free to rope me in. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From rocky at gnu.org Wed Dec 24 06:03:37 2008 From: rocky at gnu.org (rocky at gnu.org) Date: Wed, 24 Dec 2008 00:03:37 -0500 Subject: [Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? In-Reply-To: <79990c6b0812230855u1af71ee3pc396fbe2782cdc9f@mail.gmail.com> References: <6cd6de210812230355w594dcda8t4beb389a18faa33@mail.gmail.com> <79990c6b0812230606k679234ebwc7b8e6d03232b23f@mail.gmail.com> <18768.63272.61558.985690@panix5.panix.com> <79990c6b0812230741u12aa01abq93bdf7fb7b7db8f9@mail.gmail.com> <18769.5024.990970.46864@panix5.panix.com> <79990c6b0812230855u1af71ee3pc396fbe2782cdc9f@mail.gmail.com> Message-ID: <18769.49833.290713.414067@panix5.panix.com> Paul Moore writes: > 2008/12/23 R. Bernstein : > > A use case here I am thinking of here is in a stack trace or a > > debugger, or a tool which wants to show in great detail, information > > from a code object obtained possibly via a frame object. > > Thanks for the clarifications. I see what you're after much better now. > > > I find it kind of sucky to see in a traceback: "" as opposed > > to the text (or prefix of the text) of the actual string that was > > passed. Or something that has been referred to as a "pseudo-file" like > > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py > > when it is really member foo/bar.py of zipped egg > > /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg. > > Fair comment. That points to a "human readable" type of string. It's > not available at the moment, but I guess it could be. > > But see below. > > > > - xxx.get_source_code(token) is a function (I don't know where, > > > xxx is a placeholder for some "suitable" module) which, given such a > > > token, returns the source, or None if there's no viable concept of > > > "the source". > > > > There always is a viable concept of a source. It's whatever was done > > to get the code. For example, if it was via an eval then the source > > was the eval function and a string, same for exec. If it's via > > database access, well that then and some summary info about what's > > known about that. > > Hmm, "source" colloquially, yes "bytecode loaded from ....\xxx.pyc", > for example. But not "source" in the sense of "source code". Some > applications run with only bytecode shipped, no source code available > at all. > > > There are two problems. One is displaying location information in an > > unambiguous way -- the pseudo-file above is ambiguous and so is > > since there's no guarentee that OS's make to not name a file > > that. The second problem is programmatically getting information such > > as a debugger or an IDE might do so that the information can be > > conveyed back to a user who might want to inspect surrounding source > > code or modules. > > This is more than you were asking for above. > > The first problem is addressed with a "human readable" (narrative) > description, as above. > > The second, however, requires machine-readable access to source code > (if it exists). That's what the loader get_source() call does for you. > But you have to be prepared for the fact that it may not be possible > to get source code, and decide what you want to happen in that case. I'm missing your point here. When one uses information from a traceback, or is in a debugger, or is in an IDE, it is assumed that in order to use the information given you'll need access to the source code. And IDE's and debuggers have had to deal with the fact that source code is not available from day one, even before there was zipimporter. In order to get the strings of source text that linecache.getlines() gives, it has to prowl around for other information, possibly looking for a loader along the protocol defined in PEP 302 and/or others. And its that information that a debugger, IDE or some tool of that ilk might need. Many IDE's and debuggers nowadays open a socket and pass information back and forth over that. An obvious advantage is that it means you can debug remotely. But in order for this to work, some information is generally passed back and for regarding the location of the source text. In the Java world and Eclipse for example, it is possible for the jar to be in a different location from on the machine which you might be debugging on. And probably too often that jar isn't the same one. So it is helpful in this kind of scenario to break out a location into the name of a jar and the member inside the jar. Perhaps also some information about that jar. It is possible that instead of passing around locations, debuggers and such tools instead use get_source() instead, because that's what Python has to offer. :-) I jest here, but honestly I've been surprised that there is no IDE that I know of that in fact works this way. The machine running the code clearly may have more accurate access to the source than a front-end IDE. Undeterred by the harsh facts of reality, I have hope that someday there *might* be an IDE that has provision for this. So in a Ruby debugger (ruby-debug) one can request checksum information on the files the debugger things are loaded in order to facilitate checking that the source one an IDE might be showing in fact matches the source for that part of the code that one is currently under investigation. > > > > I hope this is of some help, > > > > Yes, thanks. At least I now have a clearer idea of the state of > > where things stand. > > Good. Sorry it's not better news :-) > > Paul > From skip at pobox.com Thu Dec 25 16:41:54 2008 From: skip at pobox.com (skip at pobox.com) Date: Thu, 25 Dec 2008 09:41:54 -0600 Subject: [Python-Dev] test message - please ignore Message-ID: <18771.43458.868053.174950@montanaro-dyndns-org.local> Merry Christmas everyone. Still, just hit 'd'. I'm testing the mpo spam filter. Skip From list at qtrac.plus.com Fri Dec 26 09:55:49 2008 From: list at qtrac.plus.com (Mark Summerfield) Date: Fri, 26 Dec 2008 08:55:49 +0000 Subject: [Python-Dev] Python 3 - Mac Installer? Message-ID: <200812260855.49518.list@qtrac.plus.com> Hi, Just wondered if/when there'd be a Mac installer for Python 3? Thanks! -- Mark Summerfield, Qtrac Ltd, www.qtrac.eu C++, Python, Qt, PyQt - training and consultancy "Programming in Python 3" - ISBN 0137129297 From techtonik at gmail.com Fri Dec 26 15:25:34 2008 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 26 Dec 2008 16:25:34 +0200 Subject: [Python-Dev] os.defpath for Windows In-Reply-To: <494E0A2B.4080704@gmail.com> References: <494E0A2B.4080704@gmail.com> Message-ID: I can't see any logical reason for that. There should not be such a hack to avoid "magical bugs" when PATH is empty. On Sun, Dec 21, 2008 at 11:19 AM, Yinon Ehrlich wrote: > Hi, > > just saw that os.defpath for Windows is defined as > Lib/ntpath.py:30:defpath = '.;C:\\bin' > > Most Windows machines I saw has no c:\bin directory. > > Any reason why it was defined this way ? > Thanks, > Yinon > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/techtonik%40gmail.com > -- --anatoly t. From status at bugs.python.org Fri Dec 26 18:07:11 2008 From: status at bugs.python.org (Python tracker) Date: Fri, 26 Dec 2008 18:07:11 +0100 (CET) Subject: [Python-Dev] Summary of Python tracker Issues Message-ID: <20081226170711.DF02C78301@psf.upfronthosting.co.za> ACTIVITY SUMMARY (12/19/08 - 12/26/08) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2295 open (+38) / 14279 closed (+12) / 16574 total (+50) Open issues with patches: 776 Average duration of open issues: 701 days. Median duration of open issues: 2752 days. Open Issues Breakdown open 2277 (+38) pending 18 ( +0) Issues Created Or Reopened (51) _______________________________ IDLE Code Caching Windows 12/19/08 http://bugs.python.org/issue4691 reopened amaury.forgeotdarc [PATCH] msvc9compiler raises IOError when no compiler found inst 12/19/08 http://bugs.python.org/issue4702 created pjenvey patch Syntax error in sample code for enumerate in documentation. 12/20/08 CLOSED http://bugs.python.org/issue4703 created trenholmes Update pybench for python 3.0 12/20/08 http://bugs.python.org/issue4704 created marketdickinson patch python3.0 -u: unbuffered stdout 12/20/08 http://bugs.python.org/issue4705 created haypo try to build a C module, but don't worry if it doesn't work 12/20/08 http://bugs.python.org/issue4706 created zooko round() shows undocumented behaviour 12/20/08 http://bugs.python.org/issue4707 created dingo patch os.pipe should return inheritable descriptors (Windows) 12/21/08 http://bugs.python.org/issue4708 created castironpi Mingw-w64 and python on windows x64 12/21/08 http://bugs.python.org/issue4709 created cdavid patch [PATCH] zipfile.ZipFile does not extract directories properly 12/21/08 http://bugs.python.org/issue4710 created faw patch Wide literals in the table of contents overflow in documentation 12/21/08 http://bugs.python.org/issue4711 created scottdial Document pickle behavior for subclasses of dicts/lists 12/21/08 http://bugs.python.org/issue4712 created georg.brandl Installing sgmlop can crash xmlrpclib 12/21/08 http://bugs.python.org/issue4713 created cito patch print opcode stats at the end of pybench runs 12/21/08 http://bugs.python.org/issue4714 created pitrou patch optimize bytecode for conditional branches 12/21/08 http://bugs.python.org/issue4715 created pitrou patch Python 3.0 halts on shutdown when settrace is set 12/22/08 http://bugs.python.org/issue4716 created fabioz execfile conversion is not correct 12/22/08 CLOSED http://bugs.python.org/issue4717 created fabioz wsgiref package totally broken 12/22/08 http://bugs.python.org/issue4718 created hdima patch sys.exc_clear() not flagged in any way 12/22/08 CLOSED http://bugs.python.org/issue4719 created fabioz Extension function optional argument specification | causes Runt 12/22/08 CLOSED http://bugs.python.org/issue4720 created pearu pythonw.exe crash in GUI application(PythonWX) 12/22/08 CLOSED http://bugs.python.org/issue4721 created george _winreg.QueryValue fault while reading mangled registry values 12/22/08 http://bugs.python.org/issue4722 created malicious.wizard os.path.basename error on directory names with numbers 12/22/08 CLOSED http://bugs.python.org/issue4723 created kle_py setting f_exc_traceback aborts in debug builds 12/22/08 http://bugs.python.org/issue4724 created benjamin.peterson reporting file locations in egg (and other package) files 12/22/08 CLOSED http://bugs.python.org/issue4725 created rocky doctest gets line numbers wrong due to quotes in comments 12/22/08 http://bugs.python.org/issue4726 created guyer patch pickle/copyreg doesn't support keyword only arguments in __new__ 12/23/08 http://bugs.python.org/issue4727 created erickt Endianness and universal builds problems 12/23/08 http://bugs.python.org/issue4728 created cdavid Documentation under 'pass' statement talks about exception very 12/23/08 CLOSED http://bugs.python.org/issue4729 created orsenthil cPickle corrupts high-unicode strings 12/23/08 http://bugs.python.org/issue4730 created njs suggest change to "Failed to find the necessary bits to build th 12/23/08 http://bugs.python.org/issue4731 created mkc Object allocation stress leads to segfault on RHEL 12/23/08 http://bugs.python.org/issue4732 created ajg Add a "decode to declared encoding" version of urlopen to urllib 12/23/08 http://bugs.python.org/issue4733 created ajaksu2 patch broken link for 2.5.3 doc download 12/24/08 CLOSED http://bugs.python.org/issue4734 created quiver An error occurred during the installation of assembly 12/24/08 http://bugs.python.org/issue4735 created rwpjr66 io.BufferedRWPair.closed broken; tries to call bool writer.close 12/24/08 CLOSED http://bugs.python.org/issue4736 created semanticist documentation and noddy*.c 12/24/08 CLOSED http://bugs.python.org/issue4737 created exe Patch to make zlib-objects better support threads 12/24/08 http://bugs.python.org/issue4738 created ebfe patch [patch] Let users do help('@') and so on for confusing syntax co 12/24/08 http://bugs.python.org/issue4739 created alsuren patch pickle test for protocol 3 (HIGHEST_PROTOCOL in py3k) 12/24/08 http://bugs.python.org/issue4740 created ocean-city patch, easy winsound.SND_PURGE has no effect 12/24/08 CLOSED http://bugs.python.org/issue4741 created Ultrasick 3.0 distutils byte-compiling -> Syntax error: unknown encoding: 12/24/08 http://bugs.python.org/issue4742 created sjmachin intra-pkg multiple import (import local1, local2) not fixed 12/25/08 http://bugs.python.org/issue4743 created sjmachin asynchat documentation needs to be more precise 12/25/08 http://bugs.python.org/issue4744 created beazley socket.send obscure error message 12/25/08 http://bugs.python.org/issue4745 created Luther Misguiding wording 3.0 c-api reference 12/25/08 http://bugs.python.org/issue4746 created ebfe SyntaxError executing a script containing non-ASCII characters i 12/26/08 http://bugs.python.org/issue4747 created gagenellina yield expression vs lambda 12/26/08 http://bugs.python.org/issue4748 created georg.brandl Issue with RotatingFileHandler logging handler on Windows 12/26/08 http://bugs.python.org/issue4749 created mramahi77 tarfile keeps excessive dir structure in compressed files 12/26/08 http://bugs.python.org/issue4750 created techtonik patch Patch for better thread support in hashlib 12/26/08 http://bugs.python.org/issue4751 created ebfe patch Issues Now Closed (27) ______________________ ctypes function pointer enhancements 349 days http://bugs.python.org/issue1797 haypo patch [distutils] - error when processing the "--formats=tar" option 340 days http://bugs.python.org/issue1885 techtonik patch IDLE "find in files" output not formatted optimally 206 days http://bugs.python.org/issue2996 loewis patch speedup some comparisons 190 days http://bugs.python.org/issue3106 pitrou patch Cannot start wsgiref simple server in Py3k 163 days http://bugs.python.org/issue3348 pitrou patch create a numbits() method for int and long types 148 days http://bugs.python.org/issue3439 marketdickinson patch, needs review wsgiref.simple_server fails to run demo_app 107 days http://bugs.python.org/issue3795 pitrou Tkinter cannot find Tcl/Tk on Mac OS X 79 days http://bugs.python.org/issue4017 benjamin.peterson library.pdf - Section 17.6.4 Examples - Multiprocessing - Format 62 days http://bugs.python.org/issue4162 benjamin.peterson library/turtle.rst does not format properly in PDF mode 62 days http://bugs.python.org/issue4169 benjamin.peterson 2to3 drops executable bit with --write 15 days http://bugs.python.org/issue4602 benjamin.peterson patch Add Mac OS X Disk Images to Python.org homepage 10 days http://bugs.python.org/issue4627 benjamin.peterson test_bad_address in test_urllib2_localnet often fails 7 days http://bugs.python.org/issue4666 rpetrov Typo in PyObjC URL on "GUI Programming on the Mac" 3 days http://bugs.python.org/issue4689 loewis UnicodeEncodeError in license() 0 days http://bugs.python.org/issue4700 amaury.forgeotdarc Syntax error in sample code for enumerate in documentation. 0 days http://bugs.python.org/issue4703 benjamin.peterson execfile conversion is not correct 0 days http://bugs.python.org/issue4717 benjamin.peterson sys.exc_clear() not flagged in any way 0 days http://bugs.python.org/issue4719 benjamin.peterson Extension function optional argument specification | causes Runt 0 days http://bugs.python.org/issue4720 benjamin.peterson pythonw.exe crash in GUI application(PythonWX) 0 days http://bugs.python.org/issue4721 loewis os.path.basename error on directory names with numbers 0 days http://bugs.python.org/issue4723 loewis reporting file locations in egg (and other package) files 1 days http://bugs.python.org/issue4725 loewis Documentation under 'pass' statement talks about exception very 1 days http://bugs.python.org/issue4729 benjamin.peterson broken link for 2.5.3 doc download 0 days http://bugs.python.org/issue4734 loewis io.BufferedRWPair.closed broken; tries to call bool writer.close 0 days http://bugs.python.org/issue4736 benjamin.peterson documentation and noddy*.c 0 days http://bugs.python.org/issue4737 benjamin.peterson winsound.SND_PURGE has no effect 2 days http://bugs.python.org/issue4741 georg.brandl Top Issues Most Discussed (10) ______________________________ 23 wsgiref package totally broken 4 days open http://bugs.python.org/issue4718 23 round() shows undocumented behaviour 6 days open http://bugs.python.org/issue4707 10 range objects becomes hashable after attribute access 7 days open http://bugs.python.org/issue4701 10 Added clearerr() to clear EOF state 613 days open http://bugs.python.org/issue1706039 9 sys.exc_clear() not flagged in any way 0 days closed http://bugs.python.org/issue4719 6 zipfile returns string but expects binary 16 days open http://bugs.python.org/issue4621 5 Get rid of more refercenes to __cmp__ 71 days open http://bugs.python.org/issue1717 5 subprocess is not EINTR-safe 1500 days open http://bugs.python.org/issue1068268 4 os.path.basename error on directory names with numbers 0 days closed http://bugs.python.org/issue4723 4 Permit to easily use distutils "--formats=tar,gztar,bztar" on a 340 days open http://bugs.python.org/issue1886 From mikko+python at redinnovation.com Fri Dec 26 22:46:14 2008 From: mikko+python at redinnovation.com (Mikko Ohtamaa) Date: Fri, 26 Dec 2008 23:46:14 +0200 Subject: [Python-Dev] VM imaging based launch optimizations for CPython? In-Reply-To: References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com> Message-ID: <7b5b293c0812261346v746a95e9r41751cf29c8d3c55@mail.gmail.com> On Mon, Dec 22, 2008 at 12:09 PM, Erno Kuusela wrote: > > unexec probably work out of the box on symbian, but...: > > http://mail.python.org/pipermail/python-dev/2003-May/035727.html > > unexec() is pretty much what I was looking for. However, looks like its old hack from 80s and cannot be applied as is to the modern environment. Basically unexec() dumps the running application code (not specific to any interpreter) and data segments out as a.out binary. 1) Generating a binary file is not possible on Symbian and iPhone environments, because all binaries must be signed - however, we can probably use a generic stub exe which loads data segment only 2) a.out format is deprecated 3) Dynamic DLLs are not managed - basically a show stopper I hope I could find someone find enough OS fu to tell whether this is possible with DLLs at all and what data pointers must be patched on each unexec() call. -Mikko -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikko+python at redinnovation.com Fri Dec 26 22:56:13 2008 From: mikko+python at redinnovation.com (Mikko Ohtamaa) Date: Fri, 26 Dec 2008 23:56:13 +0200 Subject: [Python-Dev] VM imaging based launch optimizations for CPython? In-Reply-To: <494D69D2.5090601@v.loewis.de> References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com> <494D69D2.5090601@v.loewis.de> Message-ID: <7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com> > > > > Of course, you still have the actual interpretation of > the top-level module code - if it's not the marshalling > but this part that actually costs performance, this > efficient marshalling algorithm won't help. It would be > interesting to find out which modules have a particularly > high startup cost - perhaps they can be rewritten I am afraid this is the case. I hope we could marshal an arbitary application state (not even Python specific) into a fast loading dump file (hibernation/snapshot). We have tried to use lazy importing as much as possible to distribute the importing cost across the application UI states. Out of my head I know at least two particular module which could be refactored. I'd recommend as the best practice that everything should be imported lazily if it's possible. However, looks like currently Python community is moving to another direction, since doing explict imports in __init__ etc. makes APIs cleaner (think Django) and debugging more sane task - Python is mainly used on the server and limited environments haven't been particular interesting until lately. logging - defines lots of classes which are used only if they are specified by logging options. I once hacked this for my personal use to be little lighter. urllib - particular heavy, imports httplib, ftplib and stuff even if it is not used Nokia has just released Python 2.5 based PyS60. I think we'll come back this after a while with a nice generic profiler which will tell the import cost. Merry XMas, -Mikko -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 27 00:06:49 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Dec 2008 09:06:49 +1000 Subject: [Python-Dev] VM imaging based launch optimizations for CPython? In-Reply-To: <7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com> References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com> <494D69D2.5090601@v.loewis.de> <7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com> Message-ID: <49556389.8080206@gmail.com> Mikko Ohtamaa wrote: > Out of my head I know at least two particular module which could be > refactored. I'd recommend as the best practice that everything should be > imported lazily if it's possible. We actually have a reason for discouraging lazy imports - using them carelessly makes it much easier to accidentally deadlock yourself on the import lock. I agree that this contributes to the problem of long startup times though. One sledgehammer approach to lazy imports is to modify the actual import system to use lazy imports by default, rather than having to explicitly enable them in a given module or for each particular import. Mercurial does this quite nicely by overriding the __import__ implementation [1]. Perhaps PyS60 could install something similar in site.py? The trade-off will be whether enough time is saved in avoiding "wasted" module loads to make up for the extra time spent managing the bookkeeping for the lazy imports. Cheers, Nick. [1] From a recent thread on Python-Ideas that Google found for me: http://selenic.com/repo/index.cgi/hg-stable/file/967adcf5910d/mercurial/demandimport.py#l1 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From benjamin at python.org Sat Dec 27 00:30:45 2008 From: benjamin at python.org (Benjamin Peterson) Date: Fri, 26 Dec 2008 17:30:45 -0600 Subject: [Python-Dev] Python 3 - Mac Installer? In-Reply-To: <200812260855.49518.list@qtrac.plus.com> References: <200812260855.49518.list@qtrac.plus.com> Message-ID: <1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com> On Fri, Dec 26, 2008 at 2:55 AM, Mark Summerfield wrote: > Hi, > > Just wondered if/when there'd be a Mac installer for Python 3? I think there should be one eventually. Unfortunately, the 3.x build process is not ironed out. If somebody wants to make a patch which makes the build script in Mac/BuildScript/ work, I'd be very happy. :) > > Thanks! -- Regards, Benjamin Peterson From skip at pobox.com Sat Dec 27 00:40:51 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 26 Dec 2008 17:40:51 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? Message-ID: <18773.27523.297588.265405@montanaro-dyndns-org.local> The doc for os.path.commonprefix states: Return the longest path prefix (taken character-by-character) that is a prefix of all paths in list. If list is empty, return the empty string (''). Note that this may return invalid paths because it works a character at a time. I remember encountering this in an earlier version of Python 2.x (maybe 2.2 or 2.3?) and "fixed" it to work by pathname components instead of by characters. That had to be reverted because it was a behavior change and broke code which used it for strings which didn't represent paths. After the reversion I then forgot about it. I just stumbled upon it again. It seems to me this would have been a good thing to fix in 3.0. Is this something which could change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)? Skip From martin at v.loewis.de Sat Dec 27 00:52:28 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Sat, 27 Dec 2008 00:52:28 +0100 Subject: [Python-Dev] VM imaging based launch optimizations for CPython? In-Reply-To: <7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com> References: <7b5b293c0812201127i97ccb2ep4fa2d3d31dc1a154@mail.gmail.com> <494D69D2.5090601@v.loewis.de> <7b5b293c0812261356u46793362rea06ee8ac3785f0b@mail.gmail.com> Message-ID: <49556E3C.7030903@v.loewis.de> > Of course, you still have the actual interpretation of > the top-level module code - if it's not the marshalling > but this part that actually costs performance, this > efficient marshalling algorithm won't help. It would be > interesting to find out which modules have a particularly > high startup cost - perhaps they can be rewritten > > > I am afraid this is the case. Is that an unfounded or a founded fear? IOW, do you have hard numbers proving that it is the actual interpretation time (rather than the marshaling time) that causes the majority of the startup cost? > I hope we could marshal an arbitary > application state (not even Python specific) into a fast loading dump > file (hibernation/snapshot). I understand that this is what you want to get. I'm proposing that there might be a different approach to achieve a similar speedup. > logging - defines lots of classes which are used only if they are > specified by logging options. I once hacked this for my personal use to > be little lighter. So what speedup did you gain by rewriting it? (i.e. how many microseconds did "import logging" take before, how much afterwards?) How much of it was parsing/unmarshaling, and how much time byte code interpretation? Of the byte code interpretation, what opcodes in particular? > urllib - particular heavy, imports httplib, ftplib and stuff even if it > is not used Same questions here. This doesn't sound like any heavy computation is being done during startup. > Nokia has just released Python 2.5 based PyS60. I think we'll come back > this after a while with a nice generic profiler which will tell the > import cost. Looking forward to hear your numbers! Regards, Martin From ncoghlan at gmail.com Sat Dec 27 00:58:07 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Dec 2008 09:58:07 +1000 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18773.27523.297588.265405@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> Message-ID: <49556F8F.5090709@gmail.com> skip at pobox.com wrote: > The doc for os.path.commonprefix states: > > Return the longest path prefix (taken character-by-character) that is a > prefix of all paths in list. If list is empty, return the empty string > (''). Note that this may return invalid paths because it works a > character at a time. > > I remember encountering this in an earlier version of Python 2.x (maybe 2.2 > or 2.3?) and "fixed" it to work by pathname components instead of by > characters. That had to be reverted because it was a behavior change and > broke code which used it for strings which didn't represent paths. After > the reversion I then forgot about it. > > I just stumbled upon it again. It seems to me this would have been a good > thing to fix in 3.0. Is this something which could change in 3.1 (or be > deprecated in 3.1 with deletion in 3.2)? Why can't we add an "allow_fragment" keyword that defaults to True? Then "allow_fragment=False" will stop at the last full directory name and ignore any partial matches on the filenames or the next subdirectory (depending on where the common prefix ends). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From skip at pobox.com Sat Dec 27 01:04:40 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 26 Dec 2008 18:04:40 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18773.27523.297588.265405@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> Message-ID: <18773.28952.937116.329215@montanaro-dyndns-org.local> skip> I just stumbled upon it again. It seems to me this would have skip> been a good thing to fix in 3.0. Is this something which could skip> change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)? Hmmm... I didn't really mean "deletion". I meant, could a behavior change be implemented in 3.2 with a warning emitted in 3.1? Skip From skip at pobox.com Sat Dec 27 03:49:55 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 26 Dec 2008 20:49:55 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <49556F8F.5090709@gmail.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> Message-ID: <18773.38867.117021.560152@montanaro-dyndns-org.local> Nick> Why can't we add an "allow_fragment" keyword that defaults to Nick> True? Then "allow_fragment=False" will stop at the last full Nick> directory name and ignore any partial matches on the filenames or Nick> the next subdirectory (depending on where the common prefix ends). You could I suppose though that would just be adding another hack on top of existing questionable behavior. I wasn't so concerned with implementation as whether or not a change to the semantics of the function was possible. Skip From skip at pobox.com Sat Dec 27 05:03:03 2008 From: skip at pobox.com (skip at pobox.com) Date: Fri, 26 Dec 2008 22:03:03 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18773.27523.297588.265405@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> Message-ID: <18773.43255.196790.18980@montanaro-dyndns-org.local> skip> I just stumbled upon it again. It seems to me this would have skip> been a good thing to fix in 3.0. Is this something which could skip> change in 3.1 (or be deprecated in 3.1 with deletion in 3.2)? This new issue in the tracker: http://bugs.python.org/issue4755 implements a commonpathprefix function. As explained in the submission this would be my second choice should it be decided that something should change. Skip From steve at pearwood.info Sat Dec 27 07:37:20 2008 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 27 Dec 2008 17:37:20 +1100 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <49556F8F.5090709@gmail.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> Message-ID: <200812271737.22101.steve@pearwood.info> On Sat, 27 Dec 2008 10:58:07 am Nick Coghlan wrote: > skip at pobox.com wrote: > > The doc for os.path.commonprefix states: > > > > Return the longest path prefix (taken character-by-character) > > that is a prefix of all paths in list. If list is empty, return the > > empty string (''). Note that this may return invalid paths because > > it works a character at a time. > > > > I remember encountering this in an earlier version of Python 2.x > > (maybe 2.2 or 2.3?) and "fixed" it to work by pathname components > > instead of by characters. That had to be reverted because it was a > > behavior change and broke code which used it for strings which > > didn't represent paths. After the reversion I then forgot about > > it. > > > > I just stumbled upon it again. It seems to me this would have been > > a good thing to fix in 3.0. Is this something which could change > > in 3.1 (or be deprecated in 3.1 with deletion in 3.2)? > > Why can't we add an "allow_fragment" keyword that defaults to True? > Then "allow_fragment=False" will stop at the last full directory name > and ignore any partial matches on the filenames or the next > subdirectory (depending on where the common prefix ends). For what it's worth, I think that the two pieces of functionality are different enough that in an ideal world they should be two different functions rather than one function with a switch. I think os.path.commonprefix should only operate on path components, and if character-by-character prefix matching on general strings is useful, then it should be a string method. -- Steven D'Aprano From solipsis at pitrou.net Sat Dec 27 17:10:20 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 27 Dec 2008 16:10:20 +0000 (UTC) Subject: [Python-Dev] A wart which should have been repaired in 3.0? References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> Message-ID: pobox.com> writes: > You could I suppose though that would just be adding another hack on top of > existing questionable behavior. Agreed. We should fix the original function so that it has the obvious, intented effect. Leaving the buggy function in place and adding another function with the proper behaviour sounds ridiculous. From skip at pobox.com Sat Dec 27 17:57:40 2008 From: skip at pobox.com (skip at pobox.com) Date: Sat, 27 Dec 2008 10:57:40 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> Message-ID: <18774.24196.530208.708594@montanaro-dyndns-org.local> >> You could I suppose though that would just be adding another hack on >> top of existing questionable behavior. Antoine> Agreed. We should fix the original function so that it has the Antoine> obvious, intented effect. Leaving the buggy function in place Antoine> and adding another function with the proper behaviour sounds Antoine> ridiculous. If we add commonpath or commonpathprefix or pathprefix, or whatever, then find someplace to move the existing commonprefix function (maybe to the string module or as a class method of string objects?) then could we make a 2to3 fixer for this? Skip From solipsis at pitrou.net Sat Dec 27 18:07:15 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 27 Dec 2008 17:07:15 +0000 (UTC) Subject: [Python-Dev] A wart which should have been repaired in 3.0? References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> Message-ID: pobox.com> writes: > > If we add commonpath or commonpathprefix or pathprefix, or whatever, then > find someplace to move the existing commonprefix function (maybe to the > string module or as a class method of string objects?) then could we make a > 2to3 fixer for this? IMHO it's a bug, the py3k migration process needn't apply. From ncoghlan at gmail.com Sat Dec 27 21:44:00 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Dec 2008 06:44:00 +1000 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> Message-ID: <49569390.1080805@gmail.com> Antoine Pitrou wrote: > pobox.com> writes: >> If we add commonpath or commonpathprefix or pathprefix, or whatever, then >> find someplace to move the existing commonprefix function (maybe to the >> string module or as a class method of string objects?) then could we make a >> 2to3 fixer for this? > > IMHO it's a bug, the py3k migration process needn't apply. The current behaviour is exactly what one would need to implement bash-style tab completion [1], so I don't get why anyone is calling it "useless" or "obviously broken". It's brokenness isn't obvious at all to me - it just doesn't do what you want it to do. Adding a separate function called "os.path.commonpath" with the behaviour Skip wants sounds like *exactly* the right answer to me. Cheers, Nick. * entries = os.listdir() candidates = [e for e in entries if e.startswith(typed)] if len(candidates) > 1: tab_result = os.path.commonprefix(entries) elif candidates: tab_result = candidates[0] else: tab_result = typed -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Sat Dec 27 21:59:28 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 27 Dec 2008 20:59:28 +0000 (UTC) Subject: [Python-Dev] A wart which should have been repaired in 3.0? References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> Message-ID: Nick Coghlan gmail.com> writes: > > The current behaviour is exactly what one would need to implement > bash-style tab completion [1], so I don't get why anyone is calling it > "useless" or "obviously broken". Point taken. Although the fact that it lives in os.path suggests that the function should know about path components instead of ignoring their existence... A generic longest common prefix function would belong elsewhere. The issue people are having with the proposal to create a separate function is that it's a bloat of the API. I don't think the os.path module claims to give utilities for implementing bash-style tab completion, however it is supposed to make manipulation of paths easier -- which returning invalid answers (or, worse, valid but intuitively wrong answers) does not. From ncoghlan at gmail.com Sun Dec 28 07:26:10 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Dec 2008 16:26:10 +1000 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> Message-ID: <49571C02.7090205@gmail.com> Antoine Pitrou wrote: > Nick Coghlan gmail.com> writes: >> The current behaviour is exactly what one would need to implement >> bash-style tab completion [1], so I don't get why anyone is calling it >> "useless" or "obviously broken". > > Point taken. > Although the fact that it lives in os.path suggests that the function should > know about path components instead of ignoring their existence... A generic > longest common prefix function would belong elsewhere. > > The issue people are having with the proposal to create a separate function is > that it's a bloat of the API. I don't think the os.path module claims to give > utilities for implementing bash-style tab completion, however it is supposed to > make manipulation of paths easier -- which returning invalid answers (or, worse, > valid but intuitively wrong answers) does not. True, but it's a matter of weighing up the migration cost of the two options: a) Add a new function (e.g. os.path.commonpath) which works on a path component basis. Zero migration cost, minor ongoing cost in explaining the difference between commonpath (with path component based semantics) and commprefix (with character based semantics). That ongoing cost can largely be handled just by referencing the two functions from each other's documentation (note that they will actually be next to each other in the alphabetical list of os.path functions, and the path component based one will appear before the character based one). b) Deprecate the current semantics of os.path.commonprefix (which will likely involve changing the name anyway, since it is easier to deprecate the old semantics when the new semantics have a different name), add the new path component based semantics, add the character-based semantics back somewhere else. This imposes a major migration cost (since the old commonprefix will at least change its name) with significant potential for confusion due to the semantic changes across versions (if the commonprefix name is reused for the new semantics). If we're going to end up with two functions anyway, why mess with the one which is already there and in use for real programs? Just add a new function with the new semantics and be done with it. Anything else will just cause migration pain without any significant counterbalancing benefit. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Sun Dec 28 11:29:01 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 28 Dec 2008 11:29:01 +0100 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <49571C02.7090205@gmail.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> Message-ID: <1230460141.6361.4.camel@localhost> Le dimanche 28 d?cembre 2008 ? 16:26 +1000, Nick Coghlan a ?crit : > If we're going to end up with two functions anyway, why mess with the > one which is already there and in use for real programs? Well, agreed. I was just hoping we could get away with "fixing" the existing function and voil? :) From ncoghlan at gmail.com Sun Dec 28 11:47:46 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Dec 2008 20:47:46 +1000 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <1230460141.6361.4.camel@localhost> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> Message-ID: <49575952.7070405@gmail.com> Antoine Pitrou wrote: > Le dimanche 28 d?cembre 2008 ? 16:26 +1000, Nick Coghlan a ?crit : >> If we're going to end up with two functions anyway, why mess with the >> one which is already there and in use for real programs? > > Well, agreed. > I was just hoping we could get away with "fixing" the existing function > and voil? :) I'm all for breaking backwards compatibility when it allows some genuine improvements that would otherwise be impossible, but in this particular case a little API bloat seems like the least of the available evils :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From solipsis at pitrou.net Sun Dec 28 11:51:48 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 28 Dec 2008 10:51:48 +0000 (UTC) Subject: [Python-Dev] =?utf-8?q?Hello_everyone_+_little_question_around=09?= =?utf-8?q?Cpython/stackless?= References: <49500B86.1070605@wanadoo.fr> Message-ID: Hello, > I'm currently studying all I can find on stackless python, PYPY and the > concepts they've brought to Python, and so far I wonder : since > stackless python claims to be 100% compatible with CPython's extensions, > faster, and brings lots of fun stuffs (tasklets, coroutines and no C > stack), how comes it hasn't been merged back, to become the standard > 'fast' python implementation ? I'm not sure Stackless ever claimed to be faster than CPython for standard tasks (i.e., not coroutine-related). Do you have any pointers to this? As for coroutines, the greenlets (*) package is said to bring them to the standard interpreter. (*) http://codespeak.net/py/dist/greenlet.html Regards Antoine. From ncoghlan at gmail.com Sun Dec 28 13:09:44 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Dec 2008 22:09:44 +1000 Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup? In-Reply-To: <494C9C08.5030702@gmail.com> References: <494C1CE4.5080102@gmail.com> <494C9C08.5030702@gmail.com> Message-ID: <49576C88.30503@gmail.com> Nick Coghlan wrote: > Nick Coghlan wrote: >> Is there a specific reason for not fully initialising the builtin types? >> Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init? > > I need to correct this slightly: some builtin types *are* initialised > properly by _Py_ReadyTypes. > > So the question is actually whether or not the missing builtin types > should be added to that function. I'm probably going to fix the specific problem with hashing of range objects in Py3k just by initialising xrange/range properly in _Py_ReadyTypes. However, I wonder how many other builtin types have the same problem - for example, the enumerate type is also missing a call to PyType_Ready: Python 3.1a0 (py3k, Dec 14 2008, 21:35:11) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> x = enumerate([]) >>> hash(x) Traceback (most recent call last): File "", line 1, in TypeError: unhashable type: 'enumerate' >>> enumerate.__name__ # implicit call to PyType_Ready 'enumerate' >>> hash(x) -1212398692 Rather than playing whack-a-mole with this, does anyone have any ideas on how to systematically find types which are defined in the core, but are missing an explicit PyType_Ready call? (I guess one way would be to remove all the implicit calls in a local build and see what blows up... that seems a little drastic though) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Sun Dec 28 13:46:29 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Dec 2008 22:46:29 +1000 Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup? In-Reply-To: <49576C88.30503@gmail.com> References: <494C1CE4.5080102@gmail.com> <494C9C08.5030702@gmail.com> <49576C88.30503@gmail.com> Message-ID: <49577525.1080800@gmail.com> Nick Coghlan wrote: > Rather than playing whack-a-mole with this, does anyone have any ideas > on how to systematically find types which are defined in the core, but > are missing an explicit PyType_Ready call? (I guess one way would be to > remove all the implicit calls in a local build and see what blows up... > that seems a little drastic though) The whack-a-mole tactic did pick up a couple more though - the two "builtin" types that iter() can return (the basic sequence iterator and the callable with sentinel result iterator). Perhaps the path of least resistance is to change PyObject_Hash to be yet another place where PyType_Ready will be called implicitly if it hasn't been called already? That approach would get us back to the Python 2.x status quo where calling PyType_Ready was only absolutely essential if you wanted to correctly inherit a slot from a type other than object itself. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From eric at trueblade.com Sun Dec 28 14:54:01 2008 From: eric at trueblade.com (Eric Smith) Date: Sun, 28 Dec 2008 08:54:01 -0500 Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup? In-Reply-To: <49577525.1080800@gmail.com> References: <494C1CE4.5080102@gmail.com> <494C9C08.5030702@gmail.com> <49576C88.30503@gmail.com> <49577525.1080800@gmail.com> Message-ID: <495784F9.8010008@trueblade.com> Nick Coghlan wrote: > Nick Coghlan wrote: >> Rather than playing whack-a-mole with this, does anyone have any ideas >> on how to systematically find types which are defined in the core, but >> are missing an explicit PyType_Ready call? (I guess one way would be to >> remove all the implicit calls in a local build and see what blows up... >> that seems a little drastic though) > > The whack-a-mole tactic did pick up a couple more though - the two > "builtin" types that iter() can return (the basic sequence iterator and > the callable with sentinel result iterator). > > Perhaps the path of least resistance is to change PyObject_Hash to be > yet another place where PyType_Ready will be called implicitly if it > hasn't been called already? I think that's the best thing to do. It would bring PyObject_Hash in line with PyObject_Format, for example. Eric. From solipsis at pitrou.net Sun Dec 28 20:11:55 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 28 Dec 2008 19:11:55 +0000 (UTC) Subject: [Python-Dev] Use -M option on buildbots? Message-ID: Hi all, Could we use the -M option (with a suitable value depending on the amount of physical RAM) for regression tests on the buildbots? It would help avoid the kind of situation described in http://bugs.python.org/issue3700 cheers Antoine. From martin at v.loewis.de Sun Dec 28 21:21:25 2008 From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=) Date: Sun, 28 Dec 2008 21:21:25 +0100 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <49575952.7070405@gmail.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> Message-ID: <4957DFC5.9030405@v.loewis.de> > I'm all for breaking backwards compatibility when it allows some genuine > improvements that would otherwise be impossible, but in this particular > case a little API bloat seems like the least of the available evils :) I don't think any change is necessary. os.path.commonprefix works just fine on path components: py> p = ["/usr/bin/ls", "/usr/bin/ln"] py> os.path.commonprefix([f.split('/') for f in p]) ['', 'usr', 'bin'] py> p.append("/usr/local/bin/ls") py> os.path.commonprefix([f.split('/') for f in p]) ['', 'usr'] Of course, using it that way would require a library function that reliably splits a path into components; I think one would have to do abspath on arbitrary inputs. Regards, Martin From rhamph at gmail.com Sun Dec 28 21:59:07 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sun, 28 Dec 2008 13:59:07 -0700 Subject: [Python-Dev] Call PyType_Ready on builtin types during interpreter startup? In-Reply-To: <49576C88.30503@gmail.com> References: <494C1CE4.5080102@gmail.com> <494C9C08.5030702@gmail.com> <49576C88.30503@gmail.com> Message-ID: On Sun, Dec 28, 2008 at 5:09 AM, Nick Coghlan wrote: > Nick Coghlan wrote: >> Nick Coghlan wrote: >>> Is there a specific reason for not fully initialising the builtin types? >>> Or should we be calling PyType_Ready on each of them from _PyBuiltin_Init? >> >> I need to correct this slightly: some builtin types *are* initialised >> properly by _Py_ReadyTypes. >> >> So the question is actually whether or not the missing builtin types >> should be added to that function. > > I'm probably going to fix the specific problem with hashing of range > objects in Py3k just by initialising xrange/range properly in > _Py_ReadyTypes. > > However, I wonder how many other builtin types have the same problem - > for example, the enumerate type is also missing a call to PyType_Ready: > > Python 3.1a0 (py3k, Dec 14 2008, 21:35:11) > [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> x = enumerate([]) >>>> hash(x) > Traceback (most recent call last): > File "", line 1, in > TypeError: unhashable type: 'enumerate' >>>> enumerate.__name__ # implicit call to PyType_Ready > 'enumerate' >>>> hash(x) > -1212398692 > > Rather than playing whack-a-mole with this, does anyone have any ideas > on how to systematically find types which are defined in the core, but > are missing an explicit PyType_Ready call? (I guess one way would be to > remove all the implicit calls in a local build and see what blows up... > that seems a little drastic though) What I did with safethread was replace the implicit calls with assertions. That with the test suite should pick everything up. -- Adam Olsen, aka Rhamphoryncus From skip at pobox.com Mon Dec 29 00:01:52 2008 From: skip at pobox.com (skip at pobox.com) Date: Sun, 28 Dec 2008 17:01:52 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <4957DFC5.9030405@v.loewis.de> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> Message-ID: <18776.1376.724926.669345@montanaro-dyndns-org.local> Martin> I don't think any change is necessary. os.path.commonprefix Martin> works just fine on path components: ... Ummm... >>> os.path.commonprefix(["/export/home", "/etc/passwd"]) '/e' I suppose that's correct given the defined behavior of the function, but it certainly doesn't seem to be very path-like to me. Martin> Of course, using it that way would require a library function Martin> that reliably splits a path into components; I think one would Martin> have to do abspath on arbitrary inputs. See for what I think is a function with more predictable behavior given that we are discussing paths and not just strings. Skip From skip at pobox.com Mon Dec 29 00:14:00 2008 From: skip at pobox.com (skip at pobox.com) Date: Sun, 28 Dec 2008 17:14:00 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18776.1376.724926.669345@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> Message-ID: <18776.2104.439166.518935@montanaro-dyndns-org.local> >>>>> "skip" == skip writes: Martin> I don't think any change is necessary. os.path.commonprefix Martin> works just fine on path components: skip> Ummm... >>> os.path.commonprefix(["/export/home", "/etc/passwd"]) '/e' skip> I suppose that's correct given the defined behavior of the skip> function, but it certainly doesn't seem to be very path-like to skip> me. I should also point out that most people will not have the foresight to use it the way Martin demonstrated. Documentation or not, I'll be a fair fraction of all usage assumes the return value represents a valid path. Martin> Of course, using it that way would require a library function Martin> that reliably splits a path into components; I think one would Martin> have to do abspath on arbitrary inputs. Kinda what I think os.path.split ought to do. Should I tackle that next? ;-) Skip From martin at v.loewis.de Mon Dec 29 00:16:22 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Mon, 29 Dec 2008 00:16:22 +0100 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18776.1376.724926.669345@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> Message-ID: <495808C6.4050304@v.loewis.de> > Martin> I don't think any change is necessary. os.path.commonprefix > Martin> works just fine on path components: > ... > > Ummm... > > >>> os.path.commonprefix(["/export/home", "/etc/passwd"]) > '/e' This calls it with strings, not with path components. As I said, it works fine for path components: py> os.path.commonprefix([f.split('/') for f in ["/export/home", "/etc/passwd"]]) [''] > See for what I think is a function with > more predictable behavior given that we are discussing paths and not just > strings. See above: the function works for lists as well. Regards, Martin From skip at pobox.com Mon Dec 29 00:21:11 2008 From: skip at pobox.com (skip at pobox.com) Date: Sun, 28 Dec 2008 17:21:11 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <495808C6.4050304@v.loewis.de> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49556F8F.5090709@gmail.com> <18773.38867.117021.560152@montanaro-dyndns-org.local> <18774.24196.530208.708594@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> Message-ID: <18776.2535.459306.987378@montanaro-dyndns-org.local> >> See for what I think is a function >> with more predictable behavior given that we are discussing paths and >> not just strings. Martin> See above: the function works for lists as well. But as you yourself pointed out, Python lacks a reliable split function for filesystem paths. The patch implements different versions for Windows and other platforms because Python supports two separators on that platform. Skip From hall.jeff at gmail.com Mon Dec 29 22:49:26 2008 From: hall.jeff at gmail.com (Jeff Hall) Date: Mon, 29 Dec 2008 16:49:26 -0500 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18776.2535.459306.987378@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> Message-ID: <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> I think Nick's solution is "Don't let the best be the enemy of the good" Had this been caught before 3.0 release it might be a different solution Let's just add a new function that works "correctly" Martin, it seems to me that a path. method shouldn't require me to pass path components but instead should accept a "path" as its input (or in this case multiple paths). The current usage feels like a string method to me. Not saying it's not useful but it isn't "intuitive". For those that prefer not to add functions all willy-nilly, would it not be better to add a "delimiter" keyword that defaults to False? Then "delimiter=False" will function with the current functionality unchanged while os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = "/") would properly return 'bob/' -------------- next part -------------- An HTML attachment was scrubbed... URL: From Scott.Daniels at Acm.Org Mon Dec 29 23:02:20 2008 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Mon, 29 Dec 2008 14:02:20 -0800 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> Message-ID: Jeff Hall wrote: >... For those that prefer not to add functions all willy-nilly, would it not > be better to add a "delimiter" keyword that defaults to False? Then > "delimiter=False" will function with the current functionality unchanged > while > > os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = > "/") The proper call should be: os.path.commonprefix(["bob/example", "bob/etc/passwd"], delimiter=True) and output: 'bob' (path to the common directory) Perhaps even call the keyword arg "delimited," rather than "delimiter." On Windows, I'd like to see: os.path.commonprefix(['a/b/c.d/e'f', r'a\b\c.d\eve'], delimited=True) return either 'a/b/c.d' or r'a\b\c.d' Perhaps even ['a', 'b', 'c.d'] (suitable for os.path.join). --Scott David Daniels Scott.Daniels at Acm.Org From hall.jeff at gmail.com Mon Dec 29 23:07:50 2008 From: hall.jeff at gmail.com (Jeff Hall) Date: Mon, 29 Dec 2008 17:07:50 -0500 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> Message-ID: <1bc395c10812291407i53edf28bv60af385f405df4a9@mail.gmail.com> I was thinking that the user could just define the delimiter character due to the differences amongst delimiters used in OS's... but if that isn't a problem (Skip seemed to think it wouldn't be) then my solution is functionally identical to the first one he proposed -------------- next part -------------- An HTML attachment was scrubbed... URL: From skip at pobox.com Mon Dec 29 23:46:01 2008 From: skip at pobox.com (skip at pobox.com) Date: Mon, 29 Dec 2008 16:46:01 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> Message-ID: <18777.21289.504321.865439@montanaro-dyndns-org.local> Jeff> For those that prefer not to add functions all willy-nilly, would Jeff> it not be better to add a "delimiter" keyword that defaults to Jeff> False? Then "delimiter=False" will function with the current Jeff> functionality unchanged while Jeff> os.path.commonprefix(["bob/export/home", "bob/etc/passwd"], delimiter = "/") Jeff> would properly return Jeff> 'bob/' On Windows what would you do with this crazy, but valid, path? c:/etc\\passwd I don't do Windows, so don't have any idea if there is even an /etc/passwd file on Windows. I'd guess not, but that's not the point. The point is that you can use both / (aka ntpath.sep) and \ (aka ntpath.altsep) in Windows pathnames. See my patch (issue 4755) for a version of os.path. which works as at least I expect and should work cross-platform. Skip From pje at telecommunity.com Tue Dec 30 02:02:07 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 29 Dec 2008 20:02:07 -0500 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18777.21289.504321.865439@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <49569390.1080805@gmail.com> <49571C02.7090205@gmail.com> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> Message-ID: <20081230010023.B46883A406C@sparrow.telecommunity.com> You know, all this path separator and list complication isn't really necessary, when you can just take the os.path.dirname() of the return from commonprefix(). Perhaps we could just add that recommendation to the docs? At 04:46 PM 12/29/2008 -0600, skip at pobox.com wrote: > Jeff> For those that prefer not to add functions all willy-nilly, would > Jeff> it not be better to add a "delimiter" keyword that defaults to > Jeff> False? Then "delimiter=False" will function with the current > Jeff> functionality unchanged while > > Jeff> os.path.commonprefix(["bob/export/home", > "bob/etc/passwd"], delimiter = "/") > > Jeff> would properly return > > Jeff> 'bob/' > >On Windows what would you do with this crazy, but valid, path? > > c:/etc\\passwd > >I don't do Windows, so don't have any idea if there is even an /etc/passwd >file on Windows. I'd guess not, but that's not the point. The point is >that you can use both / (aka ntpath.sep) and \ (aka ntpath.altsep) in >Windows pathnames. See my patch (issue 4755) for a version of >os.path. which works as at least I expect and should work >cross-platform. > >Skip > >_______________________________________________ >Python-Dev mailing list >Python-Dev at python.org >http://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: >http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com From ncoghlan at gmail.com Tue Dec 30 08:35:45 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Dec 2008 17:35:45 +1000 Subject: [Python-Dev] Commands for correctly merging to the python 3.0 maintenance branch Message-ID: <4959CF51.4030507@gmail.com> Getting the svnmerge-intergrated property right when merging trunk->py3k->release30 is a little tricky. The most concise set of instructions I have found which gets it right is to do the following in the 3.0 maintenance branch after committing to the py3k branch: svn update svnmerge merge -r svn revert . svnmerge -M -F svn commit -F svnmerge-commit-message.txt Revert and property changes on "." and running that second svnmerge line is also useful if you do a "svn update" after the first svnmerge and get a conflict on the svnmerge-intregrated property. The -M option tells the utility to only make the property changes, while the -F tells it to go ahead despite the existence of local modification. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From p.f.moore at gmail.com Tue Dec 30 10:36:26 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 30 Dec 2008 09:36:26 +0000 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <20081230010023.B46883A406C@sparrow.telecommunity.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> Message-ID: <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> 2008/12/30 Phillip J. Eby : > You know, all this path separator and list complication isn't really > necessary, when you can just take the os.path.dirname() of the return from > commonprefix(). > > Perhaps we could just add that recommendation to the docs? Actually, consider the following (on Windows): >python Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo' >>> This very clearly shows that commonprefix is a string operation rather than a path operation, as it does not respect the equivalence of os.sep and os.altsep. In path semantics, the common prefix is "foo/bar" (or equivalently "foo\\bar"). I'm not sure how to deal with this, except by recommending that all paths passed to os.path.commonprefix should at the very least be normalised via os.path.normpath first - which starts to get clumsy fast. So the "recommended" usage to get the common directory is paths = [...] common = os.path.dirname(os.path.commonprefix([os.path.normpath(p) for p in paths])) Hmm... Paul. From martin at v.loewis.de Tue Dec 30 10:42:23 2008 From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=) Date: Tue, 30 Dec 2008 10:42:23 +0100 Subject: [Python-Dev] Commands for correctly merging to the python 3.0 maintenance branch In-Reply-To: <4959CF51.4030507@gmail.com> References: <4959CF51.4030507@gmail.com> Message-ID: <4959ECFF.2010803@v.loewis.de> > svn revert . > svnmerge -M -F [are you sure you don't need a command for svnmerge here?] Instead of these two, I always do svn resolved . Regards, Martin From skip at pobox.com Tue Dec 30 13:14:24 2008 From: skip at pobox.com (skip at pobox.com) Date: Tue, 30 Dec 2008 06:14:24 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> Message-ID: <18778.4256.215698.798495@montanaro-dyndns-org.local> Paul demonstrates the shortcoming of commonprefix: >>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo' With the patch in issue4755: >>> import ntpath >>> ntpath.commonpathprefix(["foo\\bar\\baz", "foo/bar/boink"]) 'foo\\bar' Ta da ... Skip From pje at telecommunity.com Tue Dec 30 13:33:36 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 30 Dec 2008 07:33:36 -0500 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18778.4256.215698.798495@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <18778.4256.215698.798495@montanaro-dyndns-org.local> Message-ID: <20081230123153.252893A406C@sparrow.telecommunity.com> At 06:14 AM 12/30/2008 -0600, skip at pobox.com wrote: >Paul demonstrates the shortcoming of commonprefix: > > >>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"]) > 'foo' > >With the patch in issue4755: > > >>> import ntpath > >>> ntpath.commonpathprefix(["foo\\bar\\baz", "foo/bar/boink"]) > 'foo\\bar' But it doesn't handle the fact that Windows paths are case-insensitive, or that Posix paths can have symlinks... or that one path might be relative and another absolute... As soon as you move away from being a string operation, you get an endless series of gotchas... none of which are currently documented. From doomster at knuut.de Tue Dec 30 13:18:51 2008 From: doomster at knuut.de (Ulrich Eckhardt) Date: Tue, 30 Dec 2008 13:18:51 +0100 Subject: [Python-Dev] WinCE port (issues #4075 #4051) Message-ID: <200812301318.51367.doomster@knuut.de> Hi! I'm currently working again on the CE port, and since 2.6 and 3.0 are now out of the door, could you apply the patches in #4075 & #4051? Both patches are fairly isolated and easy to review and I'm pretty sure they won't cause any inconveniences. Note: this is far from everything that is necessary for Python to rock on CE, but these are prerequisites, as explained in both bugs' histories. thanks Uli From skip at pobox.com Tue Dec 30 13:58:11 2008 From: skip at pobox.com (skip at pobox.com) Date: Tue, 30 Dec 2008 06:58:11 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <20081230123153.252893A406C@sparrow.telecommunity.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <18778.4256.215698.798495@montanaro-dyndns-org.local> <20081230123153.252893A406C@sparrow.telecommunity.com> Message-ID: <18778.6883.440406.175729@montanaro-dyndns-org.local> Phillip> But it doesn't handle the fact that Windows paths are Phillip> case-insensitive, or that Posix paths can have symlinks... or Phillip> that one path might be relative and another absolute... Phillip> As soon as you move away from being a string operation, you get Phillip> an endless series of gotchas... none of which are currently Phillip> documented. Well, then we can document (some of?) the gotchas* and work on a better implementation of commonpathprefix. I don't do Windows. You're lucky I got as far as I did with the Windows side of things. ;-) Skip * I would argue that symlinks should be transparent. By the very nature of the operations and the fact that they might be performed on other platforms (import posixpath on Windows for instance) there is not much, if anything, you can infer about the paths themselves other than their structure. From ncoghlan at gmail.com Tue Dec 30 15:19:08 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 31 Dec 2008 00:19:08 +1000 Subject: [Python-Dev] Commands for correctly merging to the python 3.0 maintenance branch In-Reply-To: <4959ECFF.2010803@v.loewis.de> References: <4959CF51.4030507@gmail.com> <4959ECFF.2010803@v.loewis.de> Message-ID: <495A2DDC.6080401@gmail.com> Martin v. L?wis wrote: >> svn revert . >> svnmerge -M -F > > [are you sure you don't need a command for svnmerge here?] D'oh, I thought I fixed that before sending the message. Yes, that line should indeed be: svnmerge merge -M -F > Instead of these two, I always do > > svn resolved . That's what I had been doing before today, and I believe it works correctly so long as you never get the svn update and svnmerge merge operations out of sequence (i.e. always update and only then merge). However, I encountered the case today where I had already merged to the maintenance branch and did the svn update afterwards. In that situation, reverting the property changes and reapplying them was the only way for me to avoid losing the record of the changes everyone else had already merged. If I hadn't checked the property diff and noticed that several merged revisions were no longer listed in the property in my working copy, then svnmerge may have become very confused. The revert+redo-merge-bookkeeping approach is definitely slower than just marking the conflict as resolved, but has a definite advantage in doing the right thing even if the earlier update+merge operations were performed out of sequence (or if an extra update becomes necessary due to checkins after the merge was performed). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From ncoghlan at gmail.com Tue Dec 30 22:20:25 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 31 Dec 2008 07:20:25 +1000 Subject: [Python-Dev] test_subprocess and sparc buildbots Message-ID: <495A9099.1030907@gmail.com> Does anyone have local access to a sparc machine to try to track down the ongoing buildbot failures in test_subprocess? (I think the problem is specific to 3.x builds on sparc machines, but I haven't checked the buildbots all that closely - that assessment is just based on what I recall of the buildbot failure emails). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From barry at barrys-emacs.org Tue Dec 30 22:45:44 2008 From: barry at barrys-emacs.org (Barry Scott) Date: Tue, 30 Dec 2008 21:45:44 +0000 Subject: [Python-Dev] Python 3 - Mac Installer? In-Reply-To: <1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com> References: <200812260855.49518.list@qtrac.plus.com> <1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com> Message-ID: <74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org> On 26 Dec 2008, at 23:30, Benjamin Peterson wrote: > On Fri, Dec 26, 2008 at 2:55 AM, Mark Summerfield > wrote: >> Hi, >> >> Just wondered if/when there'd be a Mac installer for Python 3? > > I think there should be one eventually. Unfortunately, the 3.x build > process is not ironed out. If somebody wants to make a patch which > makes the build script in Mac/BuildScript/ work, I'd be very happy. :) Since I've been building 3.0 for a while now I looked at the script. build-install.py seems to have been half converted to py 3.0. Going full 3.0 was not hard but then there is the problem of the imports. Python 3.0 does not have MacOS or Carbon modules. Seems that there are two ways to go. Put back the Carbon and MacOS modules into 3.0. Use Python 2 to build the python 3 package. Barry From benjamin at python.org Tue Dec 30 22:59:51 2008 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 30 Dec 2008 15:59:51 -0600 Subject: [Python-Dev] Python 3 - Mac Installer? In-Reply-To: <74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org> References: <200812260855.49518.list@qtrac.plus.com> <1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com> <74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org> Message-ID: <1afaf6160812301359r36d3b5b9k98afb21b517a69ce@mail.gmail.com> On Tue, Dec 30, 2008 at 3:45 PM, Barry Scott wrote: > > build-install.py seems to have been half converted to py 3.0. > Going full 3.0 was not hard but then there is the problem of > the imports. Thanks for your help, but just today Ronald Oussoren, the Mac maintainer, spent some time making the installer work. As a result, we should be ready to go for 3.0.1! > > Python 3.0 does not have MacOS or Carbon modules. > > Seems that there are two ways to go. > > Put back the Carbon and MacOS modules into 3.0. > Use Python 2 to build the python 3 package. I've converted it back to 2.x for the time being. Eventually, I think some 3.x bindings should be released. -- Regards, Benjamin Peterson From Scott.Daniels at Acm.Org Tue Dec 30 23:32:02 2008 From: Scott.Daniels at Acm.Org (Scott David Daniels) Date: Tue, 30 Dec 2008 14:32:02 -0800 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> Message-ID: Paul Moore wrote: > 2008/12/30 Phillip J. Eby : >> You know, all this path separator and list complication isn't really >> necessary, when you can just take the os.path.dirname() of the return from >> commonprefix().... > > Actually, consider: ... >>>> os.path.commonprefix(["foo\\bar\\baz", "foo/bar/boink"]) > 'foo' > > ... I'm not sure how to deal with this, except by recommending that all > paths passed to os.path.commonprefix should at the very least be > normalised via os.path.normpath first - which starts to get clumsy > fast. So the "recommended" usage to get the common directory is > > paths = [...] > common = os.path.dirname(os.path.commonprefix([ > os.path.normpath(p) for p in paths])) More trouble with the "just take the dirname": paths = ['/a/b/c', '/a/b/d', '/a/b'] os.path.dirname(os.path.commonprefix([ os.path.normpath(p) for p in paths])) give '/a', not '/a/b'. --Scott David Daniels Scott.Daniels at Acm.Org From pje at telecommunity.com Tue Dec 30 23:51:48 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 30 Dec 2008 17:51:48 -0500 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> Message-ID: <20081230225006.B5D043A405E@sparrow.telecommunity.com> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: >More trouble with the "just take the dirname": > > paths = ['/a/b/c', '/a/b/d', '/a/b'] > os.path.dirname(os.path.commonprefix([ > os.path.normpath(p) for p in paths])) > >give '/a', not '/a/b'. ...because that's the correct answer. From jcea at jcea.es Wed Dec 31 01:08:38 2008 From: jcea at jcea.es (Jesus Cea) Date: Wed, 31 Dec 2008 01:08:38 +0100 Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) In-Reply-To: <3c6c07c20812230954h216d784w183ca8952d89c793@mail.gmail.com> References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com> <3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com> <3c6c07c20812230954h216d784w183ca8952d89c793@mail.gmail.com> Message-ID: <495AB806.7050603@jcea.es> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mike Coleman wrote: > I guess if ints are 12 bytes (per Beazley's book, but not sure if that > still holds), then that would correspond to a 1GB reduction. Python 2.6.1 (r261:67515, Dec 11 2008, 20:28:07) [GCC 4.2.3] on sunos5 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getsizeof(0) 12 - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea at jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea at jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSVq3+Zlgi5GaxT1NAQLYUAP+Jc0JPYf2GPdNCKypORO+mD887xs81hQ0 MM7QBbRgLflcQ6g2tijpWPhN2/INscbtFn41lptHEYFTv/kka9EICuxgoNP1COYT Or+1uChnSsx1Z7Xxr8YwLFe6ZW/LDyvPjCMpIT32mGSlc1/mfPZk3WjpqTJPeCwY vqu9xD0T0iw= =gXQ5 -----END PGP SIGNATURE----- From solipsis at pitrou.net Wed Dec 31 01:40:02 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 31 Dec 2008 00:40:02 +0000 (UTC) Subject: [Python-Dev] extremely slow exit for program having huge (45G) dict (python 2.5.2) References: <3c6c07c20812191529t30aea97flf2a722b7000b4490@mail.gmail.com> <930F189C8A437347B80DF2C156F7EC7F04D1702BD8@exchis.ccp.ad.local> <3c6c07c20812200857y327b2f8cp6c6b8a5bb4f34048@mail.gmail.com> <494D4FD0.4020202@egenix.com> <18765.21740.137339.943481@montanaro-dyndns-org.local> <3c6c07c20812201605g34b2a049qf3b8836634c90fc5@mail.gmail.com> <3c6c07c20812201622i4cf17aefo8f9b62ee4560df45@mail.gmail.com> <3c6c07c20812230954h216d784w183ca8952d89c793@mail.gmail.com> <495AB806.7050603@jcea.es> Message-ID: Jesus Cea jcea.es> writes: > > Mike Coleman wrote: > > I guess if ints are 12 bytes (per Beazley's book, but not sure if that > > still holds), then that would correspond to a 1GB reduction. > > Python 2.6.1 (r261:67515, Dec 11 2008, 20:28:07) > [GCC 4.2.3] on sunos5 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sys > >>> sys.getsizeof(0) > 12 On a 32-bit system, sure, but given Mike creates a 45 GB dict, he has a 64-bit system, where ints are 24 bytes: >>> sys.getsizeof(0) 24 >>> sys.getsizeof(100000) 24 cheers Antoine. From victor.stinner at haypocalc.com Wed Dec 31 01:49:32 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 31 Dec 2008 01:49:32 +0100 Subject: [Python-Dev] Missing FAQ about Python3 and unicode Message-ID: <200812310149.32686.victor.stinner@haypocalc.com> Hi, Slowly, we get recurrent questions about Python3 and unicode. It's maybe time to start a FAQ? Here is an ugly draft to start it ;-) (1) Exit on undecodable command line arguments $ LANG=en_GB.UTF-8 python3.0 test.py $'\xff' Could not convert argument 2 to string$ Is it an expected behaviour? Yes! Example of the question: http://bugs.python.org/issue3023 (2) Undecodable filenames os.listdir(str)->str raises an exception on undecodable filenames. Solution: use os.listdir(bytes)->bytes. To display the filename to the user, use a function like: import sys def humanFilename(filename): encoding = sys.getfilesystemencoding() return filename.encode(encoding, "replace") See also http://bugs.python.org/issue3187 (3) Bytes environment variables Python 3.0 only supports decodable variables for os.environ. Undecodable variables are skipped for the creation of os.environ but original variables still exist at the C level. $ A=$(echo -e "\xff") B=c ./python Python 3.1a0 (py3k:67973M, Dec 31 2008, 00:51:49) >>> import os >>> os.environ.get('A'), os.environ.get('B') (None, 'c') >>> retcode=os.system('echo -n $A|hexdump -C') 00000000 ff |.| 00000001 >>> retcode=os.system('echo -n $B|hexdump -C') 00000000 63 |c| 00000001 Discussion to support bytes environment variables: http://mail.python.org/pipermail/python-dev/2008-December/083856.html -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From victor.stinner at haypocalc.com Wed Dec 31 01:55:40 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 31 Dec 2008 01:55:40 +0100 Subject: [Python-Dev] I would like an svn account Message-ID: <200812310155.40206.victor.stinner@haypocalc.com> Hi, I already asked in September to get an svn account to be able to commit directly patches to trunk (or other branches like py3k). My query was rejected because I didn't know Python core enough (and maybe other reasons that I don't know). I helped to fix many issues using the bug tracker. The bigger patch was the bytes filename support for Python3, accepted by Guido (after a long review ;-)). Why an svn account instead of just using the amazing bug tracker? Just because there are not enough people to review/commit patches on the tracker and so there are more and more open issues (and so more and more lost patches) :-( I will be able to work faster using the svn. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From ncoghlan at gmail.com Wed Dec 31 02:30:06 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 31 Dec 2008 11:30:06 +1000 Subject: [Python-Dev] I would like an svn account In-Reply-To: <200812310155.40206.victor.stinner@haypocalc.com> References: <200812310155.40206.victor.stinner@haypocalc.com> Message-ID: <495ACB1E.1020300@gmail.com> Victor Stinner wrote: > Hi, > > I already asked in September to get an svn account to be able to commit > directly patches to trunk (or other branches like py3k). My query was > rejected because I didn't know Python core enough (and maybe other reasons > that I don't know). > > I helped to fix many issues using the bug tracker. The bigger patch was the > bytes filename support for Python3, accepted by Guido (after a long > review ;-)). > > Why an svn account instead of just using the amazing bug tracker? Just because > there are not enough people to review/commit patches on the tracker and so > there are more and more open issues (and so more and more lost patches) :-( I > will be able to work faster using the svn. +1 here Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- From jnoller at gmail.com Wed Dec 31 02:47:55 2008 From: jnoller at gmail.com (Jesse Noller) Date: Tue, 30 Dec 2008 20:47:55 -0500 Subject: [Python-Dev] I would like an svn account In-Reply-To: <495ACB1E.1020300@gmail.com> References: <200812310155.40206.victor.stinner@haypocalc.com> <495ACB1E.1020300@gmail.com> Message-ID: <42758A82-7A0D-4079-889A-EE5D618E76C0@gmail.com> On Dec 30, 2008, at 8:30 PM, Nick Coghlan wrote: > Victor Stinner wrote: >> Hi, >> >> I already asked in September to get an svn account to be able to >> commit >> directly patches to trunk (or other branches like py3k). My query was >> rejected because I didn't know Python core enough (and maybe other >> reasons >> that I don't know). >> >> I helped to fix many issues using the bug tracker. The bigger patch >> was the >> bytes filename support for Python3, accepted by Guido (after a long >> review ;-)). >> >> Why an svn account instead of just using the amazing bug tracker? >> Just because >> there are not enough people to review/commit patches on the tracker >> and so >> there are more and more open issues (and so more and more lost >> patches) :-( I >> will be able to work faster using the svn. > > +1 here > > Cheers, > Nick. > > Also +1 FWIW Jesse From rdmurray at bitdance.com Wed Dec 31 03:30:21 2008 From: rdmurray at bitdance.com (rdmurray at bitdance.com) Date: Tue, 30 Dec 2008 21:30:21 -0500 (EST) Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <20081230225006.B5D043A405E@sparrow.telecommunity.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <20081230225006.B5D043A405E@sparrow.telecommunity.com> Message-ID: On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote: > At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: >> More trouble with the "just take the dirname": >> >> paths = ['/a/b/c', '/a/b/d', '/a/b'] >> os.path.dirname(os.path.commonprefix([ >> os.path.normpath(p) for p in paths])) >> >> give '/a', not '/a/b'. > > ...because that's the correct answer. But not the answer that is wanted. So the challenge now is to write a single expression that will yield '/a/b' when passed the above paths list, and also produce '/a/b' when passed the following paths list: paths = ['/a/b/c', '/a/b/cd'] --RDM From alexandre at peadrop.com Wed Dec 31 03:37:01 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Tue, 30 Dec 2008 21:37:01 -0500 Subject: [Python-Dev] test_subprocess and sparc buildbots In-Reply-To: <495A9099.1030907@gmail.com> References: <495A9099.1030907@gmail.com> Message-ID: Here is what I found just by analyzing the logs. It seems the first failures appeared after this change: http://svn.python.org/view/python/branches/release30-maint/Objects/object.c?rev=67888&view=diff&r1=67888&r2=67887&p1=python/branches/release30-maint/Objects/object.c&p2=/python/branches/release30-maint/Objects/object.c The logs of failing test runs all shows the same error message: [31481 refs] * ob object : type : str refcount: 0 address : 0x3a97728 * op->_ob_prev->_ob_next object : type : str refcount: 0 address : 0x3a97728 * op->_ob_next->_ob_prev object : [31776 refs] This is the output of _Py_ForgetReference (which calls _PyObject_Dump) called either from _PyUnicode_New or unicode_subtype_new. In both cases, this implies PyObject_MALLOC returned NULL when allocating the internal array of a str object. However, I have no idea why malloc() is failing there. By counting the number of [reftotal] printed in the log, I found that the failing test could be one of the following: test_invalid_args, test_invalid_bufsize, test_list2cmdline, test_no_leaking. Looking at the tests, it seems only test_no_leaking could be problematic: * test_list2cmdline checks if the subprocess.line2cmdline function works correctly, only Python code is involved here; * test_invalid_args checks if using an option unsupported by a platform raises an exception, only Python code is involved here; * test_invalid_bufsize only checks whether Popen rejects non-integer bufsize, only Python code is involved here. And unsurprisingly, that is the failing test: test test_subprocess failed -- Traceback (most recent call last): File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/test/test_subprocess.py", line 423, in test_no_leaking data = p.communicate(b"lime")[0] File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/subprocess.py", line 671, in communicate return self._communicate(input) File "/home/pybot/buildarea-sid/3.0.klose-debian-sparc/build/Lib/subprocess.py", line 1171, in _communicate bytes_written = os.write(self.stdin.fileno(), chunk) OSError: [Errno 32] Broken pipe It seems one of the spawned processes goes out of memory while allocating a new PyUnicode object. I believe we don't see the usual MemoryError because the parent process catches stderr and stdout of the children. Also, only klose-*-sparc buildbots are failing this way; loewis-sun is failing too but for a different reason. So, how much memory is available on this machine (or actually, on this virtual machine)? Now, I wonder why manipulating the GIL caused the bug to appear in 3.0, but not in 2.x. Maybe it is related to the new I/O library in Python 3.0. Regards, -- Alexandre On Tue, Dec 30, 2008 at 4:20 PM, Nick Coghlan wrote: > Does anyone have local access to a sparc machine to try to track down > the ongoing buildbot failures in test_subprocess? > > (I think the problem is specific to 3.x builds on sparc machines, but I > haven't checked the buildbots all that closely - that assessment is just > based on what I recall of the buildbot failure emails). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > --------------------------------------------------------------- > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/alexandre%40peadrop.com > From rdmurray at bitdance.com Wed Dec 31 03:40:07 2008 From: rdmurray at bitdance.com (rdmurray at bitdance.com) Date: Tue, 30 Dec 2008 21:40:07 -0500 (EST) Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <20081230225006.B5D043A405E@sparrow.telecommunity.com> Message-ID: On Tue, 30 Dec 2008 at 21:30, rdmurray at bitdance.com wrote: > On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote: >> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: >> > More trouble with the "just take the dirname": >> > >> > paths = ['/a/b/c', '/a/b/d', '/a/b'] >> > os.path.dirname(os.path.commonprefix([ >> > os.path.normpath(p) for p in paths])) >> > >> > give '/a', not '/a/b'. >> >> ...because that's the correct answer. > > But not the answer that is wanted. > > So the challenge now is to write a single expression that will yield > '/a/b' when passed the above paths list, and also produce '/a/b' when > passed the following paths list: > > paths = ['/a/b/c', '/a/b/cd'] Sorry, now I see what you are saying: that in '/a/b' the 'b' is the filename. Clearly that wasn't what I intuitively expected our notional 'commonpathprefix' command to produce, for whatever that is worth :) --RDM From skip at pobox.com Wed Dec 31 03:57:45 2008 From: skip at pobox.com (skip at pobox.com) Date: Tue, 30 Dec 2008 20:57:45 -0600 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <20081230225006.B5D043A405E@sparrow.telecommunity.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <20081230225006.B5D043A405E@sparrow.telecommunity.com> Message-ID: <18778.57257.227598.592245@montanaro-dyndns-org.local> Phillip> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: >> More trouble with the "just take the dirname": >> >> paths = ['/a/b/c', '/a/b/d', '/a/b'] >> os.path.dirname(os.path.commonprefix([ >> os.path.normpath(p) for p in paths])) >> >> give '/a', not '/a/b'. Phillip> ...because that's the correct answer. I don't understand. If you search for os.path.commonprefix at codesearch.google.com you'll find uses like this: if os.path.commonprefix([basedir, somepath]) != basedir: ... which leads me to believe that other people using the current function in the real world would be confused by your interpretation. Skip From benjamin at python.org Wed Dec 31 04:29:19 2008 From: benjamin at python.org (Benjamin Peterson) Date: Tue, 30 Dec 2008 21:29:19 -0600 Subject: [Python-Dev] Missing FAQ about Python3 and unicode In-Reply-To: <200812310149.32686.victor.stinner@haypocalc.com> References: <200812310149.32686.victor.stinner@haypocalc.com> Message-ID: <1afaf6160812301929u509378fbxd0794c76ee13af82@mail.gmail.com> On Tue, Dec 30, 2008 at 6:49 PM, Victor Stinner wrote: > Hi, > > Slowly, we get recurrent questions about Python3 and unicode. It's maybe time > to start a FAQ? Here is an ugly draft to start it ;-) Looks like good stuff! It would probably make a good addition to the meager porting docs in development on the wiki. [1] ... [1] http://wiki.python.org/moin/PortingToPy3k -- Regards, Benjamin Peterson From ajaksu at gmail.com Wed Dec 31 04:41:41 2008 From: ajaksu at gmail.com (Daniel (ajax) Diniz) Date: Wed, 31 Dec 2008 01:41:41 -0200 Subject: [Python-Dev] test_subprocess and sparc buildbots In-Reply-To: References: <495A9099.1030907@gmail.com> Message-ID: <2d75d7660812301941r3c133eaw7094609bd6bc51ce@mail.gmail.com> Alexandre Vassalotti wrote: > The logs of failing test runs all shows the same error message: > > [31481 refs] > * ob > object : > type : str > refcount: 0 > address : 0x3a97728 > * op->_ob_prev->_ob_next > object : > type : str > refcount: 0 > address : 0x3a97728 > * op->_ob_next->_ob_prev > object : [31776 refs] A reliable way to get that in a --with-pydebug build seems to be: ~/py3k$ ./python -c "import locale; locale.format_string(1,1)" * ob object : type : tuple refcount: 0 address : 0x825c76c * op->_ob_prev->_ob_next NULL * op->_ob_next->_ob_prev object : type : tuple refcount: 0 address : 0x825c76c Fatal Python error: UNREF invalid object TypeError: expected string or buffer Aborted Found using Fusil in a very quick run on top of: Python 3.1a0 (py3k:68055M, Dec 31 2008, 01:34:52) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 So kudos to Victor again :) HTH, Daniel From python at rcn.com Wed Dec 31 05:05:12 2008 From: python at rcn.com (Raymond Hettinger) Date: Tue, 30 Dec 2008 20:05:12 -0800 Subject: [Python-Dev] I would like an svn account References: <200812310155.40206.victor.stinner@haypocalc.com> Message-ID: <9A42531762714CADABF8A6F40C08AD23@RaymondLaptop1> From: "Victor Stinner" > Why an svn account instead of just using the amazing bug tracker? Just because > there are not enough people to review/commit patches on the tracker and so > there are more and more open issues (and so more and more lost patches) :-( I > will be able to work faster using the svn. Based on the work I've seen so far, my preference is that you continue to use the tracker instead of directly committing patches. Raymond From pje at telecommunity.com Wed Dec 31 05:08:04 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 30 Dec 2008 23:08:04 -0500 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <20081230225006.B5D043A405E@sparrow.telecommunity.com> Message-ID: <20081231040622.4C8143A405E@sparrow.telecommunity.com> At 09:30 PM 12/30/2008 -0500, rdmurray at bitdance.com wrote: >On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote: >>At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: >>>More trouble with the "just take the dirname": >>> >>> paths = ['/a/b/c', '/a/b/d', '/a/b'] >>> os.path.dirname(os.path.commonprefix([ >>> os.path.normpath(p) for p in paths])) >>>give '/a', not '/a/b'. >> >>...because that's the correct answer. > >But not the answer that is wanted. > >So the challenge now is to write a single expression that will yield >'/a/b' when passed the above paths list, and also produce '/a/b' when >passed the following paths list: > > paths = ['/a/b/c', '/a/b/cd'] Change that to [os.path.normpath(p)+'/' for p in paths] and you've got yourself a winner. From pje at telecommunity.com Wed Dec 31 05:11:34 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 30 Dec 2008 23:11:34 -0500 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <18778.57257.227598.592245@montanaro-dyndns-org.local> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <20081230225006.B5D043A405E@sparrow.telecommunity.com> <18778.57257.227598.592245@montanaro-dyndns-org.local> Message-ID: <20081231040951.36EB43A410E@sparrow.telecommunity.com> At 08:57 PM 12/30/2008 -0600, skip at pobox.com wrote: > Phillip> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: > >> More trouble with the "just take the dirname": > >> > >> paths = ['/a/b/c', '/a/b/d', '/a/b'] > >> os.path.dirname(os.path.commonprefix([ > >> os.path.normpath(p) for p in paths])) > >> > >> give '/a', not '/a/b'. > > Phillip> ...because that's the correct answer. > >I don't understand. If you search for os.path.commonprefix at >codesearch.google.com you'll find uses like this: > > if os.path.commonprefix([basedir, somepath]) != basedir: > ... > >which leads me to believe that other people using the current function in >the real world would be confused by your interpretation. It never would've occurred to me to use it for that, versus checking for somepath.startswith(basedir+sep). The only thing I've ever used commonprefix for is to find the most-specific directory that contains all the specified paths. Never occurred to me that there was any other use for it, actually. From stephen at xemacs.org Wed Dec 31 08:46:09 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 31 Dec 2008 16:46:09 +0900 Subject: [Python-Dev] I would like an svn account In-Reply-To: <200812310155.40206.victor.stinner@haypocalc.com> References: <200812310155.40206.victor.stinner@haypocalc.com> Message-ID: <87fxk4ss72.fsf@xemacs.org> Victor Stinner writes: > I already asked in September to get an svn account to be able to > commit directly patches to trunk (or other branches like py3k). My > query was rejected because I didn't know Python core enough (and > maybe other reasons that I don't know). One possible reason is that commit privilege is not about quality of code, it's about quality of review. Would you review your own code in the same way that other committers review their own? Would you make the same decisions about which fixes to commit, which changes to wait for others' review, and which to propose on Python-Dev first? Remember, to be appropriate for Python, a patch needs not only to be good code, it must also be "Pythonic". Does your personal sense of code quality result in Pythonic patches? (I can't answer that, because my own sense of Pythonicity is dubiously reliable at best.) Another possible reason is that, while it's not an absolute requirement, in my projects I'm always a lot more supportive of candidates who have a track record of helping others get their patches committed. Of course if your patches have a history of being accepted often without substantial change, then implicitly you are doing good self-review, and that might be enough. But in my book, that path *should* take longer and demand higher standards than the "review others' patches" path. > The bigger patch was the bytes filename support for Python3, > accepted by Guido (after a long review ;-)). Would you have committed that patch if nobody else had reviewed it? > Just because there are not enough people to review/commit patches > on the tracker and Are you planning to review and commit other people's patches, and help reduce this backlog? Or just your own? Your emphasis on your own working speed suggests the latter. Again, I'm more supportive of people who want commit privileges in part to help improve the project's process, as well as to remove obstacles to their own work. > so there are more and more open issues (and so more and more lost > patches) :-( An open issue is not a lost patch. It's an open issue. In my own projects, I oppose candidates who seem to think that the presumption is that a patch should be applied quickly unless there's good reason given not to. Your phrasing suggests that attitude to me. You don't have to pay attention to me, since I don't have a vote in the matter. And I don't mean to be negatively critical of you, because I'm not in a position to speak for the Powers That Be in Python. Those are my criteria, and other people and projects use different ones. But it seems to me that the committers in Python do mostly conform to my criteria, and thus it's *possible* that those criteria are somewhat representative of the "maybe other reasons [you] don't know." If so, I suppose an explicit explanation may be of use to you (and others in your position). Happy New Year to you! From alexandre at peadrop.com Wed Dec 31 08:50:54 2008 From: alexandre at peadrop.com (Alexandre Vassalotti) Date: Wed, 31 Dec 2008 02:50:54 -0500 Subject: [Python-Dev] test_subprocess and sparc buildbots In-Reply-To: <2d75d7660812301941r3c133eaw7094609bd6bc51ce@mail.gmail.com> References: <495A9099.1030907@gmail.com> <2d75d7660812301941r3c133eaw7094609bd6bc51ce@mail.gmail.com> Message-ID: On Tue, Dec 30, 2008 at 10:41 PM, Daniel (ajax) Diniz wrote: > A reliable way to get that in a --with-pydebug build seems to be: > > ~/py3k$ ./python -c "import locale; locale.format_string(1,1)" > * ob > object : > type : tuple > refcount: 0 > address : 0x825c76c > * op->_ob_prev->_ob_next > NULL > * op->_ob_next->_ob_prev > object : > type : tuple > refcount: 0 > address : 0x825c76c > Fatal Python error: UNREF invalid object > TypeError: expected string or buffer > Aborted > Nice catch! I reduced your example to: "import _sre; _sre.compile(0, 0, [])". And, it doesn't seem to be an input validation problem with _sre. From what I saw, it's actually a bug in Py_TRACE_REFS's code. Now, it's getting interesting! It seems something is breaking the refchain. However, I don't know what is causing the problem exactly. > Found using Fusil in a very quick run on top of: > Python 3.1a0 (py3k:68055M, Dec 31 2008, 01:34:52) > [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 > > So kudos to Victor again :) > Could share the details on how you used Fusil to find another crasher? It sounds like a useful tool. Thanks! -- Alexandre From p.f.moore at gmail.com Wed Dec 31 09:49:43 2008 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 31 Dec 2008 08:49:43 +0000 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <20081231040622.4C8143A405E@sparrow.telecommunity.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <20081230225006.B5D043A405E@sparrow.telecommunity.com> <20081231040622.4C8143A405E@sparrow.telecommunity.com> Message-ID: <79990c6b0812310049g9c22991n21356ccba1cf6376@mail.gmail.com> 2008/12/31 Phillip J. Eby : > Change that to [os.path.normpath(p)+'/' for p in paths] and you've got > yourself a winner. s#'/'#os.sep# to make it work on Windows as well :-) Have we established yet that this is hard enough to get right to warrant a stdlib implementation? Paul From solipsis at pitrou.net Wed Dec 31 14:08:50 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 31 Dec 2008 13:08:50 +0000 (UTC) Subject: [Python-Dev] A wart which should have been repaired in 3.0? References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <20081230225006.B5D043A405E@sparrow.telecommunity.com> <18778.57257.227598.592245@montanaro-dyndns-org.local> Message-ID: pobox.com> writes: > > which leads me to believe that other people using the current function in > the real world would be confused by your interpretation. ... and are vulnerable to security hazards. From steve at holdenweb.com Wed Dec 31 14:21:49 2008 From: steve at holdenweb.com (Steve Holden) Date: Wed, 31 Dec 2008 08:21:49 -0500 Subject: [Python-Dev] A wart which should have been repaired in 3.0? In-Reply-To: <20081231040622.4C8143A405E@sparrow.telecommunity.com> References: <18773.27523.297588.265405@montanaro-dyndns-org.local> <1230460141.6361.4.camel@localhost> <49575952.7070405@gmail.com> <4957DFC5.9030405@v.loewis.de> <18776.1376.724926.669345@montanaro-dyndns-org.local> <495808C6.4050304@v.loewis.de> <18776.2535.459306.987378@montanaro-dyndns-org.local> <1bc395c10812291349t149bf3fcm7926934cef9fd6be@mail.gmail.com> <18777.21289.504321.865439@montanaro-dyndns-org.local> <20081230010023.B46883A406C@sparrow.telecommunity.com> <79990c6b0812300136i323cb7eem76d2889262fd2175@mail.gmail.com> <20081230225006.B5D043A405E@sparrow.telecommunity.com> <20081231040622.4C8143A405E@sparrow.telecommunity.com> Message-ID: Phillip J. Eby wrote: > At 09:30 PM 12/30/2008 -0500, rdmurray at bitdance.com wrote: >> On Tue, 30 Dec 2008 at 17:51, Phillip J. Eby wrote: >>> At 02:32 PM 12/30/2008 -0800, Scott David Daniels wrote: >>>> More trouble with the "just take the dirname": >>>> >>>> paths = ['/a/b/c', '/a/b/d', '/a/b'] >>>> os.path.dirname(os.path.commonprefix([ >>>> os.path.normpath(p) for p in paths])) >>>> give '/a', not '/a/b'. >>> >>> ...because that's the correct answer. >> >> But not the answer that is wanted. >> >> So the challenge now is to write a single expression that will yield >> '/a/b' when passed the above paths list, and also produce '/a/b' when >> passed the following paths list: >> >> paths = ['/a/b/c', '/a/b/cd'] > > Change that to [os.path.normpath(p)+'/' for p in paths] and you've got > yourself a winner. > Or possibly [os.path.normpath(p)+os.path.sep for p in paths]? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ From victor.stinner at haypocalc.com Wed Dec 31 14:26:58 2008 From: victor.stinner at haypocalc.com (Victor Stinner) Date: Wed, 31 Dec 2008 14:26:58 +0100 Subject: [Python-Dev] I would like an svn account In-Reply-To: <87fxk4ss72.fsf@xemacs.org> References: <200812310155.40206.victor.stinner@haypocalc.com> <87fxk4ss72.fsf@xemacs.org> Message-ID: <200812311426.58779.victor.stinner@haypocalc.com> Le Wednesday 31 December 2008 08:46:09 Stephen J. Turnbull, vous avez ?crit?: > Would you review your own code in the same way that other committers > review their own? I'm unable to review my own code. I always re-read my code and test it, but I can not see every possibles cases. That's why I prefer external eyes to review my code for parts of the code that I don't understand/known well enough. > Would you make the same decisions about which fixes to commit, > which changes to wait for others' review, and which to propose > on Python-Dev first? I think that I'm able to know if a patch needs a review or not. Especially if the patch changes the behaviour or the API (or if the patch is complex), I always prefer a review. I will not use svn as I use the tracker. Sometimes, I write a quick and dirty patch to demonstrate a feature or to propose a solution to fix the bug. If the solution is accepted, I try to write a better patch. > > The bigger patch was the bytes filename support for Python3, > > accepted by Guido (after a long review ;-)). > > Would you have committed that patch if nobody else had reviewed it? Certainly not. The patch changed the behaviour of most functions related to files. The mailing list + the bug tracker were the right tools. > > Just because there are not enough people to review/commit patches > > on the tracker and > > Are you planning to review and commit other people's patches, and help > reduce this backlog? Or just your own? It depends on the issue. There are many trivial fixes that doesn't change the behaviour / API but just improve the project and are waiting for a review or are reviewed but not commited yet. About my own patch: yes, I would like to use direclty on the svn without using the tracker to fix trivial bugs. Example: during one month, there were two gcc warnings in _testcapi module. The fix was trivial and it requires too much efforts to open an issue for such stupid warning. > Again, I'm more supportive of > people who want commit privileges in part to help improve the > project's process, as well as to remove obstacles to their own work. My not-so-secret goal is also to improve Python stability against fuzzing. I stopped to work on fuzzing because it took sometimes months to fix a dummy bug (dummy : easy to understand but also easy to fix without side effects). Example of such issue: "import _tkinter; _tkinter.mainloop()" crashs Python (maybe not directly but later on garbage collection). I opened the issue (with a patch) in august, gpolo reviewed the patch ("Looks fine to me.") two weeks later, but 4 months later the isue is still open: http://bugs.python.org/issue3638 Is it was you called "An open issue is not a lost patch."? > An open issue is not a lost patch. It's an open issue. In my own > projects, I oppose candidates who seem to think that the presumption > is that a patch should be applied quickly unless there's good reason > given not to. Your phrasing suggests that attitude to me. Even after a review, some issues stay open for months or years. Another example of issue: nntplib doesn't support IPv6, dmorr proposed a simple and good patch reusing the nice function socket.create_connection() one year ago. In this case, I think that nobody was able to test the change. But without testing it, I'm sure that the patch is better than the current situation. Well, if I have to commit the patch, I will test it before. My computer has a public IPv6 address :-) http://bugs.python.org/issue1664 > You don't have to pay attention to me, No, your opinion is interresting. I hope that my answers will help you to understand my expectations about an svn account :-) -- Victor Stinner aka haypo http://www.haypocalc.com/blog/ From solipsis at pitrou.net Wed Dec 31 14:47:30 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 31 Dec 2008 13:47:30 +0000 (UTC) Subject: [Python-Dev] opcode dispatch optimization Message-ID: Hello, I would like to mention that I've written a patch which enables "threaded interpretation" on the ceval loop with gcc (*). On my computer (an Athlon X2 3600+), it is good for a 15-20% speedup of the interpreter on pystone and pybench. I also had the opportunity to test it on a Core2-derived CPU, where it doesn't make a difference (I conjecture it's because Core2 CPUs have hardware-based indirect branch optimizations). It will make no difference if the interpreter is compiled with something else than gcc (I tested on Windows). The additional complexity is very small. There's a separate script which is run to build the dispatch table (only if needed, that is if dis.py has been modified). In ceval.c, there are a couple of macros and some #ifdef's. That's all. It breaks no test in the regression suite. Could other people test and report their results here? (the patch is for py3k, btw). Also, what are you thoughts for/against integrating this patch in the standard interpreter? Regards Antoine. (*) please note: it has nothing to see with multithreading. From solipsis at pitrou.net Wed Dec 31 14:49:53 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 31 Dec 2008 13:49:53 +0000 (UTC) Subject: [Python-Dev] opcode dispatch optimization References: Message-ID: Antoine Pitrou pitrou.net> writes: > > I would like to mention that I've written a patch which enables "threaded > interpretation" ... and I forgot to give the URL: http://bugs.python.org/issue4753 Regards Antoine. From stephen at xemacs.org Wed Dec 31 16:04:42 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 01 Jan 2009 00:04:42 +0900 Subject: [Python-Dev] I would like an svn account In-Reply-To: <200812311426.58779.victor.stinner@haypocalc.com> References: <200812310155.40206.victor.stinner@haypocalc.com> <87fxk4ss72.fsf@xemacs.org> <200812311426.58779.victor.stinner@haypocalc.com> Message-ID: <87y6xwqtbp.fsf@xemacs.org> Victor Stinner writes: > Le Wednesday 31 December 2008 08:46:09 Stephen J. Turnbull, vous avez ?crit?: > > Would you review your own code in the same way that other committers > > review their own? > > I'm unable to review my own code. Of course not, in the formal "software process" sense. But in some sense to commit code you have to have reviewed it, that's all I meant. > Is it was you called "An open issue is not a lost patch."? Yes, and I'll say it again: > > An open issue is not a lost patch. It's an open issue. > Even after a review, some issues stay open for months or years. There *is* a process problem, though I don't claim to have an idea how to solve it. Some developers (especially well-known is Martin van Loewis) are trying to address this with the "one committer's review for five reviews" offer, but maybe there are even better ways to do it. However, this is a *different problem* from "lost patches", which many projects do suffer from, and shouldn't be called by that name, which is insulting to the Python committers. In particular, we know that effort is devoted to tracking open issues by the developers, both individually and as a formal matter (the weekly report). It is insufficient in some sense, but way better than, say, in XEmacs (a project I'm supposed to be leading :-/ ). And IIRC the statistics show that the number of issues closed is of the same order of magnitude as those opened, although consistently lower by 10-20%. Actually, I think that's pretty amazing for a project that has nobody whose salary depends on getting the numbers up. > > You don't have to pay attention to me, > > No, your opinion is interresting. I hope that my answers will help you to > understand my expectations about an svn account :-) Well, as I say I have no vote. But I hope your answers will help to convince any doubters among the committers. From solipsis at pitrou.net Wed Dec 31 16:11:31 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 31 Dec 2008 15:11:31 +0000 (UTC) Subject: [Python-Dev] lost patches References: <200812310155.40206.victor.stinner@haypocalc.com> <87fxk4ss72.fsf@xemacs.org> <200812311426.58779.victor.stinner@haypocalc.com> <87y6xwqtbp.fsf@xemacs.org> Message-ID: Hi, Stephen J. Turnbull xemacs.org> writes: > > There *is* a process problem, though I don't claim to have an idea how > to solve it. Some developers (especially well-known is Martin van > Loewis) are trying to address this with the "one committer's review > for five reviews" offer, but maybe there are even better ways to do > it. However, this is a *different problem* from "lost patches", which > many projects do suffer from, and shouldn't be called by that name, > which is insulting to the Python committers. I don't think it is insulting (I say that as a young Python committer), and I do think it is fair to call them "lost patches". Perhaps not after four months, but when a good patch hasn't been committed after two years, it is potentially lost because the code base has changed a lot since that and 1) the patch doesn't apply completely anymore 2) it must be reassessed whether the patch is good/useful/necessary with respect to the current code base, which can be tricky. As for reviews, we don't seem to use Rietveld a lot, although it offers a nice interface for comfortably viewing changes, and possibly commenting them. The overhead of having to open a separate issue in Rietveld and upload the patch there is a bit annoying, though. Regards Antoine. From lists at cheimes.de Wed Dec 31 18:44:27 2008 From: lists at cheimes.de (Christian Heimes) Date: Wed, 31 Dec 2008 18:44:27 +0100 Subject: [Python-Dev] opcode dispatch optimization In-Reply-To: References: Message-ID: <495BAF7B.5090405@cheimes.de> Antoine Pitrou wrote: > I would like to mention that I've written a patch which enables "threaded > interpretation" on the ceval loop with gcc (*). On my computer (an Athlon X2 > 3600+), it is good for a 15-20% speedup of the interpreter on pystone and > pybench. I also had the opportunity to test it on a Core2-derived CPU, where it > doesn't make a difference (I conjecture it's because Core2 CPUs have > hardware-based indirect branch optimizations). It will make no difference if the > interpreter is compiled with something else than gcc (I tested on Windows). The patch makes use of a GCC feature where labels can be used as values: http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html . I didn't know about the feature and got confused by the unary && operator. A happy new your to you all! Christian From jason.orendorff at gmail.com Wed Dec 31 19:51:28 2008 From: jason.orendorff at gmail.com (Jason Orendorff) Date: Wed, 31 Dec 2008 12:51:28 -0600 Subject: [Python-Dev] opcode dispatch optimization In-Reply-To: <495BAF7B.5090405@cheimes.de> References: <495BAF7B.5090405@cheimes.de> Message-ID: On Wed, Dec 31, 2008 at 11:44 AM, Christian Heimes wrote: > The patch makes use of a GCC feature where labels can be used as values: > http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html . I didn't know > about the feature and got confused by the unary && operator. Right. SpiderMonkey (Mozilla's JavaScript interpreter) does this, and it was good for a similar win on platforms that use GCC. (It took me a while to figure out why it was so much faster, so I think this patch would be better with a few very specific comments!) SpiderMonkey calls this optimization "threaded code" too, but this isn't the standard meaning of that term. See: http://en.wikipedia.org/wiki/Threaded_code -j From brett at python.org Wed Dec 31 21:19:41 2008 From: brett at python.org (Brett Cannon) Date: Wed, 31 Dec 2008 12:19:41 -0800 Subject: [Python-Dev] lost patches In-Reply-To: References: <200812310155.40206.victor.stinner@haypocalc.com> <87fxk4ss72.fsf@xemacs.org> <200812311426.58779.victor.stinner@haypocalc.com> <87y6xwqtbp.fsf@xemacs.org> Message-ID: On Wed, Dec 31, 2008 at 07:11, Antoine Pitrou wrote: > > Hi, > > Stephen J. Turnbull xemacs.org> writes: >> >> There *is* a process problem, though I don't claim to have an idea how >> to solve it. Some developers (especially well-known is Martin van >> Loewis) are trying to address this with the "one committer's review >> for five reviews" offer, but maybe there are even better ways to do >> it. However, this is a *different problem* from "lost patches", which >> many projects do suffer from, and shouldn't be called by that name, >> which is insulting to the Python committers. > > I don't think it is insulting (I say that as a young Python committer), and I do > think it is fair to call them "lost patches". Perhaps not after four months, but > when a good patch hasn't been committed after two years, it is potentially lost > because the code base has changed a lot since that and 1) the patch doesn't > apply completely anymore 2) it must be reassessed whether the patch is > good/useful/necessary with respect to the current code base, which can be tricky. > It is unfortunate when a good patch for a real issue doesn't get applied during the current development cycle. But I honestly think, in general, the important ones do get looked at and handled. Yes, some slip through the cracks, but overall I think we do pretty well. > As for reviews, we don't seem to use Rietveld a lot, although it offers a nice > interface for comfortably viewing changes, and possibly commenting them. The > overhead of having to open a separate issue in Rietveld and upload the patch > there is a bit annoying, though. My hope is that some day we get around to fixing this and getting a code review application tied into the issue workflow so it is no more than pressing a button. -Brett From brett at python.org Wed Dec 31 22:20:54 2008 From: brett at python.org (Brett Cannon) Date: Wed, 31 Dec 2008 13:20:54 -0800 Subject: [Python-Dev] I would like an svn account In-Reply-To: <200812310155.40206.victor.stinner@haypocalc.com> References: <200812310155.40206.victor.stinner@haypocalc.com> Message-ID: On Tue, Dec 30, 2008 at 16:55, Victor Stinner wrote: > Hi, > > I already asked in September to get an svn account to be able to commit > directly patches to trunk (or other branches like py3k). My query was > rejected because I didn't know Python core enough (and maybe other reasons > that I don't know). > I am going to stick my neck out on this one and say why I have not spoken up for giving you commit privs, Victor, and my general thoughts on handing them out since I don't think this has been stated by anyone before. When it comes to commit privs in general, I am of the school that they should be handed out carefully. I for one do not want to have to babysit other committers to make sure that they did something correctly. That's a waste of my time since that defeats the purpose of having more committers. This is why I think Benjamin got is privs too soon. Luckily Georg took it upon himself, I assume because he gave Benjamin the privileges, to double-check all of Benjamin's checkins and fix them until Benjamin absorbed enough of the development process to no longer need to be watched over. But I was honestly rather close to suggesting Benjamin lose is privileges early on until he had more time to figure out how things worked. Luckily it didn't come to that and Benjamin has turned out to be a good developer. I also want people who have no agenda. It's okay to have an area you care about, but that doesn't mean you should necessarily say "I will only work on math, ever, even if something is staring me right in the face!", etc. There is also dedication. I don't like giving commit privileges to people who I don't think will definitely stick around. It's fine if they come and go, but if I am not sure if they will typically come back I would prefer to not bother giving them the privilege of saying they are a developer of Python. Typically this takes a year of regular contributions for me to believe this. And lastly, general cohesion with the other committers. Once you become a committer you become a co-worker in a way and that means getting along with everybody. And since we don't have some manager who forces a new co-worker down our throats we tend to be very picky about this. Plus I already lived through high school and I don't want that kind of drama here. So that is my personal criteria on whether or not I speak up for someone getting commit privileges. How do you play into all of this in my head? To start, your focus on security, for me at least, goes too far sometimes. I have disagreed with some of your decisions in the name of security in the past and I am not quite ready to say that if you committed something I wouldn't feel compelled to double-check it to make sure you didn't go too far. This worry, though, has gone down a lot compared to the last time you asked for commit privs. And I do worry about your attitude. I remember at one point you basically threatened to stop helping because your patches were not been looked at quickly. That really pissed me off personally. You have improved here and are a lot less abrasive than you were, but I am still smarting a little from some comments you made a few months back that came off as pushy. And as I said, I prefer to give commit privileges to people who I think will stick around and have been contributing regularly for a year (I just checked bugs.python.org and it looks like you got really involved only five months ago). Saying you stopped doing your fuzzing work simply because the turn-around was not to your liking does not cause me to instantly think you will stick around when it gets nasty around here (which in variably does a couple times a year). In other words I think you are on the right track to get commit privileges in the future, but just not right now (although if you did get them right now I wouldn't throw up a roadblock). -Brett From nicko at nicko.org Wed Dec 31 23:34:40 2008 From: nicko at nicko.org (Nicko van Someren) Date: Wed, 31 Dec 2008 14:34:40 -0800 Subject: [Python-Dev] Python 3 - Mac Installer? In-Reply-To: <74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org> References: <200812260855.49518.list@qtrac.plus.com> <1afaf6160812261530r4f72eca8nf7cc519683bcbb16@mail.gmail.com> <74A762C2-585A-479D-BA3E-E0658E212A16@barrys-emacs.org> Message-ID: On 30 Dec 2008, at 13:45, Barry Scott wrote: ... > Since I've been building 3.0 for a while now I looked at the script. > > build-install.py seems to have been half converted to py 3.0. > Going full 3.0 was not hard but then there is the problem of > the imports. > > Python 3.0 does not have MacOS or Carbon modules. > > Seems that there are two ways to go. > > Put back the Carbon and MacOS modules into 3.0. > Use Python 2 to build the python 3 package. As far as I can tell the Carbon and MacOS modules are _only_ used in the setIcon() function, which is used to give pretty icon to the python folder. Perhaps it might be better to have a fully Python 3 build system and loose the prettiness for the time being. Nicko