From rhamph at gmail.com Sun Jun 1 00:28:15 2008 From: rhamph at gmail.com (Adam Olsen) Date: Sat, 31 May 2008 16:28:15 -0600 Subject: [Python-3000] sys.exc_info() In-Reply-To: References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au>

Message-ID: On Sat, May 31, 2008 at 2:03 PM, Antoine Pitrou wrote: > Adam Olsen gmail.com> writes: >> >> The bytecode generation for "raise" could be changed literally be the >> same as "except Foo as e: raise e". Reuse our existing stack, not add >> another one. > > As someone else pointed, there is a difference between the two constructs: the > latter appends a line to the traceback while the former doesn't. I suppose in > some contexts it can be useful (especially if the exception is re-raised several > times because of a complex architecture, e.g. a framework). Yeah. If anything it seems like a positive, not a negative. >> The commented out raise should use the outer except block (and thus be >> lexically based), but sys.exc_info() doesn't have to be. > > But would you object to sys.exc_info() being lexically based as well? > I say that because the bare "raise" statement and sys.exc_info() use the same > attributes internally, so they will have the same semantics unless we decide > it's better to do otherwise. I'm trying to eliminate complexity, paring it down to the bare minimum of supported functionality. However, I was also confusing sys.exc_info() with sys.last_*. That leads to another thought.. >> > Also, "yield" cannot blindingly clear the exception state, because the frame >> > calling the generator may except the exception state to be non-None. >> > Consequently, we might have to keep the f_exc_* members solely for the >> > generator case. >> >> Why? Why should the frame calling the generator be inspecting the >> exception state of the generator? What's the use case? > > You misunderstood me. The f_exc_* fields will be used internally to swap between > the inner generator's exception state and the calling frame's own exception > state. They will have no useful meaning for outside code so I suggest they are > not accessible from Python code anymore. Why not move f_exc_* into the PyTryBlock struct? We can eliminate the per-thread exception and have sys.exc_info() search the stack for an active except block. No need to swap anything because the stack is always current. (tstate->curexc_* would be unchanged of course, as it represents the "hot" exception, not the last caught exception.) -- Adam Olsen, aka Rhamphoryncus From timothy.c.delaney at gmail.com Sun Jun 1 00:42:19 2008 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Sun, 1 Jun 2008 08:42:19 +1000 Subject: [Python-3000] sys.exc_info() References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> Message-ID: Antoine Pitrou wrote: > sys.exc_info() will remain, it's just that the returned value will be > (None, None, None) if we are not in an except block in any of the > currently active frames in the thread. In the case above it would > return the current exception (the one caught in one of the enclosing > frames). This reminds me of something I've thought a few times - maybe the tuple returned from sys.exc_info() should be a named tuple. Tim Delaney From solipsis at pitrou.net Sun Jun 1 00:52:51 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 31 May 2008 22:52:51 +0000 (UTC) Subject: [Python-3000] =?utf-8?b?c3lzLmV4Y19pbmZvKCk=?= References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au>

Message-ID: Hello again, > Why not move f_exc_* into the PyTryBlock struct? We can eliminate the > per-thread exception and have sys.exc_info() search the stack for an > active except block. No need to swap anything because the stack is > always current. Yes it's a possible implementation. At the expense of a performance hit for operations which currently use tstate->exc_* (sys.exc_info() itself, bare "raise"...). Right now I have a patch using my original implementation proposal. I'll post it soon. regards Antoine. From tjreedy at udel.edu Sun Jun 1 01:29:47 2008 From: tjreedy at udel.edu (tjreedy) Date: Sat, 31 May 2008 19:29:47 -0400 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 References: Message-ID: "Georg Brandl" wrote in message news:g1rr4o$956$1 at ger.gmane.org... > Of course, it would also be nice for ``help("if")`` to work effortlessly, > which it currently only does if the generated HTML documentation is > available somewhere, which it typically isn't -- on Unix most > distributions > put it in a separate package (from which pydoc won't always find it > of its own), on Windows only the CHM file is distributed and must be > decompiled to get single HTML files. For 3.0a5, it does not work even after decompiling (and setting the ENV var) as given in the instructions (which are inadequate for many users anyway). From wescpy at gmail.com Sun Jun 1 01:41:38 2008 From: wescpy at gmail.com (wesley chun) Date: Sat, 31 May 2008 16:41:38 -0700 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> <483F06FF.9090007@trueblade.com> <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com> <483F4E3A.9090403@trueblade.com> <48405420.8010800@trueblade.com> Message-ID: <78b3a9580805311641ob06e7faw80c9d5e41e7e64fe@mail.gmail.com> >>> I'd be fine with adding '#' back to the formatting language for hex and oct. >> >> And bin, I assume? > > Of course. somewhat on-topic, can i hear from some of you as far as use-cases for oct() and hex() [plus bin()] in Python code? i find "%x" or "%o" (and its variants) sufficient in serving my needs. in other words, why oct() and hex() built-in functions instead of elsewhere like in operator for those who desire a functional interface? another related inquiry, if we're going to have hex(), can its signature be expanded to include "%#X" functionality, i.e., hex(number, cap=False), as the default and someone who wants the "#" can do hex(123, cap=True)? on top of that, can hex() also support "%x" and '%X' functionality, i.e., hex(number, cap=False, leading=True), as the default, so i can do hex(123, leading=False) for '7b'? do you see how i'm trying to make life difficult and lead people down the path of not having hex(), oct(), or bin()? or are those three functions intended to obsolete "#"? :-) writing this message made me realize that i could have just done the following in my original post that started this whole thread: >>> i = 45 >>> 'dec: {0}/oct: {1}/hex: {2}'.format(i, oct(i), hex(i)) 'dec: 45/oct: 0o55/hex: 0x2d' it's definitely better than the "uglier" code in that post although this is less elegant than being able to specify the variable 'i' just once. cheers, -- wesley - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - "Core Python Programming", Prentice Hall, (c)2007,2001 http://corepython.com wesley.j.chun :: wescpy-at-gmail.com python training and technical consulting cyberweb.consulting : silicon valley, ca http://cyberwebconsulting.com From mhammond at skippinet.com.au Sun Jun 1 03:08:15 2008 From: mhammond at skippinet.com.au (Mark Hammond) Date: Sun, 1 Jun 2008 11:08:15 +1000 Subject: [Python-3000] sys.exc_info() In-Reply-To: References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> Message-ID: <00ab01c8c383$fb909db0$f2b1d910$@com.au> Antoine: > Mark Hammond skippinet.com.au> writes: > > In both Python 2.x and 3 (a few months old build of Py3k though), the > > traceback isn't the same. For Python 2.0 you could write it like: > > > > def handle_exception(): > > ... > > raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2] > > > > Its not clear how that would be spelt in py3k though (and from what I > can > > see, sys.exc_info() itself has an uncertain future in py3k). > > sys.exc_info() will remain, it's just that the returned value will be > (None, None, None) if we are not in an except block in any of the > currently active frames in the thread. If I look at Guido's py3k status update of almost a year ago (http://www.artima.com/weblogs/viewpost.jsp?thread=208549) it tells me: * "sys.exc_info() becomes redundant (or may disappear)". Even if it doesn't actually dissappear, the implication is that the new way (with the traceback being an attribute) will be preferred. * "The old raise syntax variants raise E, e and raise E, e, tb are gone" - but I can't see anything which indicates what the replacement is for the 2 cases of (a) raising with the original traceback and (b) raising the same exception with a "new" traceback reflecting the position of the 'raise'- I admit I didn't look *that* hard though... Mark From guido at python.org Sun Jun 1 03:49:22 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 31 May 2008 18:49:22 -0700 Subject: [Python-3000] doctest portability In-Reply-To: References: Message-ID: 2to3 lets you specify exactly which fixers to run with the -f command line flag. I really don't like the idea of having Py3k code with doctests written in a dialect of Py2k... --Guido On Sat, May 31, 2008 at 8:47 AM, Stefan Behnel wrote: > Georg Brandl wrote: >> Stefan Behnel schrieb: >>> I know, I could use the lib2to3 package, but it a) is a one-way tool >>> in the >>> wrong direction if you have to distinguish bytes/str literals, b) lacks >>> configurability stating exactly what changes need to be done and c) >>> seemed >>> harder to set up for doctests than doing the conversion by hand. >> >> Shouldn't the -d option handle doctests without further set-up? > > If you start 2to3 from the command prompt to convert the files that contain > the doctests and copy them to a new location, then yes. But the question is: > how do you run a Py2 doctest in Py3 without first copying your doctests or > doctest containing sources to new files and then running the tests from there? > You can't require people to put such a work-around into every test script in > the world. Adding an option, fine. Copying files, adapting paths and all that, > why? > > Stefan > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Jun 1 03:50:40 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 31 May 2008 18:50:40 -0700 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> Message-ID: On Sat, May 31, 2008 at 2:33 PM, Adam Olsen wrote: > I think the reason why strict/backslashreplace (respectively) work > well is that you can print a unicode string to stdout, have it fail > (encoding can't handle it), then get an exception printed to stderr > with the string escaped. > > Making stderr stricter would make it unable to print the string and > making stdout less strict would let the error pass silently (printing > potential garbage instead). You've got it exactly right, better than I said it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Sun Jun 1 03:59:05 2008 From: guido at python.org (Guido van Rossum) Date: Sat, 31 May 2008 18:59:05 -0700 Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'? In-Reply-To: <78b3a9580805311641ob06e7faw80c9d5e41e7e64fe@mail.gmail.com> References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com> <483EAF95.5050503@trueblade.com> <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com> <483F06FF.9090007@trueblade.com> <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com> <483F4E3A.9090403@trueblade.com> <48405420.8010800@trueblade.com> <78b3a9580805311641ob06e7faw80c9d5e41e7e64fe@mail.gmail.com> Message-ID: On Sat, May 31, 2008 at 4:41 PM, wesley chun wrote: > somewhat on-topic, can i hear from some of you as far as use-cases for > oct() and hex() [plus bin()] in Python code? i find "%x" or "%o" (and > its variants) sufficient in serving my needs. in other words, why > oct() and hex() built-in functions instead of elsewhere like in > operator for those who desire a functional interface? I use oct() and hex() all the time at the interactive prompt when I have a funny number given in decimal and wonder if it's really a magic bit pattern. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stefan_ml at behnel.de Sun Jun 1 08:44:35 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 01 Jun 2008 08:44:35 +0200 Subject: [Python-3000] doctest portability In-Reply-To: References: Message-ID: Guido van Rossum wrote: > 2to3 lets you specify exactly which fixers to run with the -f command line flag. Nice, one problem solved then. I assume that's also available in the library version? object. I find that very inconvenient.BTW, last time I checked, options were passed into lib2to3 as attributes of an A dict or keyword arguments would work much better. > I really don't like the idea of having Py3k code with doctests written > in a dialect of Py2k... There's two types of doctests, one that is meant as user readable examples and one where the doctest module is used for convenience. I agree that doctests should use a consistent syntax, either Py2 or Py3, especially if they are meant as documentation. But that's currently not easy to achieve in a portable way. Py2 byte strings are the most obvious problem. I don't think that every software package that supports Py3 will convert its doctests to Py3 syntax, at least as long as there is no perfectly working 3to2 doctest converter that converts byte/unicode strings correctly. And even then, it would have to be integrated with the doctest module. Having to split your code base just because your tests don't run on a new target platform is not an option IMHO. That would rather keep people from supporting Py3. Stefan From ishimoto at gembook.org Sun Jun 1 09:23:10 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sun, 1 Jun 2008 16:23:10 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> Message-ID: <797440730806010023h3f9cf85qc03b2a68c695a3b5@mail.gmail.com> On Sun, Jun 1, 2008 at 6:33 AM, Adam Olsen wrote: > I think the reason why strict/backslashreplace (respectively) work > well is that you can print a unicode string to stdout, have it fail > (encoding can't handle it), then get an exception printed to stderr > with the string escaped. > > Making stderr stricter would make it unable to print the string and > making stdout less strict would let the error pass silently (printing > potential garbage instead). > I agree these points. I know my preference was already denied by important persons in the python developers :), so may be I'm wrong. The release date is approaching fast. As Marc-Andre pointed out, the default error handler for sys.stdout may be out of scope. So I'm going to shut up and see whether community starts mumbling or not. But if you have spare time and interested, following is my points. I may be come back for Python 3.1 or 3.2 :). ====== How many scripts want to care about encoding error? If you ship your scripts with 'strict' error handler, users of the script should determine a certain encoding of output and should be responsible to set up correct runtime environment, which are not always possible. For example, we can not guarantee this trivial script would work on Windows: for filename in os.listdir("."): print(filename) The 'filename' can contain ASCII, Greek, Chinese or any other characters, so no encoding other than utf-8 (the only sane encoding for Unicode applications) may raise exceptions. In practice, the best thing we can do is printing escaped string silently and close can of worms:). Raising exceptions would be desired in some case, but for such cases, input data and environment should be examined by script, *before* the data are printed. Printing curt UnicodeEncodingError() message may not be what you want to your scripts. Thus, current restriction is too defensive for most of use-cases, IMO. As I wrote before, other languages I familiar(Java, .Net languages, Perl) don't raise exceptions by default, but silently print converted characters. Ditto for utilities such as 'ls'. From phd at phd.pp.ru Sun Jun 1 10:15:23 2008 From: phd at phd.pp.ru (Oleg Broytmann) Date: Sun, 1 Jun 2008 12:15:23 +0400 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730806010023h3f9cf85qc03b2a68c695a3b5@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010023h3f9cf85qc03b2a68c695a3b5@mail.gmail.com> Message-ID: <20080601081523.GA25882@phd.pp.ru> On Sun, Jun 01, 2008 at 04:23:10PM +0900, Atsuo Ishimoto wrote: > silently print converted > characters. Ditto for utilities such as 'ls'. $ ls -lF work/ total 72 drwxr-x--- 7 phd phd 4096 May 27 11:14 ?????/ drwx------ 9 phd phd 4096 May 30 17:30 ?????/ drwxr-xr-x 4 phd phd 4096 May 13 18:35 books/ [truncate] Filesystem encoding is koi8-r, terminal encoding is UTF-8, ls doesn't convert (because it doesn't know filesystem encoding) but simply replaces, like in Python filename.encode(LC_CTYPE, "replace"). No error reported. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd at phd.pp.ru Programmers don't die, they just GOSUB without RETURN. From ishimoto at gembook.org Sun Jun 1 11:05:17 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Sun, 1 Jun 2008 18:05:17 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <20080601081523.GA25882@phd.pp.ru> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010023h3f9cf85qc03b2a68c695a3b5@mail.gmail.com> <20080601081523.GA25882@phd.pp.ru> Message-ID: <797440730806010205u43d2604ajfaf7bd859d757311@mail.gmail.com> On Sun, Jun 1, 2008 at 5:15 PM, Oleg Broytmann wrote: > > Filesystem encoding is koi8-r, terminal encoding is UTF-8, ls doesn't > convert (because it doesn't know filesystem encoding) but simply replaces, > like in Python filename.encode(LC_CTYPE, "replace"). No error reported. > Sorry for my bad wording. I wanted to say "ls doesn't reports error, but silently prints '?'", not "ls converts filename as per terminal encoding, without reporting conversion errors". In my case, ls checks characters in the file name and convert invalid characters to '?'. [ishimoto at host test]$ export LANG=ja_JP.eucJP [ishimoto at host test]$ ls ??? [ishimoto at host test]$ export LANG=C [ishimoto at host test]$ ls ?????? From qrczak at knm.org.pl Sun Jun 1 11:34:22 2008 From: qrczak at knm.org.pl (=?UTF-8?Q?Marcin_=E2=80=98Qrczak=E2=80=99_Kowalczyk?=) Date: Sun, 1 Jun 2008 11:34:22 +0200 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730806010205u43d2604ajfaf7bd859d757311@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010023h3f9cf85qc03b2a68c695a3b5@mail.gmail.com> <20080601081523.GA25882@phd.pp.ru> <797440730806010205u43d2604ajfaf7bd859d757311@mail.gmail.com> Message-ID: <3f4107910806010234g7dfa96ds4b6d5b35685f4906@mail.gmail.com> 2008/6/1 Atsuo Ishimoto : > In my case, ls checks characters in the file name and convert invalid > characters to '?'. GNU ls has more options for displaying filenames with weird characters: http://www.gnu.org/software/coreutils/manual/coreutils.html#Formatting-the-file-names -- Marcin Kowalczyk qrczak at knm.org.pl http://qrnik.knm.org.pl/~qrczak/ From oren at hishome.net Sun Jun 1 11:42:00 2008 From: oren at hishome.net (Oren Tirosh) Date: Sun, 1 Jun 2008 09:42:00 +0000 Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka String ABC) In-Reply-To: References: Message-ID: <20080601094159.GA9825@hishome.net> On Tue, May 27, 2008 at 12:42:48PM -0700, Guido van Rossum wrote: > [+python-3000] > > On Tue, May 27, 2008 at 12:32 PM, Armin Ronacher > wrote: ... > > A problem comes up as soon as user defined strings (such as UserString) is > > passed to the function. In my opinion a good solution would be a "String" > > ABC one could test against. > > I'm not against this, but so far I've not been able to come up with a > good set of methods to endow the String ABC with. Another problem is > that not everybody draws the line in the same place -- how should > instances of bytes, bytearray, array.array, memoryview (buffer in 2.6) > be treated? > > -- > --Guido van Rossum (home page: http://www.python.org/~guido/) The issue goes beyond iterability. If a user defined string such as UserString that is not derived from one of the "true" builtin string types is passed to a builtin function that expects a string it will be rejected. PyArgs_ParseTuple discriminates against such user defined types :-) If a String ABC is implemented it could be used as a signal that the object is not just convertable to a string (virtually all objects are) but IS a string and its __str__ should be used during builtin function argument parsing. - Oren From stefan_ml at behnel.de Sun Jun 1 13:51:09 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 01 Jun 2008 13:51:09 +0200 Subject: [Python-3000] doctest portability In-Reply-To: References: Message-ID: Hi again, sorry, mail editing disorder. I meant to say this: Stefan Behnel wrote: > BTW, last time I checked, options were passed into lib2to3 as attributes of > an object. I find that very inconvenient. A dict or keyword arguments would > work much better. And just to give a hint on what I mean here: > There's two types of doctests, one that is meant as user readable examples and > one where the doctest module is used for convenience. An example of the convenience part is that we use doctests as compiler tests in Cython, for example. The idea is to compile a source file into a C extension module using Cython and then execute the module docstring in Python's doctest module to test the extension. http://hg.cython.org/cython-devel/file/tip/tests/run/ For example, here is a test for unicode strings, which currently uses both the u'' and b'' prefix and replaces one of them based on the Python version it runs in. http://hg.cython.org/cython-devel/file/tip/tests/run/unicodeliterals.pyx That's simple to do. However, doctests in user documentation are much harder to write in a portable way, as all that overhead of (e.g.) encoding byte strings to unicode strings for normalised output comparison is very distracting for readers, so it would be much better if you could just say "this is a doctest in Py3 syntax" or "in Py2 syntax", and have doctest do the rest for you at runtime. Stefan From ncoghlan at gmail.com Sun Jun 1 14:46:31 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 01 Jun 2008 22:46:31 +1000 Subject: [Python-3000] doctest portability In-Reply-To: References: Message-ID: <48429A27.8090507@gmail.com> Stefan Behnel wrote: > That's simple to do. However, doctests in user documentation are much harder > to write in a portable way, as all that overhead of (e.g.) encoding byte > strings to unicode strings for normalised output comparison is very > distracting for readers, so it would be much better if you could just say > "this is a doctest in Py3 syntax" or "in Py2 syntax", and have doctest do the > rest for you at runtime. Doctest just uses 'exec' under the covers though - the only way for it to run code using non-native syntax would be for it to be able to invoke a non-native parser and then run the resulting AST directly, or for it to invoke 2to3 on the docstring. Sphinx (the tool used to build the Python docs) gets around this by allowing code to be included in the doc source that is used when running the doctests, but hidden when generating output for human consumption (HTML, PDF, etc) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org From stefan_ml at behnel.de Sun Jun 1 15:18:26 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 01 Jun 2008 15:18:26 +0200 Subject: [Python-3000] doctest portability In-Reply-To: <48429A27.8090507@gmail.com> References: <48429A27.8090507@gmail.com> Message-ID: Hi, Nick Coghlan wrote: > Doctest just uses 'exec' under the covers though - the only way for it > to run code using non-native syntax would be for it to be able to invoke > a non-native parser and then run the resulting AST directly, or for it > to invoke 2to3 on the docstring. I think it should do the latter. > Sphinx (the tool used to build the Python docs) gets around this by > allowing code to be included in the doc source that is used when running > the doctests, but hidden when generating output for human consumption > (HTML, PDF, etc) That's what I do, too, in a couple of places, e.g. to import StringIO/BytesIO or to make unicode() available. But having to copy syntax work-around code into each source file that has doctests is the wrong trade-off, IMHO. This should be an external option passed to the doctest run. Stefan From stefan_ml at behnel.de Sun Jun 1 16:39:39 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 01 Jun 2008 16:39:39 +0200 Subject: [Python-3000] doctest portability In-Reply-To: References: Message-ID: Stefan Behnel wrote: > It would be > really nice if the doctest module had a simple option that specified if the > doctests of a test suite are in Py2 or Py3 syntax, and then just did the right > thing under Py3 (and maybe also 2.6). I filed a feature request on this for now. http://bugs.python.org/issue3020 Stefan From stefan_ml at behnel.de Sun Jun 1 17:28:06 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 01 Jun 2008 17:28:06 +0200 Subject: [Python-3000] doctest portability In-Reply-To: References: Message-ID: Guido van Rossum wrote: > I really don't like the idea of having Py3k code with doctests written > in a dialect of Py2k... BTW, this argument might hold for code that was written for Py3, of which there currently is close to nothing. Stefan From ishimoto at gembook.org Sun Jun 1 17:32:27 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Mon, 2 Jun 2008 00:32:27 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> Message-ID: <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> On Sun, Jun 1, 2008 at 2:30 AM, Atsuo Ishimoto wrote: > > I'll update the PEP and the patch on Sunday. Thank you! Here's new PEP, and new patch is uploaded at http://bugs.python.org/issue2630. (codereview.appspot.com refused to create new issue for this patch, btw.) ---------------------------------------------- PEP: 3138 Title: String representation in Python 3000 Version: $Revision$ Last-Modified: $Date$ Author: Atsuo Ishimoto Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 05-May-2008 Post-History: Abstract ======== This PEP proposes a new string representation form for Python 3000. In Python prior to Python 3000, the repr() built-in function converted arbitrary objects to printable ASCII strings for debugging and logging. For Python 3000, a wider range of characters, based on the Unicode standard, should be considered 'printable'. Motivation ========== The current repr() converts 8-bit strings to ASCII using following algorithm. - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. - Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII characters(>=0x80) to '\\xXX'. - Backslash-escape quote characters (apostrophe, ') and add the quote character at the beginning and the end. For Unicode strings, the following additional conversions are done. - Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. - Convert 16-bit characters(>=0x100) to '\\uXXXX'. - Convert 21-bit characters(>=0x10000) and surrogate pair characters to '\\U00xxxxxx'. This algorithm converts any string to printable ASCII, and repr() is used as a handy and safe way to print strings for debugging or for logging. Although all non-ASCII characters are escaped, this does not matter when most of the string's characters are ASCII. But for other languages, such as Japanese where most characters in a string are not ASCII, this is very inconvenient. We can use ``print(aJapaneseString)`` to get a readable string, but we don't have a similar workaround for printing strings from collections such as lists or tuples. ``print(listOfJapaneseStrings)`` uses repr() to build the string to be printed, so the resulting strings are always hex-escaped. Or when ``open(japaneseFilemame)`` raises an exception, the error message is something like ``IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'``, which isn't helpful. Python 3000 has a lot of nice features for non-Latin users such as non-ASCII identifiers, so it would be helpful if Python could also progress in a similar way for printable output. Some users might be concerned that such output will mess up their console if they print binary data like images. But this is unlikely to happen in practice because bytes and strings are different types in Python 3000, so printing an image to the console won't mess it up. This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected. Specification ============= - Add a new function to the Python C API ``int PY_UNICODE_ISPRINTABLE (Py_UNICODE ch)``. This function returns 0 if repr() should escape the Unicode character ``ch``; otherwise it returns 1. Characters that should be escaped are defined in the Unicode character database as: * Cc (Other, Control) * Cf (Other, Format) * Cs (Other, Surrogate) * Co (Other, Private Use) * Cn (Other, Not Assigned) * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028'). * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029'). * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in this category should be escaped to avoid ambiguity. - The algorithm to build repr() strings should be changed to: * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. * Convert non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'. * Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. * Convert non-printable characters(PY_UNICODE_ISPRINTABLE() returns 0) to 'xXX', '\\uXXXX' or '\\U00xxxxxx'. * Backslash-escape quote characters (apostrophe, 0x27) and add quote character at the beginning and the end. - Set the Unicode error-handler for sys.stderr to 'backslashreplace' by default. - Add ``'%a'`` string format operator. ``'%a'`` converts any python object to a string using repr() and then hex-escapes all non-ASCII characters. The ``'%a'`` format operator generates the same string as ``'%r'`` in Python 2. - Add a new built-in function, ``ascii()``. This function converts any python object to a string using repr() and then hex-escapes all non- ASCII characters. ``ascii()`` generates the same string as ``repr()`` in Python 2. - Add an ``isprintable()`` method to the string type. ``str.isprintable()`` returns False if repr() should escape any character in the string; otherwise returns True. The ``isprintable()`` method calls the `` PY_UNICODE_ISPRINTABLE()`` function internally. Rationale ========= The repr() in Python 3000 should be Unicode not ASCII based, just like Python 3000 strings. Also, conversion should not be affected by the locale setting, because the locale is not necessarily the same as the output device's locale. For example, it is common for a daemon process to be invoked in an ASCII setting, but writes UTF-8 to its log files. Also, web applications might want to report the error information in more readable form based on the HTML page's encoding. Characters not supported by the user's console could be hex-escaped on printing, by the Unicode encoder's error-handler. If the error-handler of the output file is 'backslashreplace', such characters are hex- escaped without raising UnicodeEncodeError. For example, if your default encoding is ASCII, ``print('Hello ?')`` will prints 'Hello \\xa2'. If your encoding is ISO-8859-1, 'Hello ?' will be printed. Default error-handler of sys.stdout is 'strict'. Other applications reading the output might not understand hex-escaped characters, so unsupported characters should be trapped when writing. If you need to escape unsupported characters, you should change error-handler explicitly. For sys.stderr, default error-handler is set to 'backslashreplace' and printing exceptions or error messages won't be failed. Alternate Solutions ------------------- To help debugging in non-Latin languages without changing repr(), other suggestions were made. - Supply a tool to print lists or dicts. Strings to be printed for debugging are not only contained by lists or dicts, but also in many other types of object. File objects contain a file name in Unicode, exception objects contain a message in Unicode, etc. These strings should be printed in readable form when repr()ed. It is unlikely to be possible to implement a tool to print all possible object types. - Use sys.displayhook and sys.excepthook. For interactive sessions, we can write hooks to restore hex escaped characters to the original characters. But these hooks are called only when printing the result of evaluating an expression entered in an interactive Python session, and doesn't work for the print() function, for non-interactive sessions or for logging.debug("%r", ...), etc. - Subclass sys.stdout and sys.stderr. It is difficult to implement a subclass to restore hex-escaped characters since there isn't enough information left by the time it's a string to undo the escaping correctly in all cases. For example, `` print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But there is no chance to tell file objects apart. - Make the encoding used by unicode_repr() adjustable, and make the existing repr() the default. With adjustable repr(), the result of using repr() is unpredictable and would make it impossible to write correct code involving repr(). And if current repr() is the default, then the old convention remains intact and users may expect ASCII strings as the result of repr(). Third party applications or libraries could be confused when a custom repr() function is used. Backwards Compatibility ======================= Changing repr() may break some existing code, especially testing code. Five of Python's regression tests fail with this modification. If you need repr() strings without non-ASCII character as Python 2, you can use the following function. :: def repr_ascii(obj): return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") For logging or for debugging, the following code can raise UnicodeEncodeError. :: log = open("logfile", "w") log.write(repr(data)) # UnicodeEncodeError will be raised # if data contains unsupported characters. To avoid exceptions being raised, you can explicitly specify the error- handler. :: log = open("logfile", "w", errors="backslashreplace") log.write(repr(data)) # Unsupported characters will be escaped. For a console that uses a Unicode-based encoding, for example, en_US. utf8 or de_DE.utf8, the backslashescape trick doesn't work and all printable characters are not escaped. This will cause a problem of similarly drawing characters in Western, Greek and Cyrillic languages. These languages use similar (but different) alphabets (descended from the common ancestor) and contain letters that look similar but have different character codes. For example, it is hard to distinguish Latin 'a', 'e' and 'o' from Cyrillic '?', '?' and '?'. (The visual representation, of course, very much depends on the fonts used but usually these letters are almost indistinguishable.) To avoid the problem, the user can adjust the terminal encoding to get a result suitable for their environment. Open Issues =========== - Is the ``ascii()`` function necessary, or is it sufficient to document how to do it? If necessary, should ``ascii()`` belong to the builtin namespace? Rejected Proposals ================== - Add encoding and errors arguments to the builtin print() function, with defaults of sys.getfilesystemencoding() and 'backslashreplace'. Complicated to implement, and in general, this is not seen as a good idea. [2]_ - Use character names to escape characters, instead of hex character codes. For example, ``repr('\u03b1')`` can be converted to ``"\N{GREEK SMALL LETTER ALPHA}"``. Using character names can be very verbose compared to hex-escape. e.g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``. - Default error-handler of sys.stdout should be 'backslashreplace'. Stuff written to stdout might be consumed by another program that might misinterpret the \ escapes. For interactive session, it is possible to make 'backslashreplace' error-handler to default, but may add confusion of the kind "it works in interactive mode but not when redirecting to a file". Reference Implementation ======================== http://bugs.python.org/issue2630 References ========== .. [1] Multibyte string on string::string_print (http://bugs.python.org/issue479898) .. [2] [Python-3000] Displaying strings containing unicode escapes (http://mail.python.org/pipermail/python-3000/2008-April/013366.html) Copyright ========= This document has been placed in the public domain. From solipsis at pitrou.net Sun Jun 1 22:15:30 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 01 Jun 2008 22:15:30 +0200 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References:

Message-ID: <1212351330.5862.1.camel@fsol> Hello, A patch is now at http://bugs.python.org/issue3021 . Antoine. From qgallet at gmail.com Mon Jun 2 00:45:53 2008 From: qgallet at gmail.com (Quentin Gallet-Gilles) Date: Mon, 2 Jun 2008 00:45:53 +0200 Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108 In-Reply-To: <8b943f2b0805290839s7a1f3238g9e21407a56c34159@mail.gmail.com> References: <8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com> <483EC414.7080603@ibp.de> <8b943f2b0805290839s7a1f3238g9e21407a56c34159@mail.gmail.com> Message-ID: <8b943f2b0806011545x6a11f019r3c412bc3ebc1a3ab@mail.gmail.com> I've uploaded a patch for the aifc module (http://bugs.python.org/issue2847). I'm still working on the testsuite. Comments are welcome! Quentin On Thu, May 29, 2008 at 5:39 PM, Quentin Gallet-Gilles wrote: > > On Thu, May 29, 2008 at 4:56 PM, Lars Immisch wrote: > >> >> >>> Issue 2847 - the aifc module still imports the cl module in 3.0. >>> Problem is that the cl module is gone. =) So it seems silly to >>> have >>> the imports lying about. This can probably be changed to critical. >>> >>> >>> It shouldn't be a problem to rip everything cl-related out of aifc. >>> The question is how useful aifc will be after that ... >>> >>> >>> Has someone already used that module ? I took a look into it, but I'm a >>> bit confused about the various compression types, case-sensitivity and >>> compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the >>> same encoding ? Can we use the audioop module for ALAW, just like it's >>> already done for ULAW ? >>> >> >> There is just one alaw I've ever come across (G.711), and the audioop >> implementation could be used (audioop's alaw support is younger than the >> aifc module, BTW) >> >> The capitalisation is confusing, but your document [1] says: "Apple >> Computer's QuickTime player recognize only the Apple compression types. >> Although "ALAW" and "ULAW" contain identical sound samples to the "alaw" and >> "ulaw" formats and were in use long before Apple introduced the new codes, >> QuickTime does not recognize them." >> >> So this seems just a matter of naming in the AIFC, but not a matter of two >> different alaw implementations. >> >> - Lars >> > > Ok, I'll handle this issue. I'll be using the audioop implementation as a > replacement of the SGI compression library. I'll also create a test suite, > as Brett mentioned in the bug tracker the module was missing one. > > Quentin > > >> >> [1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Jun 2 01:30:54 2008 From: greg at krypto.org (Gregory P. Smith) Date: Sun, 1 Jun 2008 16:30:54 -0700 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <483FBCB4.5020007@egenix.com> References: <48397ECC.9070805@cheimes.de> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> <483FBCB4.5020007@egenix.com> Message-ID: <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> On Fri, May 30, 2008 at 1:37 AM, M.-A. Lemburg wrote: > On 2008-05-30 00:57, Nick Coghlan wrote: >> >> M.-A. Lemburg wrote: >>> >>> * Why can't we have both PyString *and* PyBytes exposed in 2.x, >>> with one redirecting to the other ? >> >> We do have that - the PyString_* names still work perfectly fine in 2.x. >> They just won't be used in the Python core codebase anymore - everything in >> the Python core will use either PyBytes_* or PyUnicode_* regardless of which >> branch (2.x or 3.x) you're working on. I think that's a good thing for ease >> of maintenance in the future, even if it takes people a while to get their >> heads around it right now. > > Sorry, I probably wasn't clear enough: > > Why can't we have both PyString *and* PyBytes exposed as C > APIs (ie. visible in code and in the linker) in 2.x, with one redirecting > to the other ? > >>> * Why should the 2.x code base turn to hacks, just because 3.x wants >>> to restructure itself ? >> >> With the better explanation from Greg of what the checked in approach >> achieves (i.e. preserving exact ABI compatibility for PyString_*, while >> allowing PyBytes_* to be used at the source code level), I don't see what >> has been done as being any more of a hack than the possibly more common >> "#define " (which *would* break binary compatibility). >> >> The only things that I think would tidy it up further would be to: >> - include an explanation of the approach and its effects on API and ABI >> backward and forward compatibility within 2.x and between 2.x and 3.x in >> stringobject.h >> - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0 > > Which is what I was suggesting all along; sorry if I wasn't > clear enough on that. > > The standard approach is that you provide #define redirects from the > old APIs to the new ones (which are then picked up by the compiler) > *and* add function wrappers to the same affect (to make linkers, > dynamic load APIs such ctypes and debuggers happy). > > > Example from pythonrun.h|c: > --------------------------- > > /* Use macros for a bunch of old variants */ > #define PyRun_String(str, s, g, l) PyRun_StringFlags(str, s, g, l, NULL) > > /* Deprecated C API functions still provided for binary compatiblity */ > > #undef PyRun_String > PyAPI_FUNC(PyObject *) > PyRun_String(const char *str, int s, PyObject *g, PyObject *l) > { > return PyRun_StringFlags(str, s, g, l, NULL); > } > Okay, how about this? http://codereview.appspot.com/1521 Using that patch, both PyString_ and PyBytes_ APIs are available using function stubs similar to the above. I opted to define the stub functions right next to the ones they were stubbing rather than putting them all at the end of the file or in another file but they could be moved if someone doesn't like them that way. > I still believe that we should *not* make "easy of merging" the > primary motivation for backporting changes in 3.x to 2.x. Software > design should not be guided by restrictions in the tool chain, > if not absolutely necessary. > > The main argument for a backport needs to be general usefulness > to the 2.x users, IMHO... just like any other feature that > makes it into 2.x. > > If merging is difficult then this needs to be addressed, but > there are more options to that than always going back to the > original 2.x trunk code. I've given a few suggestions on how > this could be approached in other emails on this thread. I am not the one doing the merging or working on merge tools so I'll leave this up to those that are. -gps From ishimoto at gembook.org Mon Jun 2 01:32:09 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Mon, 2 Jun 2008 08:32:09 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> Message-ID: <797440730806011632o76e5dcferca8b6c4dc015a4bf@mail.gmail.com> On Mon, Jun 2, 2008 at 2:39 AM, Alexandre Vassalotti wrote: > On Sun, Jun 1, 2008 at 11:32 AM, Atsuo Ishimoto wrote: >> >> ---------------------------------------------- >> PEP: 3138 >> >> Title: String representation in Python 3000 >> Version: $Revision$ >> Last-Modified: $Date$ >> Author: Atsuo Ishimoto >> Status: Draft >> Type: Standards Track >> Content-Type: text/x-rst >> Created: 05-May-2008 >> Post-History: >> > [SNIP] >> - Add a new function to the Python C API ``int PY_UNICODE_ISPRINTABLE >> (Py_UNICODE ch)``. > > Shouldn't the name be Py_UNICODE_ISPRINTABLE? Oh, yes. It has correct name in the patch. > > I know that I am a bit late in the whole discussion, but isn't, > whether or not a character is "printable", actually defined as a > property of the output device (i.e., does it have the necessary glyphs > to render the characters)? > > I don't have a problem with allowing more characters to be represented > unescaped (in fact, I think this is a great idea). But, I just don't > like using "printable" as a character property. Maybe, "readable" or > "legible" would be more appropriate. Anyway, that's only nitpicking > from my part. > I'm not comfortable with "printable", too. Is "legible" better? This is first time for me to see this word in my life :). >> - Add ``'%a'`` string format operator. ``'%a'`` converts any python >> object to a string using repr() and then hex-escapes all non-ASCII >> characters. The ``'%a'`` format operator generates the same string as >> ``'%r'`` in Python 2. >> >> - Add a new built-in function, ``ascii()``. This function converts any >> python object to a string using repr() and then hex-escapes all non- >> ASCII characters. ``ascii()`` generates the same string as ``repr()`` >> in Python 2. >> > > Why ascii() has to use repr()? Couldn't we simply rename the old 2.x > repr() function to ascii()? No. repr() simply calls obj.__repr__(), and obj.__repr() returns non-ASCII string now. So to get ASCII string, we should convert result of repr(). > >> - Add an ``isprintable()`` method to the string type. ``str.isprintable()`` >> returns False if repr() should escape any character in the string; >> otherwise returns True. The ``isprintable()`` method calls the >> `` PY_UNICODE_ISPRINTABLE()`` function internally. >> > > Quick thought, what should become of string.printable? Should it be > renamed to string.ascii_printable or removed? > I agree string.ascii_printable is better name, but I'm not motivated enough to break compatibility. > Overall, I think the PEP is good. So, +1 from me. Thank you! From greg.ewing at canterbury.ac.nz Mon Jun 2 03:21:57 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 02 Jun 2008 13:21:57 +1200 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730806011632o76e5dcferca8b6c4dc015a4bf@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> <797440730806011632o76e5dcferca8b6c4dc015a4bf@mail.gmail.com> Message-ID: <48434B35.5080400@canterbury.ac.nz> Atsuo Ishimoto wrote: > I'm not comfortable with "printable", too. Is "legible" better? This > is first time for me to see this word in my life :). The term "printable" has a long history in computing of meaning that a character code corresponds to some visual glyph, even if the display process involved isn't literally printing. It would be confusing to replace it with something else now, I think. -- Greg From collinw at gmail.com Mon Jun 2 04:47:16 2008 From: collinw at gmail.com (Collin Winter) Date: Sun, 1 Jun 2008 19:47:16 -0700 Subject: [Python-3000] Exception re-raising woes In-Reply-To: References:

Message-ID: <43aa6ff70806011947o20af27fahac889ae609cdaa7@mail.gmail.com> On Fri, May 30, 2008 at 6:33 PM, Antoine Pitrou wrote: > Guido van Rossum python.org> writes: >> I would be okay as well with restricting bare raise syntactically to >> appearing only inside an except block, to emphasize the change in >> semantics that was started when we decided to make the optional >> variable disappear at the end of the except block. >> >> This would render the following code illegal: >> >> def f(): >> try: 1/0 >> except: pass >> raise > > But you may want to use bare raise in a function called from an exception > handler, e.g.: > > def handle_exception(): > if user() == "Albert": > # Albert likes his exceptions uncooked > raise > else: > logging.exception("an exception occurred") > > def f(): > try: > raise KeyError > except: > handle_exception() I think it's perfectly fine to require that such code use sys.exc_info() or have the exception information passed in. The latter is more testable and more readable, in any case. Collin From collinw at gmail.com Mon Jun 2 04:52:48 2008 From: collinw at gmail.com (Collin Winter) Date: Sun, 1 Jun 2008 19:52:48 -0700 Subject: [Python-3000] sys.exc_info() In-Reply-To: <00ab01c8c383$fb909db0$f2b1d910$@com.au> References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> <00ab01c8c383$fb909db0$f2b1d910$@com.au> Message-ID: <43aa6ff70806011952r4f3fdd49q7f5e0456689e90c7@mail.gmail.com> On Sat, May 31, 2008 at 6:08 PM, Mark Hammond wrote: > Antoine: >> Mark Hammond skippinet.com.au> writes: >> > In both Python 2.x and 3 (a few months old build of Py3k though), the >> > traceback isn't the same. For Python 2.0 you could write it like: >> > >> > def handle_exception(): >> > ... >> > raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2] >> > >> > Its not clear how that would be spelt in py3k though (and from what I >> can >> > see, sys.exc_info() itself has an uncertain future in py3k). >> >> sys.exc_info() will remain, it's just that the returned value will be >> (None, None, None) if we are not in an except block in any of the >> currently active frames in the thread. > > If I look at Guido's py3k status update of almost a year ago > (http://www.artima.com/weblogs/viewpost.jsp?thread=208549) it tells me: > > * "sys.exc_info() becomes redundant (or may disappear)". Even if it doesn't > actually dissappear, the implication is that the new way (with the traceback > being an attribute) will be preferred. > > * "The old raise syntax variants raise E, e and raise E, e, tb are gone" - > but I can't see anything which indicates what the replacement is for the 2 > cases of (a) raising with the original traceback and (b) raising the same > exception with a "new" traceback reflecting the position of the 'raise'- > I admit I didn't look *that* hard though... See PEP 3109: http://www.python.org/dev/peps/pep-3109/ Collin From jimjjewett at gmail.com Mon Jun 2 04:56:09 2008 From: jimjjewett at gmail.com (Jim Jewett) Date: Sun, 1 Jun 2008 22:56:09 -0400 Subject: [Python-3000] PEP: str(container) should call str(item), not repr(item) In-Reply-To: <20080529195757.GB17896@phd.pp.ru> References: <20080529192157.GA17896@phd.pp.ru> <20080529195757.GB17896@phd.pp.ru> Message-ID: On 5/29/08, Oleg Broytmann wrote: > On Thu, May 29, 2008 at 12:31:17PM -0700, Guido van Rossum wrote: >> ... I'm opposed to this change, and that I believe >> that it would cause way too much >> disturbance to be accepted this close to beta. > That's ok. A rejected PEP has its purpose, too. Yes, but it might be better to defer instead. "Too close to beta" is a strong argument, but doesn't apply to 3.1 or 3.2 -- by which time the unicode-keyed dicts or oddly-converted print statements may change the importance. -jJ From musiccomposition at gmail.com Mon Jun 2 04:58:13 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Sun, 1 Jun 2008 21:58:13 -0500 Subject: [Python-3000] PEP: str(container) should call str(item), not repr(item) In-Reply-To: References: <20080529192157.GA17896@phd.pp.ru> <20080529195757.GB17896@phd.pp.ru> Message-ID: <1afaf6160806011958n4f53bdb8vd88668503730232b@mail.gmail.com> On Sun, Jun 1, 2008 at 9:56 PM, Jim Jewett wrote: > On 5/29/08, Oleg Broytmann wrote: >> On Thu, May 29, 2008 at 12:31:17PM -0700, Guido van Rossum wrote: >>> ... I'm opposed to this change, and that I believe >>> that it would cause way too much >>> disturbance to be accepted this close to beta. > >> That's ok. A rejected PEP has its purpose, too. > > Yes, but it might be better to defer instead. > > "Too close to beta" is a strong argument, but doesn't apply to 3.1 or > 3.2 -- by which time the unicode-keyed dicts or oddly-converted print > statements may change the importance. Notice how Guido said he's opposed to it, *and* it would cause too much disturbance. I believe he has previously shunned this idea. -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From dalcinl at gmail.com Mon Jun 2 18:17:47 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 2 Jun 2008 13:17:47 -0300 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> References: <48397ECC.9070805@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> <483FBCB4.5020007@egenix.com> <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> Message-ID: Are you completelly sure of adding those guys: PyBytes_InternXXX ??? On 6/1/08, Gregory P. Smith wrote: > On Fri, May 30, 2008 at 1:37 AM, M.-A. Lemburg wrote: > > On 2008-05-30 00:57, Nick Coghlan wrote: > >> > >> M.-A. Lemburg wrote: > >>> > >>> * Why can't we have both PyString *and* PyBytes exposed in 2.x, > >>> with one redirecting to the other ? > >> > >> We do have that - the PyString_* names still work perfectly fine in 2.x. > >> They just won't be used in the Python core codebase anymore - everything in > >> the Python core will use either PyBytes_* or PyUnicode_* regardless of which > >> branch (2.x or 3.x) you're working on. I think that's a good thing for ease > >> of maintenance in the future, even if it takes people a while to get their > >> heads around it right now. > > > > Sorry, I probably wasn't clear enough: > > > > Why can't we have both PyString *and* PyBytes exposed as C > > APIs (ie. visible in code and in the linker) in 2.x, with one redirecting > > to the other ? > > > >>> * Why should the 2.x code base turn to hacks, just because 3.x wants > >>> to restructure itself ? > >> > >> With the better explanation from Greg of what the checked in approach > >> achieves (i.e. preserving exact ABI compatibility for PyString_*, while > >> allowing PyBytes_* to be used at the source code level), I don't see what > >> has been done as being any more of a hack than the possibly more common > >> "#define " (which *would* break binary compatibility). > >> > >> The only things that I think would tidy it up further would be to: > >> - include an explanation of the approach and its effects on API and ABI > >> backward and forward compatibility within 2.x and between 2.x and 3.x in > >> stringobject.h > >> - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0 > > > > Which is what I was suggesting all along; sorry if I wasn't > > clear enough on that. > > > > The standard approach is that you provide #define redirects from the > > old APIs to the new ones (which are then picked up by the compiler) > > *and* add function wrappers to the same affect (to make linkers, > > dynamic load APIs such ctypes and debuggers happy). > > > > > > Example from pythonrun.h|c: > > --------------------------- > > > > /* Use macros for a bunch of old variants */ > > #define PyRun_String(str, s, g, l) PyRun_StringFlags(str, s, g, l, NULL) > > > > /* Deprecated C API functions still provided for binary compatiblity */ > > > > #undef PyRun_String > > PyAPI_FUNC(PyObject *) > > PyRun_String(const char *str, int s, PyObject *g, PyObject *l) > > { > > return PyRun_StringFlags(str, s, g, l, NULL); > > } > > > > > Okay, how about this? http://codereview.appspot.com/1521 > > Using that patch, both PyString_ and PyBytes_ APIs are available using > function stubs similar to the above. I opted to define the stub > functions right next to the ones they were stubbing rather than > putting them all at the end of the file or in another file but they > could be moved if someone doesn't like them that way. > > > > I still believe that we should *not* make "easy of merging" the > > primary motivation for backporting changes in 3.x to 2.x. Software > > design should not be guided by restrictions in the tool chain, > > if not absolutely necessary. > > > > The main argument for a backport needs to be general usefulness > > to the 2.x users, IMHO... just like any other feature that > > makes it into 2.x. > > > > If merging is difficult then this needs to be addressed, but > > there are more options to that than always going back to the > > original 2.x trunk code. I've given a few suggestions on how > > this could be approached in other emails on this thread. > > > I am not the one doing the merging or working on merge tools so I'll > leave this up to those that are. > > -gps > _______________________________________________ > Python-Dev mailing list > Python-Dev at python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: http://mail.python.org/mailman/options/python-dev/dalcinl%40gmail.com > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From mal at egenix.com Mon Jun 2 14:33:08 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 02 Jun 2008 14:33:08 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> References: <48397ECC.9070805@cheimes.de> <483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> <483FBCB4.5020007@egenix.com> <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> Message-ID: <4843E884.1060705@egenix.com> On 2008-06-02 01:30, Gregory P. Smith wrote: > On Fri, May 30, 2008 at 1:37 AM, M.-A. Lemburg wrote: >> Sorry, I probably wasn't clear enough: >> >> Why can't we have both PyString *and* PyBytes exposed as C >> APIs (ie. visible in code and in the linker) in 2.x, with one redirecting >> to the other ? >> >>>> * Why should the 2.x code base turn to hacks, just because 3.x wants >>>> to restructure itself ? >>> With the better explanation from Greg of what the checked in approach >>> achieves (i.e. preserving exact ABI compatibility for PyString_*, while >>> allowing PyBytes_* to be used at the source code level), I don't see what >>> has been done as being any more of a hack than the possibly more common >>> "#define " (which *would* break binary compatibility). >>> >>> The only things that I think would tidy it up further would be to: >>> - include an explanation of the approach and its effects on API and ABI >>> backward and forward compatibility within 2.x and between 2.x and 3.x in >>> stringobject.h >>> - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0 >> Which is what I was suggesting all along; sorry if I wasn't >> clear enough on that. >> >> The standard approach is that you provide #define redirects from the >> old APIs to the new ones (which are then picked up by the compiler) >> *and* add function wrappers to the same affect (to make linkers, >> dynamic load APIs such ctypes and debuggers happy). >> >> >> Example from pythonrun.h|c: >> --------------------------- >> >> /* Use macros for a bunch of old variants */ >> #define PyRun_String(str, s, g, l) PyRun_StringFlags(str, s, g, l, NULL) >> >> /* Deprecated C API functions still provided for binary compatiblity */ >> >> #undef PyRun_String >> PyAPI_FUNC(PyObject *) >> PyRun_String(const char *str, int s, PyObject *g, PyObject *l) >> { >> return PyRun_StringFlags(str, s, g, l, NULL); >> } >> > > Okay, how about this? http://codereview.appspot.com/1521 > > Using that patch, both PyString_ and PyBytes_ APIs are available using > function stubs similar to the above. I opted to define the stub > functions right next to the ones they were stubbing rather than > putting them all at the end of the file or in another file but they > could be moved if someone doesn't like them that way. Thanks. I was working on a similar patch. Looks like you beat me to it. The only thing I'm not sure about is having the wrappers in the same file - this is likely to cause merge conflicts when doing direct merging and even with an automated renaming approach, the extra code would be easier to remove if it were e.g. at the end of the file or even better: in a separate file. My patch worked slightly differently: it adds wrappers PyString* that forward calls to the PyBytes* APIs and they all live in stringobject.c. stringobject.h then also provides aliases so that recompiled extensions pick up the new API names. While working on my patch I ran into an issue that I haven't been able to resolve: the wrapper functions got optimized away by the linker and even though they appear in the libpython2.6.a, they don't end up in the python binary itself. As a result, importing Python 2.5 in the resulting 2.6 binary still fails with a unresolved PyString symbol. Please check whether that's the case for your patch as well. >> I still believe that we should *not* make "easy of merging" the >> primary motivation for backporting changes in 3.x to 2.x. Software >> design should not be guided by restrictions in the tool chain, >> if not absolutely necessary. >> >> The main argument for a backport needs to be general usefulness >> to the 2.x users, IMHO... just like any other feature that >> makes it into 2.x. >> >> If merging is difficult then this needs to be addressed, but >> there are more options to that than always going back to the >> original 2.x trunk code. I've given a few suggestions on how >> this could be approached in other emails on this thread. > > I am not the one doing the merging or working on merge tools so I'll > leave this up to those that are. I'm not sure whether there are any specific merge tools around - apart from the 2to3.py script. There also doesn't seem to be any documentation on the merge process itself (at least nothing that Google can find in the PEPs), so it's difficult to make any suggestions. Thanks, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 02 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 34 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From greg at krypto.org Tue Jun 3 00:21:18 2008 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 2 Jun 2008 15:21:18 -0700 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <4843E884.1060705@egenix.com> References: <48397ECC.9070805@cheimes.de> <483D300B.5090309@egenix.com> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> <483FBCB4.5020007@egenix.com> <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> <4843E884.1060705@egenix.com> Message-ID: <52dc1c820806021521g810d9f1wd282508f8452c13@mail.gmail.com> On Mon, Jun 2, 2008 at 5:33 AM, M.-A. Lemburg wrote: > >> Okay, how about this? http://codereview.appspot.com/1521 >> >> Using that patch, both PyString_ and PyBytes_ APIs are available using >> function stubs similar to the above. I opted to define the stub >> functions right next to the ones they were stubbing rather than >> putting them all at the end of the file or in another file but they >> could be moved if someone doesn't like them that way. >> > > Thanks. I was working on a similar patch. Looks like you beat > me to it. > > The only thing I'm not sure about is having the wrappers in the > same file - this is likely to cause merge conflicts when doing > direct merging and even with an automated renaming approach, > the extra code would be easier to remove if it were e.g. at > the end of the file or even better: in a separate file. > > My patch worked slightly differently: it adds wrappers PyString* > that forward calls to the PyBytes* APIs and they all live in > stringobject.c. stringobject.h then also provides aliases > so that recompiled extensions pick up the new API names. > > While working on my patch I ran into an issue that I haven't > been able to resolve: the wrapper functions got optimized away > by the linker and even though they appear in the libpython2.6.a, > they don't end up in the python binary itself. > > As a result, importing Python 2.5 in the resulting 2.6 > binary still fails with a unresolved PyString symbol. > > Please check whether that's the case for your patch as well. I think that is going to happen no matter which approach is used (yours or mine) unless we force some included code to call each of the stubs (needlessly inefficient). One way to do that is to reference them all from a section of code called conditionally based upon an always false condition that the compiler and linker can never predetermine is false so that it cannot be eliminated as dead code. Given that, should we bother? I don't think we really need PyBytes_ to show up in the binary ABI for 2.x even if that is how we write the calls in the python internals code. The arguments put forth that debugging is easier if you can just set a breakpoint on what you read may be true but including stub functions doesn't help this when most of the time they're compiled under the alternate name using #defines so a breakpoint set on the stub name will not actually trigger. API wise we're really providing the PyBytes* names to make module author's work of writing code that targets 2.6 and 3.x easier but isn't it reasonable for authors to just be told that they're just #defined aliases for PyString*. There is no goal, nor should there be, of a module binary compiled against 2.x loading and working in 3.x. I expect most module authors, code generators and such will want to target Python 2.x earlier than 2.6 as well so should we provide PyBytes_ names as a public API in 2.6 at all? (regardless of if we use the PyBytes names internally for any reason) -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jun 3 00:30:11 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jun 2008 15:30:11 -0700 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <48434B35.5080400@canterbury.ac.nz> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> <797440730806011632o76e5dcferca8b6c4dc015a4bf@mail.gmail.com> <48434B35.5080400@canterbury.ac.nz> Message-ID: On Sun, Jun 1, 2008 at 6:21 PM, Greg Ewing wrote: > Atsuo Ishimoto wrote: > >> I'm not comfortable with "printable", too. Is "legible" better? This >> is first time for me to see this word in my life :). > > The term "printable" has a long history in computing of > meaning that a character code corresponds to some visual > glyph, even if the display process involved isn't literally > printing. It would be confusing to replace it with something > else now, I think. Agreed. I'm +1 on everything the PEP specifies. I'll accept it tomorrow. Other developers, please review Atsuo's patch in http://bugs.python.org/issue2630 . -- --Guido van Rossum (home page: http://www.python.org/~guido/) From stephen at xemacs.org Mon Jun 2 07:47:59 2008 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 02 Jun 2008 14:47:59 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> Message-ID: <87fxrwe6b4.fsf@uwakimon.sk.tsukuba.ac.jp> Atsuo Ishimoto writes: > Okay, we'll keep 'strict' as default error handler for stdout always, > then. I can live with it. > But, my $0.02, I expect this issue will be revisited after people > start to develop real applications with Python 3.x. I agree, I expect it to be revisited too. But in the meantime - people who need it will have an obvious signal that they need it, - it will be obvious to people who know what's going on how to fix it, - and when the experience needed to decide what use cases matter and which ones are subject to confusion of the kind Guido mentions accumulates, and it gets changed, programs that previously did not throw an error will continue to work as they did. From barry at python.org Tue Jun 3 00:51:46 2008 From: barry at python.org (Barry Warsaw) Date: Mon, 2 Jun 2008 18:51:46 -0400 Subject: [Python-3000] Postponing the first betas Message-ID: <06BB1BB8-2C9C-4588-9B38-49DE234752E0@python.org> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We are going to postpone the first beta releases by one week. We had some problems with mail.python.org today, which prompted a query to Guido from me about the postponement. mail.python.org should now be back up normally now, as evidenced by the emailfloodl but in the meantime, Guido said: "I'd also like to see PEP 3138 (CJK-friendly repr()) and the pyprocessing PEP implemented by then, and perhaps some other small stuff." So we're going to do the first beta releases next Wednesday, June 11. Please take this time to stabilize all APIs and features, both in Python and C. Next week, I'll do a gut check on critical and release blocker bugs, so please also take a look at those and try to fix what you can. Cheers, - -Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSER5gnEjvBPtnXfVAQJlBAQAgfmRwGQzwNFrwvMusIoDNVRuyIObkKO0 FeDYb26RAL1jLXt0x/7jE0fBc5FvhDzUJnnNj3sydfyKU5MCb0eB0VeBTmjHU05l yncX6zYSoU14OUW+bkG4y7vf+aLD9zlFsj/ybMEZTQh0RMpZ+HBNhup3NJFEDTBM 97q4SIvltAg= =NBRW -----END PGP SIGNATURE----- From dalcinl at gmail.com Tue Jun 3 01:00:05 2008 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Mon, 2 Jun 2008 20:00:05 -0300 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <52dc1c820806021522u77e94406q7575a34bcaee79c1@mail.gmail.com> References: <48397ECC.9070805@cheimes.de> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> <483FBCB4.5020007@egenix.com> <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> <52dc1c820806021522u77e94406q7575a34bcaee79c1@mail.gmail.com> Message-ID: On 6/2/08, Gregory P. Smith wrote: > I believe those APIs are already there in the existing interface. Why does > that concern you? Just because PyBytes_InternXXX are not in Py3K C API. Iff the whole point of this patch is easier merges, then I believe there is a problem here. Please note I'm definitely +1 for your patch, but the string interning API seems to need a bit more of care. Am I wrong? > > > On Mon, Jun 2, 2008 at 9:17 AM, Lisandro Dalcin wrote: > > > Are you completelly sure of adding those guys: PyBytes_InternXXX ??? > > > > > > > > > > > > On 6/1/08, Gregory P. Smith wrote: > > > On Fri, May 30, 2008 at 1:37 AM, M.-A. Lemburg wrote: > > > > On 2008-05-30 00:57, Nick Coghlan wrote: > > > >> > > > >> M.-A. Lemburg wrote: > > > >>> > > > >>> * Why can't we have both PyString *and* PyBytes exposed in 2.x, > > > >>> with one redirecting to the other ? > > > >> > > > >> We do have that - the PyString_* names still work perfectly fine in > 2.x. > > > >> They just won't be used in the Python core codebase anymore - > everything in > > > >> the Python core will use either PyBytes_* or PyUnicode_* regardless > of which > > > >> branch (2.x or 3.x) you're working on. I think that's a good thing > for ease > > > >> of maintenance in the future, even if it takes people a while to get > their > > > >> heads around it right now. > > > > > > > > Sorry, I probably wasn't clear enough: > > > > > > > > Why can't we have both PyString *and* PyBytes exposed as C > > > > APIs (ie. visible in code and in the linker) in 2.x, with one > redirecting > > > > to the other ? > > > > > > > >>> * Why should the 2.x code base turn to hacks, just because 3.x > wants > > > >>> to restructure itself ? > > > >> > > > >> With the better explanation from Greg of what the checked in > approach > > > >> achieves (i.e. preserving exact ABI compatibility for PyString_*, > while > > > >> allowing PyBytes_* to be used at the source code level), I don't see > what > > > >> has been done as being any more of a hack than the possibly more > common > > > >> "#define " (which *would* break binary > compatibility). > > > >> > > > >> The only things that I think would tidy it up further would be to: > > > >> - include an explanation of the approach and its effects on API and > ABI > > > >> backward and forward compatibility within 2.x and between 2.x and > 3.x in > > > >> stringobject.h > > > >> - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0 > > > > > > > > Which is what I was suggesting all along; sorry if I wasn't > > > > clear enough on that. > > > > > > > > The standard approach is that you provide #define redirects from the > > > > old APIs to the new ones (which are then picked up by the compiler) > > > > *and* add function wrappers to the same affect (to make linkers, > > > > dynamic load APIs such ctypes and debuggers happy). > > > > > > > > > > > > Example from pythonrun.h|c: > > > > --------------------------- > > > > > > > > /* Use macros for a bunch of old variants */ > > > > #define PyRun_String(str, s, g, l) PyRun_StringFlags(str, s, g, l, > NULL) > > > > > > > > /* Deprecated C API functions still provided for binary compatiblity > */ > > > > > > > > #undef PyRun_String > > > > PyAPI_FUNC(PyObject *) > > > > PyRun_String(const char *str, int s, PyObject *g, PyObject *l) > > > > { > > > > return PyRun_StringFlags(str, s, g, l, NULL); > > > > } > > > > > > > > > > > > > Okay, how about this? > http://codereview.appspot.com/1521 > > > > > > Using that patch, both PyString_ and PyBytes_ APIs are available using > > > function stubs similar to the above. I opted to define the stub > > > functions right next to the ones they were stubbing rather than > > > putting them all at the end of the file or in another file but they > > > could be moved if someone doesn't like them that way. > > > > > > > > > > I still believe that we should *not* make "easy of merging" the > > > > primary motivation for backporting changes in 3.x to 2.x. Software > > > > design should not be guided by restrictions in the tool chain, > > > > if not absolutely necessary. > > > > > > > > The main argument for a backport needs to be general usefulness > > > > to the 2.x users, IMHO... just like any other feature that > > > > makes it into 2.x. > > > > > > > > If merging is difficult then this needs to be addressed, but > > > > there are more options to that than always going back to the > > > > original 2.x trunk code. I've given a few suggestions on how > > > > this could be approached in other emails on this thread. > > > > > > > > > I am not the one doing the merging or working on merge tools so I'll > > > leave this up to those that are. > > > > > > -gps > > > _______________________________________________ > > > Python-Dev mailing list > > > Python-Dev at python.org > > > http://mail.python.org/mailman/listinfo/python-dev > > > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/dalcinl%40gmail.com > > > > > > > > > > > > > > > -- > > Lisandro Dalc?n > > --------------- > > Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) > > Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) > > Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) > > PTLC - G?emes 3450, (3000) Santa Fe, Argentina > > Tel/Fax: +54-(0)342-451.1594 > > > > -- Lisandro Dalc?n --------------- Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC) Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC) Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET) PTLC - G?emes 3450, (3000) Santa Fe, Argentina Tel/Fax: +54-(0)342-451.1594 From guido at python.org Tue Jun 3 01:09:18 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jun 2008 16:09:18 -0700 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: <52dc1c820806021521g810d9f1wd282508f8452c13@mail.gmail.com> References: <48397ECC.9070805@cheimes.de> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> <483FBCB4.5020007@egenix.com> <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> <4843E884.1060705@egenix.com> <52dc1c820806021521g810d9f1wd282508f8452c13@mail.gmail.com> Message-ID: I will freely admit that I haven't followed this thread in any detail, but if it were up to me, I'd have the 2.6 internal code use PyString (as both what the linker sees and what the human reads in the source code) and the 3.0 code use PyBytes for the same thing. Let the merges be damed -- most changes to 2.6 these days seem to be blocked explicitly from being merged anyway. I'd prefer the 2.6 code base to stay true to 2.x, and the 3.0 code base start afresh where it makes sense. We should reindent more of the 3.0 code base to use 4-space-indents in C code too. I would also add macros that map the PyBytes_* APIs to PyString_*, but I would not start using these internally except in code newly written for 2.6 and intended to be "in the spirit of 3.0". IOW use PyString for 8-bit strings containing text, and PyBytes for 8-bit strings containing binary data. For 8-bit strings that could contain either text or data, I'd use PyString, in the spirit of 2.x. --Guido On Mon, Jun 2, 2008 at 3:21 PM, Gregory P. Smith wrote: > > > On Mon, Jun 2, 2008 at 5:33 AM, M.-A. Lemburg wrote: >>> >>> Okay, how about this? http://codereview.appspot.com/1521 >>> >>> Using that patch, both PyString_ and PyBytes_ APIs are available using >>> function stubs similar to the above. I opted to define the stub >>> functions right next to the ones they were stubbing rather than >>> putting them all at the end of the file or in another file but they >>> could be moved if someone doesn't like them that way. >> >> Thanks. I was working on a similar patch. Looks like you beat >> me to it. >> >> The only thing I'm not sure about is having the wrappers in the >> same file - this is likely to cause merge conflicts when doing >> direct merging and even with an automated renaming approach, >> the extra code would be easier to remove if it were e.g. at >> the end of the file or even better: in a separate file. >> >> My patch worked slightly differently: it adds wrappers PyString* >> that forward calls to the PyBytes* APIs and they all live in >> stringobject.c. stringobject.h then also provides aliases >> so that recompiled extensions pick up the new API names. >> >> While working on my patch I ran into an issue that I haven't >> been able to resolve: the wrapper functions got optimized away >> by the linker and even though they appear in the libpython2.6.a, >> they don't end up in the python binary itself. >> >> As a result, importing Python 2.5 in the resulting 2.6 >> binary still fails with a unresolved PyString symbol. >> >> Please check whether that's the case for your patch as well. > > I think that is going to happen no matter which approach is used (yours or > mine) unless we force some included code to call each of the stubs > (needlessly inefficient). One way to do that is to reference them all from > a section of code called conditionally based upon an always false condition > that the compiler and linker can never predetermine is false so that it > cannot be eliminated as dead code. > > Given that, should we bother? I don't think we really need PyBytes_ to show > up in the binary ABI for 2.x even if that is how we write the calls in the > python internals code. The arguments put forth that debugging is easier if > you can just set a breakpoint on what you read may be true but including > stub functions doesn't help this when most of the time they're compiled > under the alternate name using #defines so a breakpoint set on the stub name > will not actually trigger. > > API wise we're really providing the PyBytes* names to make module author's > work of writing code that targets 2.6 and 3.x easier but isn't it reasonable > for authors to just be told that they're just #defined aliases for > PyString*. There is no goal, nor should there be, of a module binary > compiled against 2.x loading and working in 3.x. > > I expect most module authors, code generators and such will want to target > Python 2.x earlier than 2.6 as well so should we provide PyBytes_ names as a > public API in 2.6 at all? (regardless of if we use the PyBytes names > internally for any reason) > > -gps > > > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/guido%40python.org > > -- --Guido van Rossum (home page: http://www.python.org/~guido/) From greg at krypto.org Tue Jun 3 01:29:16 2008 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 2 Jun 2008 16:29:16 -0700 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: References: <48397ECC.9070805@cheimes.de> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> <483FBCB4.5020007@egenix.com> <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> <4843E884.1060705@egenix.com> <52dc1c820806021521g810d9f1wd282508f8452c13@mail.gmail.com> Message-ID: <52dc1c820806021629k5491e8c0u67e8a6f5247d2368@mail.gmail.com> On Mon, Jun 2, 2008 at 4:09 PM, Guido van Rossum wrote: > I will freely admit that I haven't followed this thread in any detail, > but if it were up to me, I'd have the 2.6 internal code use PyString ... Should we read this as a BDFL pronouncement and make it so? All that would mean change wise is that trunk r63675 as well as possibly r63672 and r63677 would need to be rolled back and this whole discussion over if such a big change should have happened would turn into a moot point. I would also add macros that map the PyBytes_* APIs to PyString_*, but > I would not start using these internally except in code newly written > for 2.6 and intended to be "in the spirit of 3.0". IOW use PyString > for 8-bit strings containing text, and PyBytes for 8-bit strings > containing binary data. For 8-bit strings that could contain either > text or data, I'd use PyString, in the spirit of 2.x. > > --Guido > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ishimoto at gembook.org Tue Jun 3 01:35:48 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Tue, 3 Jun 2008 08:35:48 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> <797440730806011632o76e5dcferca8b6c4dc015a4bf@mail.gmail.com> <48434B35.5080400@canterbury.ac.nz> Message-ID: <797440730806021635u64184235w24ecc8e87094f03e@mail.gmail.com> On Tue, Jun 3, 2008 at 7:30 AM, Guido van Rossum wrote: > On Sun, Jun 1, 2008 at 6:21 PM, Greg Ewing wrote: >> Atsuo Ishimoto wrote: >> >>> I'm not comfortable with "printable", too. Is "legible" better? This >>> is first time for me to see this word in my life :). >> >> The term "printable" has a long history in computing of >> meaning that a character code corresponds to some visual >> glyph, even if the display process involved isn't literally >> printing. It would be confusing to replace it with something >> else now, I think. > > Agreed. I'm +1 on everything the PEP specifies. I'll accept it > tomorrow. Other developers, please review Atsuo's patch in > http://bugs.python.org/issue2630 . > Thank you! Mark Summerfield suggested to add "!a" conversion flag to the str.format() method. I'll add the conversion flag to the patch and PEP later today. From brett at python.org Tue Jun 3 02:22:51 2008 From: brett at python.org (Brett Cannon) Date: Mon, 2 Jun 2008 17:22:51 -0700 Subject: [Python-3000] Postponing the first betas In-Reply-To: <06BB1BB8-2C9C-4588-9B38-49DE234752E0@python.org> References: <06BB1BB8-2C9C-4588-9B38-49DE234752E0@python.org> Message-ID: On Mon, Jun 2, 2008 at 3:51 PM, Barry Warsaw wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > We are going to postpone the first beta releases by one week. We had some > problems with mail.python.org today, which prompted a query to Guido from me > about the postponement. mail.python.org should now be back up normally now, > as evidenced by the emailfloodl but in the meantime, Guido said: > > "I'd also like to see PEP 3138 (CJK-friendly repr()) and the > pyprocessing PEP implemented by then, and perhaps some other small > stuff." > > So we're going to do the first beta releases next Wednesday, June 11. > Please take this time to stabilize all APIs and features, both in Python > and C. Next week, I'll do a gut check on critical and release blocker bugs, > so please also take a look at those and try to fix what you can. > Now is as good a time as any to mention that on Wednesday I am flying out to help my mother move. I don't know when she is going to have her Internet connection set up, so I might not be back online until June 16. But thanks to all the help I have been receiving on PEP 3108, I trust the various people involved to continue to do the right thing in my absence. -Brett From musiccomposition at gmail.com Tue Jun 3 02:26:22 2008 From: musiccomposition at gmail.com (Benjamin Peterson) Date: Mon, 2 Jun 2008 19:26:22 -0500 Subject: [Python-3000] Postponing the first betas In-Reply-To: References: <06BB1BB8-2C9C-4588-9B38-49DE234752E0@python.org> Message-ID: <1afaf6160806021726w2ed8d453p9467ee006b90aaee@mail.gmail.com> On Mon, Jun 2, 2008 at 7:22 PM, Brett Cannon wrote: >> > > Now is as good a time as any to mention that on Wednesday I am flying > out to help my mother move. I don't know when she is going to have her > Internet connection set up, so I might not be back online until June > 16. But thanks to all the help I have been receiving on PEP 3108, I > trust the various people involved to continue to do the right thing in > my absence. That reminds me of those Dilbert cartoons where his mother ends up knowing much more about computers than he does. :) -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1." From fdrake at acm.org Tue Jun 3 04:56:56 2008 From: fdrake at acm.org (Fred Drake) Date: Mon, 2 Jun 2008 22:56:56 -0400 Subject: [Python-3000] sys.exc_info() In-Reply-To: References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> Message-ID: <44F57FF6-1DC0-4956-942B-3ABE5B6F9F34@acm.org> On May 31, 2008, at 6:42 PM, Tim Delaney wrote: > This reminds me of something I've thought a few times - maybe the > tuple returned from sys.exc_info() should be a named tuple. +1 -Fred -- Fred Drake From rhamph at gmail.com Tue Jun 3 05:16:25 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 2 Jun 2008 21:16:25 -0600 Subject: [Python-3000] sys.exc_info() In-Reply-To: <44F57FF6-1DC0-4956-942B-3ABE5B6F9F34@acm.org> References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> <44F57FF6-1DC0-4956-942B-3ABE5B6F9F34@acm.org> Message-ID: On Mon, Jun 2, 2008 at 8:56 PM, Fred Drake wrote: > On May 31, 2008, at 6:42 PM, Tim Delaney wrote: >> >> This reminds me of something I've thought a few times - maybe the tuple >> returned from sys.exc_info() should be a named tuple. > > +1 It should be replaced with a function that returns only the value - type and traceback are both redundant now. I don't think anything's been proposed yet though. -- Adam Olsen, aka Rhamphoryncus From rasky at develer.com Tue Jun 3 02:00:15 2008 From: rasky at develer.com (Giovanni Bajo) Date: Tue, 3 Jun 2008 00:00:15 +0000 (UTC) Subject: [Python-3000] -t command line option Message-ID: Hello, Python 3.0 defaults to "-tt" (error on inconsistent usage of tab and spaces). Then: why is there still a "-t" and "-tt" command line option? Is just a relic that should be removed? Thanks! -- Giovanni Bajo Develer S.r.l. http://www.develer.com From stefan_ml at behnel.de Mon Jun 2 21:15:31 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 02 Jun 2008 21:15:31 +0200 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References: Message-ID: Travis Oliphant wrote: > This should be clarified in the PEP. Can you take a stab at it? Would this work? Stefan -------------- next part -------------- A non-text attachment was scrubbed... Name: pep-3113-locking.patch Type: text/x-patch Size: 8527 bytes Desc: not available URL: From stefan_ml at behnel.de Mon Jun 2 14:38:05 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 2 Jun 2008 12:38:05 +0000 (UTC) Subject: [Python-3000] Single buffer implied in new buffer protocol? References: Message-ID: Travis Oliphant ieee.org> writes: > This should be clarified in the PEP. Can you take a stab at it? Here's a patch for what I think the locking protocol part should look like. It changes the "passing NULL as Py_buffer" bit into a requirement to always pass a Py_buffer structure, and distinguishes the two cases where you set LOCK and pass either a NULL buf pointer or a valid buf pointer into getbuffer. It also adds a new UNLOCK flag for an explicit unlock without invalidating the Py_buffer view. The idea it that the provider can (!) decide to return different buffers for locked and non-locked access and can switch between the two during a call to getbuffer. That matches the existing section about copying buffers somewhere near the end. Stefan Index: pep-3118.txt =================================================================== --- pep-3118.txt (Revision 63861) +++ pep-3118.txt (Arbeitskopie) @@ -153,10 +153,8 @@ This function returns ``0`` on success and ``-1`` on failure (and raises an error). The first variable is the "exporting" object. The second -argument is the address to a bufferinfo structure. If view is ``NULL``, -then no information is returned but a lock on the memory is still -obtained. In this case, the corresponding releasebuffer should also -be called with ``NULL``. +argument is the address to a bufferinfo structure. Both arguments must +never be NULL. The third argument indicates what kind of buffer the consumer is prepared to deal with and therefore what kind of buffer the exporter @@ -178,6 +176,19 @@ structure (with defaults or NULLs if nothing else is requested). The PyBuffer_FillInfo function can be used for simple cases. +The second function is called to release the Py_buffer view, which may +allow the provider to clean up the buffer itself:: + + typedef void (*releasebufferproc)(Py_buffer *view) + +Any existing lock on the buffer will be released by this call. The +Py_buffer struct will be invalidated and can no longer be used by the +caller. + + +Access flags +------------ + Some flags are useful for requesting a specific kind of memory segment, while others indicate to the exporter what kind of information the consumer can deal with. If certain information is not @@ -185,14 +196,6 @@ without that information, then a ``PyErr_BufferError`` should be raised. -``PyBUF_SIMPLE`` - - This is the default flag state (0). The returned buffer may or may - not have writable memory. The format will be assumed to be - unsigned bytes . This is a "stand-alone" flag constant. It never - needs to be \|'d to the others. The exporter will raise an error if - it cannot provide such a contiguous buffer of bytes. - ``PyBUF_WRITABLE`` The returned buffer must be writable. If it is not writable, @@ -221,6 +224,54 @@ necessary (especially the exclusive write lock) as it makes the object unable to share its memory until the lock is released. + The ``PyBUF_LOCK`` flag is the only case where a Py_buffer struct + with an initialised ``buf`` field can be passed. This enables two + general locking cases: + + * lock a new buffer: The caller requests a lock at the same time as + requesting the buffer (i.e. ``buf`` is NULL), in an atomic + operation. + + * lock an existing buffer: The caller has already received a buffer + view and now wants to gain a lock on the existing buffer (i.e. + ``buf`` is a valid buffer pointer). Note that the provider is + free to change the ``buf`` pointer during this call, so the + previously used buffer may become invalid. + + If the call succeeds, this means that the consumer now has the + exclusive requested rights on the buffer. The lock can be released + by either calling ``releasebuffer`` on the Py_buffer, or by + explicitly releasing the lock in a subsequent call to ``getbuffer`` + that sets the ``PyBUF_UNLOCK`` flag. + +``PyBUF_UNLOCK`` + + This flag requests to release the lock on an existing buffer, while + keeping the Py_buffer view alive. The ``buf`` field must be + initialised by a previous call to ``getbuffer`` and should have + been locked before (if is not an error if the buffer is not + currently locked). Similar to the LOCK call, the provider may + decide to change the ``buf`` field in this case, so the previous + buffer may become invalid. + + The provider is free to ignore any flags except for the WRITABLE + flag, so the caller cannot request a new buffer layout with an + UNLOCK call. If the WRITABLE flag is set, only an existing + exclusive write lock will be released, but an existing read lock + will be kept. No new locks can be acquired with an UNLOCK call. + + +Memory layout flags +------------------- + +``PyBUF_SIMPLE`` + + This is the default flag state (0). The returned buffer may or may + not have writable memory. The format will be assumed to be + unsigned bytes . This is a "stand-alone" flag constant. It never + needs to be \|'d to the others. The exporter will raise an error if + it cannot provide such a contiguous buffer of bytes. + ``PyBUF_FORMAT`` The returned buffer must have true format information if this flag @@ -256,7 +307,6 @@ All of these flags imply PyBUF_STRIDES and guarantee that the strides buffer info structure will be filled in correctly. - ``PyBUF_INDIRECT`` (implies ``PyBUF_STRIDES``) The returned buffer must have suboffsets information (which can be @@ -307,6 +357,10 @@ buffer info structure correctly according to the provided flags if a contiguous chunk of "unsigned bytes" is all that can be exported. + +The Py_buffer struct +-------------------- + The bufferinfo structure is:: struct bufferinfo { @@ -322,14 +376,15 @@ void *internal; } Py_buffer; -Before calling the bf_getbuffer function, the bufferinfo structure can be -filled with whatever. Upon return from bf_getbuffer, the bufferinfo -structure is filled in with relevant information about the buffer. -This same bufferinfo structure must be passed to bf_releasebuffer (if -available) when the consumer is done with the memory. The caller is -responsible for keeping a reference to obj until releasebuffer is -called (i.e. the call to bf_getbuffer does not alter the reference -count of obj). +Before calling the bf_getbuffer function, the bufferinfo structure can +be filled with whatever, but the ``buf`` field must be NULL when +requesting a new buffer. Upon return from bf_getbuffer, the +bufferinfo structure is filled in with relevant information about the +buffer. This same bufferinfo structure must be passed to +bf_releasebuffer (if available) when the consumer is done with the +memory. The caller is responsible for keeping a reference to obj until +releasebuffer is called (i.e. the call to bf_getbuffer does not alter +the reference count of obj). The members of the bufferinfo structure are: @@ -344,13 +399,13 @@ ``readonly`` an integer variable to hold whether or not the memory is readonly. 1 means the memory is readonly, zero means the memory is writable, - -1 means the memory was read "locked" when this Py_buffer - structure was filled-in therefore should be unlocked when this - Py_buffer structure is "released." A -2 means this Py_buffer - structure has an exclusive-write lock on the memory. This should - be unlocked when the Py_buffer structure is released. The concept - of locking is not supported by all objects that expose the buffer - protocol. + -1 means the memory was read "locked" either when this Py_buffer + structure was filled-in or later on with an explicit LOCK flag, + therefore should be unlocked when this Py_buffer structure is + "released". A -2 means this Py_buffer structure has an + exclusive-write lock on the memory. This should be unlocked when + the Py_buffer structure is released. The concept of locking is + not supported by all objects that expose the buffer protocol. ``format`` a NULL-terminated format-string (following the struct-style syntax @@ -571,7 +626,7 @@ :: PyObject * PyMemoryView_GetContiguous(PyObject *obj, int buffertype, - char fort) + char fortran) Return a memoryview object to a contiguous chunk of memory represented by obj. If a copy must be made (because the memory pointed to by obj @@ -818,10 +873,10 @@ The proposed locking mechanism relies entirely on the exporter object to not invalidate any of the memory pointed to by the buffer structure -until a corresponding releasebuffer is called. If it wants to be able -to change its own shape and/or strides arrays, then it needs to create -memory for these in the bufferinfo structure and copy information -over. +until a corresponding releasebuffer is called or the UNLOCK flag is +passed to a getbuffer call. If it wants to be able to change its own +shape and/or strides arrays, then it needs to create memory for these +in the bufferinfo structure and copy information over. The sharing of strided memory and suboffsets is new and can be seen as a modification of the multiple-segment interface. It is motivated by From solipsis at pitrou.net Mon Jun 2 11:58:14 2008 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 2 Jun 2008 09:58:14 +0000 (UTC) Subject: [Python-3000] =?utf-8?b?c3lzLmV4Y19pbmZvKCk=?= References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> <00ab01c8c383$fb909db0$f2b1d910$@com.au> <43aa6ff70806011952r4f3fdd49q7f5e0456689e90c7@mail.gmail.com> Message-ID: Collin Winter gmail.com> writes: > > See PEP 3109: http://www.python.org/dev/peps/pep-3109/ By the way, this document mentions a "raise ... from ..." form, but it doesn't seem to me it has been implemented. Perhaps the document should be corrected? Also, it doesn't mention the with_traceback() method of exception objects. Antoine. From guido at python.org Tue Jun 3 05:46:31 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jun 2008 20:46:31 -0700 Subject: [Python-3000] sys.exc_info() In-Reply-To: References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> <44F57FF6-1DC0-4956-942B-3ABE5B6F9F34@acm.org> Message-ID: On Mon, Jun 2, 2008 at 8:16 PM, Adam Olsen wrote: > On Mon, Jun 2, 2008 at 8:56 PM, Fred Drake wrote: >> On May 31, 2008, at 6:42 PM, Tim Delaney wrote: >>> >>> This reminds me of something I've thought a few times - maybe the tuple >>> returned from sys.exc_info() should be a named tuple. >> >> +1 > > It should be replaced with a function that returns only the value - > type and traceback are both redundant now. I don't think anything's > been proposed yet though. Since I expect that in a while we will be able to deprecate sys.exc_info() and later kill it, I would rather not meddle with it now. There is tons of code out there that does fairly obscure things with it, and keeping that code happy is higher on my list than cleaning up an API that's eventually doomed. I'm similarly underwhelmed by the idea having it return a named tuple. I personally don't have any trouble keeping three values apart, so I don't think it adds much. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jun 3 05:47:38 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jun 2008 20:47:38 -0700 Subject: [Python-3000] -t command line option In-Reply-To: References: Message-ID: On Mon, Jun 2, 2008 at 5:00 PM, Giovanni Bajo wrote: > Python 3.0 defaults to "-tt" (error on inconsistent usage of tab and > spaces). Then: why is there still a "-t" and "-tt" command line option? > Is just a relic that should be removed? Probably. Though there are plenty of precedents for leaving such inactive options in for a long time, to avoid unnecessarily breaking hairy shell scripts. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From rhamph at gmail.com Tue Jun 3 05:56:39 2008 From: rhamph at gmail.com (Adam Olsen) Date: Mon, 2 Jun 2008 21:56:39 -0600 Subject: [Python-3000] sys.exc_info() In-Reply-To: References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> <44F57FF6-1DC0-4956-942B-3ABE5B6F9F34@acm.org> Message-ID: On Mon, Jun 2, 2008 at 9:46 PM, Guido van Rossum wrote: > On Mon, Jun 2, 2008 at 8:16 PM, Adam Olsen wrote: >> On Mon, Jun 2, 2008 at 8:56 PM, Fred Drake wrote: >>> On May 31, 2008, at 6:42 PM, Tim Delaney wrote: >>>> >>>> This reminds me of something I've thought a few times - maybe the tuple >>>> returned from sys.exc_info() should be a named tuple. >>> >>> +1 >> >> It should be replaced with a function that returns only the value - >> type and traceback are both redundant now. I don't think anything's >> been proposed yet though. > > Since I expect that in a while we will be able to deprecate > sys.exc_info() and later kill it, I would rather not meddle with it > now. There is tons of code out there that does fairly obscure things > with it, and keeping that code happy is higher on my list than > cleaning up an API that's eventually doomed. > > I'm similarly underwhelmed by the idea having it return a named tuple. > I personally don't have any trouble keeping three values apart, so I > don't think it adds much. So keep the old sys.exc_info() (at least for a few more releases) and add a new function that only returns the value? Just need to find a name we can be happy with for a long time.. maybe sys.exception_block()? -- Adam Olsen, aka Rhamphoryncus From guido at python.org Tue Jun 3 05:59:02 2008 From: guido at python.org (Guido van Rossum) Date: Mon, 2 Jun 2008 20:59:02 -0700 Subject: [Python-3000] sys.exc_info() In-Reply-To: References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> <44F57FF6-1DC0-4956-942B-3ABE5B6F9F34@acm.org> Message-ID: On Mon, Jun 2, 2008 at 8:56 PM, Adam Olsen wrote: > On Mon, Jun 2, 2008 at 9:46 PM, Guido van Rossum wrote: >> On Mon, Jun 2, 2008 at 8:16 PM, Adam Olsen wrote: >>> On Mon, Jun 2, 2008 at 8:56 PM, Fred Drake wrote: >>>> On May 31, 2008, at 6:42 PM, Tim Delaney wrote: >>>>> >>>>> This reminds me of something I've thought a few times - maybe the tuple >>>>> returned from sys.exc_info() should be a named tuple. >>>> >>>> +1 >>> >>> It should be replaced with a function that returns only the value - >>> type and traceback are both redundant now. I don't think anything's >>> been proposed yet though. >> >> Since I expect that in a while we will be able to deprecate >> sys.exc_info() and later kill it, I would rather not meddle with it >> now. There is tons of code out there that does fairly obscure things >> with it, and keeping that code happy is higher on my list than >> cleaning up an API that's eventually doomed. >> >> I'm similarly underwhelmed by the idea having it return a named tuple. >> I personally don't have any trouble keeping three values apart, so I >> don't think it adds much. > > So keep the old sys.exc_info() (at least for a few more releases) and > add a new function that only returns the value? Just need to find a > name we can be happy with for a long time.. maybe > sys.exception_block()? Actually I think we won't need that function once we're used to just passing exception instances around. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From fdrake at acm.org Tue Jun 3 06:05:05 2008 From: fdrake at acm.org (Fred Drake) Date: Tue, 3 Jun 2008 00:05:05 -0400 Subject: [Python-3000] sys.exc_info() In-Reply-To: References:

<003a01c8c2e9$afc2d910$0f488b30$@com.au> <44F57FF6-1DC0-4956-942B-3ABE5B6F9F34@acm.org> Message-ID: On Jun 2, 2008, at 11:59 PM, Guido van Rossum wrote: > Actually I think we won't need that function once we're used to just > passing exception instances around. Ah, so many differences that I've lost track of. I feel so... py2k. :-( -Fred -- Fred Drake From g.brandl at gmx.net Tue Jun 3 11:25:45 2008 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 03 Jun 2008 11:25:45 +0200 Subject: [Python-3000] -t command line option In-Reply-To: References: Message-ID: Guido van Rossum schrieb: > On Mon, Jun 2, 2008 at 5:00 PM, Giovanni Bajo wrote: >> Python 3.0 defaults to "-tt" (error on inconsistent usage of tab and >> spaces). Then: why is there still a "-t" and "-tt" command line option? >> Is just a relic that should be removed? > > Probably. Though there are plenty of precedents for leaving such > inactive options in for a long time, to avoid unnecessarily breaking > hairy shell scripts. It's even stranger: you can use -ttt to disable the errors again. ------------------------------------------------------------------------ r45381 | thomas.wouters | 2006-04-14 13:33:28 +0200 (Fr, 14 Apr 2006) | 9 lines Make 'python -tt' the default, meaning Python won't allow mixing tabs and spaces for indentation. Adds a '-ttt' option to turn the errors back into warnings; I'm not yet sure whether that's desireable for Py3K. ... Should this stay? In any case, the usage string and the docs for -t and sys.flags must be corrected. Georg From guido at python.org Tue Jun 3 16:35:36 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 3 Jun 2008 07:35:36 -0700 Subject: [Python-3000] -t command line option In-Reply-To: References: Message-ID: On Tue, Jun 3, 2008 at 2:25 AM, Georg Brandl wrote: > Guido van Rossum schrieb: >> >> On Mon, Jun 2, 2008 at 5:00 PM, Giovanni Bajo wrote: >>> >>> Python 3.0 defaults to "-tt" (error on inconsistent usage of tab and >>> spaces). Then: why is there still a "-t" and "-tt" command line option? >>> Is just a relic that should be removed? >> >> Probably. Though there are plenty of precedents for leaving such >> inactive options in for a long time, to avoid unnecessarily breaking >> hairy shell scripts. > > It's even stranger: you can use -ttt to disable the errors again. > > ------------------------------------------------------------------------ > r45381 | thomas.wouters | 2006-04-14 13:33:28 +0200 (Fr, 14 Apr 2006) | 9 > lines > > > Make 'python -tt' the default, meaning Python won't allow mixing tabs and > spaces for indentation. Adds a '-ttt' option to turn the errors back into > warnings; I'm not yet sure whether that's desireable for Py3K. > ... > > Should this stay? > > In any case, the usage string and the docs for -t and sys.flags must > be corrected. I think by now it can be removed. Just last week in an App Engine code lab, I was helping someone who had just started to use Python on a Windows box using some Windows-only editor (not Notepad :-) who ran into trouble by mixing tabs and spaces. His editor most unhelpfully displayed a tab as four spaces. Had Python defaulted to -tt we would have known much quicker that this was the case. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From mal at egenix.com Tue Jun 3 19:43:45 2008 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 03 Jun 2008 19:43:45 +0200 Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0 In-Reply-To: References: <48397ECC.9070805@cheimes.de> <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com> <483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de> <483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com> <483FBCB4.5020007@egenix.com> <52dc1c820806011630y7957ef90n2b7b3441ba9451b5@mail.gmail.com> <4843E884.1060705@egenix.com> <52dc1c820806021521g810d9f1wd282508f8452c13@mail.gmail.com> Message-ID: <484582D1.7000309@egenix.com> On 2008-06-03 01:09, Guido van Rossum wrote: > I will freely admit that I haven't followed this thread in any detail, > but if it were up to me, I'd have the 2.6 internal code use PyString > (as both what the linker sees and what the human reads in the source > code) and the 3.0 code use PyBytes for the same thing. Let the merges > be damed -- most changes to 2.6 these days seem to be blocked > explicitly from being merged anyway. I'd prefer the 2.6 code base to > stay true to 2.x, and the 3.0 code base start afresh where it makes > sense. We should reindent more of the 3.0 code base to use > 4-space-indents in C code too. > > I would also add macros that map the PyBytes_* APIs to PyString_*, but > I would not start using these internally except in code newly written > for 2.6 and intended to be "in the spirit of 3.0". IOW use PyString > for 8-bit strings containing text, and PyBytes for 8-bit strings > containing binary data. For 8-bit strings that could contain either > text or data, I'd use PyString, in the spirit of 2.x. +1 Let's work on better merge tools that edit the trunk code base into shape for a 3.x checkin. Using automated tools for this is likely going to lower the probability of bugs introduced due to unnoticed merge conflicts and in the end is also going to be a benefit to everyone wanting to maintain a single code base for both targets. Perhaps we could revive the old Tools/scripts/fixcid.py that was used for the 1.4->1.5 renaming ?! -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 03 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 33 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 From ishimoto at gembook.org Tue Jun 3 19:53:56 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Wed, 4 Jun 2008 02:53:56 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730806021635u64184235w24ecc8e87094f03e@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> <797440730806011632o76e5dcferca8b6c4dc015a4bf@mail.gmail.com> <48434B35.5080400@canterbury.ac.nz> <797440730806021635u64184235w24ecc8e87094f03e@mail.gmail.com> Message-ID: <797440730806031053m443e8c09g2016b464759ff1ed@mail.gmail.com> > Mark Summerfield suggested to add "!a" conversion flag to the > str.format() method. I'll add the conversion flag to the patch and > PEP later today. I updated PEP and patch. New patch is updated to http://bugs.python.org/issue2630. Changes are:: - Added conversion flag to the str.format() and PyUnicode_FromFormat() C API. - Added new C API PyObject_ASCII() which called from ascii() builtin function for consistency. If PyObject_ASCII() is not necessary to be public API, I'll make it internal function. Questions:: - The error-handler of the sys.stderr is now configurable by PYTHONIOENCODING, but I think default error-handler of sys.stderr should be 'backslashreplace'. PYTHONIOENCODING is fine for sys.stdout, but sys.stderr is not necessary to have same error-handler with sys.stdout. - Should new C APIs/function/method/string format operaters be back-ported to Python 2.6, without modifying repr() itself? If so, ll prepare a patch for Python 2.6. --------------------------------------------------- PEP: 3138 Title: String representation in Python 3000 Version: $Revision$ Last-Modified: $Date$ Author: Atsuo Ishimoto Status: Draft Type: Standards Track Content-Type: text/x-rst Created: Post-History: Abstract ======== This PEP proposes a new string representation form for Python 3000. In Python prior to Python 3000, the repr() built-in function converted arbitrary objects to printable ASCII strings for debugging and logging. For Python 3000, a wider range of characters, based on the Unicode standard, should be considered 'printable'. Motivation ========== The current repr() converts 8-bit strings to ASCII using following algorithm. - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. - Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII characters(>=0x80) to '\\xXX'. - Backslash-escape quote characters (apostrophe, ') and add the quote character at the beginning and the end. For Unicode strings, the following additional conversions are done. - Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. - Convert 16-bit characters(>=0x100) to '\\uXXXX'. - Convert 21-bit characters(>=0x10000) and surrogate pair characters to '\\U00xxxxxx'. This algorithm converts any string to printable ASCII, and repr() is used as a handy and safe way to print strings for debugging or for logging. Although all non-ASCII characters are escaped, this does not matter when most of the string's characters are ASCII. But for other languages, such as Japanese where most characters in a string are not ASCII, this is very inconvenient. We can use ``print(aJapaneseString)`` to get a readable string, but we don't have a similar workaround for printing strings from collections such as lists or tuples. ``print(listOfJapaneseStrings)`` uses repr() to build the string to be printed, so the resulting strings are always hex-escaped. Or when ``open(japaneseFilemame)`` raises an exception, the error message is something like ``IOError: [Errno 2] No such file or directory: '\u65e5\u672c\u8a9e'``, which isn't helpful. Python 3000 has a lot of nice features for non-Latin users such as non-ASCII identifiers, so it would be helpful if Python could also progress in a similar way for printable output. Some users might be concerned that such output will mess up their console if they print binary data like images. But this is unlikely to happen in practice because bytes and strings are different types in Python 3000, so printing an image to the console won't mess it up. This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected. Specification ============= - Add a new function to the Python C API ``int PY_UNICODE_ISPRINTABLE (Py_UNICODE ch)``. This function returns 0 if repr() should escape the Unicode character ``ch``; otherwise it returns 1. Characters that should be escaped are defined in the Unicode character database as: * Cc (Other, Control) * Cf (Other, Format) * Cs (Other, Surrogate) * Co (Other, Private Use) * Cn (Other, Not Assigned) * Zl (Separator, Line), refers to LINE SEPARATOR ('\\u2028'). * Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR ('\\u2029'). * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in this category should be escaped to avoid ambiguity. - The algorithm to build repr() strings should be changed to: * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'. * Convert non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'. * Convert leading surrogate pair characters without trailing character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'. * Convert non-printable characters(PY_UNICODE_ISPRINTABLE() returns 0) to 'xXX', '\\uXXXX' or '\\U00xxxxxx'. * Backslash-escape quote characters (apostrophe, 0x27) and add quote character at the beginning and the end. - Set the Unicode error-handler for sys.stderr to 'backslashreplace' by default. - Add a new function to the Python C API ``PyObject *PyObject_ASCII (PyObject *o)``. This function converts any python object to a string using PyObject_Repr() and then hex-escapes all non-ASCII characters. `` PyObject_ASCII()`` generates the same string as ``PyObject_Repr()`` in Python 2. - Add a new built-in function, ``ascii()``. This function converts any python object to a string using repr() and then hex-escapes all non-ASCII characters. ``ascii()`` generates the same string as ``repr()`` in Python 2. - Add ``'%a'`` string format operator. ``'%a'`` converts any python object to a string using repr() and then hex-escapes all non-ASCII characters. The ``'%a'`` format operator generates the same string as ``'%r'`` in Python 2. Also, add ``'!a'`` conversion flags to the ``string.format()`` method and add ``'%A'`` operator to the PyUnicode_FromFormat(). They converts any object to an ASCII string as ``'%a'`` string format operator. - Add an ``isprintable()`` method to the string type. ``str.isprintable()`` returns False if repr() should escape any character in the string; otherwise returns True. The ``isprintable()`` method calls the ``PY_UNICODE_ISPRINTABLE()`` function internally. Rationale ========= The repr() in Python 3000 should be Unicode not ASCII based, just like Python 3000 strings. Also, conversion should not be affected by the locale setting, because the locale is not necessarily the same as the output device's locale. For example, it is common for a daemon process to be invoked in an ASCII setting, but writes UTF-8 to its log files. Also, web applications might want to report the error information in more readable form based on the HTML page's encoding. Characters not supported by the user's console could be hex-escaped on printing, by the Unicode encoder's error-handler. If the error-handler of the output file is 'backslashreplace', such characters are hex-escaped without raising UnicodeEncodeError. For example, if your default encoding is ASCII, ``print('Hello ?')`` will print 'Hello \\xa2'. If your encoding is ISO-8859-1, 'Hello ?' will be printed. The default error-handler for sys.stdout is 'strict'. Other applications reading the output might not understand hex-escaped characters, so unsupported characters should be trapped when writing. If you need to escape unsupported characters, you should explicitly change the error-handler. Unlike sys.stdout, sys.stderr doesn't raise UnicodeEncodingError by default, because the default error-handler is 'backslashreplace'. So printing error messeges containing non-ASCII characters to sys.stderr will not raise an exception. Also, information about uncaught exceptions (exception object, traceback) are printed by the interpreter without raising exceptions. Alternate Solutions ------------------- To help debugging in non-Latin languages without changing repr(), other suggestions were made. - Supply a tool to print lists or dicts. Strings to be printed for debugging are not only contained by lists or dicts, but also in many other types of object. File objects contain a file name in Unicode, exception objects contain a message in Unicode, etc. These strings should be printed in readable form when repr()ed. It is unlikely to be possible to implement a tool to print all possible object types. - Use sys.displayhook and sys.excepthook. For interactive sessions, we can write hooks to restore hex escaped characters to the original characters. But these hooks are called only when printing the result of evaluating an expression entered in an interactive Python session, and doesn't work for the print() function, for non-interactive sessions or for logging.debug("%r", ...), etc. - Subclass sys.stdout and sys.stderr. It is difficult to implement a subclass to restore hex-escaped characters since there isn't enough information left by the time it's a string to undo the escaping correctly in all cases. For example, ``print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But there is no chance to tell file objects apart. - Make the encoding used by unicode_repr() adjustable, and make the existing repr() the default. With adjustable repr(), the result of using repr() is unpredictable and would make it impossible to write correct code involving repr(). And if current repr() is the default, then the old convention remains intact and users may expect ASCII strings as the result of repr(). Third party applications or libraries could be confused when a custom repr() function is used. Backwards Compatibility ======================= Changing repr() may break some existing code, especially testing code. Five of Python's regression tests fail with this modification. If you need repr() strings without non-ASCII character as Python 2, you can use the following function. :: def repr_ascii(obj): return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII") For logging or for debugging, the following code can raise UnicodeEncodeError. :: log = open("logfile", "w") log.write(repr(data)) # UnicodeEncodeError will be raised # if data contains unsupported characters. To avoid exceptions being raised, you can explicitly specify the error- handler. :: log = open("logfile", "w", errors="backslashreplace") log.write(repr(data)) # Unsupported characters will be escaped. For a console that uses a Unicode-based encoding, for example, en_US. utf8 or de_DE.utf8, the backslashescape trick doesn't work and all printable characters are not escaped. This will cause a problem of similarly drawing characters in Western, Greek and Cyrillic languages. These languages use similar (but different) alphabets (descended from a common ancestor) and contain letters that look similar but have different character codes. For example, it is hard to distinguish Latin 'a', 'e' and 'o' from Cyrillic '?', '?' and '?'. (The visual representation, of course, very much depends on the fonts used but usually these letters are almost indistinguishable.) To avoid the problem, the user can adjust the terminal encoding to get a result suitable for their environment. Rejected Proposals ================== - Add encoding and errors arguments to the builtin print() function, with defaults of sys.getfilesystemencoding() and 'backslashreplace'. Complicated to implement, and in general, this is not seen as a good idea. [2]_ - Use character names to escape characters, instead of hex character codes. For example, ``repr('\u03b1')`` can be converted to ``"\N{GREEK SMALL LETTER ALPHA}"``. Using character names can be very verbose compared to hex-escape. e. g., ``repr("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``. - Default error-handler of sys.stdout should be 'backslashreplace'. Stuff written to stdout might be consumed by another program that might misinterpret the \ escapes. For interactive session, it is possible to make 'backslashreplace' error-handler to default, but may add confusion of the kind "it works in interactive mode but not when redirecting to a file". Reference Implementation ======================== http://bugs.python.org/issue2630 References ========== .. [1] Multibyte string on string::string_print (http://bugs.python.org/issue479898) .. [2] [Python-3000] Displaying strings containing unicode escapes (http://mail.python.org/pipermail/python-3000/2008-April/013366.html) Copyright ========= This document has been placed in the public domain. From guido at python.org Tue Jun 3 20:55:00 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 3 Jun 2008 11:55:00 -0700 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com> <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com> <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> Message-ID: On Sun, Jun 1, 2008 at 8:32 AM, Atsuo Ishimoto wrote: > Here's new PEP, and new patch is uploaded at http://bugs.python.org/issue2630. > (codereview.appspot.com refused to create new issue for this patch, btw.) Thanks for the report! I had made codereview UTF-8-aware, but it choked on Latin-1. I now use Latin-1 as a fallback. You can see for yourself here: http://codereview.appspot.com/1465 -- --Guido van Rossum (home page: http://www.python.org/~guido/) From guido at python.org Tue Jun 3 23:02:20 2008 From: guido at python.org (Guido van Rossum) Date: Tue, 3 Jun 2008 14:02:20 -0700 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: <797440730806031053m443e8c09g2016b464759ff1ed@mail.gmail.com> References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> <797440730806011632o76e5dcferca8b6c4dc015a4bf@mail.gmail.com> <48434B35.5080400@canterbury.ac.nz> <797440730806021635u64184235w24ecc8e87094f03e@mail.gmail.com> <797440730806031053m443e8c09g2016b464759ff1ed@mail.gmail.com> Message-ID: On Tue, Jun 3, 2008 at 10:53 AM, Atsuo Ishimoto wrote: > I updated PEP and patch. New patch is updated to > http://bugs.python.org/issue2630. > > Changes are:: > > - Added conversion flag to the str.format() and PyUnicode_FromFormat() C API. > > - Added new C API PyObject_ASCII() which called from ascii() builtin > function for consistency. If PyObject_ASCII() is not necessary to be > public API, I'll make it internal function. I've accepted the PEP, meaning implementation can now go ahead. Hopefully it will make it into 3.0b1. Congratulations, Atsuo! > Questions:: > > - The error-handler of the sys.stderr is now configurable by > PYTHONIOENCODING, but I think default error-handler of sys.stderr > should be 'backslashreplace'. PYTHONIOENCODING is fine for sys.stdout, > but sys.stderr is not necessary to have same error-handler with > sys.stdout. Correct. You can fix this in the PEP or just in the code. :-) > - Should new C APIs/function/method/string format operaters be > back-ported to Python 2.6, without modifying repr() itself? If so, ll > prepare a patch for Python 2.6. I don't think the C level features need to be backported. Backporting the .format() operations is probably a good idea. -- --Guido van Rossum (home page: http://www.python.org/~guido/) From ishimoto at gembook.org Wed Jun 4 03:43:19 2008 From: ishimoto at gembook.org (Atsuo Ishimoto) Date: Wed, 4 Jun 2008 10:43:19 +0900 Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in Python 3000 In-Reply-To: References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com> <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com> <797440730806010832t1137891fi8e47458b8dafb1b2@mail.gmail.com> <797440730806011632o76e5dcferca8b6c4dc015a4bf@mail.gmail.com> <48434B35.5080400@canterbury.ac.nz> <797440730806021635u64184235w24ecc8e87094f03e@mail.gmail.com> <797440730806031053m443e8c09g2016b464759ff1ed@mail.gmail.com> Message-ID: <797440730806031843o1ad6aa47w8b0f93749dae1d8@mail.gmail.com> On Wed, Jun 4, 2008 at 6:02 AM, Guido van Rossum wrote: > On Tue, Jun 3, 2008 at 10:53 AM, Atsuo Ishimoto wrote: >> I updated PEP and patch. New patch is updated to >> http://bugs.python.org/issue2630. >> >> Changes are:: >> >> - Added conversion flag to the str.format() and PyUnicode_FromFormat() C API. >> >> - Added new C API PyObject_ASCII() which called from ascii() builtin >> function for consistency. If PyObject_ASCII() is not necessary to be >> public API, I'll make it internal function. > > I've accepted the PEP, meaning implementation can now go ahead. > Hopefully it will make it into 3.0b1. Congratulations, Atsuo! Thank you very much! The repr() issue have annoyed me more than ten years, but now it's gone at last! I would much appreciate everybody participated the discussion, in spite of my crazy English. And thanks to Mark Summerfield, he helped me by fixing my English patiently. From greg.ewing at canterbury.ac.nz Wed Jun 4 03:49:00 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 04 Jun 2008 13:49:00 +1200 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References:

Message-ID: <4845F48C.6030500@canterbury.ac.nz> I don't understand all this stuff about getting unlocked buffers and unlocking buffers while keeping them alive, etc. The way I thought this was supposed to work is that the buffer is *always* locked while the client is accessing it, the only choice being whether it's a read-only or read-write lock. So the usage sequence is: 1) Client calls getbuffer() and receives a buffer pointer. The memory referred to by the pointer is now locked and will not move. 2) Client accesses the memory via the returned pointer. 3) Client calls releasebuffer(). The memory is now unlocked and the pointer is no longer valid. If the client wants to access the buffer again, it must go back to step 1. Are there use cases for which this sequence is not adequate? -- Greg From vdedaniya at gmail.com Tue Jun 3 08:44:29 2008 From: vdedaniya at gmail.com (vdedaniya) Date: Mon, 2 Jun 2008 23:44:29 -0700 (PDT) Subject: [Python-3000] Python certifications Message-ID: <6ab54660-6b02-4231-b21d-80ac7592248d@l42g2000hsc.googlegroups.com> Hello Experts, I want to give Python certifications. Can you please suggest how i should proceed? I do not want any training on Python as I am already working on it since last two years. I just want to know the authorized center in INDIA where i can give the Python certifications. Thanks in advance, Vishal From r.m.oudkerk at googlemail.com Tue Jun 3 21:16:31 2008 From: r.m.oudkerk at googlemail.com (r.m.oudkerk) Date: Tue, 3 Jun 2008 20:16:31 +0100 Subject: [Python-3000] [Python-Dev] Postponing the first betas In-Reply-To: <06BB1BB8-2C9C-4588-9B38-49DE234752E0@python.org> References: <06BB1BB8-2C9C-4588-9B38-49DE234752E0@python.org> Message-ID: On 02/06/2008, Barry Warsaw wrote: > meantime, Guido said: > > "I'd also like to see PEP 3138 (CJK-friendly repr()) and the > pyprocessing PEP implemented by then, and perhaps some other small > stuff." The pyprocessing unit tests crash with a fatal error when run on Linux with a debug version of the interpreter. This is because the GILState stuff is not fork aware. I submitted a patch some months ago: http://bugs.python.org/issue1683 Could somebody review it please. Cheers, Richard. From cvrebert at gmail.com Wed Jun 4 08:57:52 2008 From: cvrebert at gmail.com (Chris Rebert) Date: Tue, 3 Jun 2008 23:57:52 -0700 Subject: [Python-3000] Python certifications In-Reply-To: <6ab54660-6b02-4231-b21d-80ac7592248d@l42g2000hsc.googlegroups.com> References: <6ab54660-6b02-4231-b21d-80ac7592248d@l42g2000hsc.googlegroups.com> Message-ID: <47c890dc0806032357n2d3f3508y478a625c9f72d509@mail.gmail.com> This mailinglist is specifically about discussing Python v3.0, not general Python interest topics. As such, you question would be better suited to the comp.lang.python newsgroup than this list. - Chris Rebert On Mon, Jun 2, 2008 at 11:44 PM, vdedaniya wrote: > Hello Experts, > > I want to give Python certifications. Can you please suggest how i > should proceed? > > I do not want any training on Python as I am already working on it > since last two years. I just want to know the authorized center in > INDIA where i can give the Python certifications. > > Thanks in advance, > Vishal > _______________________________________________ > Python-3000 mailing list > Python-3000 at python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: http://mail.python.org/mailman/options/python-3000/cvrebert%40gmail.com > From stefan_ml at behnel.de Wed Jun 4 09:29:27 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 4 Jun 2008 07:29:27 +0000 (UTC) Subject: [Python-3000] Single buffer implied in new buffer protocol? References:

<4845F48C.6030500@canterbury.ac.nz> Message-ID: Hi Greg, Greg Ewing canterbury.ac.nz> writes: > The way I thought this was supposed to work is that the > buffer is *always* locked while the client is accessing > it, the only choice being whether it's a read-only or > read-write lock. I don't think there should always be a lock in the sense that the requestor is the only permitted accessor. Concurrent read access is common and easy to allow. Such an "always lock" scheme would disallow long-living buffer references. > So the usage sequence is: > > 1) Client calls getbuffer() and receives a buffer > pointer. The memory referred to by the pointer > is now locked and will not move. "not move" is just fine, but any locks should be requested explicitly using LOCK and maybe WRITABLE. > 2) Client accesses the memory via the returned pointer. > > 3) Client calls releasebuffer(). The memory is now > unlocked and the pointer is no longer valid. > > If the client wants to access the buffer again, it > must go back to step 1. or if the client wants to acquire a lock that it currently does not hold. That's pretty much what my locking scheme is proposing, with the difference of not caring about a possibly valid "buf" pointer being passed into getbuffer (which I implicitly allow in my proposal anyway). I like that scheme BTW. It works without an explicit UNLOCK and thus simplifies my proposal at the cost of a separate Py_buffer allocation for the case of a postponed LOCK request /after/ requesting the buffer for the first time. And the Py_buffer allocation will most likely happen on the stack anyway, as a LOCK is commonly held only during a function call life-time. I'll fix up the PEP and send another patch ASAP. Thanks for the feed-back, Stefan From stefan_ml at behnel.de Wed Jun 4 10:53:29 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 4 Jun 2008 08:53:29 +0000 (UTC) Subject: [Python-3000] Single buffer implied in new buffer protocol? References:

<4845F48C.6030500@canterbury.ac.nz> Message-ID: Greg Ewing canterbury.ac.nz> writes: > The way I thought this was supposed to work is that the > buffer is *always* locked while the client is accessing > it, the only choice being whether it's a read-only or > read-write lock. Thinking about this some more while updating the PEP: This scheme has the advantage of always guaranteeing a consistent buffer state, as no consumer can be granted write access while any other consumer has read access. However, getting exclusive write access is hard, as this requires that there are no reading consumers, including the consumer who wants to get write access. So you'd always have to release your own read buffer before acquiring a write buffer (which also implies that the provider must never deallocate the buffer just because all readers have released their buffer view). This encourages a short-read, short-write use pattern and always requires all consumers to call releasebuffer before any write access can be granted. I don't know if that scenario is realistic and/or efficient, but it would work and does not even require the LOCK flag, the WRITABLE flag would be enough. Comments? Stefan From greg.ewing at canterbury.ac.nz Thu Jun 5 03:02:43 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Jun 2008 13:02:43 +1200 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References:

<4845F48C.6030500@canterbury.ac.nz> Message-ID: <48473B33.8010001@canterbury.ac.nz> Stefan Behnel wrote: > I don't think there should always be a lock in the sense that the requestor is > the only permitted accessor. No, but there's always a lock in the sense that the provider is not allowed to move the memory while the buffer is in use. As for the other forms of locking, I'm still not sure whether that's something the buffer protocol ought to be concerning itself with. It's actually an orthogonal concept -- there's no logical reason you couldn't hold a read or write lock on an object *without* holding a move-lock on its buffer the whole time. I'm wondering whether it would be better to separate the two kinds of lock and have a different api for dealing with read/write locks. Conflating them seems to be making the buffer api confusing to talk about and complicated for a provider to implement. For example, consider a provider whose memory never moves. If the buffer api confines itself to move-locking, then that provider can ignore locking altogether. But if locking can include concepts of read/write access, it has to maintain state for the lock and handle the logic of managing it. Furthermore, the read/write aspect of the lock management logic is going to be pretty much identical for all buffer providers, suggesting it ought to be factored out somehow and implemented in one place. > Concurrent read access is common and easy to allow. > Such an "always lock" scheme would disallow long-living > buffer references. But you shouldn't be keeping long-lived buffer references in the first place. -- Greg From greg.ewing at canterbury.ac.nz Thu Jun 5 03:22:55 2008 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 05 Jun 2008 13:22:55 +1200 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References:

<4845F48C.6030500@canterbury.ac.nz> Message-ID: <48473FEF.3060503@canterbury.ac.nz> Stefan Behnel wrote: > So you'd always have to release your own read buffer before acquiring a > write buffer Yes, you really want to be able to upgrade your own lock from a read lock to a write lock, which means the provider has to keep track of who the lock holder is somehow. The more I think about this, the more I feel that implementing this form of locking is far too big a burden to place on every buffer provider. So I propose that it be declared outside the scope of the buffer protocol altogether. The only form of locking known to the buffer protocol should be move-locking, which is implicit in every call to getbuffer. Supporting this requires at most a simple counter, and if the provider never moves its memory anyway, it can ignore locking completely. > This encourages a short-read, short-write use pattern and always requires all > consumers to call releasebuffer before any write access can be granted. I don't > know if that scenario is realistic and/or efficient, but it would work and does > not even require the LOCK flag, the WRITABLE flag would be enough. It sounds like you're heading towards the same conclusion, if I understand what you're saying correctly. -- Greg From stefan_ml at behnel.de Thu Jun 5 09:12:06 2008 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 05 Jun 2008 09:12:06 +0200 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: <48473FEF.3060503@canterbury.ac.nz> References:

<4845F48C.6030500@canterbury.ac.nz> <48473FEF.3060503@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Stefan Behnel wrote: >> So you'd always have to release your own read buffer before acquiring a >> write buffer > > Yes, you really want to be able to upgrade your own lock > from a read lock to a write lock, which means the provider > has to keep track of who the lock holder is somehow. That's why I was initially proposing to pass in the original Py_buffer when requesting a lock. > The more I think about this, the more I feel that implementing > this form of locking is far too big a burden to place on every > buffer provider. I agree, especially since I expect read-only buffers to be quite common. Locking can always be done in Python space. That may require a Python function call and may thus be less efficient, but locking semantics can be pretty diverse and correctness is usually more important than absolute speed here. So I wouldn't mind leaving the locking business entirely to a separate protocol between providers and consumers, maybe with a short note in the PEP that the API of the provider should follow the locking API in the threading module as far as appropriate. One question: what does that mean for the mutable bytearray class? How would that handle locking? Stefan From ncoghlan at gmail.com Thu Jun 5 14:16:01 2008 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 05 Jun 2008 22:16:01 +1000 Subject: [Python-3000] Single buffer implied in new buffer protocol? In-Reply-To: References:

<4845F48C.6030500@canterbury.ac.nz>